عنوان مقاله

یادگیری تقویتی با راهنمایی هماهنگی



خرید نسخه پاورپوینت این مقاله


خرید نسخه ورد این مقاله



 

فهرست مطالب

مقدمه

سیستم RL دو سطحی

نتایج آزمایش

بحث و کارهای وابسطه

نتیجه گیری





بخشی از مقاله

تنها روشهای درست 

تابع جدولی معادل تقریب تابع خطی است که هر ویژگی متغیر بولی نظیر یک داده ورودی در جدول می باشد. در اینجا فقط بازیکنان هماهنگ شده و ثابت را باهم مقایسه می کنیم. میدان فوتبال 6 × 4 واحد بوده و بازیکنان RL دارای 2 بازیکن فوتبال در برابر 1 بازیکن برای هر تیپ حریف آماده می باشند. بازیکنان RL از سیاست های شبکه ای ϵ ، تنزیل (𝛾=0.99) با ثابت ϵ=0.1 و اندازه پله ثابت  (𝛼=〖10〗^(−3)) استفاده کردند.





خرید نسخه پاورپوینت این مقاله


خرید نسخه ورد این مقاله



 

کلمات کلیدی: 

Coordination Guided Reinforcement Learning Qiangfeng Peter Lau♠, Mong Li Lee† and Wynne Hsu§ Department of Computer Science National University of Singapore 13 Computing Drive, Singapore 117417, Republic of Singapore {plau♠,leeml† ,whsu§ }@comp.nus.edu.sg ABSTRACT In this paper, we propose to guide reinforcement learning (RL) with expert coordination knowledge for multi-agent problems managed by a central controller. The aim is to learn to use expert coordination knowledge to restrict the joint action space and to direct exploration towards more promising states, thereby improving the overall learning rate. We model such coordination knowledge as constraints and propose a two-level RL system that utilizes these constraints for online applications. Our declarative approach towards specifying coordination in multi-agent learning allows knowledge sharing between constraints and features (basis functions) for function approximation. Results on a soccer game and a tactical real-time strategy game show that coordination constraints improve the learning rate compared to using only unary constraints. The two-level RL system also outperforms existing single-level approach that utilizes joint action selection via coordination graphs. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning; I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, Search General Terms Algorithms, Performance, Experimentation Keywords Reinforcement learning, guiding exploration, coordination constraints, factored Markov decision process 1. INTRODUCTION Expert knowledge is commonly employed in large-scale reinforcement learning (RL) in a variety of ways. In particular, hierarchical RL handles single agent Markov decision processes (MDPs) by recursively partitioning them into smaller problems using a task hierarchy [19, 7, 1]. The task hierarchy constrains the solution space (policies) of the learning problem so that only relevant actions for a task can be selected at each time step. Learning a good task selection Appears in: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, Winikoff, Padgham, and van der Hoek (eds.), 4-8 June 2012, Valencia, Spain. Copyright c 2012, International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. policy will direct exploration towards the more promising parts of the MDP. For multi-agent problems, each agent has a set of actions whose Cartesian product forms the joint action space. This space is exponential in the number of agents and therefore, RL with naive exploration is slow. Hierarchical RL has been adapted to multi-agent problems [15, 9] by having one task hierarchy per agent where the actions are selected jointly. Once each individual agent’s task is selected, it will have a constrained (reduced) set of actions to consider. However, this framework cannot be easily extended to incorporate coordination behavior among multiple agents. Consider Fig. 1 which depicts a state in a soccer game and player P1 has the ball. Let N, S, E, W, be the four compass directions. P1’s action set is A1 = {S, E, pass2, pass3, shoot} where pass2 and pass3 denote passing the ball to players P2 and P3 respectively, and shoot denotes the action to kick the ball into the goal. Players P2 and P3 have the action set A2 = {N, S, E, W} and A3 = {N, W} respectively. We denote a joint action as ha1, a2, a3i ∈ A1 × A2 × A3. The size of this joint action space is 5 ×4×2 = 40. A closer examination reveals that much of this space does not need be explored as they are unlikely to lead to a winning state. For example, P1 certainly should not pass the ball to P2 if P2 is moving adjacent to an opponent as the ball can easily be intercepted. With this simple coordination strategy, the set of disallowed joint actions is {pass2} ×{S, E, W} × A3. Similarly, P1 should not pass the ball to P3 and the set of disallowed joint actions is {pass3} × A2 × A3. Immediately, the size of the joint action space is reduced by 35%. P1 P2 P3 (a) Bad pass P1 P2 P3 (b) Good pass Figure 1: Example states in a simplified soccer game. The white . versus black J players. In this paper, we focus on a central learner with multiple agents where communication is free. This corresponds to the scenario of a computer player managing an army in real-time strategy (RTS) games or a team of players in soccer. We aim to exploit coordination knowledge for improving the learning rate of good policies by modeling coordination among agents