PREPARATION OF ARTICLES IN TWO COLUMNS

MICRO-ROBOTS PLAYING SOCCER GAMES: A REAL IMPLEMENTATION BASED ON A MULTI-AGENT DECISION-MAKING STRUCTURE

JOSEP LLUIS DE LA ROSA peplluis@eia.udg.es
ALBERT OLLER aoller@etse.urv.es
Universitat de Girona
Universitat Rovira i Virgili

Abstract: Multi-Agent decision-making structures are expected to be extensively applied in complex industrial systems. Multi-robotic soccer competitions are a forum where rules and constraints are good to develop for MIROSOT, and test them. This paper describes an implementation of such a robotic team consisting of three micro-robots. It shows an agent-based decision-making method that takes into account an index of difficulty to execute collision-free path obtained by a path-planning method specially devised for this domain. Some real situations are used to explain how robots behave, and which are their movements through the playground. Then, some agents persistence problems are analyzed, and simulation results are shown.
Keywords : Decision-making, Multi-Agent Systems, Co-operation, Multi-robots, MIROSOT.
I. INTRODUCTION
Decision-making structures are expected to be applied in complex industrial systems (namely distributed systems like energy or material distribution networks, big production plants with several stations and dynamical or discrete processes with huge number of variables)
These systems are so-called Complex Systems (COSY) because all of them have some common properties: complexity, dynamism, uncertainty, and goal variability. COSY can be physically distributed, i.e., resources, data bases, plants, and so on.
To facilitate the automatic control community to develop agents in control systems analysis, design, and implementation, this work proposes to incorporate agents’ capabilities and facilities in Computer Aided Control Systems Design frameworks (CACSD). These facilities were initially introduced in [ 1] by means of an Agent-Oriented Language (AOL) called AGENT0[ 2] that consists of a language to program agents by means of belief, goals, commitments and capabilities.
This system is developed to do some research on physical agents design, and practical results are obtained from MIROSOT’97 [ 3] competition. Because of FIRA [ 4] rules, robots are limited to the size 7,5´ 7,5´ 7,5cm. and must run autonomously through a 130´ 90 cm. sized ground. So far, teams can use a centralized vision system to provide position and orientation of the robots so as ball position. Does this domain implies Agent Distributed Artificial Intelligence? This domain possesses many characteristics given that it is a real-world system, with multiple collaborating and competing robots. Here follows some properties [ 5] :
Complexity. Playmates do not possess perfect information about the game field, provided by the vision system.
Dynamism. Multiple playmates or opponents robots can independently execute actions in this environment.
Uncertainty. A real-time system with many vehicles ensures that even identical start states lead to very different scenario configurations and outcomes.
Goal variability. Playmates are constantly moving, then an individual robot at any one time is executing individual actions. When team plan changes, i.e., attack instead of defend, individual goals change instantaneously.
This properties fulfil in the COSY. This micro-robotic system is designed to get up to 0.6 m/s controlled motion. Work went on specializing the approach of automatic control community by introducing the classical structure behavior- supervision- control algorithms [ 6][ 7] . Thus, both the multi color based-vision system and the micro-robot control structure are specified in this paper.
This work focuses on how co-operation is implemented in this dynamic and real-world (real-time) domain. This co-operation among robots is introduced at the behavior (agent) level, that is also responsible for the behavior of each robot by exchanging messages to the supervision level (see "Fig.K").
Finally, this work has developed the soccer system that contains the following new features:
1) Explicit partition of the behavior-supervision-control levels, developing mainly the behavior-supervision levels.
2) Exchange of information between behavior « supervisor levels. The supervisor (developed with toolboxes with MATLAB/SIMULINK (M/S) [ 8] ) executes targets proposed by the behavior level, and the behavior level obtains suggestions about difficulty indexes in possible targets to feed its co-operative behavior.
3) The exchange of fuzzy evaluations in the behavior « supervisor communication is included. The fuzzy capabilities in the AOL of the behavior can use now fuzzy evaluations of difficulty indexes in possible targets.
II. INTELLIGENT AGENTS
A. What Agents Are?
The agent A beliefs in good_weather that agent B has told him. Before this first agent introduces this belief in his knowledge base together with other evidences he will criticise this information, that could be called modification of beliefs. The process of critics or review could work with three hypotheses:

Agent B wants that A goes out with him on excursion, and therefore is perfectly possible that he's trying to lie him. Perhaps A will not belief that good_weather because B isn't reliable.

Agent B, is an unreliable system because its reasoning or perception isn't good. For instance, that continuously says good_weather regardless of the current weather. In this case, the certitude coefficients that could emit aren't neither reliable.

Agent A beliefs whatever B informs (classical approach in systems exchange of information).

B. What Physical Agents Are?
Definition 1: Software agents. This term denotes a software-based computer system that has several properties [19] as autonomy, social ability, reactivity, pro-activeness, mobility, rationality, etc.
Physical Agents are software agents that contain the N/S (Numerical/Symbolical) and S/N (Symbolical / Numerical) interface that is typical of real systems, which according to [1] and [8] are constrained by imprecision, uncertainty, changing through time, and others.
One typical implementation of physical agents (but not the unique) is mobile robots, that in current research are progressively more and more autonomous and co-operative. The traditional AI has focused on symbolic paradigms (toy problems) and has not expended time on real applications. On the other hand, robotics has focused on design and construction of hardware and its control.
For solving current problems in autonomous robots, traditional AI has evolved into perception based and multi-agent approaches. The conjunction of AI and robotics in autonomous mobile robots that solve in an emergent way, complex problems by Cupertino is important, especially when the environment continuously changes because of the movement of the physical agents.
Having a "physical body" according to [18] could be summarised as:

Sensorial and action capacities are closely related.

The agent’s sensorial and actuation spaces have to be abstracted in the onboard resources (CPU memory, CPU speed, controllers, etc.).
This abstraction depends on the interactions that this agent experiments in the environment.
As a result of the abstraction, every agent has a representation model of the environment.
The interactions among agents and environment-agents are asynchronous, complex, and in concurrency in the real world. The physical agents are continuously running in continuous time.
On the other hand, the real physical interactions give good data for learning (software agents lack this).

Definition 2: Extension of the agent concept from software to physical agent. This definition consists of definition 1 and also a physical body.

Let us assume, for reasons of performance, that there is no single agent capable of doing a task alone. Every agent has to apply to the others for help. Then, each individual has to decide between the following problems: Which agent am I going to help to? On what terms am I committed to? The decision could be either deliberative or reactive. An interpretation of commitment [2] will be used for the agents. Agents that accept to help (are committed), have to know the implications of these commitments. In other words, whether they could do it or not.

For knowing what is possible or not to be done, some physical knowledge has to be taken into account. This means, physical inputs and outputs from the environment have to be mapped in the knowledge base of each agent. This is because agents have to control their physical body by means of proper physical decisions.

C. What Dynamical Physical Agents Are?

Previous analysis is true but now consider systems whose movements can be described by differential equations, that is, can be portrayed by its dynamics. Then the automatic control theory has something to assert.

Definition 3: Extension of the agent concept from physical agent to dynamical physical agent. Complementing Definition 2 with the following new assertion: The knowledge is obtained from dynamics of the physical body, which is represented by a further declarative control and supervision levels [16] [17].

III. MATERIALS AND METHODS

Planning of coordinated motions for robots playing soccer game opens a new deal in motion planning, namely coordinated motion planning in team competition. The game situation continuous varies, that is, this kind of game consist of a dynamic system which constantly evolves.

The robots are devised and controlled taking into account two main tasks:

Low-level tasks. Main task is to execute the movement commands with accuracy. This level consists of velocity closed-loop control implemented on the robots.
High-level tasks. Main task is to generate the movement commands to be executed by the robots. This process is done by using two computers: one for the vision system and the other one for the command generation procedure.

A: Technical Specifications of the Robots:

According to the FIRA rules, robots have strong limitations on their size so that communication features, such as autonomous motion capabilities, must be implemented by using miniaturized instruments. Therefore, it is necessary to design a control architecture that can adequately control the robot without overtaxing the available computation resources. Paying attention to "Fig. 1", robots are composed by one receiver, a CPU unit, a power unit, two micro-motors, and batteries for motion, communication and control unit supply voltages.

1. GENERAL VIEW OF THE MICRO-ROBOT

The hardware robot is devised from a 87C552 microcontroller, which is capable to change the speed of the robot using two PWM outputs according to the velocity control requirements. The actuators are two DC motors, including gear and encoder in the same axis of the motor. The connection between microcontroller and these motors is made through a L293 chip. Optional sensors consist of two photoelectric ones: one is placed high enough in order to detect the obstacles but not the ball, and the other is placed to detect the ball into a closed area in front of the robot. By integrating these two sensors, robot control procedure can efficiently distinguish which kind of thing is detected in front of the robot: ball or obstacle. Finally, communication between the host computer and robots is carried out by RS232 radio-communication, then every robot has one receiver and one identifying code (here called ID_ROBOT).

Here follows a brief description of some important components of the robots:

87C552: This microcontroller contains a non-volatile 8K´ 8 read-only program memory, a volatile 256´ 8 read/write data memory, 6 configurable I/O ports, two internal timers, 8 input ADC and a dual pulse with modulate output.

L293: Contains two full bridges and drivers, with an external component which is capable to control the speed and the sense of rotation of both motors.

Sensors: Two optional photoelectric switches with adjustable distance of detection. Its outputs are open collector.

Motors: Maxon DC motor, 9V, 1,5 W, nominal speed of 12.300 r.p.m., and Maxon gear (16:1).

Encoder: Digital magnetic encoder with 16 counts per turn.

Receiver: Super-heterodyne receiver with monolithic filter oscillator (433,92 Mhz).

Batteries: Each robot is supplied by 9 batteries of 1,2 V - 1000 mA/h.

2. TOP VIEW OF ROBOTS (A) BOARD DETAILS, (B) CHASSIS SCHEMA

B. Overall System

Robots do not have any camera on-board so that a global vision system is needed. In this work, the vision system [ 9] is implemented on a Pentium PC-computer (PC_2, see "Fig. 3") which process 7 frames/s. Obtained data are positions of all the objects in the game field (ball and robots) and orientations of the proper team robots. Data are shared with another computer using TCP/IP communication protocol (PC_1, see "Fig. 3"), and then used for the command generation procedure. This computer host also process strategy parameters, decision-making and collision-free path planning algorithms. The emulation of software agents is in the host computer with feed back from the physical agents, that is to say, the robots.

3. HIGH-LEVEL CONTROL STRUCTURE

C. Low-level Control

This section explains the main program of the robot which allows velocity control. When movement commands sent by the host computer are received by the robot, they are immediately executed. Every command has the following frame:

ID_ROBOT | #movements | mov_1 | mov_2 |. . . | ETX (1)

,where mov_X := linear_velocity | rotation_ velocity | time. Since the maximum speed of the robot can be 120m/s (straight way), usual speed values are set up to 60cm/s using a slope function in order to avoid skidding. When the last movement is executed, another slope function is used to stop the robot. The control of the movements is carried out by using three signals: those coming from the encoders (velocity closed-loop) and batteries voltage.

While movements are executed, the sensors can interrupt the main program because photoelectric sensors are directly connected to the interruption lines. When an obstacle or a ball is detected, the robot can react according to interruption function procedure; on-board sensors can be helpful to the robot in order to react in front of obstacles and walls, or look for the ball. These two sensors are optional in the sense that it is possible to get the robot totally or partially controlled with information from the host computer. Without sensors (disabled) the command frame is the previous frame (1), but when the sensors are enabled then frames must be as follows:

mov_X := linear_velocity | rotation_ velocity | time | behavior_byte

The value ‘behavior_byte’ is used as a parameter in the interruption function so that the movement of the robot becomes technologically speaking more autonomous, that is to say, reactive. Then, this kind of frame allows the robot itself to respond to contingencies in real time. However, these sensors were disabled for MIROSOT competition to avoid purely reactive behavior because they can get the robot stuck in loops. An example of this loops occurs when the robot is placed in front of the wall: the host computer sends commands to move the robot forward in order to turn it whenever the reactive robot is forced to back up when upper IR sensor detects the wall.

Although random perturbations will eventually get a reactive robot out of many loops, the environment of the soccer game in MIROSOT competitions is very dynamic: it consists of six robots moving fast through a 130´ 90 cm. sized ground. Under this point of view, the sensors in MIROSOT competition where disabled. In future competitions the sensors will be enabled using internal states to improve performance as shown in [ 10] .

D. The vision System

Every robot is equipped with two color bands: the first one is the color of the team and the other one is a robot specific-color. Finding the center of gravity of both regions allows the system to know the position and orientation of the robots.

The vision system is composed of a color camera, an specific color processor [9] and a multiple purpose frame-grabber. The camera is located at 2 meters over the playing field and the vision system is able to segment and track the moving robots and the ball over a sequence (see "Fig. 3"). The segmentation step is performed by means of an specific processor which performs a real-time conversion from RGB to the HSI space [ 9] . This processor uses the discriminatory properties of the three color attributes: hue, saturation, and intensity, in order to segment everyone of the regions in scene. Selecting different values o these attributes for every object avoids the correspondence problem in multi-tracking environments. The result of the segmentation model is an ../files/image 768´ 576 with several labeled regions on a black background. The cycle time of the segmentation module is 40 ms, that is, video rate on the CCIR standard.

The second step consists of the computation of the position and orientation of every object by using the enhanced ../files/image from the segmentation module. As a first step a median filter is applied to the region-labeled ../files/image, in order to reduce noise. Then a multiresolution approach is applied: a second ../files/image is constructed, taking one pixel over four from the filtered ../files/image. This second ../files/image is scanned searching for the labeled regions, and when these regions are detected then the exact position is computed in the high-resolution ../files/image, to increase accuracy of the measurements.

4. TRACKING SEQUENCE

There is a state vector which keeps the information about the position of every robot. At every time step, a new prediction is performed of the next position of every object using a 2D polynomial regressive model over the low-resolution ../files/image. This prediction is used in the next time step in order to search for the object, and an error factor is measured to be taken into account for the next prediction. The cycle time of the tracking module depends on the data to be processed and the accuracy of the prediction. Normal cycle time is around 150 ms.

At the end of every processing cycle the vision system sends the state vector to the host computer by means of a TCP/IP connection.

E. Path Planning

The used algorithm is based on Artificial Potential Fields (APF) approach [ 11] and some modifications proposed in [ 12] . In that way, the playing field is equivalent to a 36´ 48 matrix. Then, a Cartesian co-ordinate system is used to define the position of the center of each square cell, which allocates the value of the associated potential. When the guidance matrix elements have been settled, the algorithm will systematically scan the robot’s three neighboring cells to search for a potential value that is lower than that of the cell currently occupied by the robot. The trio of the cells to be scanned (see "Fig.E") depends on the orientation information given by the relative direction from the robot to the set-point position, that is to say, eight possible orientations: {North (N), North-East (NE), East (E), South-East (SE), South (S), South-West (SW), West (W), North-West (NW)}. In such a way, the scanning method is goal finding.

5. EXAMPLE. ONLY CELLS WITH DASHED LINES ARE SCANNED WHEN RELATIVE DIRECTION IS SE

Obtained guidance matrix give a path that could followed by the robot but the real execution of such path has no sense because it has a high number of turns. Therefore, a filtering process is applied and a smoothed path is obtained, as shown in "Fig. 6". At the end of the algorithm, an index of difficulty to execute this collision-free path is obtained taking into account the number of turns and the whole distance to be covered by the robot. This index of difficulty is normalized from 0 to 1:

0 ® Set-point position is straight way.

1 ® Set-point position can not be reached in 500 steps.

F. Co-operation

The decision-making structure is devised in two hierarchical blocks: higher one contains those tasks related to the collectivity (co-operative block), that is to say, communicative actions, and lower one where the agent executes its private actions (private block).

6. EXAMPLE OF COLLISION-FREE PATH FINDING RESULT. (A) 3-DIMENSIONAL VISUALIZATION OF THE POTENTIAL FIELD ASSOCIATED TO THE GAME FIELD. THIS EXAMPLE CONTAINS FOUR OBSTACLES. (B) ZOOM IN AREA: DOTTED LINE CORRESPONDS TO FILTERED PATH. INDEX OF DIFFICULTY = 0,7.

Private Block. Every agent is able to execute several actions according to a set of capabilities, in terms of AGENT0 language. In this block, this set is forced to include persistence capabilities because the environment is dynamic and continuously changing. This block manages several types of information so that three levels are defined depending on the abstraction degree of this information (see "Fig. 11"): agent level (behavior decisions), supervisory level (path planning -hidden intentions- and set-point calculations), and control level (execution of set-points).

Here follows some examples of real situations with only one robot; it must score to the right side of the plots. Data are acquired every 150 ms., and plots uses a Cartesian co-ordinate system where units corresponds to ‘cm’. In this examples, the behavior of the robot is set manually but the system automatically assigns different actions to the robot.

Example 1:

Situation® "The ball is not moving and close to the goal"

Behavior ® Defend_as_goalkeeper

Actions® Go_back_to_ball, Stand_close_to_goal_following_the_ball

Example 2:

Situation®"Initially the ball is not moving. Then it moves"

Behavior ® Defend_as_goalkeeper

Actions® Go_back_to_ball, Stand_close_to_goal_following_the_ball

Example 3:

Situation®"The ball is not moving and it is easy to score"

Behavior ® Attack_as_goal_scorer

Actions® Go_back_to_ball, Obtain_good_orientation, Shoot

7. (A) EXAMPLE 1, (B) EXAMPLE 2

Example 4:

Situation ® "The ball is not moving but it is not easy to score"

Behavior ® Attack_as_ goal_scorer

Actions® Go_back_to_ball,.Avoid_wall, Obtain_good_orientation, Shoot

8. (A) EXAMPLE 3, (B) EXAMPLE 4

Example 5:

Situation ®"The ball is not moving"

Behavior ® Protect_goal

Actions ® Go_to_baricenter, Stand_there

9. EXAMPLE 5

Co-operative Block. The agent level is the connection between the agent itself (autonomous robot) and the collectivity so that it includes communicative actions to request or inform about three parameters which will be helpful to the agent decisions. These three parameters consist of which behavior has been decided (private block), what is the index of difficulty of its execution, and what is the relative distance from the robot to the set-point position. This block is based on Agent-Oriented Programming (AOP), and uses a subset of the communicative primitives defined by AGENT0: {‘request’, ‘inform’}.

With this structure, decision-making procedure can take into account both collective and individual decisions, based on the communication between agents and the exchanged information between the behavior and supervisory levels. By this way, collective decisions are based on strategy rules, and individualized decisions are only based on the index of difficulty and the relative distance from the robot to the set-point position. These two parameters are calculated in the supervisory level and, because it is contained into private block, it contains different parameters for every agent, in other words, different behavior for every agent that is included in the path planning method.

Here follows some more examples of real situations with three robots; one has ‘Protect_goal’ behavior constantly assigned. In this examples, the behavior of two robots are set automatically by co-operation procedure using relative distances as evaluation parameters. Although these two examples use relative distance as the evaluation parameter, the use of the index of difficulty as the evaluation parameter shows the same results. It must be noticed that both parameters are useful when obstacles are present in the game. Then, behaviors are changed when relative distances are on some predefined threshold values.

Example 6:

Situation®"The ball is not moving and near to the wall"

Behavior_1® Protect_goal (because D1>D2) Behavior_2 ® Attack_as_ goal_scorer (D2<D1)

Actions_1 ® Go_to_baricenter, Stand_there

Actions_2 ® Go_back_to_ball, Avoid_wall, Obtain_good_orientation, Shoot

Example 7:

Situation ® "The ball is not moving and near to the wall"

Behavior_1® Attack_as_goalie (because D1<D2) Behavior_2 ® Protect_goal (D2>D1)

Actions_1® Go_back_to_ball, Avoid_wall, Obtain_good_orientation, Shoot

Actions_2 ® Go_to_baricenter, Stand_there

10. (A) EXAMPLE 6, (B) EXAMPLE 7

IV. DESING OF THE CO-OPERATION SYSTEM

The scheme in "Fig. 11" represents the way the information, analysis, design and implementation, is distributed in each level. Also, all co-operative actions are responsibility of the behavior level. The AOP and agents could be applied to any levels although to start the application of agents in the behavior level is advisable.

11. THIS SCHEME REPRESENTS THE WAY THE INFORMATION MOVES

The co-operation of the team of robots is done at the agent (behavior) level, that is also responsible for the behavior of each robot by exchanging information to the supervision level. Originality of this work is the representation of co-operative behavior by means of an AOL that is developed in a CACSD framework so that the link behavior-supervisor could be easily implemented with tools from the automatic control field.

However, new difficulties appear in the functional protocol between the behavior and the supervisor levels, namely the persistence of goals that the AGENT0 language lacks. This lack is overcome by tuning the sample time of agent® supervisor communication that leads to persistence develop at the supervisor level.

A. Persistence: Which is the problematic of all three levels [ 6][ 13][ 14] ? The fact is that the behavior level controls the path planning of the supervision level, and the latter calculates set-points and supervises the performance of the control level that is responsible for the effective movements of the micro-robots. The environment of the game is changing continuously and this represents a difficult scenario for doing path planning that is responsibility of the supervision level. The behavior level must overcome this problem by choosing correctly the target positions that the path planner must get through. But, to do that the behavior level needs of some evaluation of the difficulties of possible targets, to decide what to do and then start the co-operative procedures to finally precisely decide what to do.

In continuously changing scenario the robot could be the whole time changing decisions of where to go. This problem we call it lack of persistence because from agents’ point of view, the decisions of the behavior level do not last enough time to success in targets, regardless of whether the targets are accurate or not. This persistence could be developed at the behavior level or the supervision level. If the behavior level does not work till the last target is succeeding by the supervision level then this is the case the responsibility for persistence is due to the supervision level. On the other hand, if the behavior level is the whole time inspecting the accuracy of targets but explicitly dealing with persistence of goals so that any change is a measured decision (not purely reactive), and then this is the case the responsibility is due to the behavior level.

To implement the latter case, responsibility for persistence in the behavior, a new AOL is required to explicitly represent intention, because the AGENT0 language does not contain this specification. Future work is to implement this persistence at the behavior level by means of PLACA language, following the instructions of [ 15] . Therefore, in this work the former case of responsibility due to the supervision level is initially chosen because of its simpler conception.

To solve the problem of difficulty evaluation of situations to help the behavior level to decide, a fuzzy measure is implemented, obtained from the path planner procedure (section 2.5.). This difficulty measure depends on the value of two parameters: the number of cells (or matrix elements) to be crossed by the robot, and the number of scanned cells. With the first parameter, the supervisor level can obtain a measure of distance and, combining both parameters a fuzzy measure of turns can be also obtained. When supervisory level knows these two values, behavior (agent) level uses them to improve its reactive and cooperative decisions.

Here follows another example similar to example 6, but now the ball is moving. In this example the persistence is at the behavior level, and only the relative distance is used as evaluation parameter.

Example 8:

Situation ® "The ball is moving. Start point is near to the wall"

Behavior_1 ® Protect_goal (because D1>D2)

Behavior_2 ® Attack_as_goalie (because D2<D1)

Actions_1 ®Go_to_baricenter, Stand_there

Actions_2 ®Go_back_to_ball, Obtain_good_orientation, Go_back_to_ball

EXAMPLE 8. SHOWING THE PERSISTENCE OF ROBOT 2

B. Simulation Results: The simulator introduced in [ 1] that uses the tools of M/S contains good continuous dynamic behavior of the soccer micro-robots and ball movements. The behavior level is inspected in some cases, thus testing the strategy and its impact in a continuous world. The simulator also provides with animation tools to see the evolution of the system over the soccer field. One result due to the discrete nature of the behavior level is that cyclic sequences sometimes appear because of the non linear components of the negotiation algorithms. For instance, cycle normally comes up when two agents decide ‘Go_back_to_ball’ action at a time: the fulfillment evaluation oscillate periodically because the environment conditions change significantly in a short period of time.

However, the simulation permits to evaluate the working conditions of the decision process and its impact in a continuous world. This decision process depends on the selected profile of each agent in the case that specialized rolls are intended.

V.CONCLUSIONS

The previous work [ 1] contained important new items such as the adaptation of an AOL (AGENT0) to the M/S application. Although M/S has not proper language to implement AOL the AGENT0 development was possible although difficult. That implementation contemplated the automatic code generation with RTW (Real-time Workshop) toolbox of M/S that was partially solved. Another novelty was the inclusion of fuzzy logic in the behavior decision process, incorporated to the AGENT0 language.

Lacks of the current work:

A.- The persistence problem is not finely solved because the solution of focusing on the supervision level to maintain the decision of the behavior level is rather rough due to the highly changing nature of the soccer game. The persistence means to maintain previous decisions while it is obvious there is no proper alternative.

B.- To link supervision « control levels is also necessary because supervision systems are not so far fully developed to calculate physically good set-points that the control level of the robot could execute. To incorporate this exchange, new controllers technology should be included to robots.

C.- A new language closer to control engineers and applied mathematics is necessary to develop agents under AGENT0 language.

D.- Engineers do still not accept the design methodology proposed in previous work [ 1] since they do not have the tools to develop it.

The experience in MIROSOT is about the lack of robustness in our approach since, even though the persistence in agents was a task of the supervision level, the generation of events from the dynamically (continuous state and time dynamically variables) is a complex matter. The suggestion to solve it is to apply theory from hybrid systems.

Future work is to introduce predictive control algorithms in the control level to easy the supervision « control activity. Also, the PLACA language [ 15] will be included to compare how does it improve our current work in terms of persistence and analysis representation poverty, after the improvement of the behavior « supervision communication contents. Finally, an increase of the use of agents in simple automatic control application is expected, with the development of agents’ librarian toolboxes within this framework.

VI. ACKNOWLEDGEMENT

This work has been partially funded by the TAP98-0955-C03-02 "Diseño de agentes físicos (DAFNE)" y TAP99-1354-E, "Eurobot-Iberia: red temática en agentes físicos" of the CICYT Spanish government.

VII. REFERENCES

[ 1] De la Rosa J.Ll., Oller A., et al. "Soccer Team Based on Agent-Oriented Programming". Robotics and Autonomous Systems. 1997. No. 21, pp. 167-176.

[ 2] Shoham Y. "Agent-Oriented Programming". Artificial Intelligence. 1993. No. 60, pp. 51-92.

[ 3] MIROSOT is the acronym of IEEE Micro-Robot Soccer Tournament, held in November 5-11 in KAIST (Taejon, Korea), 1997.

[ 4] FIRA is the acronym of Federation of International Robotic-soccer Association. http://www.fira.org.

[ 5] Tambe M. "Teamwork in real-world, dynamic environments". Proceedings of the International Conference on Multi-agent Systems (ICMAS). December 1995.

[ 6] Brooks R.A. "A Robust Layered Control System for a Mobile Robot". IEEE Journal of Robotics and Automation. March 1986. Vol.RA-2, No.1, pp. 14-23.

[ 7] Chatila R. "Deliberation and Reactivity in Autonomous Mobile Robots". Robotics and Autonomous Systems. December 1995. Vol. 16, pp. 197-211.

[ 8] MATLAB (ver. 4.2c.1) and SIMULINK (ver. 1.3c) are trademarks of Mathworks Inc. 1994.

[9] Forest J., García R. et al. "Position, Orientation Detection and Control of Mobile Robots through Real Time Processing". Proceedings of World Automation Congress, Robotic and Manufacturing Systems. May 1996. Vol. 3, pp. 307-312.

[ 10] Gat E., Rajiv R., et al. "Behavoir Control for Robotic Exploration of Planetary Surfaces". IEEE Tran. Robotics and Automat., 1994. Vol. 10, No. 4, pp. 490-503.

[ 11] Kathib O. "Real-Time Obstacle Avoidance for manipulators and Mobile Robots". International Journal of Robotics Research. 1986. Vol. 5 No.1, pp. 90-98.

[ 12] Clavel N., Zapata R., et al.. "Non-holonomic Local Navigation by means of Time and Speed Dependent Artificial Potencial Fields". IFAC Intelligent Autonomous Vehicles. 1993. pp.233-238.

[ 13] Albus J.S. "Outline of a theory of Intelligence". IEEE Transactions on Systems, Man and Cybernetics. May 1991. Vol.21, No.3, pp. 473-509.

[ 14] Simmons R.G. "Structured Control for Autonomous Robots". IEEE Transactions on Robotics and Automation. February 1994. Vol. 10, No. 1, pp. 34-43.

[ 15] Thomas S.R. "A Logic for Representing Action, Belief, Capability, and Intention". Stanford working document. Stanford, CA. 1992.

[16] Kitano H., Veloso M., et al., "The RoboCup Synthetic Agent Challenge 97". XV IJCAI-97 International Joint Conference on Artificial Intelligence, Vol 1, pp.24-29, Nagoya, August 1997.

[17] Oller A, de la Rosa J.Ll., and del Acebo E., "DPA2: Architecture for Co-operative Dynamical Physical Agents", MAMAAW ’99, Valencia June 1999

[18] Asada M., Kuniyoshi Y., et al. "The RoboCup Physical Agent Challenge". First RoboCup Workshop in the XV IJCAI-97 International Joint Conference on Artificial Intelligence, pp.51-56, 1997.

[19] Wooldridge M., Jennings N. "Intelligent Agents: Theory and Practice". Knowledge Engineering Review, October 1994.

VIII. CURRICULA

Josep Lluís de la Rosa es Doctor en Ciencias por la Universidad Autónoma de Barcelona desde 1993 http://eia.udg.es/~peplluis. Tras consecutivas estancias pre y postdoctorales en Francia, es el director del grupo de investigación eXiT, Ingeniería de Control y Sistemas Inteligentes, desde el 1994, http://eia.udg.es/exit. Sus trabajos más conocidos son los agentes físicos y especialmente los relacionados con los desarrollos con robots futbolistas, http://rogiteam.udg.es

Albert Oller: Físico por la Universidad Autónoma de Barcelona. Actualmente es profesor asociado. La investigación se centra en el ámbito del diseño de agentes físicos dinámicos.

AGENTES FÍSICOS DINÁMICOS PARA MEJORAR LA COOPERACIÓN ENTRE ROBOTS MÓVILES. EJEMPLO DE UN CONVOY DE DOS ROBOTS

JOSEP LLUIS DE LA ROSA peplluis@eia.udg.es
JOSEP ANTONI RAMON jar@eia.udg.es
Enginyeria de Control i Sistemes Intel·ligents (eXit) Escola Politècnica Superior
Universitat de Girona Universidad de Girona

Resumen: En los últimos años el tratamiento de sistemas robóticos móviles y cooperantes se ha basado en el paradigma agente. Tanto en las técnicas de análisis y diseño como en los algoritmos de decisión individual y colectiva de estos sistemas, no se tienen en cuenta las ecuaciones diferenciales que describen la dinámica del movimiento de los robots. En este trabajo se presentan algunos resultados encaminados a introducir dichas ecuaciones en la toma de decisiones individuales y colectivas de dichos agentes. Para ello se ha extendido la interpretación del agente de Shoham no ya como agente físico sino como agente físico con una dinámica propia la cual conlleva que ciertas acciones sean realizables y otras no.
I. INTRODUCCIÓN
El campo de la robótica móvil está asistiendo a una emergente cantidad de trabajos relacionados con la problemática del trabajo cooperativo en sistemas multi-robot gracias a la introducción del paradigma agente como herramienta de análisis y diseño, y a su inminente aplicabilidad. Sin embargo, aunque la introducción de dicha técnica de inteligencia artificial permita resolver problemas complejos mediante cooperación, no tiene en cuenta que su aplicación en robots móviles necesita que el agente físico sea capaz de controlar el cuerpo físico asociado, es decir, ejecutar trayectorias correctamente, seguir y evitar objetos, etc. De hecho, cualquier tarea por simple que sea contiene infinitas situaciones debido a los continuos cambios del entorno y a los propios movimientos del robot-agente, o también llamados acciones. Los estudios con robots reales como [1], [2], [3]
se dedican a testear diversas técnicas de negociación y protocolos de comunicación mientras que los problemas que causan las dinámicas de los robots se interpretan desde el punto de vista de supervisión y de detección de fallos. Es decir, son problemas que se analizan a posteriori y no en la etapa inicial de análisis.
El diseño de los controladores incorporados en los sistemas autónomos realimentados (robots móviles) se hace teniendo en cuenta diversas especificaciones en la respuesta del sistema ante diversos tipos de entradas. En este trabajo se mostrará que hay situaciones en que este tipo de análisis puede ser insuficiente puesto que no se tienen en cuenta ciertos aspectos dinámicos que pueden ser importantes no ya en la respuesta del sistema, sino en consecuencias físicas derivadas. Por tanto, aunque la decisión del agente sea correcta desde un punto de vista lógico, la ejecución de la acción correspondiente conllevará consecuencias no deseadas.

Este trabajo se centra en como evitar los problemas que se derivan del proceso de decisión cuando no se utiliza la información del mundo físico, en concreto, la información relativa a las características dinámicas del sistema físico subyacente al agente. A modo de ilustración, se utilizará un ejemplo de un convoy formado por dos vehículos autónomos: el problema que se planteará se concentra en cómo controlar la velocidad del segundo vehículo de forma que mantenga una cierta distancia de seguridad con el primero. Para la resolución de esta problemática se propone ampliar la noción del agente de Shoham [4] incluyendo parámetros dinámicos en las capacidades. Así, se obtiene un agente físico conocedor de su dinámica de forma que sus limitaciones físicas pueden influir en la toma de decisiones e inhibirlas en caso que supongan acciones no realizables. Esta ampliación supone una mejora cualitativa respecto a la que se introdujo en [5] en dónde el proceso de decisión incluía un parámetro de certeza que aparecía de forma natural puesto que el método de razonamiento estaba basado en lógica fuzzy. En este trabajo, independientemente del método de razonamiento, también aparecerá un parámetro de certeza indicativo del grado de precisión de la acción de control, que está completamente vinculado a las características dinámicas del sistema físico (cuerpo físico) del agente.
II. EL PROBLEMA DE LA COLISIÓN EN UN CONVOY
Sea un sistema formado por dos vehículos (A y B) que circulan en línea recta y con velocidades independientes. El problema de control que se plantea es como controlar la velocidad del vehículo de atrás (V_B) de forma que mantenga una cierta distancia de seguridad (D) ante variaciones de V_A. Desde un punto de vista lineal, la dinámica de este sistema convoy controlado es como sigue: el sistema A es el guía, y dispone de un sistema de control aparentemente independiente de B. El segundo vehículo tiene una estructura inherentemente más compleja puesto que el mantenimiento de la formación está completamente bajo su responsabilidad. Asumiendo que los sistemas A y B presentan ambos funciones de transferencia de primer orden en lazo cerrado (G_A(s) y G_B(s)), el comportamiento en situación de convoy del sistema B responde a un sistema de segundo orden (véase la figura 1). Este cambio cualitativo de dinámica en los sistemas convoy debe tenerse en cuenta para obtener un buen comportamiento en situación de formación.
La variable a controlar es V_B con la especificación de mantener una distancia regular D entre los dos sistemas autónomos. El problema se puede atacar con un controlador proporcional de forma que la señal de control sobre G_B(s) es: K_P´(D - (X_A-X_B)). En la figura 1 se muestra la evolución del sistema para las condiciones iniciales D=5m, V_A(t₀)=V_B(t₀)=8m/s, X_A(t₀)-X_B(t₀)=0, y con el cambio V_A(t=5)=4m/s. Como puede observarse, el comportamiento de V_B(t) es de segundo orden.

FIG.1: VELOCIDADES DE LOS MÓVILES A Y B

En el análisis previo no se ha hecho mención alguna sobre la posibilidad de colisión. Sin embargo, a causa de este comportamiento de orden superior parece razonable deducir que el uso de información sólo de tipo estático puede resultar insuficiente en las etapas de decisión agente-agente. Por tanto, la decisión basada únicamente en parámetros estáticos puede tener resultados peligrosos (colisiones, como en este ejemplo) cuando se aplica en agentes físicos dinámicos.
III. INTRODUCCIÓN DE PARÁMETROS DINÁMICOS EN LOS AGENTES
Existen dos tipos de comportamiento para los agentes: el reactivo y el deliberativo. En líneas generales, en un tipo reactivo el agente guía decidiría frenar o acelerar sin tener en cuenta la dinámica del otro agente y sería pura responsabilidad de éste el no colisionar. Este comportamiento se corresponde con lo que se ha mostrado en la sección anterior. Sin embargo, en el deliberativo los dos agentes pueden negociar e intercambiar información y el guía modifica su conducta basado en la opinión del otro. Así, cuando los parámetros dinámicos (y estáticos) de un agente no permitan la ejecución de la acción correspondiente, la decisión quedará inhibida y deberá ser negociada hasta llegar a un acuerdo. Por tanto, para basar la decisión en la dinámica de ambos agentes será necesario contar con su cooperación para que las decisiones se tomen conjuntamente. Para llevar a cabo esta idea se ha tomado como referencia el AGENT0 [4], un lenguaje de programación orientado a agentes: el agente queda especificado con diversos conjuntos de datos (capacidades, creencias y compromisos), y un conjunto de reglas de compromisos. El funcionamiento del agente se basa en la ejecución de estas reglas cuyos consecuentes son las acciones. Las acciones pueden ser comunicativas, las cuales permiten el intercambio de información, y privadas, correspondientes a los procesos internos del agente.
Este estudio se basa en incluir la dinámica de los vehículos añadiendo un subconjunto de capacidades. Estas nuevas capacidades se construyen a partir de los resultados de la simulación de la respuesta del sistema frente a un conjunto finito de consignas y con variaciones de los parámetros del controlador. Con estos resultados se crea una base de datos en donde constan, para cada consigna o ensayo, los parámetros dinámicos fundamentales de la respuesta, léase ganancia, tiempo de respuesta, parámetros de control, estabilidad, etc. De esta manera, cuando el agente guía comunique su intención de frenar o acelerar a una determinada velocidad podrá especificar directamente el tiempo requerido para llevarla a cabo, o la precisión de su ejecución u otros datos dinámicos de interés para la resolución del problema. Para sistemas de primer orden como los presentados en la segunda sección, se han utilizado los siguientes parámetros para construir las capacidades:

Acción (Act)	Consigna (SP)		Constante proporcional del controlador (K_P)
Tiempo de respuesta (t )		Error en estado estacionario (e_SS)

TABLA 1. LISTA DE PARÁMETROS USADOS EN LA CONSTRUCCIÓN DE LAS CAPACIDADES DEL AGENTE.
En estas condiciones, el agente ‘A’ está en disposición de pedir al ‘B’ si un cambio de velocidad de V_I a V_F en un tiempo determinado puede producirse sin peligro de colisión.
IV. INTERCAMBIO DE PARÁMETROS DINÁMICOS: EJEMPLO CON EL CONVOY
Una vez se ha conseguido que el agente físico sea consciente de sus limitaciones físicas, debe ser capaz de usarlas en el proceso de decisión de acciones. Para ello hace falta establecer el mecanismo de negociación y usar las acciones comunicativas para permitir el intercambio de información. De las acciones comunicativas inicialmente propuestas en AGENT0, aquí se utilizarán dos: request y inform, y respecto a las acciones privadas sólo se han definido dos: ‘frenar’ y ‘acelerar’, pero matizadas con la introducción de los parámetros dinámicos.
En cuanto al subconjunto de capacidades, se han ordenado de forma que cada capacidad quede especificada con el siguiente conjunto de parámetros: {specified_time_for_action, initial_state, final_state, certainty}. La variable specified_time_for_action t_d se define como un múltiplo (3 veces) de la constante de tiempo (t ) de la respuesta dinámica del sistema. El parámetro initial_state e_i se define como la velocidad inicial; el final_state e_f corresponde a la velocidad final; y certainty corresponde a un coeficiente calculado a partir del conocimiento de la dinámica y de la estática del sistema f(G(s),t_d,e_i,e_f). Por ejemplo, se podría definir este coeficiente a partir de la constante de tiempo y del error del sistema en estado estacionario. En sistemas más complejos se pueden definir coeficientes más elaborados: para la ejecución de trayectorias 2D libres de colisión se puede usar un índice de la dificultad de ejecución de dicha trayectoria [6].
Respecto al mecanismo de negociación, a continuación se muestra como funciona el protocolo de comunicación entre los agentes A y B para el caso SP_A<0, es decir, cuando el guía decide ejecutar la acción ‘frenar’ (en cursiva se muestran los parámetros dinámicos):
(i) ‘A’ decide cambiar su velocidad y pide confirmación a ‘B’:
request( A, B, v_init_A, v_fin_A, time, t_A, e_SS)
(ii) ‘B’ recibe el mensaje y procesa la información:

Cálculos previos: SP_A= v_fin_A - v_actual_A

Acción privada (procedimiento de búsqueda)

IF_EXISTS Essay WITH SPÎ[SP_A- e_SS , SP_A+e_SS ]
THEN SELECT Essay
t_B = t (#Essay)
e_SS_B = e_SS(#Essay)
ELSE [t_B, e_SS _B]=calculate_time(SP_A, time)
END

Acción comunicativa

IF t_B<t_A
THEN inform( B, A, v_fin_A, time)
do( B, v_fin_A, time)
ELSE request( B, A, v_fin_A, t_B, e_SS _B )
END
(iii) ‘A’ recibe la respuesta. Si ‘B’ inhibe la acción se negocia t :
IF answer IS_TYPE ‘inform’
THEN do( A, v_fin_A, time)
ELSE [t_A, e_SS _A]=calculate_time( SP_A, 4´ (t_B+t_A)/2) inform( A, B, v_fin_A, time, t_A, e_SS_A)
do( A, v_fin_A, time)
END
Descripción: una vez el guía inicia la comunicación, el otro agente busca entre sus capacidades si puede cumplir con los requisitos dinámicos y, en caso afirmativo, confirma la decisión. En caso contrario simula su comportamiento con los datos provistos por el agente guía y le confirma que puede realizar la acción pero con un tiempo y una precisión diferentes. Basándose en esta información, el guía decidirá si ejecutar o negociar la acción decidida. Aunque el mecanismo de negociación podría llegar a ser muy complejo, se ha optado por una solución de compromiso: cuando los tiempos de respuesta son inadecuados se impone un tiempo promedio para ambos.
V. RESULTADOS
En el ejemplo que se muestra a continuación se utilizarán dos móviles en formación de convoy cuyas velocidades quedan representadas a partir de las siguientes funciones de transferencia:

En una primera simulación se supondrá que el agente ‘A’ toma sus decisiones de forma completamente independiente del agente ‘B’ lo cual conllevará a situaciones de riesgo o incluso de peligro inminente. Por falta de espacio en este artículo, en este apartado sólo se mostrará una situación de peligro. Al final, se verá cómo mejora el resultado de la simulación con la introducción del mecanismo de negociación mostrado en la sección anterior.
Supongamos que el agente ‘A’ decide frenar hasta una velocidad de 2m/s menor a la actual. Por su parte el agente ‘B’ deberá reaccionar a dicho cambio de velocidad y mantener la especificación estática de la distancia de separación entre ambos (D>0). Es decir, estamos suponiendo un comportamiento reactivo de ambos agentes.
Se puede comprobar que la decisión de frenar no se podrá llevar a cabo debido a una colisión previa a la llegada al estado estacionario de los dos sistemas (véase la figura 2). Tal y como se ha discutido anteriormente, las causas de dicha colisión están en las diferencias entre las dinámicas de los dos sistemas, representadas por medio de sus funciones de trasferencia. Por otra parte, desde un punto de vista agente, el agente ‘B’ debe ser consciente (deberá tener la capacidad de introspección) de sus limitaciones dinámicas e informar de ello al otro agente.
Por lo tanto, la decisión de frenar ya no es responsabilidad exclusiva ni del agente ‘A’ ni del ‘B’, sino de los dos.

FIG. 2: RESPUESTA DE AGENTES REACTIVOS. HAY COLISIÓN.

Para obtener resultados con agentes deliberativos, la decisión deberá ser cooperativa, es decir, tendrá que haber intercambio de información entre los dos agentes. Sin embargo, para definir el método de negociación entre ambos se debe especificar sobre qué se negocia. En este experimento la negociación busca un acuerdo con el parámetro t (véase ‘Descripción’ en la sección anterior) y con la ayuda del parámetro certainty: cuando una capacidad ofrece un certainty inferior al 0.6 se inhibe. Cabe comentar en este punto que la negociación podría ser más elaborada con el uso de más parámetros (dinámicos y estáticos).
A continuación se presenta la misma situación que en el caso anterior pero resuelto con agentes deliberativos: el agente ‘A’ decide parar con t=0.3s. Después que el agente ‘B’ inhiba la ejecución a causa de su baja fiabilidad (0.52), acuerdan un valor t=0.6s puesto que ‘B’ obtiene una fiabilidad superior (0.77). Puede observarse que la distancia no se anula, es decir, se evita la colisión (véase la figura 3).

FIG. 3: RESPUESTA DE AGENTES DELIBERATIVOS. SE EVITA LA COLISIÓN.

En este ejemplo cabe comentar que los controladores son los mismos que en el caso reactivo sólo que con una mejor decisión mutua entre agentes.
Los resultados que se han mostrado son suficientes para comprobar que el funcionamiento del sistema en su conjunto es mejor. De hecho, la mejora de la decisión se debe al hecho que los agentes se intercambian información relativa a la fiabilidad sobre sus capacidades, es decir, pueden suministrar un índice de bondad de la decisión.

FIG. 4: EVOLUCIÓN DE LA DISTANCIA DE LOS AGENTES

FIG. 5: VELOCIDAD DE LOS AGENTES EN EL CASO DE DECISIÓN REACTIVA

FIG. 6: VELOCIDAD DE LOS AGENTES EN EL CASO DE DECISIÓN DELIBERATIVA

VI. CONCLUSIONES Y TRABAJO FUTURO
El análisis de los sistemas reales multi-robot en trabajo cooperativo presenta una gran cantidad de problemas si se hace hincapié en los fenómenos físicos del sistema global (agentes + entorno). La motivación de este trabajo es poder establecer un método de diseño que tenga en cuenta que el sistema en conjunto está formado por subsistemas autónomos con dinámicas propias. En la Universitat de Girona se está llevando a cabo un proyecto con micro-robots móviles reales que cooperan para desplazar objetos grandes. En este caso los problemas ocasionados por la dinámica de los robots son mucho más complejos que los presentados en este trabajo ya que hay colisiones de diversos tipos y cambios en las dinámicas de los robots cuando entran en contacto con los objetos que han de empujar. El presente estudio permite asegurar que la introducción de parámetros dinámicos en el conjunto de las capacidades del agente permite obtener un mejor comportamiento colectivo de los robots.
Aunque hay más problemas a analizar como los derivados de las limitaciones que impone la ejecución del proceso en tiempo real, queda por analizar qué tipos de parámetros deberían usarse para sistemas con componentes no lineales o con dinámicas de orden superior. En este caso la negociación debería ser más compleja puesto que el sistema podría ser no homogéneo ya que las dinámicas no serían comparables.
VII. AGRADECIMIENTOS
Este trabajo ha sido parcialmente financiado por el proyecto CICYT TAP98-0955-C03-02 "Diseño de agentes físicos (DAFne)" del MEC y por 1999GR00161 "Grup de Recerca Consolidat de Sistemes Integrats" CIRIT del Gobierno Catalán.
VIII. REFERENCIAS
R.Beckers, O.E.Holland, and J.L.Deneubourg. "From Local to global tasks: Stigmergy and collective robotics". Proc. A-Life IV. MIT Press, 1994.
McFarland. "Towards robot cooperation". Proc. Simulation of Adaptive Behavior, 1994.
L.Steels. "A case study in the behavior-oriented design of autonomous agents". Proc. Simulation of Adaptive Behavior, 1994.
Y. Shoham. "Agent-oriented programming". Artificial Intelligence, 60:51-92, 1993.
J.Ll. de la Rosa, A. Oller, J.Vehí, J.Puyol. "Soccer Team based on Agent-Oriented Programming". Robotics and Autonomous Systems. 21 (1997) 167-176. Ed. Elsevier B.V. 1997.
A. Oller, J.Ll. de la Rosa, R.García, J.A.Ramon, A.Figueras. "Micro-Robots Playing Soccer Games: A Real Implementation Based on Multi-Agent Decision-Making Structure". Intelligent Automation and Soft Computing. En prensa. 1999.
VIII. CURRICULA

Josep Lluís de la Rosa es Doctor en Ciencias por la Universidad Autónoma de Barcelona desde 1993 http://eia.udg.es/~peplluis. Tras consecutivas estancias pre y postdoctorales en Francia, es el director del grupo de investigación eXiT, Ingeniería de Control y Sistemas Inteligentes, desde el 1994, http://eia.udg.es/exit. Sus trabajos más conocidos son los agentes físicos y especialmente los relacionados con los desarrollos con robots futbolistas, http://rogiteam.udg.es

Josep Antoni Ramon es Ingeniero en Automática por la Universidad Politécnica de Cataluña. Actualmente es profesor asociado. La investigación se centra en el ámbito de la supervisabilidad de los agentes físicos mediante técnicas de sistemas híbridos.