|Brückner, Sven: Return From The Ant Synthetic Ecosystems for Manufacturing Control |
The GMC system presented in Chapter 4 accepts a strategy and tries to implement the required material flow pattern. The following chapter proposes detailed concepts for the automatic evaluation of the performance of a strategy in terms of global production goals and concepts for the automatic generation of new strategies (Section 5.1). Furthermore, an integrated approach to criticality-driven visualization is presented (Section 5.2).
The purpose of Chapter 5 is to provide a starting point for future research. Even though there are agents and pheromones proposed in the following sections already, there has to be more research invested to realize and validate the concepts.
System operators run complex production systems like a paint-shop based on flow-patterns. The basic tasks are executed by local control (manual and automatic) at the shop floor. The goal of the operators is to provide the global perspective and guide and coordinate the local task execution. Their advice is given in terms of local patterns (e.g., At the switch XYZ, x1 percent of all spot-repair cars go to exit one and the rest goes to exit two). The local control has to translate the advice into control decisions for the single car.
The GMC system enables the operators to run the production system as they did before. The system takes care of the automatic translation of the advice into control decisions according to the current situation in the plant. The control system makes effective use of the flexibility and robustness of a production system and, at the same time, it provides the user with an intuitive way of interfering with the operation.
An automatic on-line optimization adds more layers to the advisory system, supporting an operator by selecting and even generating strategies. Instead of just executing a given strategy, the extended system tries to evaluate the effectiveness of the strategy in achieving production goals. Learning from past experience, the system accumulates strategies and their situation-specific evaluation. The space of possible strategies is searched for better performing ones at the highest layer of the advisory system.
An approach to the evaluation of the currently implemented strategy in terms of production goals is presented first. The strategy evaluation layer is the second layer of the advisory system placed on top of the strategy implementation layer. A third layer is introduced to realize strategy ranking and generation.
With the introduction of the strategy evaluation layer multiple strategies are handled by the advisory system. One of these strategies is the currently implemented one while all the other strategies are passive strategies. In the following, the currently implemented strategy is called the current one for short.
The activities of the Policy-agents in their attempt to implement their goals change the
104execution of the production. The consequences may be positive or negative in terms of the usually globally defined production goals. The requirements specified by these goals have to be met by the GMC system as a whole and not by a single agent. Since the behavior of the system emerges from the actions of its agents, the evaluation of the goal fulfillment must emerge too.
A high global throughput of the production system is the most important goal in the paint shop. Global throughput is measured in the number of workpieces leaving the production system successfully processed in a fixed period. Therefore, it is sufficient to place simple agents at the exits of the system and have them count all passing Workpiece-agents. Other production goals require other methods of measurement.
When the current goal fulfillment is accessible to the agent system, the credit assignment problem is still left to be solved. The perceived output is a result of the current material flow. But, does the observed pattern match the goal of the current strategy? In other words, is the currently implemented strategy to be given credit for the current production goal fulfillment? Implementing its own local aspect of the strategy, each Policy-agent perceives the local difference between the current material flow pattern and its goal pattern. According to the perceived difference, the agent changes its advice. Thus, a Policy-agent not changing its advice must have met its goal.
The proposed emergent evaluation of a strategy in terms of production goals is related to resource-based approaches in multi-agent coordination. Each strategy in the advisory system is assigned an account. The account of the current strategy is filled continuously according to the fulfillment of production goals, and it is reduced by Policy-agents adapting their advice.
In terms of production goals, strategies are successful if their account rises and they are unsuccessful when it falls. The degree of change in the account of a strategy provides a rough estimate of the quantitative evaluation of the strategy. But, it is not more than an estimate since the proposed resource-based approach operates on a very high level of abstraction. Figure 5.1 illustrates the relations among the elements of the resource-based evaluation of a strategy.
Figure 5.1. Feedback Loop in the Resource-Based Strategy Evaluation
The implementation of the resource-based approach to the evaluation of a strategy in the example of global the throughput production goal requires two additional agent types. There are Tollbooth-agents, translating the local goal fulfillment into an input to the account of the current strategy, and there are Strategy-agents, representing a strategy and managing its account. Furthermore, the specification of a Policy-agent is extended to transmit its resource-usage when it changes its advice.
A Tollbooth-agent provides the resource input to the current strategy. A resource-input is directly linked to the fulfillment of the goal of high global throughput. The agent type derives its name from the fact that Workpiece-agents meeting a Tollbooth-agent have to pay a toll if they represent a finished workpiece. A Tollbooth-agent is usually co-located with an Unloader-agent.
After it is set up, a Tollbooth-agent requests from its Place-agent a notification when a Workpiece-agent arrives at the place (Section 3.3.2). Upon receiving such a notification, the Tollbooth-agent contacts the Workpiece-agent, requesting the processing state of the workpiece. If the state matches one of the specified final states, the Tollbooth-agent provides an input to the account of the agent of the current strategy by sending a message.
The current Strategy-agent is known to the Tollbooth-agents by reference. The restriction to local communication only is broken to keep the design simple. The interaction does not take up much bandwidth in global communication.
A Strategy-agent represents one strategy. If the strategy is the current one, its implementation is attempted by its Policy-agents. Each strategy has its own set of Policy-agents, which are known to the Strategy-agent by reference.
The account of the strategy is handled by its Strategy-agent. The agent receives positive inputs to the account from Tollbooth-agents. The sequence of inputs represents the current level of the throughput-goal fulfillment. The account is reduced with every change of advice by one of the Policy-agents. Every time a Policy-agent changes its advice, the strength of the change is used to compute the resource usage. In a message, the Policy-agent tells its Strategy-agent by how much the account is to be reduced.
To cooperate within the strategy evaluation layer, Policy-agents have to extend their cyclic adaptation of their advice. After the required changes are computed, the resulting resource usage is determined. The resource usage is proportional to the absolute strength of the change. A Policy-agent sends the amount of the resource usage to its Strategy-agent.
Also the behavior of the Workpiece-agents is extended, permitting the Tollbooth-agent to request the current processing state of the workpiece.
The strategy ranking and generation layer supports the human operator in the selection of the currently appropriate strategy. Its first task is to provide a ranking of a given set of strategies according to the current situation in the production process. The second task is to explore the space of possible strategies and to change the currently available ones.
The agents realizing the adaptive ranking of a set of strategies are organized in a spreading activation network called Strategy-Ranking-Net. The abstract model of the net comprises three types of nodes (Figure 5.2). There are nodes matching the places in the PI, there are nodes representing the elements in a classification of the pattern segments at each place, and finally there are nodes for each strategy.
A classification of the locally perceivable segment of a material flow pattern is required, because potentially, there may be an infinite number of patterns. A (small) set of pattern classes permits reasoning on a range of patterns. A local classification should be designed to assign the most often encountered patterns in the actual production system to different classes.
The nodes of the places are linked to the nodes of the local pattern classes. The links are not weighted. Each pattern class node is linked to each strategy. These links are labeled. The label, a real number between zero and one, represents the current applicability of a strategy considered for a specific place and pattern class.
In the network model, a ranking of a set of strategies specific to one flow pattern is realized by a spreading activation starting at the nodes of the places. For each place the respective node of a pattern class that is assigned to the considered flow pattern is activated. From there, the activation spreads via the labeled links to all strategy nodes. This last step sees a reduction of the propagated strength according to the applicability a strategy to the flow pattern. The label of the link represents the applicability.
The activation arriving at a strategy node adds up into an internal activation value. After the propagation process is completed, the ranking of the strategies is given through their activation values. The largest activation value indicates the highest rated strategy for the considered flow pattern.
The network model is adapted over time by changing the relevance labels according to the performance of a strategy in the face of the current flow pattern. The performance evaluation considers production goals.
In Figure 5.2 the abstract model of the Strategy-Ranking-Net is shown. Shaded in the background the mapping of the entities of the model to agents is indicated.
Figure 5.2. Strategy-Ranking-Net Model
The model of the Strategy-Ranking-Net is closely related to the Case-Retrieval-Net model, which is applied in case-based reasoning for efficient organization of a case base. The Case-Retrieval-Net has been developed at Humboldt-University Berlin [Lenz and
107Burkhard, 1996a] [Lenz and Burkhard, 1996b] and it is applied to a number of practical problems (e.g., technical diagnosis in [Lenz et al., 1996]).
The information-entity nodes (attribute-value pairs describing one aspect of a case) of a Case-Retrieval-Net relate to a pair of one place node and one pattern-class node, whereas the case nodes equal the nodes of strategies. With all case-descriptions being structurally the same (each place-pattern pair is linked to each strategy), the cases are only distinguished in the specific relevance of a strategy for a pattern. The basic retrieval process in a Case-Retrieval-Net operates in the same way as in a Strategy-Ranking-Net.
The adaptive Strategy-Ranking-Net is realized by the interaction of three agent types. There are Match-agents representing the place-pattern pairs for one place in the infrastructure. Then, there are Strategy-agents fulfilling the role of a strategy node in the model. Finally, there are Policy-agents providing evaluation information for the adaptation of the ranking.
The relevance labels are adapted continuously according to the current performance of a strategy. Furthermore, there may be requests for a ranking of the currently available strategies coming from a user at any time.
A Match-agent is able to perceive and to classify the local segment of the current flow pattern at its place. Furthermore, the agent manages the currently assumed relevance of each available strategy for each local pattern class. The Match-agent knows the strategies by reference to the Strategy-agents.
When a ranking of the currently available strategies is requested from a Match-agent, it accesses the current flow pattern and maps it to one of the pattern classes. The pattern class provides the agent with the current relevance of each strategy. Each Strategy-agent is sent an activation value computed by the multiplication of the relevance value and the activation specified in the request. Finally, the sender of the request is notified that the task of the Match-agent is fulfilled.
The Strategy-agents add up the received activation values in an internal variable. When their activation is eventually requested, they pass it on and reset the variable to zero.
The following steps have to be taken to retrieve the ranking of strategies for the current flow pattern. After making sure each Strategy-agent carries zero activation, all Match-agents are requested to spread activation according to their adapted evaluation. When confirmations have been returned from all Match-agents all Strategy-agents are asked for their activation. The strength of the activation determines the ranking of the strategies.
To realize the adaptation of the relevance data held by the Match-agents, the gap between the changes of the account handled by the Strategy-agents and the Match-agents has to be bridged. The reinforcement learning approach requires the Match-agents to know the performance evaluation of the currently implemented strategy. The evaluation is generated resource-based and it presents itself in the change of the account of the strategy. An increasing account indicates a good strategy; a decreasing account is attributed to a bad strategy.
The performance of the current strategy is transmitted to the Match-agents using a pheromone type Evaluation (PE). Inputs to PE do not propagate. PE carries one additional data slot referencing a Strategy-agent. Policy-agents implementing a strategy perceive the change of the account of their strategy over time. The PE pheromone of the current strategy is refreshed proportional to the strength of the change.
108The behavior of the Policy-agents is extended by one more cyclic process. A Policy-agent regularly accesses the current status of the account. Comparing the current status to the one retrieved in the previous cycle, the Policy-agent determines the refresh strength to be used from the difference of these two values. After the new account status has been stored for the use in the next cycle, the Policy-agent refreshes the PE pheromone matching its strategy.
Using a pheromone-based transmission of the evaluation, Policy-agents are decoupled from Match-agents. The separation is necessary because the agents operate in different time-scales. Furthermore, the computational power of the agent environment is tapped since the strength of the PE pheromone matching the current strategy approximates the change of the account over time (first deviation).
A Match-agent runs an additional cyclic process to realize the adaptation of the relevance data. In regular intervals, the strength of the PE pheromone of the current strategy is sampled. According to the perceived strength, the relevance label of the link from the current pattern class to the strategy is changed by a very small value. Positive pheromone strength reinforces the links. Negative strength weakens future propagation of activation.
Whereas the spreading activation network of Match-agents and Strategy-agents ranks the current set of strategies, a further extension generates new strategies, realizing an ongoing exploration of the infinite search-space of strategies.
An evolutionary search for strategies takes the fitness of the currently available strategies into account. New strategies are generated from the recombination of fit strategies including mutational changes. Strategies are deleted from the system if their fitness is low.
The fitness of a strategy is based on the relevance to the current flow pattern in the production system as perceived by Match-agents. But while Match-agents only perceive the local relevance of a strategy, the Incubator-agent considers its global fitness. The Incubator-agent is the central element of the evolutionary strategy generation. It is not located at any of the places in the PI. Instead it communicates directly with all local Match-agents. At regular intervals the Incubator-agent requests the complete mapping of pattern classes to strategy rankings from each Match-agent. The returned data is aggregated into one numerical fitness value for each currently available strategy.
In a probabilistic selection, the Incubator-agent chooses strategies for recombination and strategies for extinction. Strategies with a higher than average fitness have a higher probability of being selected for recombination. Weaker strategies are more often selected for extinction. The recombination of strategies into a new one may operate on different levels of detail. The new strategy could be a selection of pattern segments from the parent strategies (high level recombination). Or the local load requirements may be recombined (low level recombination). Mutation only changes load values.
In one Incubator-agent cycle there is always the same number of extinct strategies as there are new ones generated. The Incubator-agent executes a genetic algorithm with a constant population size. The representation of the genetic code is not a bit-vector, but a hierarchy of places and patterns.
109To ensure a stable operation of the advisory system, the Incubator-agent runs at a much slower rate then the other agents. Strategies need time to be implemented and evaluated in different scenarios before they have a meaningful fitness value. Furthermore, there should be a basic set of strategies defined by the operator, which the Incubator-agent is not permitted to select for extinction.
Visualization is another important issue in the extension of the GMC system. With the distribution of the actual control over the production system, there is no central point of access of data on the current operation. Furthermore, because of the complexity of the ongoing operation human operators are not able to understand and influence the processes when provided only with raw data. Aggregation and filtering is required.
In the following, a third sub-system is proposed. The visualization system operates on information provided by the control system, by the interface layer, and by the advisory system. The visualization system provides an interface to the human user. Its agents generate specific local and global views on the current state of the productions system, providing the user with up-to-date information, aggregated and presented in a human-friendly fashion.
Presenting the human operator with up-to-date information focused on current points of interest in location and aggregation is a challenging task, most of today's state-of-practice systems are not yet fully up to. The pheromone-enhanced synthetic ecosystems approach presents the designer of the system with an opportunity to integrate high-quality visualization seamlessly.
The different components of the GMC system generate a lot of locally available data on the current operation of the production process. The available information concerns different levels of aggregation (workpieces, flows, strategies), and different time scales (control system versus advisory system). There is data representing the past, present, and an approximation of the future of the production process. Some of the available data is directly accessible in specific pheromones (e.g. processing capabilities). Other information has to be requested from the agents (e.g. strategy evaluation).
The visualization system operates in two modes. In the surf mode, the user specifies the current point of interest (focus) and the required abstraction. In analogy to the Internet, the operator surfs the information in the production system. The user may point the visualization to different sites (places) and access specific pages (local information / local view) there.
Figure 5.3. Pheromone-Based Focusing in the Visualization System
110The second mode of visualization is the auto-pilot mode. The visualization system automatically relocates the focus of the user towards critical situations when triggered by events inside the other layers. The user initially gives the criteria for criticality. An adaptive visualization system re-evaluates its criteria constantly, analyzing the reaction of the user to the presented view.
Figure 5.3 illustrates the proposed realization of the visualization system. One possible focus of the view of a user into the production system is represented by a Focus-agent. Focus-agents constantly move through the PI. The pheromone type View (PV) transmits attracting forces that guide the movements of the Focus-agents. Whenever a Focus-agent finds a point of interest, it triggers the user interface to focus on the current place of the agent.
The generation of the local view on the production process is the task of the View-agents. For each place in the PI there may be a View-agent. A View-agent accesses the available information and it aggregates this information into a set of status documents. When a Focus-agent migrates to a place of a View-agent and if the local criticality level is sufficiently high, then the Focus-agent transmits the documents of the View-agent to the user interface.
It is the task of the View-agents to gain the attention of the user in the auto-pilot visualization mode. View-agents generate attracting forces influencing the movement of the Focus-agents. The generation of the forces is based on the criteria of criticality of the respective View-agent. The more critical a local situation appears to a View-agent, the stronger is the interest of the agent to attract Focus-agents. In very urgent situations the View-agent spawns a new Focus-agent if it is not able to attract one of the existing ones in time.
There are several advantages of approaching the visualization of the system in such a distributed manner. The operation remains decoupled from the monitoring, communication of the process status is reduced to points of interest, and data is aggregated according to the requirements of the user.
In comparison to state-of-practice visualization systems, the main advantage of the approach is the guidance of the focus of the user on the basis of critical situation previously defined by the user or even learned by the system. The user is always provided with a ranking of the criticality of different foci. Based on its own experience the user selects a view from the presented ranking. Thus, the system may re-evaluate its ranking parameters and learn the preferred view on a place.
In addition to the display of the operation, the visualization system may also present the user with suggestions for strategies to be implemented according to the results of the strategy ranking and generation layer. In this case, the user may enter new strategies and change existing ones. Or, the user may change their relevance for specific flow patterns. Finally, the user may require the system to begin implementing a different strategy. Thus, the user operates on a high-level abstraction of the production process.
© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 2.0||
der Humboldt-Universität zu Berlin
|HTML - Version erstellt am:|
Fri Jun 15 12:30:34 2001