The Case-Based Reasoner described in Learning to Win is slightly different than others. It is built around the idea that the game can be in 20 states. These states are defined by what buildings have been constructed. These states are organized in a lattice network based on what states can follow from a previous state. The following figure is the lattice as shown in the Learning to Win article:
Each state has associated with it a set of tactics to be used in that state. In the paper, there are 8 different tactics included in this (as indicated by a-h in the figure), each designed to counteract a specific opponent strategy. I will be starting out with the same number of tactics as there are states which follow from the current state; so for instance state one will have 3 different tactics. Implementing the other tactics is a later project goal, and I hope to be able to receive these tactics from the original authors.
The CBR has the responsibility of selecting which tactic to use while the game is in the current state. Only one tactic will be used per state per play through. Upon entering the state, the CBR must evaluate the current state of the game, and decide which tactic it will use while in the state. Once it decides, the tactic does not change until it enters the next state.
When the CBR first begins playing, it will be in an 'exploration mode'. This means that during this time it will choose each tactic x amount of times to get an idea of how each one performs. Once the exploration mode has finished, it will begin evaluating each state normally.
The normal workflow of the CBR selecting a tactic for a state is as follows:
The normal workflow of the CBR selecting a tactic for a state is as follows:
- The CBR enters a new lattice state.
- The CBR receives information about the current state of the game (such as number of workers, combat units, etc.)
- The CBR uses the the current game state to query its database of cases associated with the current lattice state, and returns a set of the k closest matches using a k-nearest neighbor algorithm. It does this by using the information about the game state as an 8 dimensional coordinate, and then checks it against the previous game state cases that were used by using the distance formula.
- The CBR selects the case with the highest performance rating.
- The CBR implements the tactic in the case.
- The CBR moves on to other cases.
- At the end of the game, the CBR saves the cases it used by either creating new cases if the exact cases did not exist, or updating the performance of the cases if they did exist.
If in step 4 the CBR finds that none of the nearest neighbors have a performance rating of above .5, the CBR will enter exploration mode for this step and try one of the least used tactics.
The jCOLIBRI CBR was not set up to handle cases in this way, and thus the idea for BARC (Barton CBR) was born. BARC was designed to fit these criteria:
- Organize tactics and cases by state
- Save cases for later use
- Update the performance of a case when it has been selected
- Have an 'exploration mode' for when the CBR first starts.
- Enter 'exploration mode' when a none of the retrieved cases have a performance above .5.
- Create new cases.
- Retrieve k-closest cases from the knowledge base
- Select case with best performance from the k-closest cases.
- Send selected tactic to game.
- Select 1 case per state.
More specific design and implementation notes on BARC will follow in the coming posts.

No comments:
Post a Comment