Title Page
1a: Technical Paper (html) 1b: Technical Paper (LaTeX) 1c: Technical Paper (pdf) 2: References 3: Sample Runs 4a: CA Code 4b: GA Code 5: Technical Paper Reading 6: Project Description 7: Oral Report 8a: Daily Logs 8b: Bi-Weekly Goals 9: Scientific Method 10: Tutorial 11: Next Year 12: Web Portfolio |
Solving the Majority Classification Problem                                                                                    by Austin RachlinABSTRACT      The Majority Classification Problem deals with a test array with binary cells that are randomly turned either on or off at the beginning of the program. The test array has an odd number of cells so that a majority of the cells are either turned on or off. A rule array is created that defines how each cell of the test array should evolve in each step of the cellular automata. The purpose of the rule array is to turn the entire test array either on or off, whichever is the majority in the original test array. The goal of this problem is to create an algorithm that will produce an array that will turn a high percentage of test arrays toward their majorities. ALGORITHMS    The Majority Classification problem requires two main components: a cellular automata and a genetic algorithm.     Cellular Automata        The CA is responsible for the initialization of all arrays and termination of the program. The CA drives the program and defines the interactions between the test and rule arrays. The CA randomly creates 100 rule arrays 1000 test arrays. Each rule is performed on each test array. Because a 7-neighbor system is used, the specific cell and the four cells on either side of it are converted into a binary number that relates to a position on the rule array. If that position of the rule array is on, then the specific cell in the test array is turned on for the next step; otherwise it is turned off. The system is run until all of the cells of the test array are in the same state or if the rule has run for 1000 steps without turning the test array to an absolute state. The success rate of every rule is recorded and stored with the rule.     Genetic Algorithm        The GA is responsible for "breeding" the rule arrays to produce resulting arrays that are better than the previous generations. The rule arrays are combined in a fashion similar to DNA reproduction in offspring. There are many options available to determine what combinations of factors works best in this problem. The first step is determining the fitness of each rule. This is already done in the CA, which records the success rate of each rule. A higher fitness rating means that the rule has a higher probability of having its "genes" survive until the next generation. Next, mating must occur. Arrays are randomly chosen, with a bias given to arrays with higher fitness ratings. Two arrays can be joined in any of these fashions: one-point crossover, two-point crossover, or one-for-one selection. One-point crossover cuts the two arrays at a random spot and combines them there. Two-point cuts the arrays at two spots and splices them at those points. One-for-one selection randomly picks which array to take each gene from. There are other options to supplement the GA. For instance, varying levels of mutation can be added to ensure the introduction of new genes. Also, elitism can be allowed to ensure the survival of the best or the top few rules. The goal of the GA is to find the best combination of all these factors to breed the best rule. I use 100 generations to develop a rule BASIC PROBLEM      This specific variation of the problem deals with test arrays that are 149 cells long. Because a 7-neighbor system is used (the specific cell and four cells on either side are observed to determine the resultant cell), the rule arrays are 128 cells long to cover every possible combination of test cells. Wrap-around is used when considering cells close to either end of the test array: if there aren't enough cells on one side, cells from the other end of the array are used. If a rule array fails to correctly evaluate a test array in 1000 steps, that particular run is considered a failure. Lastly, the final and best array is tested on a test space of 1,000,000 test arrays. SAMPLE RUN| TechLab Page | |