Lexicase Selection

thelmuth · July 19, 2015, 11:25pm

Lexicase selection is a parent selection mechanism invented by @lspector and studied extensively by him and @thelmuth (and others such as @mcphee) in various papers and in @thelmuth’s dissertation. Lexicase selection is a behavior-based selection approach that does not aggregate error values on test cases into a single fitness value. Instead, it considers test cases individually, placing more emphasis on some cases than others.

Here is some pseudocode for the lexicase algorithm:

Initialize
a. Set candidates to be the entire population of programs.
b. Set cases to be a list of all of the test cases in the training set in random order.
Loop
a. Set candidates to be the subset of the current candidates that have exactly the best performance of any individual currently in candidates for the first case in cases.
b. If candidates contains just a single individual then return it.
c. If cases contains just a single test case then return a randomly selected individual from candidates.
d. Otherwise remove the first case from cases and go to Loop.

The lexicase selection algorithm first randomly orders the test cases from the training set. It then eliminates any individuals in the population that do not have the best performance on the first test case. Assuming that more than one individual remains, it then loops, eliminating any individuals that do not have the best performance of the remaining individuals on the second test case. This process continues until only one individual remains and is selected, or until all test cases have been used, in which case it randomly selects one of the remaining individuals.

Lexicase selection sometimes selects individuals that perform well on a relatively small number of test cases, even if they perform very poorly on other cases. This differs from most other selection algorithms, which select individuals based on aggregations of performance on all test cases into a single scalar fitness value. As such, lexicase often selects specialist individuals that solve parts of the problem extremely well, as opposed to tournament selection and implicit fitness sharing (IFS), which select generalist individuals that have good performance on average across the test cases. Although these individuals may have worse summed error across all test cases, the hope is they will be able to reproduce in ways that pass on their preeminence on certain cases while improving with respect to others. In order to give every test case equal selection pressure, each lexicase selection event uses a randomly shuffled list of test cases to determine which test cases are treated as most important.

Time Complexity

The theoretical worst-case time complexity of the lexicase selection algorithm for selecting parents each generation is $O(P^2T)$, where $P$ is the population size and $T$ is the number of test cases. In comparison, traditional tournament selection must sum the errors from every test case for every individual, giving a time complexity of $O(PT)$. While lexicase selection is theoretically slower in the worst case, in practice it often quickly eliminates many candidates and does not need to loop over every test case, leading to better running times. Additionally, if lexicase selection allows us to more often solve problems than other selection methods, it may be preferred even if it runs slower than those methods. In practice, we have found that lexicase is about 2 to 10 times slower per generation than tournament selection.

thelmuth · July 19, 2015, 11:30pm

Publications

Spector, L. 2012. Assessment of Problem Modality by Differential Performance of Lexicase Selection in Genetic Programming: A Preliminary Report. In Companion Publication of the 2012 Genetic and Evolutionary Computation Conference, GECCO’12 Companion. ACM Press. pp. 401-408
Helmuth, T., and L. Spector. 2013. Evolving a digital multiplier with the PushGP genetic programming system. In Companion Publication of the 2013 Genetic and Evolutionary Computation Conference, GECCO’13 Companion. ACM Press. pp. 1627-1634.
Helmuth, T., L. Spector, and J. Matheson. 2014. Solving Uncompromising Problems with Lexicase Selection. In IEEE Transactions on Evolutionary Computation. DOI: 10.1109/TEVC.2014.2362729.
Helmuth, Thomas, 2015. General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases. Doctoral Dissertations. 465.
https://scholarworks.umass.edu/dissertations_2/465
Helmuth, T., and L. Spector. 2014. Word Count as a Traditional Programming Benchmark Problem for Genetic Programming. In Proceedings of the 2014 Genetic and Evolutionary Computation Conference, GECCO’14. ACM Press. pp. 919-926.
Helmuth, T., and L. Spector. 2015. General Program Synthesis Benchmark Suite. In Proceedings of the 2015 Genetic and Evolutionary Computation Conference, GECCO’15. ACM Press. pp. 1039-1046.
Liskowski, P., K. Krawiec, T. Helmuth, and L. Spector. 2015. Comparison of Semantic-aware Selection Methods in Genetic Programming. In Companion Publication of the 2015 Genetic and Evolutionary Computation Conference, GECCO’15 Companion. ACM Press. pp. 1301-1307.
Helmuth, T., N. F. McPhee, and L. Spector. 2016. Lexicase selection for program synthesis: a diversity analysis. In Genetic Programming Theory and Practice XIII, edited by R. Riolo, W. Worzel, M. Kotanchek, and A. Kordon, pp. 151-167. New York: Springer.
McPhee, Nicholas Freitag, Donatucci, David, and Helmuth, Thomas. 2016. Using graph databases to explore the dynamics of genetic programming runs. In Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation. Springer.
Helmuth, T., N. F. McPhee, and L. Spector. 2016. The Impact of Hyperselection on Lexicase Selection. In Proceedings of the 2016 Genetic and Evolutionary Computation Conference, GECCO’16. ACM Press. pp. 717-724. Nominated, Best Paper Award, Genetic Programming Track.
La Cava, W., L. Spector, and K. Danai. 2016. Epsilon-lexicase Selection for Regression. In Proceedings of the 2016 Genetic and Evolutionary Computation Conference, GECCO’16. ACM Press. pp. 741-748.
Helmuth, T., N. F. McPhee, and L. Spector. 2016. Effects of Lexicase and Tournament Selection on Diversity Recovery and Maintenance. In Companion Publication of the 2016 Genetic and Evolutionary Computation Conference, GECCO’16 Companion. ACM Press. pp. 993-990.
La Cava, W., T. Helmuth, L. Spector, and J. H. Moore. 2018. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. In Evolutionary Computation. (arXiv preprint, MIT Press, PubMed)

thelmuth · July 20, 2015, 3:22pm

Source Code

The implementation of lexicase selection in Clojush can be found here.

williamlacava · July 21, 2015, 5:03pm