Better GP Benchmarks: Community Survey Results and Proposals: David R. White, James McDermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaskowski, Una-May O’Reilly, and Sean Luke. Genetic Programming and Evolvable Machines 14:1 (3-29), 2013. (This is an authors’ preprint, with minor corrections compared to published version.)
Abstract. We present the results of a community survey regarding genetic programming (GP) benchmark practices. Analysis shows broad consensus that improvement is needed in problem selection and experimental rigor. While views expressed in the survey dissuade us from proposing a large-scale benchmark suite, we find community support for creating a “blacklist” of problems which are in common use but have important flaws, and whose use should therefore be discouraged. We propose a set of possible replacement problems. [BibTeX]
Genetic Programming Needs Better Benchmarks. James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una-May O’Reilly. GECCO 2012, Philadelphia, USA. ACM. (This version corrects a few errata in the conference-published version: changes are enclosed in rectangles within the paper)
Abstract. Genetic programming (GP) is not a field noted for the rigor of its benchmarking. Some of its benchmark problems are popular purely through historical contingency, and they can be criticized as too easy or as providing misleading information concerning real-world performance, but they persist largely because of inertia and the lack of good alternatives. Even where the problems themselves are impeccable, comparisons between studies are made more difficult by the lack of standardization. We argue that the definition of standard benchmarks is an essential step in the maturation of the field. We make several contributions towards this goal. We motivate the development of a benchmark suite and define its goals; we survey existing practice; we enumerate many candidate benchmarks; we report progress on reference implementations; and we set out a concrete plan for gathering feedback from the GP community that would, if adopted, lead to a standard set of benchmarks. [BibTeX]
Slides from GECCO 2012.
These slides were presented on 09/07/2012 at GECCO in Philadelphia, PA. The images are not always self-explanatory without their accompanying narrative, but the presentation is provided for reference. For a more detailed overview, please see the full paper.
We also publish data as follows.
A survey of the problems used as benchmarks in the EuroGP conference and the GP track of the GECCO conference, during the years 2009-2012, was carried out. The raw data is available here.
Data from the 2012 Community Survey is also available: