Thanks to all those who attended and contributed to the lively debate at GECCO 2012 in Philadelphia. For reference, and for the benefit of those who didn’t attend the conference, below is a list of the points made during the discussion. Each of these points was made by a single attendee and certainly does not reflect the opinion of everyone in attendance!
- Past competitions could serve as a starting point for benchmark selection.
- Perhaps a shift of focus away from symbolic regression and classification would benefit the field, as these areas are well-studied in the wider field of machine learning.
- Programming and planning may be more suitable domains for Genetic Programming.
- Automated programming is a dream that was John Koza’s original vision, but appears to have fallen by the wayside. Can we reignite it?
- At least some of the benchmarks should be exceedingly difficult.
- The benchmarks must be aligned with the overall goals of the field, and therefore identifying those goals is important.
- Perhaps identifying a “benchmark blacklist” would be a good start, by preventing the use of trivial or overly used problems as benchmarks.
- Any benchmark suite must take into account how comparisons between algorithms will be made, and steps taken when selecting and specifying benchmarks to ensure that such comparisons are as fair as possible.
- Do we really need benchmarks? Or is it simply the case that we need to change our methodology?
- Preordaining benchmarks could be a bad idea. Why not allow the natural selection of benchmarks to occur?
- We should be application-driven, and therefore benchmarks should be drawn from domains where GP is in use.
- Real-world data is difficult, for a number of reasons. Some data is proprietary, for example.
- Code must be completely open in order to facilitate comparison.
- We should focus on the issues GP faces in choosing benchmarks.
- The difficulty of implementing a benchmark is an important consideration when selecting benchmarks.
- Single-objective problems are dated; some benchmarks should be multi-objective.
In addition, there was some controversy over whether comparisons should be made using fitness values or computational effort. No clear consensus has been reached on this issue in discussions on the mailing list or at the conference.