# This file contains trees, the output of a coding process carried out
# for all the free-text survey responses. For each question, we read
# the free-text responses and try to identify the essential points. We
# try to express them in a way that allows multiple responses making
# the same point to be categorised together. Each specific point is
# assigned a short codeword (written one per line, in uppercase below)
# and a definition (lowercase, following the codeword and the
# colon). All responses which make the corresponding point are then
# counted and the number of them is placed after the second colon. In
# some cases it is useful to create a hierarchy of responses, for
# grouping at a level above that of individual points. That leads to
# the tree structures below. Levels of the tree are indicated by the
# number of asterisks. Individual points are always at the leaves of
# the trees.

# The coding was carried out by David White and James McDermott. Each
# coded all the responses, then compared results, then merged them to
# form a single consensus coding.

# The coding process is a standard method for processing free-text
# responses in fields where questionnaires and similar texts are
# common:

# http://onlineqda.hud.ac.uk/Intro_QDA/how_what_to_code.php
# http://onlineqda.hud.ac.uk/Intro_QDA/phpechopage_titleOnlineQDA-Examples_QDA.php
# http://www.sagepub.com/upm-data/24614_01_Saldana_Ch_01.pdf
# http://www.southalabama.edu/coe/bset/johnson/lectures/lec17.pdf
# http://www.qualitative-research.net/index.php/fqs/article/view/1634/3154


######################################################################
Question 2
What types of problems do you use?  Please check all that apply.
######################################################################

* PROBLEM_TYPES ::
** COMPETITIONS : existing competitions :1
** SCIENTIFIC_SYMBOLIC_REGRESSION : symbolic regression on scientific data :1
** UCI_KDD : UCI and KDD data sets :2
** REINFORCEMENT_LEARNING_PROBLEMS : reinforcement learning problems :1
** RANDOMLY_GENERATED : randomly generated problems :1
** CURRENT_PROBLEM : use the problem I want to solve, not a benchmark :1
** NON_GP : Problems from outside GP :1
** REINFORCEMENT_LEARNING: Reinforcement Learning:1


######################################################################
Question 6
If yes [current experimental regime has disadvantages], what are those
disadvantages?  Please check all that apply.
######################################################################

* BENCHMARK_SELECTION::
** COVERAGE:narrow selection of problems used:
*** GENERAL : some problem classes not covered :1
*** NO_REAL_PROGRAMMING:few problems address automatic programming:1
** NO_CONSENSUS:no agreement on what constitutes a 'good' benchmark:2
** TOY_PROBLEMS:results based on trivial problems/not realistic:5
** CHOOSING_BENCHMARKS :  how should researcher choose benchmarks, given goal :2
** SCALABILITY:few benchmarks are truly scalable:1
** REPRESENTATIVE:benchmarks do not reflect real-world performance:2
** OUT_OF_DATE:problems that are at least 20 years old are being used:1
* REPORTING::
** REPLICABILITY:papers omit implementation details, inhibiting repeatability:3
* EXPERIMENTAL_METHOD::
** STANDARD_APPROACH:no standard approach to comparing methods:1
** AWARENESS:authors not aware of previous results:1
** EXPERIMENTAL_DESIGN:authors do not carry out rigorous experiments:1
** CHERRY_PICKING:only measures that show favourable results are used:1
* STATISTICS
** NO_STANDARD_PERFORMANCE_MEASURE:no agreed performance measure:5
** UNSUITABLE_PERFORMANCE_MEASURES: weak performance measures :4
** STATISTICAL_RIGOUR:lack of statistical rigour:2
** SIGNIFICANCE:no agreed way of establishing statistical significance:1
* NEED_GOLD_STANDARD:we should have a list of best values achieved on each problem:1


######################################################################
Question 7
What forms of GP do you use?  Please check all that apply.
######################################################################

* FORMS_OF_GP_USED::
** GRNs : Genetic regulatory networks :1
** LINEAR_GP : Linear GP :1
** CARTESIAN_GP : Cartesian GP :1
** OOGP : Object-oriented GP :1
** MICRO_GP : Micro GP :1
** NATIVE_CODE_TREES : Native code trees :1
** TREE_ENSEMBLES : Ensembles of trees :1
** LOGIC_PROGRAMMING_GP : Logic programming GP :1
** BAYESIAN_NETWORKS : Bayesian networks :1
** DETERMINISTIC_SR: Deterministic symbolic regression :1
** OTHER : unspecified new forms :1




######################################################################
Question 8
Which form of GP do you use most often?
######################################################################

* FORMS_OF_GP_USED_MOST_OFTEN::
** GRNs : genetic regulatory networks :1
** CARTESIAN_GP : Cartesian GP :1
** MICRO_GP : MicroGP :1
** TREE_ENSEMBLES : Ensembles of trees :1
** NON_STANDARD_TREE : non-standard tree representations :1
** STANDARD_GP : standard GP :1
** PROJECT_DEPENDENT : depends on current project :1


######################################################################
Question 13
If yes [you are in favour of a standardised GP benchmark suite], why?
Please check all that apply.
######################################################################

* IMPROVE_QUALITY::
** QUALITY:Improve quality of research:1
** YOUNG:Help young researchers improve empirical method:1
** PROGRESS:Drive the field forward:1
** REPUTATION:Improve the reputation/maturity of the field:2
** PRACTICE:Improve experimental practice:1
* PROVIDE_RESOURCE::
** EMPIRICAL_STUDIES:useful for empirical studies:1
** OTHER_DISCIPLINES:Provide a resource to other disciplines too:1
* FOCUS::
** USAGE:Enable analysis of benchmark usage:1
** ELIMINATE_OLD:Eliminate old and outdated benchmarks:1
** BEST_RESULTS:Enable us to keep track of best results:1
** TAXONOMY:Provide a taxonomy to organise problems:1
** IDENTIFY_STRENGTHS:Will identify those problems GP is good at:1
* BENEFIT_VERSUS_EFFORT::
** NO_RISK:Not a risk of over-focus on benchmarks:1
** LOW_EFFORT:Not much required for the potential benefit:1

######################################################################
Question 14
If not [you are not in favour of a standardised GP benchmark suite],
why not?  Please check all that apply.
######################################################################

* WHY_NOT_STANDARDISE
** NOT_REAL_WORLD : Benchmarks don't reflect the real-world :4
** NOT_BENCHMARKABLE: Applications aren’t always benchmarkable:1
** DICTATING_TO_THE_FIELD: Small group decides benchmarks :1
** POOR_COMPROMISE: Compromising between different types of practitioners will result in a set of benchmarks that only limits us:1
** DISTORTION : Benchmarks encourage research distortion or "teaching to the test" :1
** DIFFICULTY : Hard to create a benchmark suite satisfying everyone :1
** WRONG_FOCUS : Benchmarks miss the point -- it would be better to focus on some other aspect :
*** FOCUS_ON_OPENNESS : Open source, open data, replicability :2
*** FOCUS_ON_RIGOUR : Experimental rigour, including avoiding straw man comparisons :2
*** FOCUS_ON_COMPLEXITY : time/space complexity :1
*** CODE_AVAILABLE_LONG_TERM : Focus on making code available long term in a distributed fashion :1
*** NO_NEED_FOR_CENTRALISATION: don't make a big centralised repository :1
** LIMIT_INNOVATION : Don't limit researchers or stifle innovation :1
** UNNECESSARY : Good data sets exist or are in progress :1


######################################################################
Question 17
Should a benchmark suite aim to include real-world problems, synthetic
problems, or a mixture of both?
######################################################################

* SYNTHETIC_EMPIRICAL:Synthetic problems could be used for empirical research:1
* REAL_WORLD_PROGRESS:Real-world problems could be used to drive research forward:1
* BAD_IDEA: Benchmarks are not a good way of promoting development:1

######################################################################
Question 19
What application domains and problem types should the benchmark suite
contain [assuming one is to be created]?  Please check all that apply.
######################################################################


* APPLICATION_DOMAINS
** PATTERN : Pattern identification :2
** AGENT_CONTROL : Agent/robot behaviour and control :4
** GAMES : Game-playing :3
** SIGNAL_PROCESSING : Signal processing :2
** DESIGN : Design :2
** VIDEO_COMPRESSION : Compression of videos :1
** BIOINFORMATICS : Bioinformatics :2
** DATA_MINING : Data mining :1
** TEXT_PROCESSING : Text processing :2
** NUMERICAL : Real-valued numerical problems (finite element, PDE, time series) :1
** VISION : Computer vision :1
** STOCK_FORECASTING : forecasting stock market :1
** REAL_PROGRAMMING : Real programming, with multiple data types :1
** META : Comments on desirable aspects, not domain-specific :
*** HYPERHEURISTICS : Benchmarks should involve hyperheuristics somehow :1
*** HARD: Difficult problems:2
*** REAL_WORLD : desirability of real-world problems :6
*** MORE_DIVERSITY : multiple categories, diversity, completeness :2
*** CORRECT_DIVERSITY: Diversity, but not too many:1
*** LESS_DIVERSITY : benchmarks shouldn't be too much/too diverse :1
*** REUSE : existing competitions and repositories solve this problem for us :1
** NO_RESPONSE : Effectively no response :2


######################################################################
Question 21
Are there any other details which should be specified [as part of the
benchmark suite]?
######################################################################

* COMPUTATIONAL_BUDGET:Specify how much computation may be used:
** RUNS:Specify allowed number of runs:2
** NODE_EVALS:Number of node evaluations:2
** FITNESS_CASES:Number of fitness case evaluations:1
* EXPERIMENTAL_EVALUATION:Specify how to evaluate results:
** COMPUTATIONAL_PERFORMANCE:Computational performance comparison:2
** SOLUTION_COMPLEXITY:Measure of an individual's complexity should be specified:1
** PERFORMANCE_MEASURE:Performance measure should be specified:1
** EFFICIENCY_ANALYSIS:Specify how efficiency is reported:1
** REPORTING:Specify standard report methodology:2
** PERFORMANCE_ANALYSIS:How to analyse performance:1
** QUALITY_ANALYSIS:How to report solution sizes and examples:1
** ACCEPTANCE_TESTS:Specify tests to be passed before comparisons are valid:1
** COMPARISON_METHOD:Specify how results are to be compared:1
** ONLINE_SUBMISSION_METHOD:Provide online submission method to improve credibility:1
* ALGORITHM:How much of GP Algorithm should be specified?:
** NOT_OPERATORS:Do not specify crossover, mutation, initialisation:1
** JVM:Specify a JVM Version for Java programs:1
* INDIVIDUALS:Which components of an individual should be controlled:
** LANGUAGE:Programming language of individuals:1
** FUNCTIONS:Functions in function set:1
** TERMINALS:Terminals in function set:1
** CONSTANTS:Constant ranges for random constants:1
* DATA::
** DATA_GENERATION:Where data came from, techniques for generation:1
** DATASETS:Full datasets provided:1
** FOLDS:Folds for cross-fold validation:1
* FITNESS_FUNCTION::
** DEFINE_FITNESS_FUNCTION:Precise definition of fitness function:3
** DEFINE_ERROR:Definition of how error is calculated:1
** FITNESS_FUNCTION:Code for generating fitness from evaluation results:1
** HIDDEN_FITNESS:Fitness cases should be blackbox:1
* DYNAMIC_BEHAVIOUR::
** ITERATION_DETAILS:Details of iteration, e.g. episodes:1
** ENVIRONMENT:Details of interaction with environment:1
** INTERPRETER:Provide code that interprets a given candidate program:1
** EXCEPTIONS:Specify how to deal with overflows etc.:1
* PHILOSOPHY::
** FLEXIBILITY:Details should be specified but adaptable:1
** FOCUS_ON_PROBLEM:Standardise the problem, not the algorithm implementation:1
** REMAIN_OPEN_TO_CHANGE:Remain open to changing nature of specifications:1
** FOR_REPETITION_EVERYTHING:When encouraging repetition, all details should be specified:2
** OPEN_TO_NEW_IDEAS:Should be flexible to enable new ideas to be explored:1
** DO_NOT_OVERSPECIFY:Don't invent a nanny state, allow exploration:2
** SPECIFIC:Specify as much as possible:1
** FOCUS_ON_REPORTING:Reporting parameters afterwards rather than specify:1
* CONTEXT::
** CASE_STUDY:Full case study data should be given, not just example:1
** DEPENDS:Specification should depend on the application and purpose of benchmark:1
* META
** BEST_KNOWN_RESULTS:Best known results should be kept:1
** SPECIFICATION_LANGUAGE:Create an interchangeable description language for contributions:1
** COPY_BBOB:Use BBOB method:1
* OTHER
** NOT_IN_AREA:Not working within GP at the moment:1
** UNKNOWN_INTENT:Intent of comment unclear:1


######################################################################
Question 22
Can you suggest any existing benchmark problems you think SHOULD be
part of a benchmark suite?     Please give reasons if possible.
Please supply enough information to precisely identify the problems.
######################################################################


* SUGGESTED_BENCHMARKS
** SYMBOLIC_REGRESSION::
*** GENERAL_SR:General SR:1
*** DYNAMIC : Dynamic symbolic regression:1
*** TIME SERIES FORECASTING:Time Series forecasting:1
*** MULTIVARIATE_SPLINES:Problems from "Multivariate adaptive regression splines":1
*** NON_LINEAR: Problems requiring non-linear regression with variable dimensionality:1
*** SPROTT: Sprott's chaotic flow x''' = -2.017 x'' + (x')^2 - x:1
*** Q-FUNCTION: Q-Function as featured at GECCO 2012:1
*** DOW: Dow Chemical datasets:1
*** SR_FOR_PROTEIN_FOLDING: Using symbolic regression for protein folding :1
*** FRIEDMAN_AND_BREIMAN: Synthetic symbolic regression problems proposed by Friedman, used by Breiman in work on bagging predictors:1
** CLASSIFICATION ::
*** UCI : UCI :2
*** KDD : KDD :2
*** BIOINFORMATICS : bioinformatics classification:2
** PTSP : Physical travelling salesman problem :1
** BOOLEAN : Better Boolean problems including multi-out parallel multiplier :1
** SIGNAL_PROCESSING : signal processing such as image processing :1
** REAL_VALUED_OPTIMISATION : Real-valued optimisation :1
** GAMES : Games such as chess, checkers, pacman :1
** PRIMES : Miller's prime number predication :1
** TRUE_PROGRAMMING : True programming :2
** SYNTHETIC : Synthetic problems eg tree-shape, unique known solution:1
** OTHER_DISCIPLINES : Copy problems from other disciplines :
*** KDD: KDD:2
*** RL_GLUE ::1
*** FROM_NEURAL_NETWORKS: some well-known problems from field of neural networks :1
** META : miscellaneous comments, desirable/undesirable features, not problem-specific:
*** BLACKLIST : avoid UCI and Proben because many have been solved :1
*** ALL_AS_APPROPRIATE : all listed problems could be used :1
*** AVOID_CHERRYPICKING : avoid authors cherrypicking by defining subsets of problems. Helps avoid time-consuming running on all problems :1
*** HUMAN_COMPETITIVE : grand challenges, human competitive results :1
*** MODULARITY : problems which require modularity :1
** NO_RESPONSE: Effectively no response:2

######################################################################
Question 23
Can you suggest any existing benchmark problems you think SHOULD NOT
be part of a benchmark suite?     Please give reasons if possible.
Please supply enough information to precisely identify the problems.
######################################################################



* PROBLEM_CHARACTERISTICS:Characteristics that should be avoided:
** TOY_PROBLEMS:Toy/Trivial Problems:4
** KOZA:Problems from Koza’s books:6
** OLD:Problems solved more than 10 years ago:1
** PROBLEMS_TAILORED:Exclude problems tailored to GP:1
* PROBLEM_TYPES:Problem types that should be avoided:
** BOOLEAN:Boolean logic:1
** CONTROL:Control Problems:1
** SYMBOLIC_REGRESSION::2
* SPECIFIC_PROBLEMS
** ORAL_BIOAVAILABILITY:Oral bioavailability dataset:1
** LAWNMOWER::4
** ANT::3
** QUARTIC::3
** MULTIPLEXER::1
** PARITY::3
** CART:Cart centering:1
** TWO_BOX::1
** DISPATCH_RULES::1
** ROYAL_TREE::1
* META:Meta-information, ideas and opinions:
** KEEP_AS_APPROPRIATE:41,51:
** KEEP_EASY_FOR_REF_ONLY:Retain easy problems just for reference not publication:1
** POLYNOMIAL_TIME:Anything solvable by a simple polynomial algorithm, brute force, hill-climber:1
** ANY:All can be worth keeping:1
** MAX:The more the merrier, if appropriate for purpose:1
** RETAIN_OLD:Keep old examples, for sanity checks etc. but not for use in papers:1
** PISZCZ:Refer to Alan Piszcz's Papers:1

######################################################################
Question 26
For how many years have you worked in or studied GP?
######################################################################

* 0-5 Years::34
* 6-10 Years::16
* 11-15 Years::14
* 16-20 Years::10


######################################################################
Question 28
Where did you hear about this questionaire?
######################################################################


* COLLEAGUES::
** SUPERVISOR:From supervisor:1
** COLLEAGUE:Heard from Colleague (not specified):3
** EMAIL:Email (Source not specified):2
* RESEARCH::
** PHD:From PhD Research:1
** GOOGLE:Google:2
* OFFICIAL::
** ORGANISER_EMAIL:Email from survey organisers:3
** OTHER_CONFERENCE:Attendance at another conference:1
** TWITTER:From Twitter Tag #gecco2012:1



######################################################################
Question 29
Please give any other comments you wish to make.
######################################################################


* OTHER_COMMENTS
** POSITIVITY : generally positive comments, good discussion, good luck:8
** NEGATIVITY : generally negative comments, and be careful because things could go wrong :4
** FOCUS_ON_RIGOUR : It would be better to focus on and enforce experimental rigour, statistics, openness, academic honesty:5
** FIELD_IS_UNDIRECTED: Much work in GP is not progressing, just an exercise in comparing one technique to another:1
** TEACHING_TO_THE_TEST : Danger that researchers will focus on benchmarks, losing sight of other issues:1
** REAL_WORLD_PERFORMANCE : Making benchmarks relate to real-world performance is key, and is difficult:2
** BEST_PRACTICE : Existence of a benchmark suite would reassure new practitioners :1
** CROSS_PAPER_COMPARISON : Comparing results across papers is key, and is difficult:2
** CODE_QUALITY_STANDARDS : Code quality is a concern, if creating a repository :1
** CHERRYPICKING: Authors currently cherry-pick datasets:1
** WITHIN_TECHNIQUE: Benchmarks are also useful for tweaking/parameter sweeping of a single technique, not just between techniques:1
** SUGGESTED_PROBLEMS_AND_TYPES::
*** REAL_WORLD : Real world problems:3
*** DIVERSITY : Diverse problems :1
*** NON_GP_REPRESENTATION_INDEPENDENT : Problems which are from outside GP and/or are representation independent:4
*** FLEXIBLE: Be flexible in required specification detail:1
*** CONTROL_POLE_BALANCING : Control problems like pole balancing:1
*** TRUE_PROGRAMMING : True automatic programming:1
*** SIMPLICITY: Benchmarks should be simple to implement and understand:1
*** FAST : Fast to execute:1
*** DIFFICULT : Benchmark suite should cause some techniques to fail, not just easy toy problems:2
*** TUNABILITY : At least some benchmarks should be tunable:2
*** THEORETICAL : Theoretical and synthetic problems have value even if they look like toy problems:2
** USE_CASES : A benchmark suite should recognise different use-cases for problems, eg sanity checking, verifying code, publishing... :1
** CLARIFICATIONS : answers clarifying previous multiple choice answers :4