# This file contains trees, the output of a coding process carried out # for all the free-text survey responses. For each question, we read # the free-text responses and try to identify the essential points. We # try to express them in a way that allows multiple responses making # the same point to be categorised together. Each specific point is # assigned a short codeword (written one per line, in uppercase below) # and a definition (lowercase, following the codeword and the # colon). All responses which make the corresponding point are then # counted and the number of them is placed after the second colon. In # some cases it is useful to create a hierarchy of responses, for # grouping at a level above that of individual points. That leads to # the tree structures below. Levels of the tree are indicated by the # number of asterisks. Individual points are always at the leaves of # the trees. # The coding was carried out by David White and James McDermott. Each # coded all the responses, then compared results, then merged them to # form a single consensus coding. # The coding process is a standard method for processing free-text # responses in fields where questionnaires and similar texts are # common: # http://onlineqda.hud.ac.uk/Intro_QDA/how_what_to_code.php # http://onlineqda.hud.ac.uk/Intro_QDA/phpechopage_titleOnlineQDA-Examples_QDA.php # http://www.sagepub.com/upm-data/24614_01_Saldana_Ch_01.pdf # http://www.southalabama.edu/coe/bset/johnson/lectures/lec17.pdf # http://www.qualitative-research.net/index.php/fqs/article/view/1634/3154 ###################################################################### Question 2 What types of problems do you use? Please check all that apply. ###################################################################### * PROBLEM_TYPES :: ** COMPETITIONS : existing competitions :1 ** SCIENTIFIC_SYMBOLIC_REGRESSION : symbolic regression on scientific data :1 ** UCI_KDD : UCI and KDD data sets :2 ** REINFORCEMENT_LEARNING_PROBLEMS : reinforcement learning problems :1 ** RANDOMLY_GENERATED : randomly generated problems :1 ** CURRENT_PROBLEM : use the problem I want to solve, not a benchmark :1 ** NON_GP : Problems from outside GP :1 ** REINFORCEMENT_LEARNING: Reinforcement Learning:1 ###################################################################### Question 6 If yes [current experimental regime has disadvantages], what are those disadvantages? Please check all that apply. ###################################################################### * BENCHMARK_SELECTION:: ** COVERAGE:narrow selection of problems used: *** GENERAL : some problem classes not covered :1 *** NO_REAL_PROGRAMMING:few problems address automatic programming:1 ** NO_CONSENSUS:no agreement on what constitutes a 'good' benchmark:2 ** TOY_PROBLEMS:results based on trivial problems/not realistic:5 ** CHOOSING_BENCHMARKS : how should researcher choose benchmarks, given goal :2 ** SCALABILITY:few benchmarks are truly scalable:1 ** REPRESENTATIVE:benchmarks do not reflect real-world performance:2 ** OUT_OF_DATE:problems that are at least 20 years old are being used:1 * REPORTING:: ** REPLICABILITY:papers omit implementation details, inhibiting repeatability:3 * EXPERIMENTAL_METHOD:: ** STANDARD_APPROACH:no standard approach to comparing methods:1 ** AWARENESS:authors not aware of previous results:1 ** EXPERIMENTAL_DESIGN:authors do not carry out rigorous experiments:1 ** CHERRY_PICKING:only measures that show favourable results are used:1 * STATISTICS ** NO_STANDARD_PERFORMANCE_MEASURE:no agreed performance measure:5 ** UNSUITABLE_PERFORMANCE_MEASURES: weak performance measures :4 ** STATISTICAL_RIGOUR:lack of statistical rigour:2 ** SIGNIFICANCE:no agreed way of establishing statistical significance:1 * NEED_GOLD_STANDARD:we should have a list of best values achieved on each problem:1 ###################################################################### Question 7 What forms of GP do you use? Please check all that apply. ###################################################################### * FORMS_OF_GP_USED:: ** GRNs : Genetic regulatory networks :1 ** LINEAR_GP : Linear GP :1 ** CARTESIAN_GP : Cartesian GP :1 ** OOGP : Object-oriented GP :1 ** MICRO_GP : Micro GP :1 ** NATIVE_CODE_TREES : Native code trees :1 ** TREE_ENSEMBLES : Ensembles of trees :1 ** LOGIC_PROGRAMMING_GP : Logic programming GP :1 ** BAYESIAN_NETWORKS : Bayesian networks :1 ** DETERMINISTIC_SR: Deterministic symbolic regression :1 ** OTHER : unspecified new forms :1 ###################################################################### Question 8 Which form of GP do you use most often? ###################################################################### * FORMS_OF_GP_USED_MOST_OFTEN:: ** GRNs : genetic regulatory networks :1 ** CARTESIAN_GP : Cartesian GP :1 ** MICRO_GP : MicroGP :1 ** TREE_ENSEMBLES : Ensembles of trees :1 ** NON_STANDARD_TREE : non-standard tree representations :1 ** STANDARD_GP : standard GP :1 ** PROJECT_DEPENDENT : depends on current project :1 ###################################################################### Question 13 If yes [you are in favour of a standardised GP benchmark suite], why? Please check all that apply. ###################################################################### * IMPROVE_QUALITY:: ** QUALITY:Improve quality of research:1 ** YOUNG:Help young researchers improve empirical method:1 ** PROGRESS:Drive the field forward:1 ** REPUTATION:Improve the reputation/maturity of the field:2 ** PRACTICE:Improve experimental practice:1 * PROVIDE_RESOURCE:: ** EMPIRICAL_STUDIES:useful for empirical studies:1 ** OTHER_DISCIPLINES:Provide a resource to other disciplines too:1 * FOCUS:: ** USAGE:Enable analysis of benchmark usage:1 ** ELIMINATE_OLD:Eliminate old and outdated benchmarks:1 ** BEST_RESULTS:Enable us to keep track of best results:1 ** TAXONOMY:Provide a taxonomy to organise problems:1 ** IDENTIFY_STRENGTHS:Will identify those problems GP is good at:1 * BENEFIT_VERSUS_EFFORT:: ** NO_RISK:Not a risk of over-focus on benchmarks:1 ** LOW_EFFORT:Not much required for the potential benefit:1 ###################################################################### Question 14 If not [you are not in favour of a standardised GP benchmark suite], why not? Please check all that apply. ###################################################################### * WHY_NOT_STANDARDISE ** NOT_REAL_WORLD : Benchmarks don't reflect the real-world :4 ** NOT_BENCHMARKABLE: Applications aren’t always benchmarkable:1 ** DICTATING_TO_THE_FIELD: Small group decides benchmarks :1 ** POOR_COMPROMISE: Compromising between different types of practitioners will result in a set of benchmarks that only limits us:1 ** DISTORTION : Benchmarks encourage research distortion or "teaching to the test" :1 ** DIFFICULTY : Hard to create a benchmark suite satisfying everyone :1 ** WRONG_FOCUS : Benchmarks miss the point -- it would be better to focus on some other aspect : *** FOCUS_ON_OPENNESS : Open source, open data, replicability :2 *** FOCUS_ON_RIGOUR : Experimental rigour, including avoiding straw man comparisons :2 *** FOCUS_ON_COMPLEXITY : time/space complexity :1 *** CODE_AVAILABLE_LONG_TERM : Focus on making code available long term in a distributed fashion :1 *** NO_NEED_FOR_CENTRALISATION: don't make a big centralised repository :1 ** LIMIT_INNOVATION : Don't limit researchers or stifle innovation :1 ** UNNECESSARY : Good data sets exist or are in progress :1 ###################################################################### Question 17 Should a benchmark suite aim to include real-world problems, synthetic problems, or a mixture of both? ###################################################################### * SYNTHETIC_EMPIRICAL:Synthetic problems could be used for empirical research:1 * REAL_WORLD_PROGRESS:Real-world problems could be used to drive research forward:1 * BAD_IDEA: Benchmarks are not a good way of promoting development:1 ###################################################################### Question 19 What application domains and problem types should the benchmark suite contain [assuming one is to be created]? Please check all that apply. ###################################################################### * APPLICATION_DOMAINS ** PATTERN : Pattern identification :2 ** AGENT_CONTROL : Agent/robot behaviour and control :4 ** GAMES : Game-playing :3 ** SIGNAL_PROCESSING : Signal processing :2 ** DESIGN : Design :2 ** VIDEO_COMPRESSION : Compression of videos :1 ** BIOINFORMATICS : Bioinformatics :2 ** DATA_MINING : Data mining :1 ** TEXT_PROCESSING : Text processing :2 ** NUMERICAL : Real-valued numerical problems (finite element, PDE, time series) :1 ** VISION : Computer vision :1 ** STOCK_FORECASTING : forecasting stock market :1 ** REAL_PROGRAMMING : Real programming, with multiple data types :1 ** META : Comments on desirable aspects, not domain-specific : *** HYPERHEURISTICS : Benchmarks should involve hyperheuristics somehow :1 *** HARD: Difficult problems:2 *** REAL_WORLD : desirability of real-world problems :6 *** MORE_DIVERSITY : multiple categories, diversity, completeness :2 *** CORRECT_DIVERSITY: Diversity, but not too many:1 *** LESS_DIVERSITY : benchmarks shouldn't be too much/too diverse :1 *** REUSE : existing competitions and repositories solve this problem for us :1 ** NO_RESPONSE : Effectively no response :2 ###################################################################### Question 21 Are there any other details which should be specified [as part of the benchmark suite]? ###################################################################### * COMPUTATIONAL_BUDGET:Specify how much computation may be used: ** RUNS:Specify allowed number of runs:2 ** NODE_EVALS:Number of node evaluations:2 ** FITNESS_CASES:Number of fitness case evaluations:1 * EXPERIMENTAL_EVALUATION:Specify how to evaluate results: ** COMPUTATIONAL_PERFORMANCE:Computational performance comparison:2 ** SOLUTION_COMPLEXITY:Measure of an individual's complexity should be specified:1 ** PERFORMANCE_MEASURE:Performance measure should be specified:1 ** EFFICIENCY_ANALYSIS:Specify how efficiency is reported:1 ** REPORTING:Specify standard report methodology:2 ** PERFORMANCE_ANALYSIS:How to analyse performance:1 ** QUALITY_ANALYSIS:How to report solution sizes and examples:1 ** ACCEPTANCE_TESTS:Specify tests to be passed before comparisons are valid:1 ** COMPARISON_METHOD:Specify how results are to be compared:1 ** ONLINE_SUBMISSION_METHOD:Provide online submission method to improve credibility:1 * ALGORITHM:How much of GP Algorithm should be specified?: ** NOT_OPERATORS:Do not specify crossover, mutation, initialisation:1 ** JVM:Specify a JVM Version for Java programs:1 * INDIVIDUALS:Which components of an individual should be controlled: ** LANGUAGE:Programming language of individuals:1 ** FUNCTIONS:Functions in function set:1 ** TERMINALS:Terminals in function set:1 ** CONSTANTS:Constant ranges for random constants:1 * DATA:: ** DATA_GENERATION:Where data came from, techniques for generation:1 ** DATASETS:Full datasets provided:1 ** FOLDS:Folds for cross-fold validation:1 * FITNESS_FUNCTION:: ** DEFINE_FITNESS_FUNCTION:Precise definition of fitness function:3 ** DEFINE_ERROR:Definition of how error is calculated:1 ** FITNESS_FUNCTION:Code for generating fitness from evaluation results:1 ** HIDDEN_FITNESS:Fitness cases should be blackbox:1 * DYNAMIC_BEHAVIOUR:: ** ITERATION_DETAILS:Details of iteration, e.g. episodes:1 ** ENVIRONMENT:Details of interaction with environment:1 ** INTERPRETER:Provide code that interprets a given candidate program:1 ** EXCEPTIONS:Specify how to deal with overflows etc.:1 * PHILOSOPHY:: ** FLEXIBILITY:Details should be specified but adaptable:1 ** FOCUS_ON_PROBLEM:Standardise the problem, not the algorithm implementation:1 ** REMAIN_OPEN_TO_CHANGE:Remain open to changing nature of specifications:1 ** FOR_REPETITION_EVERYTHING:When encouraging repetition, all details should be specified:2 ** OPEN_TO_NEW_IDEAS:Should be flexible to enable new ideas to be explored:1 ** DO_NOT_OVERSPECIFY:Don't invent a nanny state, allow exploration:2 ** SPECIFIC:Specify as much as possible:1 ** FOCUS_ON_REPORTING:Reporting parameters afterwards rather than specify:1 * CONTEXT:: ** CASE_STUDY:Full case study data should be given, not just example:1 ** DEPENDS:Specification should depend on the application and purpose of benchmark:1 * META ** BEST_KNOWN_RESULTS:Best known results should be kept:1 ** SPECIFICATION_LANGUAGE:Create an interchangeable description language for contributions:1 ** COPY_BBOB:Use BBOB method:1 * OTHER ** NOT_IN_AREA:Not working within GP at the moment:1 ** UNKNOWN_INTENT:Intent of comment unclear:1 ###################################################################### Question 22 Can you suggest any existing benchmark problems you think SHOULD be part of a benchmark suite? Please give reasons if possible. Please supply enough information to precisely identify the problems. ###################################################################### * SUGGESTED_BENCHMARKS ** SYMBOLIC_REGRESSION:: *** GENERAL_SR:General SR:1 *** DYNAMIC : Dynamic symbolic regression:1 *** TIME SERIES FORECASTING:Time Series forecasting:1 *** MULTIVARIATE_SPLINES:Problems from "Multivariate adaptive regression splines":1 *** NON_LINEAR: Problems requiring non-linear regression with variable dimensionality:1 *** SPROTT: Sprott's chaotic flow x''' = -2.017 x'' + (x')^2 - x:1 *** Q-FUNCTION: Q-Function as featured at GECCO 2012:1 *** DOW: Dow Chemical datasets:1 *** SR_FOR_PROTEIN_FOLDING: Using symbolic regression for protein folding :1 *** FRIEDMAN_AND_BREIMAN: Synthetic symbolic regression problems proposed by Friedman, used by Breiman in work on bagging predictors:1 ** CLASSIFICATION :: *** UCI : UCI :2 *** KDD : KDD :2 *** BIOINFORMATICS : bioinformatics classification:2 ** PTSP : Physical travelling salesman problem :1 ** BOOLEAN : Better Boolean problems including multi-out parallel multiplier :1 ** SIGNAL_PROCESSING : signal processing such as image processing :1 ** REAL_VALUED_OPTIMISATION : Real-valued optimisation :1 ** GAMES : Games such as chess, checkers, pacman :1 ** PRIMES : Miller's prime number predication :1 ** TRUE_PROGRAMMING : True programming :2 ** SYNTHETIC : Synthetic problems eg tree-shape, unique known solution:1 ** OTHER_DISCIPLINES : Copy problems from other disciplines : *** KDD: KDD:2 *** RL_GLUE ::1 *** FROM_NEURAL_NETWORKS: some well-known problems from field of neural networks :1 ** META : miscellaneous comments, desirable/undesirable features, not problem-specific: *** BLACKLIST : avoid UCI and Proben because many have been solved :1 *** ALL_AS_APPROPRIATE : all listed problems could be used :1 *** AVOID_CHERRYPICKING : avoid authors cherrypicking by defining subsets of problems. Helps avoid time-consuming running on all problems :1 *** HUMAN_COMPETITIVE : grand challenges, human competitive results :1 *** MODULARITY : problems which require modularity :1 ** NO_RESPONSE: Effectively no response:2 ###################################################################### Question 23 Can you suggest any existing benchmark problems you think SHOULD NOT be part of a benchmark suite? Please give reasons if possible. Please supply enough information to precisely identify the problems. ###################################################################### * PROBLEM_CHARACTERISTICS:Characteristics that should be avoided: ** TOY_PROBLEMS:Toy/Trivial Problems:4 ** KOZA:Problems from Koza’s books:6 ** OLD:Problems solved more than 10 years ago:1 ** PROBLEMS_TAILORED:Exclude problems tailored to GP:1 * PROBLEM_TYPES:Problem types that should be avoided: ** BOOLEAN:Boolean logic:1 ** CONTROL:Control Problems:1 ** SYMBOLIC_REGRESSION::2 * SPECIFIC_PROBLEMS ** ORAL_BIOAVAILABILITY:Oral bioavailability dataset:1 ** LAWNMOWER::4 ** ANT::3 ** QUARTIC::3 ** MULTIPLEXER::1 ** PARITY::3 ** CART:Cart centering:1 ** TWO_BOX::1 ** DISPATCH_RULES::1 ** ROYAL_TREE::1 * META:Meta-information, ideas and opinions: ** KEEP_AS_APPROPRIATE:41,51: ** KEEP_EASY_FOR_REF_ONLY:Retain easy problems just for reference not publication:1 ** POLYNOMIAL_TIME:Anything solvable by a simple polynomial algorithm, brute force, hill-climber:1 ** ANY:All can be worth keeping:1 ** MAX:The more the merrier, if appropriate for purpose:1 ** RETAIN_OLD:Keep old examples, for sanity checks etc. but not for use in papers:1 ** PISZCZ:Refer to Alan Piszcz's Papers:1 ###################################################################### Question 26 For how many years have you worked in or studied GP? ###################################################################### * 0-5 Years::34 * 6-10 Years::16 * 11-15 Years::14 * 16-20 Years::10 ###################################################################### Question 28 Where did you hear about this questionaire? ###################################################################### * COLLEAGUES:: ** SUPERVISOR:From supervisor:1 ** COLLEAGUE:Heard from Colleague (not specified):3 ** EMAIL:Email (Source not specified):2 * RESEARCH:: ** PHD:From PhD Research:1 ** GOOGLE:Google:2 * OFFICIAL:: ** ORGANISER_EMAIL:Email from survey organisers:3 ** OTHER_CONFERENCE:Attendance at another conference:1 ** TWITTER:From Twitter Tag #gecco2012:1 ###################################################################### Question 29 Please give any other comments you wish to make. ###################################################################### * OTHER_COMMENTS ** POSITIVITY : generally positive comments, good discussion, good luck:8 ** NEGATIVITY : generally negative comments, and be careful because things could go wrong :4 ** FOCUS_ON_RIGOUR : It would be better to focus on and enforce experimental rigour, statistics, openness, academic honesty:5 ** FIELD_IS_UNDIRECTED: Much work in GP is not progressing, just an exercise in comparing one technique to another:1 ** TEACHING_TO_THE_TEST : Danger that researchers will focus on benchmarks, losing sight of other issues:1 ** REAL_WORLD_PERFORMANCE : Making benchmarks relate to real-world performance is key, and is difficult:2 ** BEST_PRACTICE : Existence of a benchmark suite would reassure new practitioners :1 ** CROSS_PAPER_COMPARISON : Comparing results across papers is key, and is difficult:2 ** CODE_QUALITY_STANDARDS : Code quality is a concern, if creating a repository :1 ** CHERRYPICKING: Authors currently cherry-pick datasets:1 ** WITHIN_TECHNIQUE: Benchmarks are also useful for tweaking/parameter sweeping of a single technique, not just between techniques:1 ** SUGGESTED_PROBLEMS_AND_TYPES:: *** REAL_WORLD : Real world problems:3 *** DIVERSITY : Diverse problems :1 *** NON_GP_REPRESENTATION_INDEPENDENT : Problems which are from outside GP and/or are representation independent:4 *** FLEXIBLE: Be flexible in required specification detail:1 *** CONTROL_POLE_BALANCING : Control problems like pole balancing:1 *** TRUE_PROGRAMMING : True automatic programming:1 *** SIMPLICITY: Benchmarks should be simple to implement and understand:1 *** FAST : Fast to execute:1 *** DIFFICULT : Benchmark suite should cause some techniques to fail, not just easy toy problems:2 *** TUNABILITY : At least some benchmarks should be tunable:2 *** THEORETICAL : Theoretical and synthetic problems have value even if they look like toy problems:2 ** USE_CASES : A benchmark suite should recognise different use-cases for problems, eg sanity checking, verifying code, publishing... :1 ** CLARIFICATIONS : answers clarifying previous multiple choice answers :4