Population Evaluator

The following task is the second step on our path to make a system, which automatically creates programs for a robot in a maze via genetic programming. At the end, the programs will be created from example pairs of the state of the maze before and after the execution of the program. The best program is the shortest and quickest program that can transform any input maze to the desired output maze.

The basic law driving the evolution is the higher chance of survival of the best individuals. The individuals in our case are the programs for the robot simulator. Your second assignment is to implement a function, which will choose the best programs from a given population and sort them according to their quality.

Input

(evaluate <prgs> <pairs> <threshold> <stack_size>) where

<prgs>
is a list of programs for the robot (as in Task 1). Each program includes a procedure named "start", which is the initial expression for the simulation;
<pairs>
is a list of pairs of states, including the position and the orientation of the robot;
<threshold>
is lower bounds on the quality of the program in order to appear in the output;
<stack_size>
is the limit on the robot simulator stack size (see Task 1).

Output

A sorted list of pairs ((<value> <program>) ... (<value> <program>)) including only the programs, which made it under the threshold.

The value of a program has four components that are used in lexicographical order for the final ordering. The components in the decreasing order of importance are the following:

The final value of a program is a list of four numbers, each representing the sum of the corresponding component above (the length of the program is of course computed only once) over all the input pairs of states.

The role of the threshold is to remove obviously wrong programs and to make the evaluation more efficient. As soon as it is clear that an individual will not make it to the output set, the evaluator should stop evaluating (e.g. simulating) the individual in order not to waste computational resources. The threshold for each of the fitness function components is applied separately. It means that a program which has any of the components of the fitness function higher than the threshold will not appear in the output. Note that the threshold is applied to the sum of the evaluations for all pair not to each evaluation separately.

Example:

Consider the following part of Scheme program:
(define prgs
'(
   ( 
      (procedure start
         (turn-right (if wall? (turn-left 
             (if wall? (turn-left (if wall? turn-left step)) step)) step)
                 put-mark start )
      )   
      (procedure turn-right (turn-left turn-left turn-left turn-left turn-left))
  )
  (
      (procedure start  (put-mark (if wall? turn-left step) start))
  )
  (
      (procedure start (step step step put-mark))
  )
)
)


(define pairs
'(
  (
   (((w w w w w w) 
      (w 0 w 0 w w) 
     (w 1 w 0 0 w) 
      (w 1 0 0 w w) 
     (w w w w w w)) 
     (1 3) southwest)

   (((w w w w w w) 
      (w 0 w 0 w w) 
     (w 0 w 0 0 w) 
      (w 0 0 0 w w) 
     (w w w w w w)) 
     (1 1) northeast)
   )
   (
   (((w w w w w w) 
      (w 0 w 0 w w) 
     (w 0 w 2 0 w) 
      (w 1 3 0 w w) 
     (w w w w w w)) 
     (3 3) northwest)

   (((w w w w w w) 
      (w 0 w 0 w w) 
     (w 0 w 0 0 w) 
      (w 0 0 0 w w) 
     (w w w w w w)) 
     (1 1) northeast)
  ))
 )

Now we call:
>  (evaluate prgs pairs '(20 20 20 20) 5)
And we obtain (according to the task specification):
(
((8 7 5 1) ((procedure start (step step step put-mark)))) 
((18 8 6 20) ((procedure start (put-mark (if wall? turn-left step) start))))
)

Example 2, Example 3, Example 4, Example 5, Example 6