even-quicker-push.m4v (29.4 MB)
Also available on Youtube.
even-quicker-push.m4v (29.4 MB)
Also available on Youtube.
Transcript of An Even Quicker Introduction to the Push Programming Language, by Lee Spector
I am Lee Spector, and this is an even quicker introduction to the Push programming language.
Push is a language for programs that evolve by random variation and selection, not for programs that humans write directly.
Push is unusually simple in terms of syntax, which makes it easy to randomly generate and change programs. The surprising and useful thing is that Push’s simplicity doesn’t prevent it from expressing programs that use complex data and control structures.
The key to this is typed data stacks. All Push instructions take their inputs from stacks, and push their outputs to stacks. In most languages, the way data flows depends on what’s next to what in the text of the program. But in Push data always flows through the stacks. There’s a stack for each data type, and instructions always go to stacks of the right types so they always get appropriate inputs, no matter what they’re next to. If an instruction needs something and the stack for it is empty, it just does nothing.
More specifically, the entire syntax of the language is just that Push programs are made of instructions, literals like numbers or strings, and parentheses that group blocks of code, all of which can be in any order as long as the parentheses are balanced. As long as that’s true, it’s a valid program that can be run and produce some kind of result, which may not be what you want but it won’t crash anything.
We use the name PushGP for any genetic programming system that evolves Push programs. PushGP can be used for pretty much anything you would use any genetic programming system for, plus problems that call for rich data and control structures.
A few of the areas in which PushGP has produced significant results are general software synthesis and the discovery of algorithms for quantum computers.
How do you get Push?
The ideas behind the language are simple and if you’re a programmer writing an evolutionary computing system then you may just want to write a simple version of your own as part of your system. An interpreter with enough instructions to be useful can probably be written in a few hundred lines of code in any reasonable high-level language.
That said, Push and PushGP systems have been developed in a variety of host languages, and some of these are publicly available.
See pushlanguage.org for more on these.
Push is expressive, compared to other program representations used for genetic programming, because it allows for the use of multiple data types, arbitrary control structures, including novel forms of conditional execution, iteration, and recursion, and the completion of multiple tasks, through the production of multiple outputs from multiple inputs, by a single program.
In most languages, features like these come with syntax restrictions and brittleness. If you change one thing randomly then everything probably breaks. But in Push, adding, removing, or switching the order of instructions often makes no difference at all to what a program does. This supports neutral variation, which may help with evolutionary search.
Here’s a diagram of a Push interpreter, or a Push virtual machine, with data on stacks of several types. The exec stack contains code, and is central to the way the interpreter works.
Now lets look at how Push programs are executed.
When a Push program is executed it is first pushed onto the exec stack, and then, while the exec stack isn’t empty and a pre-defined step limit hasn’t yet been reached, the top item is popped and processed.
If that popped item is an instruction, then it is executed by popping the inputs that it needs, doing the instruction, and pushing outputs onto the right stacks. If the stacks don’t have everything the instruction needs, then it just does nothing.
If the popped item is a literal, like a number or a string of characters, then it is pushed onto the right stack.
Finally, if it’s a block of code in parentheses, then each item in it is pushed individually back onto the exec stack, last first, so the first item ends up on top.
Push implementations have to enforce some limits, which the user can usually set, for example on the number of steps to execute. Some implementations may also support limits on wall-clock time, on stack depths, or on data item sizes, and they may provide different ways to handle them.
Here’s how it works starting from state we saw previously. First integer_mult is popped and processed, which causes the top two integers to be popped and multiplied, and their result to be pushed back onto the integer stack.
Next, boolean_and is popped and processed, which pops the top two boolean values and pushes the result of and-ing them together. True and false is false.
Next we have a code block, which is unwrapped and its contents are pushed individually back on the exec stack.
Then 3 is pushed onto the integer stack.
Then string_dup pops and then pushes two copies of the string on top of the string stack.
Then integer_add pops the top two integers and pushes their sum.
And then the exec stack is empty so it stops. The results are whatever is on the stacks, but usually we’ll only care about the top of some particular stack or stacks, depending on what the program is supposed to do.
When we want to run a full program, in an initially empty interpreter, we push it onto the exec stack and then proceed before. Here we’ll see that 1 plus 2 is 3.
A few examples without the diagram. Here is a program using or and not on booleans, which leaves false on the boolean stack.
This one uses an integer comparison instruction, which pops two integers and pushes a boolean indicating whether the deeper one is less than or equal to the shallower one, which it is in this case, so the value left on the boolean stack is true.
Here we see how that boolean value can be used to cause the execution of one or another block of code, using exec_if. When exec_if is executed, the two code blocks after it will be under it on the exec stack. Depending on what’s on the boolean stack, one or the other will be removed, and the remaining one will be left to be executed.
In this case this leaves yes on the string stack, and 1 on the integer stack.
Here’s an example involving multiple types, with string_length producing an integer from a string, string_reverse producing a string from a string, and then string_eq taking two strings an producing a boolean, which is then used by exec_if to determine which of two integer instructions to apply to the top two integers.
Which in this case produces a result of 5.
And finally, here we see an iterative calculation of the factorial of 8, using the exec_do*range instruction to execute integer_mult in the body of a loop 8 times on a running product and the loop index.
Producing a result of 40 thousand, three hundred and twenty, which is the factorial of 8.
Here are two programs that look pretty different but do the same thing.
In the first it’s easy to see that it multiplies 2 and 3, adds 4.1 and 5.2, and computes the result of “or” on true and false.
The second does the same thing except the operations on the different types are interleaved, and a couple of instructions are included that won’t do anything because there won’t be enough inputs for them when they’re executed.
Most Push implementations include dozens or hundreds of instructions, but we’ll just look at a few here.
Here are some typical integer instructions, for example to add two integers, divide two integers, find the minimum of two integers, determine whether one integer is greater than another, read an integer from a string, and to generate a random integer.
And here are instructions for taking the exclusive or of two booleans, and for converting an integer to a boolean, which will be false unless the integer is zero.
Here are instructions for concatenating two strings, for checking whether a string contains a substring, for getting the length of a string, and for replacing one character with another in a string.
Finally, input instructions allow a program to push copies of program inputs on demand, and print instructions send output to a write-only stack that functions like a console or file.
Exec instructions implement control structures such as conditionals and loops by manipulating items on the exec stack, which will be executed later. Here are a few examples of those.
Stack manipulation instructions are also available for every type, for example to duplicate the top item on the stack, to test if a stack is empty, to test the top two items for equality, and so on.
How should we make random changes to Push programs during evolution?
One nice approach is to make the changes not to the programs themselves, but instead to what we call genomes, which are programs without parentheses. This makes it easy to perform uniform variation, meaning that everything is equally likely to be mutated, or included in a recombination.
We translate genomes into programs only when we need to execute the for testing. Aside from that the evolutionary process operates on genomes.
The translation process adds opening parentheses where instructions that use code blocks indicate they should be. For example, exec_if takes two code blocks, so one is added after the exec_if, and then another after the first code block is closed.
We’ve developed two ways to say where the closing parentheses go. One uses epigenetic markers attached to genes in the genome, saying how many closing parentheses go in each place. We call this Plush, which is Push with an L added for linear.
The other way is to allow close instructions in the genome, which turn into closing parentheses during translation. We call this Plushy, since it’s like Plush but not quite the same.
With either Plush or Plushy, unmatched closing parentheses are ignored, and extras are added at the end if necessary to close everything up.
I’ll briefly mention a few optional features that are available in some Push implementations.
Some include instructions that create associations between names or tags and data items, allowing for non-stack data storage.
Some also have instructions for creating local environments, in which non-local effects can only be produced with explicit return instructions. This involves the addition of stacks for environments and return values.
Push makes it easy to simplify programs automatically, for example by just repeatedly removing random things and putting them back whenever that breaks something. This usually shrinks evolved programs pretty dramatically, and sometimes it even makes them more general in the process.
Since Push programs can express operations on other Push programs and on genomes, they can express their own reproduction and variation processes in an evolutionary computation system. Usually they don’t, so far, but this is being studied under the name autoconstructive evolution.
In conclusion, The Push programming language supports the evolution of expressive programs that may use arbitrary data types and control structures, possibly to perform multiple tasks.
The core concepts of the language are nonetheless relatively straightforward, which means that Push interpreters, and systems that evolve Push programs, are relatively easy to write in any high level host language.
More information is available at pushlanguage.org.
Thank you.