Autosimplification is a method for automatically simplifying a Push program invented by @lspector. For it to work, you need to be able to test a program deterministically on a a set of test cases, resulting in an error vector.
To autosimplify a program, you loop a large number of times (50 to 10000 have been used) over the following: remove a small number (currently 1 or 2) of points (instructions or code blocks) from the program, or remove a parentheses pair. If the resulting program produces the exact same error vector as the original, keep it; otherwise, revert to the previous program. You’ll notice that this is a hill-climbing algorithm that cannot backtrack.
So far, no one has implemented autosimplification using Plush genomes instead of Push programs, but there is no reason to think it wouldn’t work similarly.
With the advent of Plush genomes, we have not used autosimplification as a genetic operator, since it works on Push programs and not Plush genomes. In the past we have used it as a genetic operator, with no evidence of improving problem solving performance and some evidence of evolving smaller programs. See Frode’s student workshop paper for details.
There is evidence (see @thelmuth’s dissertation) that simplified programs are significantly more likely to generalize than unsimplified programs. I.e., it is not uncommon for an evolved program to be successful on all the training cases, but fail one or more of the test cases that weren’t seen during the training. The simplified version, however, is often less likely to fail these unseen test cases, presumably because some weird edge stuff gets removed that didn’t interfere on the (limited) training set, but would cause trouble on some larger or more complicated test case. @thelmuth plans on working on this over the summer 2016.
As I understand the code, the current system removes either 1 or 2 instructions from a program in a simplification step. @thelmuth has commented on several occasions that removing such small numbers of instructions might be a bit restricted and it might be worth playing with removing several instructions in a group. In going over a program that I evolved today, I ran across a simple example where this would have been useful. I was evolving a “simple” bit of arithmetic (the kind of thing that tree-based GP would be pretty good at, actually, and in looking at the simplified program I had two things on the stack $x$ and $y$ and if I just multiplied them together I’d have the desired answer. Instead of just integer_mult, though, I had the following:
8 49 integer_shove integer_mult
The 8 and 49 were completely irrelevant, and in the way of finishing up the problem. The integer_shove “dealt” with that, by eating the 49 and pushing the 8 down below $x$ and $y$. Then the integer_mult could do it’s thing and we were good.
The three instruction sequence 8 49 integer_shove was completely unnecessary and could in principle be removed by simplification, but the current simplification can’t “see” that. Removing any one of them on their own breaks the program, as does removing any pair. You’d have to remove all three, which can’t happen with the current simplification system.
I don’t have any idea how often this happens, to be honest, but my guess is that it happens often enough to warrant some digging. I also could imagine that it might make sense to vary the range of values for how many to remove change over the process of simplification, although I’m not sure whether it should go up over time or down.
It can and has been used on programs evolved using autoconstruction, which does a really nice job of stripping out most of the “baby making” code so you can focus on how it solves the test problem. It’s less clear (as discussed elsewhere) how to use something like this to remove the problem-specific code to aid in understanding the baby making code.
Automsimplification will only guarantee that you’ll keep the things you’re testing for, and it won’t tell us when it’s wrong. But we can’t make a good enough measure. For what it’s worth, a possible test of any measure would be to check if the reproductive system of the simplified programs is not the same as the original.
So, with determinism on, if the simplified programs progeny always had the same error vectors as that of the original program, then you did it right. But I have no idea how to make a good measure.
When I was working on the plush genome implementation for the new thing, I noticed that they all have :silent keys, and wondered if I might be able to do a simplification by trying to maximize the number of silenced genes using a GA or equivalent, under the hard constraint of not changing the training case scores.
I’ll make some edits to clarify the first post, which I assume is what you wanted since you made it a wiki
This is mostly because the simplification code hasn’t changed much since before Plush genomes were even a thing. I’ve been arguing that using the :silent epignes for simplification would be useful, especially with not-fully-hill-climbing simplification.
This is correct, looking at the code. And yes, it would sometimes be better to remove more things. Though one reason not to is that the combinatorics of removing “the right 3” instructions are exponentially higher than “the right 2” or “the right 1”, at least for large programs. Still, would be worth a shot, probably up to ~4-5 – sometimes, you’ll get a few things that aren’t related but are good to remove anyway.
This is something that I hope to try this summer, when I’m going to dig into simplification more, especially as a post-hoc generalization improver (as @mcphee mentioned above). Well, to be specific, I imagine a simple non-hill-climbing-but-mostly-hill-climing version would be likely to be sufficient almost all of the time, and would likely be better than what we’ve done previously.
Exactly! I figured I’d say something half-kinda-sorta-right, and someone with a clue would jump in and start fixing all the mistakes. After a few rounds of that, we’ll have a positively useful entry!
Agreed!
I definitely see your worry, but it’s also clear to me that there are more than a few places (like my example) where three things need to be removed together if they’re ever going to be removed, and I bet there are 4’s as well. I’m guessing there’s a sweet spot, and I’m not sure 2 is it.
Also, based on very little poking around (so ignore this), it seems that the groups often are lexically proximate in the program. So maybe the trick would be to look for three or four (nearly) adjacent instructions and try to remove them?
FWIW I think that we’ve messed with removing more than 1 or 2 things in a single step at some point. Perhaps the best thing would be to use a Gaussian distribution or something like that, so that any number of removals in a step would be possible but it’d be rare for the number to be very high.
Another thing that was in earlier code was replacement of a “thing” (instruction or literal or parenthesized sub-expression) by a no-op instruction (or its equivalent). This doesn’t make the program smaller but it removes “stuff that does stuff,” which can also be helpful for analysis, and maybe enable other removals. The reason that this sometimes works is that sometimes a thing needs to be there not because of what it does, but for structural reasons, e.g. because exec_stackdepth instruction needs a certain number of things to be on the :exec stack.
Yeah, I was thinking something like this, but maybe with not-so-drastic differences. It might also be good to have it vary based on the length of the current program – it might be easier to remove 4 instructions from a 1000-length program than a 10-length program and get the same error vector.
So here’s what I’d do (about half done already): Start with a genome you want to simplify, say it has g genes. Run a simple metaheuristic (SA or GA) over binary vectors of length g, in which a true in position i means “silence the gene in position i” (even if already silent), and a false means don’t silence it. Subject to the hard constraint that the performance vector must not change, of course.
There would be some easy stuff: all the positions in which the genes in the original genome are already silent would be “free points”. But after that it’s just the same old nonlinear integer programming that we know and love.