Article summary
Back in March of 2012, I wrote some code that disturbed me immensely. The experience was profound and a bit frightening.
It was a code generator — specifically, it was a program that could emit random abstract syntax trees for a language that a friend and I are writing. Our idea was to emit random ASTs from our types and, in turn, convert those random ASTs into pretty-printed code. With this randomly generated code, we could test to ensure our parser was adequately able to parse everything we expected.
The approach worked great. We used it to verify a subset of our in-the-works parser. Specifically, we used it to test the correct operation of the parsers responsible for interpreting the textual representation of types.
While I’m a huge fan of this approach and will hopefully have more material to write about concerning it at a later time, I’m far more interested in what happened in my mind while trying to get the random-AST-generator to work.
Realization
I was unable to write this tool quickly. In theory, it was simple. I used the Arbitrary typeclass in the QuickCheck library to define how a random AST could be generated. While the program compiled and ran, it would never terminate. I couldn’t figure out why. “I’m not missing anything,” I thought.
Being unable to figure out what was happening, I turned to Job and asked for some help. “Job, can you see what I’ve done wrong?” He thoughtfully looked at my code and shared my confusion for a second — until suddenly, the still-running program exhausted the available memory on the system and crashed.
It struck us both at the same time. The program was generating enormous (but finite, given enough RAM) structures.
“It’s generating random types that cannot fit into system memory!” I exclaimed. This was amazing. I had never even considered that such types could exist before. Normally a type is a small and simple statement. Even large ones normally only take several lines.
These were types that managed to exhaust hundreds of megabytes of RAM. We tweaked the parameters of the system so that the random generation selected fixed types more frequently than the parametric types. This change caused the system to halt much more quickly. What it output still shocked me. Here, see for yourself.
That looks like gunk, and it is! But it’s a valid type signature that our AST could represent. It took me a while to recognize it as such. I had to run the program several times before I finally recognized the code as valid.
Reflection
After I got over how cool it was that this technique was going to be useful, I got uncomfortable. “There’s a giant space of programs that this computer can emit and consume that I have no way of ever understanding,” I thought. That’s cool, but it’s also scary. I made something that caused me to feel deeply insignificant.
Programming became scary. The potential of the machine suddenly felt limitless.
After understanding why these huge trees were emitted, I recognized it as an obvious property of such trees that’s usually masked by how we, as humans, normally use them. Even though I could have written that randomly generated code, I didn’t. I wouldn’t. No one would. This experience tangibly showed me the stuff of abstraction and has since made me question how it is used.
The programs that were generated were correct; that is, they would compile and run without encountering errors. Once this property is achieved, the rest (it seems to me) is entirely human. Most of the work we put into these machines is for our own benefit. I fear that we’ve been holding this technology back by depending too much on our understanding of each specific piece of implementation. Here, go watch this presentation from StrangeLoop 2012. By the end of it, the presenters, Daniel P. Friedman and William E. Byrd, are using a program to emit other programs that have the property of being quines). Do they care how the quines work? No! They only care that they are correct and have the desired properties. They only care that they are quines.
That perspective, I think, is the future of programming.
Fascinating article. In your case, I can see that you were only concerned that you got types, and that the quine guys only cared about getting quines.
If I am interested in correct payroll calculation or my car only shifting into gears that won’t explode the engine, I’m a bit concerned about dealing with programs that no human can understand.
Fascinating in any case. Thanks.
I’m concerned about the same things. My first line of work is normally in embedded software. What that little experience of mine showed me was that there is, perhaps, a way to convince the machine to show me the program I want rather than attempt to write it myself.
In this case, I only cared that they were types. The quine guys only cared that they were quines.
I don’t think it’s unrealistic to eventually create programs that can emit a human-verifiable program that is an RS-232 driver with a 16 byte buffer suitable for a real-time environment.
Just not yet…
Though, let’s not forget that human-verifiable is a separate requirement from correct.
Human-readable vs. correct is the challenge that assemblers, compilers, code generators for high level languages (such as UML) have had for a while. Typically the output of such applications can be read if necessary (they usually have a pattern and style), but it is not going to be as easy as reading hand-written code.
But at some point, it doesn’t matter. All that matters is that the output does what the input stated. In your case you wanted types, and you got types. Here’s the key part of course, you never stated that you wanted the types to be useable on your machine.
This is the key to programming regardless of the language being used, being able to fully constrain the output so that it behaves as desired in normal circumstances, and it behaves correctly in all circumstances, even if those circumstances were not explicitly specified.
At the moment we use software engineers as the initial translator between the desired program
and the output, the final translator is usually a compiler of some sort. Because this translation is fallible we also employ a variety of validation steps (such as testing, reviews, or perhaps different development methodologies) to ensure that the output program works as desired.
As software development evolves I would expect and hope this translation becomes less manual.
I don’t think it’s unrealistic to eventually create programs that can emit a human-verifiable program that is an RS-232 driver with a 16 byte buffer suitable for a real-time environment.
Aren’t we already doing that simply by using higher level languages to generate code in lower level languages? You have to describe what you want somehow… Or, are you just suggesting declarative programming of some sort?
My thought is that we’d describe some criteria that the program has to meet, and the program generator begins to emit all possible (likely infinite in number) programs capable of meeting those criteria. It’s not the same as using higher level constructs to inform the machine how to perform its task.
I looked at the “gunk” link and it looked for all the world like you made an ascii art generator.
That’s what confused me at first as well. The whole thing is actually fairly deterministic. Here’s some examples:
Using this, we can interpret a short snippet from the ASCII art type:
This, firstly, is a function. It takes 6 parameters and returns a polymorphic type (in this case, named “anuE30Hi5”). Of those 6 parameters, two are fixed types, and the rest are polymorphic.
I think the author is reading too much into what actually happened. He wrote a program to generate a logical process. The program was only doing what it was told. Since it was a random generator it randomly cascaded to generate what it was told to do. If I wrote a program to take 3 colors and randomly mix those three colors then theoretically it should come up with infinite color combination. Should we be amazed at the hues it generates? It was only doing what it was told to do, nothing more. Just because it generated a type that exceeded the memory does not make the type it generated useful. Programming is a tool, it can not think for itself unless it is programmed to make decisions, and those decisions will be predictable because it was programmed to do it a specific way. Random is a predicable way.
For my purposes, fuzz testing, this outcome was exactly what I wanted. What caught me off guard was a representation of a type I hadn’t expected could even exist.
Random is predictable, but not always intuitive. At least, not to me.
I personally think the author had the right idea towards the end, but didn’t perhaps phrase / look into it in enough depth.
This experiment shows the vast potential, but it is still random. If such an approach of random code generation were coupled with a structured genetic programming model whereby program goals could be stated in a grammar-bound way (i.e stating you want it to calculate payroll) then conceivably over a period of time you would eventually hit upon a program that does just that.
I think programming in its current form will stick around for a time to come, languages and paradigms will shift though.
Unless computers get so fast that it can do all the permutations and evolutionary selection in a short space of time, then I imagine the approach will be limited to simpler tasks, whereby programmers will have the computer create some of the easier code for them.
Fascinating prospects for the future though.
Genetic programming isn’t really what I had in mind. The StrangeLoop talk I reference is about logic programming. There’s a more fine-tuned way to approach these sorts of problems than recombining digital bits of DNA.
I probably should have made that bit more clear. While I find GP very interesting (especially the digital evolution work done by Dr. Ofria at MSU), I don’t think it’s quite what I’d want to use when building device drivers.
A logic programming system seems to be far more useful for this sort of effort. Fascinating prospects, indeed. :)
Seems to me that programs writing other programs with no “useful” purpose that consume all available resources is something akin to digital cancer.
Not necessarily… The program runs and meets all it’s requirements, it is only that the supposed “author” doesn’t understand HOW it achieves it’s purpose. Not that there is none, but rather that the code to achieve it is not human readable.
This also doesn’t mean complexity, it could mean it’s written poorly by the computer.
Combine this technique with some AI and you have the birth of “The Matrix” :-)
Wow!
A structure which is an array of size[129871239847904872309847329087432984] ?
Who could have though it is possible?? Except for everyone.
And you didn’t even get to the Halting problem.
2nd-year Computer Science, right?
I too worry about programming that I can’t understand, only I usually call it a bug.
I think what the author was impacted by was the –space– of all possible compliant programs. Normally we don’t see this space, just as normally we don’t see air. But when a little fog comes in, then we go “oh, that looks so beautiful,” because the fog lets us “see the air” and we become aware of the space. I think the random types were the fog (not in the muddling sense, but as just noted). Although a “few 100 megabytes” is –huge– in its own way, the space of possible types is combinatoric on this. In fact, the iceberg image pales by comparison. This is kernel to a central issue in quantifying evolution: until one can approxamate the percentage of “usable” deviations from a given state compared to the total number of possible deviations it makes no sense to apply a truth value to the possibility of usable (survivable) new organisms.
Anyone whose ever supported VB code knows it doesn’t take a machine to generate code that no human can understand.
I laughed out loud at this.