Dream debugging

At the weekend I watched a Tomorrow Corporation Tech Demo, that’s the games company that developed World of Goo and Little Inferno. They show off some tools to help development and debugging. Nice development tools but an amazing debugging tool and I want it.

The basics

It has the basics you’d expect from an integrated editor and debugger:

  • See the code.
  • Run the code.
  • Set breakpoints.
  • Step through the code.
  • Inspect variables.

None of that is surprising. It does have a bonus that you don’t always find:

  • Hot loading of data and code assets.

The dream

We are use to stepping through code line by line following along what’s happening. Sometimes you see the problem as it happens which is great. However sometimes you miss the problem and the first you know about it is an assert, a crash or suddenly the variables looking all wonky. In that case we have to restart then run and / or step through the code trying to get back there. Maybe it’s at the beginning of the code but often it’s not. Getting back to the interesting point can be difficult, however careful you were with breakpoints. Not in this debugger. You just step backwards through the code. You effectively reverse time in the debugger and step back to a previous state. The problem you just missed hasn’t happened. Step back again if you want, as much as you want. You can re-examine the code and step forwards. Go back and forth as many times as you like.

Profiling code to find a bottleneck can be awkward. You might need to rebuild or run it in a special mode. The added instrumentation can make things run slowly so getting to the point of interest can take longer. If you’re lucky you can disable profiling until you get there. When you’re there you collect data, stop the program and start trawling through the figures. Not in this debugger. Profiling is not included by default but it’s easy to hot load the profiler. In the middle of a normal session you just recompile the full executable and start collecting data, no interruptions. Visualisations of the data are available immediately and you can drill down to the culprit function. Not only can you jump to the function responsible in the editor, you can also jump backwards in time to when this profile data was collected. It lets you investigate what was causing the slowdown in the exact situation where it was slow. Update the code and continue profiling, see if it’s fixed. When you finished you can remove the profiler and continue the debugging session.

If you’re working in a team bug reports may come in from anyone. Whoever it is bug reports come in many flavours. You might get one that carefully describes the exact steps needed to create the problem. That could be a 15 step process involving specific files, menu items to pick, buttons to press. Repeating all that and recreating the problem can be hard. If you can’t recreate it are the steps wrong or did you just do them wrong. On the other hand you may just get told that it’s broken when they loaded a file. Maybe it’s a specific file, maybe it’s any file but the bug report doesn’t say. It works with the first file you try, it works with the second, I guess you need to chase up the reporter. Not with this debugger. Every session saves enough information to completely repeat the session. Every bug report can link to the exact occurrence of the problem. Just click a link, scrub through a timeline of the program’s execution and see the problem. Debug the problem. See exactly what line, what variable is causing the problem. Update the code and see if it’s fixed.

I think that all sounds like a dream debugging experience.

The technology

I don’t know how they’ve done this and it seems that this video is the main source of information available. The developers have complete control over the tech stack. They have their own programming language, compiler front engine and back engine, editor, debugger, build system and game engine which is impressive. I don’t know if such complete control is required for all these features but it probably helps smooth off most of the rough spots. With just the video to go on the rest of this section has some supposition on my part.

There are probably a few things going on to make this possible:

  • We know they save out a session file for each run but it’s not too big. This needs to include the version of the source and assets that were used. That could be an existing source control version but it must also be able to include local changes or maybe all local changes also go to the server. If someone applies hot fixes during a session those will also have to be stored. All user input would have to be recorded and timestamped to an exact frame.
  • We know their system is deterministic. That means given the same setup and same inputs it will always proceed in the same way. For this to work everything has to be deterministic. If one thing can vary run to run then it can effect everything else. Any random number generation will have to come from the same seed. Rendering probably has to be at a fixed frame rate or operates separately from the rest of the system.
  • We know systems must be coupled. To use small session files the game engine and debugger must know about source control. To support dynamic graphics during debugging the debugger must know about the engine. There’s probably a lot of other things going on.

All of this isn’t completely unknown. Searching finds multiple mentions, e.g. GDB and Reverse Debugging and Step-back while debugging with IntelliTrace.

I’ve even heard of it before in combination with virtual machines. That sounded hard but not too hard. Some operations are perfectly reversible, say, adding two numbers but other operations are not, say, dividing two numbers. Once you’ve divided your numbers some information has just gone. Fortunately you can have complete control over a virtual machine. Any information that would be lost can be stored and retrieved when necessary. Alternatively it would be possible to take periodic snapshots of the machine’s memory. Then you could fake a single “step back” by jumping back to the previous snapshot then quickly stepping forwards to the right instruction. The more snapshots you take the faster the you can fake it but the more data you need to store. Maybe something clever could be done with virtual memory so that you only needed to consider pages that had been changed.

Are they doing this the easy way with a virtual machines or are they running on real hardware? No idea.

In the end

While I’d love to be able to use something like this it I’m not sure when it will arrive. The GDB and Microsoft articles are both many years old but this video is the first practical application of the idea. It could make a huge difference in day to day productively and I definitely want it.

P.S.

There is a bit more detail in the comments at Tomorrow Corporation,
specifically in reply to Joseph Garvin’s questions. They capture the game state by simply copying the game’s heap in between frames. This happens every 2 minutes to facilitate the timeline and more fine grained but temporary snapshots taken more frequently. That’s low level enough that random number generation and input buffers may be captured without additional effort. It makes me think that there is some fraction of the game engine that is outside this full debugging framework but it’s still sounds great.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *