Introducing Mr. Crowbar! Untold minutes of reverse engineering fun for ages 26-29!

22 October 2015

Mr. Crowbar logo

I believe that one of the best ways of allowing creativity to overcome limited technical skill is to build upon stuff that already exists. Take game modding; just try and count the number of successful games that started out as a modification of an existing one. Would these developers have gotten anywhere as far with their idea if they couldn't build on an existing engine and assets, not to mention community support? WOULD THEY?

Indeed, game modding is an important springboard for new developers to cut their teeth in a familiar setting and form a good understanding of how games work in the real world. There is, however, one rather large barrier of entry for people to start playing with the innards of their favourite game: tools. Occasionally the engine author will encourage custom content and give their developer tools out for free. But most of the time games are 100% not designed with modding in mind, leaving it up to a tiny number of skilled reverse engineers to write their own tools and share them with the community.

And these tools? Sometimes they'll let you edit stuff, sometimes only view. Sometimes they will be open source, sometimes not. Sometimes they'll be actively maintained, sometimes a rando will post an EXE on a forum then vanish off the face of the planet. Sometimes there won't even be a tool; just some old C code, or a meticulously-put-together yet scabby text file from the bulletin board age describing the file format. There's no standard!

It occured to me that most of the tools that are out there share a ton of common ground, in that they are mostly just fancy GUIs that implement a CRUD (Create Read Update Delete) interface. Sometimes it's just R! Well, what if you didn't have a fancy GUI per se, but instead had a nice understandable CRUD model bolted onto a scripting language with the option of a fancy GUI later on?

So after getting this dumb idea after unearthing some horribly-written tools I made in high school, I reached for my favourite language for messy hacking times (Python), put together a test model targeted at a game that I mostly knew the file formats for (the 1991 classic "Lemmings" by DMA Design), and named it Mr. Crowbar.

The object model steals a lot of ideas from Django and Schematic; each Block class is defined as a set of Field objects, but thanks to the mindbending power of metaclasses they can be get/set by the user at runtime with normal types like ints and strings, hiding all of that tedious validation and type-safety. In Mr. Crowbar, each Block class is strongly typed and defines the equivalent of a C struct, and gives you importing and exporting stuff as bytes for free. For non-trivial storage (e.g. compression algorithms) you can write manual import/export routines and wrap them in a Transform class, which can be used as a preprocessor for loading bytes into Blocks. Finally there's a Loader class, which does most of the spadework in loading the full spread of files from a game and linking up cross-references.

Here's an example of a Block object:

class Terrain( mrc.Block ):
    _block_size =       4

    x_raw =             mrc.UInt16_BE( 0x00, bitmask=b'\x0f\xff', range=range( 0, 1600 ) )
    draw_back =         mrc.Bits( 0x00, 0b10000000 )
    draw_upsidedown =   mrc.Bits( 0x00, 0b01000000 )
    draw_erase =        mrc.Bits( 0x00, 0b00100000 )
    y_raw_coarse =      mrc.Int8( 0x02, range=range( -17, 82 ) )
    y_raw_fine =        mrc.Bits( 0x03, 0b10000000 )
    obj_id =            mrc.UInt8( 0x03, bitmask=b'\x3f', range=range( 0, 64 ) )

    def x( self ):
        return (self.x_raw-16)

    def y( self ):
        return (self.y_raw_coarse*2 + self.y_raw_fine)-4

This was taken from Every Level object has a list of Terrain objects; basically, info on where to put selected background tiles to build the level up with. Each of these Terrain references is 4 bytes, and there's a lot packed in there! The x position of the background piece is stored in the first two bytes (unsigned big-endian) as the least significant 12 bits. The remaining most significant bits are used as flags which affect how the tile is rendered. The y position is stored in two parts: a signed coarse component in byte 2, and a fine modifier as the most-significant-bit of byte 3. The 6 least significant bits of byte 3 make up the ID of the terrain object.

What does this all mean? Well, if we open this class with 4 bytes as input, or it gets chainloaded by a larger struct (e.g. Level), all of the above field definitions will get replaced with a variable you can edit. Notice how we've exposed proper cartesian x and y as properties, maybe later we could add a setter which casts back to the original packed format. Then, after you're done changing stuff up in the class, you can run export_data() and get four bytes back. Or do it on the parent Level class and get a full level file back as bytes. What's not to love?

Also, we strongly recommend you use a terminal with support for ANSI 24-bit colour, such as GNOME Terminal for Linux or ConEmu for Windows. Why? Well...

Bitmap graphics from Lemmings printed into a terminal with UTF-8

That's right. Live bitmap previews in the terminal! At two pixels per letter! This is exactly what Guido van Rossum had in mind when he allowed people to override __str__ and balls to you if you think otherwise.

BONUS ARCHAEOLOGICAL FIND: I got sidetracked at one point and wrote a parser for those colour text screens you sometimes see at the end of shareware games, nagging you to buy the full thing. Take a look at the nag screen for the game "Boppin'" published by Apogee:

Apogee ordering screen from the end of Boppin', with and without colouring

The top is what you would normally see, painstakingly recreated with Unicode box-drawing characters to mimic the ol' DOS text mode. Underneath is exactly the same text, but without any of the colours. Notice how there's a bunch of hidden text in the blank areas? Looks like the Nag Screen Creative Director was caught short and had to steal a graphic from the company BBS. After all, no-one would ever suspect that it wasn't an original work... UNTIL NOW!

Anyway, right now the project is just a babby; the only thing it can do is read some files from the 1991 classic "Lemmings" by DMA Design. Here's a list of features in the pipeline:

  • Export support! The whole idea of having a strongly-typed data model is so you can edit the content in Python, then reverse the import steps to get a binary again.
  • Foreign key support! Games love nothing more than to have multiple files that reference one another.
  • More standard models for common structures like lookup tables and streams!
  • More games!

We're not even up to a proper release yet; I just needed to write something down before I exploded. First release will be when enough of Lemmings is satisfactorily editable; as of this moment we are an R application, with CUD to follow. Thanks for reading!