Show Your Working: Fixing a pile of bugs in MAME

03 December 2017

The parfait eating contest in Daisu-Kiss

Show Your Working is a brand new segment where I write up things I have attempted to fix in open source software. Sometimes it's interesting to debug a problem yourself from first principles, even if the codebase is huge and you don't know anything going in. I will try and explain my thought process as I venture out into the weeds armed only with a butter knife.

The software

If you haven't heard of it, MAME is an emulation project with the aim of supporting basically every arcade machine ever made. To do this, thousands of circuit boards and custom chips have been probed, analysed, decapped with acid, and ultimately written out as code. By adding board ROMs (whole dumps of the memory chips containing the copyrighted program data), MAME can emulate a system by connecting together drivers for all of the chipsets as per the layout of the original board. Recently, the sister project MESS has been merged back into the MAME main codebase, adding ~2200 consoles and home computer systems to the lineup. It is an incredible feat of volunteer-driven engineering.

Unlike other emulators, MAME places accuracy and preservation above all else. All CPU code is interpreted (with some IL speedups via the UML architecture) and runs time-shared in a single thread. Instead of tightly coupling the hardware code with the framework for performance, MAME has an infinitely-rewirable generic module architecture, to encourage reuse of chip drivers across platforms. Imagine MAME as a big box of all the different types of chips, plus some templates that explain what chips to take from the box and how to wire them together to make a system.

Also I was not exaggerating when I said every arcade machine ever made. The MAME source tree has slightly north of 6900 .cpp files, and a princely 2.9 million lines of code (4 million if you count headers!). This is a Big Codebase.

The problem

The reason I started looking at MAME was because of this tweet:

<@bfod> I am beyond sad that MAME doesn’t support save states for Daisu Kiss, so I can’t put it in Multibowl

Multibowl is a pretty amazing project; Bennett Foddy and AP Thomson have turned hundreds of MAME ROMs into this weird tournament of two-player minigames, thanks to judicious abuse of memory scraping and savestates. Similar to Joe Dante's The Movie Orgy, Multibowl is only available as a touring art installation due to the staggering amount of copyright infringement involved. Nevertheless I have a soft spot for illegal art, and I was looking for an opportunity to get my feet wet with emulators, and this seemed like it! I mean the hard part is already done, the game is playable, all that's left is to make sure all the bits of state are being saved... right?

The preparation

The only software we need installed to debug this is the Git VCS and the standard Linux command line tools, plus whatever build dependencies MAME has. Start by grabbing the latest MAME source code

$ git clone https://github.com/mamedev/mame
$ cd mame

The MAME build system is sort of nutso, however there is a nice way to build just the machine you want to debug. (And you will want to do this, as a full symbols build of all the arcade machines is a 2.2gb executable.)

First, find the .cpp file for the driver of the machine named on the MAME splash screen when you try and emulate it (in my case it was src/mame/drivers/konamigx.cpp). Then, run the following command from the project root:

make REGENIE=1 SUBTARGET=arcade OPTIMIZE=0 SYMBOLS=1 SYMLEVEL=2 NOWERROR=1 PYTHON_EXECUTABLE=/usr/bin/python2 TOOLS=1 ARCHOPTS=-flifetime-dse=1 VERBOSE=1 SOURCES=src/mame/drivers/konamigx.cpp -j3

(For further recompiles, remove the REGENIE=1)

This will build a copy of MAME with symbols, but without the features of the debug build level, like asserts! I did it this way because one of the asserts in the audio CPU was killing the emulator after loading a save state, and I wasn't quite feeling brave enough to debug the whole stack to find out why. (Alternatively you can add DEBUG=1 and comment out all the things)

The other modification I recommend for save state hacking:

diff --git a/src/emu/save.cpp b/src/emu/save.cpp
index c85f0d1..5e33895 100644
--- a/src/emu/save.cpp
+++ b/src/emu/save.cpp
@@ -291,6 +291,8 @@ save_error save_manager::write_file(emu_file &file)
        if (m_illegal_regs > 0)
                return STATERR_ILLEGAL_REGISTRATIONS;

+    dump_registry();
+
        // generate the header
        u8 header[HEADER_SIZE];
        memcpy(&header[0], STATE_MAGIC_NUM, 8);
@@ -353,7 +355,7 @@ u32 save_manager::signature() const
void save_manager::dump_registry() const
{
        for (auto &entry : m_entry_list)
-               LOG(("%s: %d x %d\n", entry->m_name.c_str(), entry->m_typesize, entry->m_typecount));
+               osd_printf_verbose("%s: %d x %d\n", entry->m_name.c_str(), entry->m_typesize, entry->m_typecount);
}

This will modify the save state command to print out a big list of all the saved properties. Apparently you can get the logging mode to trigger the list, but I gave up trying. Make sure not to commit this change :P

For testing I ran my single-machine build inside GDB, with MAME's built-in debugger switched on. The MAME debugger is great for inspecting the code and memory usage of the emulated CPUs, but if you want to see what's happening inside those chips you need to break out into C++ land with GDB.

gdb --args ~/Development/mame/arcade64 -debug -window -verbose daiskiss.zip

The diagnosis

Start by doing the obvious thing: open Daisu Kiss up in MAME and check out the machine info:

Daisu-Kiss (ver JAA)
1996 Konami
Driver: konamigx.cpp

CPU:
M68EC020 24.000000MHz
M68000 8.000000MHz
TMS57002 12.000000MHz

Sound:
TMS57002 12.000000MHz
2xSpeaker
2xK054539 ADPC 18.432000MHz

Video:
288 x 224 (H) 59.185606 Hz

So it's a game using the Konami GX model of arcade boards, with a couple of trusty old Motorola 68000 processors doing the computing spadework, some Texas Instruments DSP thingo, and... what looks like an avalanche of custom Konami chips doing the graphics and sound! Lovely.

As expected, loading a saved state from a cold boot didn't work; the game would either hang with the music looping in the background, or hang with the music stuck on a slur. However save states usually DID work if you loaded them after the machine has finished booting, which indicates there's some bit of state which gets set on boot that isn't accounted for.

Wrapping your head around save states

My initial assumption about how save states would work: MAME dumps a copy of the memory, CPU registers, and anything else which holds machine state to a file. Loading does the reverse: updates the memory and sets all the CPU registers. The documentation mostly confirms this (ok fine so I missed timing information, bank-switching and perhipherals from my list), and reassures that most of the stock hardware has state support coded in already! So that's something. There was a rather tantalising page about save states in the MAME wiki, but it's maybe two codebase refactoring sprees behind and none of the names matched up anymore.

Anyway, the way it works is there's a small window at the end of the driver init cycle where a component can flag the parts of itself which contain state. MAME provides two helper methods for this:

void save_item(_ItemType &value, const char *valname, int index = 0)
void save_pointer(_ItemType *value, const char *valname, u32 count, int index = 0)

save_item is pretty malleable, and works for scalar values, 1D and 2D static arrays, and 1D and 2D std::vectors. For malloc'd memory you can use save_pointer, but you need to provide a count of the number of elements. Most of the time you'll see them used in the form save_item(NAME(state_object)), where NAME(state_object) is a macro that expands to 'state_object, "state_object"', to save on repetition.

(I am sad that save_item can't be used on structs. I mean, it totally can, but every compiler aligns items inside structs differently, meaning that save states from different compilers would break in terrible ways).

And that's it. Once an item has been flagged, the contents will be written to and loaded from save states. The simplicity of this system is that because each component tracks its own state, reuse between different arcade boards is quite high. For instance the Motorola 68000 has a fairly well-defined set of internal state, so every game which uses the M68000 driver will have that chip's state saved automatically.

Bonus fact: there's no actual seperation of concerns in the save state file. There's a game ID, a CRC signature for the set of fields, then an unlabelled stream of raw data. If the CRC in the file matches the one generated by MAME from the field list, it will load said data into the fields sequentially.

In which we consider the majestic Konami GX arcade board in all its proprietary splendour

First place to look would be in /mame/src/mame/drivers/konamigx.cpp, which defines the hardware layout for all Konami GX games. Search around for "daiskiss":

GAME( 1996, daiskiss, konamigx, konamigx, gokuparo, konamigx_state, konamigx, ROT0, "Konami", "Daisu-Kiss (ver JAA)", MACHINE_IMPERFECT_GRAPHICS )

"gokuparo" is the name of another Konami GX game, so presumably this piggy-backs off the hardware config. The actual definition of how the hardware in the konamigx driver is wired together is in a big macro block MACHINE_CONFIG_START:

static MACHINE_CONFIG_START( konamigx, konamigx_state )
    /* basic machine hardware */
    MCFG_CPU_ADD("maincpu", M68EC020, MASTER_CLOCK)
    MCFG_CPU_PROGRAM_MAP(gx_type2_map)
    MCFG_CPU_VBLANK_INT_DRIVER("screen", konamigx_state, konamigx_type2_vblank_irq)

    MCFG_CPU_ADD("soundcpu", M68000, SUB_CLOCK/2)
    MCFG_CPU_PROGRAM_MAP(gxsndmap)

    MCFG_CPU_ADD("dasp", TMS57002, MASTER_CLOCK/2)
    MCFG_CPU_DATA_MAP(gxtmsmap)

    MCFG_DEVICE_ADD("k053252", K053252, MASTER_CLOCK/4)
    MCFG_K053252_OFFSETS(24, 16)
    MCFG_K053252_INT1_ACK_CB(WRITELINE(konamigx_state, vblank_irq_ack_w))
    MCFG_K053252_INT2_ACK_CB(WRITELINE(konamigx_state, hblank_irq_ack_w))
    MCFG_VIDEO_SET_SCREEN("screen")

    MCFG_QUANTUM_TIME(attotime::from_hz(6000))

    MCFG_MACHINE_START_OVERRIDE(konamigx_state,konamigx)
    MCFG_MACHINE_RESET_OVERRIDE(konamigx_state,konamigx)

    MCFG_EEPROM_SERIAL_93C46_ADD("eeprom")

    /* video hardware */
    MCFG_SCREEN_ADD("screen", RASTER)
    MCFG_SCREEN_VIDEO_ATTRIBUTES(VIDEO_UPDATE_AFTER_VBLANK)
    MCFG_SCREEN_RAW_PARAMS(8000000, 384+24+64+40, 0, 383, 224+16+8+16, 0, 223)
    /* These parameters are actual value written to the CCU.
    tbyahhoo attract mode desync is caused by another matter. */

    //MCFG_SCREEN_VBLANK_TIME(ATTOSECONDS_IN_USEC(600))
    // TODO: WTF, without these most games crashes? Some legacy call in video code???
    MCFG_SCREEN_SIZE(1024, 1024)
    MCFG_SCREEN_VISIBLE_AREA(24, 24+288-1, 16, 16+224-1)
    MCFG_SCREEN_UPDATE_DRIVER(konamigx_state, screen_update_konamigx)

    MCFG_PALETTE_ADD("palette", 8192)
    MCFG_PALETTE_FORMAT(XRGB)
    MCFG_PALETTE_ENABLE_SHADOWS()
    MCFG_PALETTE_ENABLE_HILIGHTS()

    MCFG_DEVICE_ADD("k056832", K056832, 0)
    MCFG_K056832_CB(konamigx_state, type2_tile_callback)
    MCFG_K056832_CONFIG("gfx1", K056832_BPP_5, 0, 0, "none")
    MCFG_K056832_PALETTE("palette")

    MCFG_K055555_ADD("k055555")

    MCFG_DEVICE_ADD("k054338", K054338, 0)
    MCFG_K054338_MIXER("k055555")
    MCFG_K054338_SET_SCREEN("screen")
    MCFG_K054338_ALPHAINV(1)

    MCFG_DEVICE_ADD("k055673", K055673, 0)
    MCFG_K055673_CB(konamigx_state, type2_sprite_callback)
    MCFG_K055673_CONFIG("gfx2", K055673_LAYOUT_GX, -26, -23)
    MCFG_K055673_SET_SCREEN("screen")
    MCFG_K055673_PALETTE("palette")

    MCFG_VIDEO_START_OVERRIDE(konamigx_state, konamigx_5bpp)

    /* sound hardware */
    MCFG_SPEAKER_STANDARD_STEREO("lspeaker", "rspeaker")

    MCFG_DEVICE_MODIFY("dasp")
    MCFG_SOUND_ROUTE(0, "lspeaker", 0.3)
    MCFG_SOUND_ROUTE(1, "rspeaker", 0.3)

    MCFG_K056800_ADD("k056800", XTAL_18_432MHz)
    MCFG_K056800_INT_HANDLER(INPUTLINE("soundcpu", M68K_IRQ_1))

    MCFG_DEVICE_ADD("k054539_1", K054539, XTAL_18_432MHz)
    MCFG_K054539_REGION_OVERRRIDE("shared")
    MCFG_K054539_TIMER_HANDLER(WRITELINE(konamigx_state, k054539_irq_gen))
    MCFG_SOUND_ROUTE_EX(0, "dasp", 0.5, 0)
    MCFG_SOUND_ROUTE_EX(1, "dasp", 0.5, 1)
    MCFG_SOUND_ROUTE(0, "lspeaker", 1.0)
    MCFG_SOUND_ROUTE(1, "rspeaker", 1.0)

    MCFG_DEVICE_ADD("k054539_2", K054539, XTAL_18_432MHz)
    MCFG_K054539_REGION_OVERRRIDE("shared")
    MCFG_SOUND_ROUTE_EX(0, "dasp", 0.5, 2)
    MCFG_SOUND_ROUTE_EX(1, "dasp", 0.5, 3)
    MCFG_SOUND_ROUTE(0, "lspeaker", 1.0)
    MCFG_SOUND_ROUTE(1, "rspeaker", 1.0)
MACHINE_CONFIG_END

static MACHINE_CONFIG_DERIVED( gokuparo, konamigx )
    MCFG_DEVICE_MODIFY("k055673")
        MCFG_K055673_CONFIG("gfx2", K055673_LAYOUT_GX, -46, -23)
MACHINE_CONFIG_END
**

That was long, but we have our full list of chips that the board uses, which we can cross-reference against the MAME driver source to determine what they do.

  • M68EC020 (Motorola 68020 - main CPU)
  • M68000 (Motorola 68000 - sound CPU)
  • TMS57002 (Texas Instruments- sound DASP)
  • K053252 (Konami - display timing/interrupt)
  • K056832 (Konami - tilemap)
  • K055555 (Konami - video mixer/priority encoder)
  • K054338 (Konami - video mixer/alpha)
  • K055673 (Konami - sprite generator)
  • K056800 (Konami - sound controller)
  • K054539_1, K054539_2 (Konami - PCM sound)

Yep. That's a lot. Uhhh.

Finding where the lockup is

Okay, we can reproduce the issue and we have access to a fancy debugger. My plan is to open two debug copies of MAME: one pre-boot, one post-boot, load the state in each, then step through them both line by line until execution diverges. Then that line shall magically disclose what the problem is. No, it will! Shush.

Well. Well well well. The main CPU activity on the hung session is basically 4 lines of assembly.

# The code which is responsible for the hang loop (A1: 0x00c08400):

28707C   tst.w    (A1)       # check that the uint16_be at address 0x00c08400 == 0x0000
28707E   beq      $287184    # if so, jump to 287184!
...
287184   tst.w    ($646,A1)  # check that the uint16_be at address (0x00c08400 + 0x646) == 0x0000
287188   beq      $28707c    # if so, jump to 28707C!

Now, at the time of execution, register A1 is set to 0x00c08400. What part of the memory map is this?

static ADDRESS_MAP_START( gx_base_memmap, AS_PROGRAM, 32, konamigx_state )
    AM_RANGE(0x000000, 0x01ffff) AM_ROM // BIOS ROM
    AM_RANGE(0x200000, 0x3fffff) AM_ROM // main program ROM
    AM_RANGE(0x400000, 0x7fffff) AM_ROM // data ROM
    AM_RANGE(0xc00000, 0xc1ffff) AM_RAM AM_SHARE("workram")
    AM_RANGE(0xd00000, 0xd01fff) AM_DEVREAD("k056832", k056832_device, k_5bpp_rom_long_r)
    AM_RANGE(0xd20000, 0xd20fff) AM_DEVREADWRITE16("k055673", k055673_device, k053247_word_r, k053247_word_w, 0xffffffff)
    AM_RANGE(0xd21000, 0xd21fff) AM_RAM // second bank of sprite RAM, accessed thru ESC
    AM_RANGE(0xd22000, 0xd23fff) AM_RAM // extra bank checked at least by sexyparo, pending further investigation.
    AM_RANGE(0xd40000, 0xd4003f) AM_DEVWRITE("k056832", k056832_device, long_w)
    AM_RANGE(0xd44000, 0xd4400f) AM_WRITE(konamigx_tilebank_w)
    AM_RANGE(0xd48000, 0xd48007) AM_DEVWRITE16("k055673", k055673_device, k053246_word_w, 0xffffffff)
    AM_RANGE(0xd4a000, 0xd4a00f) AM_DEVREAD16("k055673", k055673_device, k055673_rom_word_r, 0xffffffff)
    AM_RANGE(0xd4a010, 0xd4a01f) AM_DEVWRITE16("k055673", k055673_device, k055673_reg_word_w, 0xffffffff)
    AM_RANGE(0xd4c000, 0xd4c01f) AM_DEVREADWRITE8("k053252", k053252_device, read, write, 0xff00ff00)
    AM_RANGE(0xd4e000, 0xd4e01f) AM_WRITENOP // left-over for "secondary" CCU, apparently (used by type 3/4 for slave screen?)
    AM_RANGE(0xd50000, 0xd500ff) AM_DEVWRITE("k055555", k055555_device, K055555_long_w)
    AM_RANGE(0xd52000, 0xd5201f) AM_DEVREADWRITE8("k056800", k056800_device, host_r, host_w, 0xff00ff00)
    AM_RANGE(0xd56000, 0xd56003) AM_WRITE(eeprom_w)
    AM_RANGE(0xd58000, 0xd58003) AM_WRITE(control_w)
    AM_RANGE(0xd5a000, 0xd5a003) AM_READ_PORT("SYSTEM_DSW")
    AM_RANGE(0xd5c000, 0xd5c003) AM_READ_PORT("INPUTS")
    AM_RANGE(0xd5e000, 0xd5e003) AM_READ_PORT("SERVICE")
    AM_RANGE(0xd80000, 0xd8001f) AM_DEVWRITE("k054338", k054338_device, long_w)
    AM_RANGE(0xda0000, 0xda1fff) AM_DEVREADWRITE("k056832", k056832_device, ram_long_r, ram_long_w)
    AM_RANGE(0xda2000, 0xda3fff) AM_DEVREADWRITE("k056832", k056832_device, ram_long_r, ram_long_w)
ADDRESS_MAP_END

Why, it's smack in the middle of the work RAM! Let's dump it from the save state and do a diff with one dumped from a clean run.

dump /tmp/work.dmp,0xc00000,0x20000

Eh, the RAM contents match. Seems legit, MAME will add any part of the address range flagged as RAM to the save state.

(taken from my dump_registry printout)
...
memory/:maincpu/0/00c00000-00c1ffff: 4 x 32768
...
memory/:maincpu/0/00d20000-00d2ffff: 4 x 16384

Still, we can confirm by putting breakpoints immediately after those two jumps that no amount of waiting will break the main CPU out of this hang. Contrast that with the full run, where the breakout happens instantaneously!

After a more thorough examination, it doesn't look like that section of work RAM has a direct memory access (DMA) arrangement, or is touched by anything other than the CPU. Which means the CPU has to be the one to change it, despite being stuck in an infinite loop. Perhaps there's an interrupt that causes it to jump to another chunk of code...

Oh look! Using the debugger's "Run to the next Interrupt on this CPU" command on the working copy, we can see there are two interrupts that get called every frame in the working copy: IRQ 1 and IRQ 4. And neither of those are being called in the busted save state copy! Dun dun duuuuuuuuunnnn.

/*
 * IRQs:
 * 1: VBL (at 60 Hz)
 * 2: HBL
 * 3: Sprite DMA complete
 * 4: from protection device, indicates command completion for "ESC" chip
 */

As far as I can tell it's the vertical blanking interrupt (IRQ 1) that breaks the main CPU out of the loop and gives it more things to do, so let's inspect that. In real life, you'd interrupt the M68020C by pulling the voltage up on one of the chip legs or something. In MAME this happens in the driver C++ code with the handy command:

m_maincpu->set_input_line(1, HOLD_LINE);

There are two spots in the code (?!) which call IRQ 1, and only one of them is called every frame for the hung session:

INTERRUPT_GEN_MEMBER(konamigx_state::konamigx_type2_vblank_irq)
{
    // lift idle suspension
    if (m_resume_trigger && m_suspension_active)
    {
        m_suspension_active = 0;
        machine().scheduler().trigger(m_resume_trigger);
    }

    // IRQ 1 is the main 60hz vblank interrupt
    if (m_gx_syncen & 0x20)
    {
        m_gx_syncen &= ~0x20;

        if ((m_gx_wrport1_1 & 0x81) == 0x81 || (m_gx_syncen & 1))
        {
            m_gx_syncen &= ~1;
            // TODO: enabling ASSERT_LINE breaks opengolf, annoying.
            device.execute().set_input_line(1, HOLD_LINE);
        }
    }

    dmastart_callback(0);
}

There it is! If either of those two if statements fail, it doesn't call IRQ 1! In the hung session, m_gx_syncen is always 0, and that's not saved in the state. Oh hoh hoh hoh hoh.

MACHINE_START_MEMBER(konamigx_state,konamigx)
{
    save_item(NAME(m_gx_wrport1_0));
    save_item(NAME(m_gx_wrport1_1));
    save_item(NAME(m_gx_wrport2));

    save_item(NAME(m_gx_rdport1_3));
    save_item(NAME(m_gx_syncen));
    save_item(NAME(m_suspension_active));
    save_item(NAME(m_prev_pixel_clock));
}

I added all the registers that were nearby m_gx_syncen in MACHINE_RESET_MEMBER, because why the hell not. AND IT WORKS! Man what an adventure that was, ah well let's hammer out a pull request and wrap this up nicely.

Hang on, what about the dodgy sound you mentioned earlier? Don't be cheap

You got me, it mostly works. The main CPU execution saunters through without a hitch, but I could still break the audio by loading a save state on a cold boot. There must be some more state variables we haven't captured yet.

Good thing we have a canary to help us out; the crashy assert in the debug build mentioned earlier.

Thread 1 "arcade64d" received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff3e3e367 in kill () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff3e3e367 in kill () at /usr/lib/libc.so.6
#1  0x0000555555fc7475 in osd_break_into_debugger(char const*) ([email protected]=0x55555a530448 "assert: ../../../../../src/devices/cpu/tms57002/tms57002.cpp:790: samples == 1") at ../../../../../src/osd/modules/lib/osdlib_unix.cpp:97
#2  0x0000555555acb405 in emu_fatalerror::emu_fatalerror(char const*, ...) (this=0x55555a530440, format=<optimized out>) at ../../../../../src/emu/emucore.cpp:29
#3  0x00005555559ba7c2 in tms57002_device::sound_stream_update(sound_stream&, int**, int**, int) (this=<optimized out>, stream=..., inputs=<optimized out>, outputs=<optimized out>, samples=<optimized out>) at ../../../../../src/devices/cpu/tms57002/tms57002.cpp:790
#4  0x0000555555c006c4 in delegate_base<void, sound_stream&, int**, int**, int>::operator()(sound_stream&, int**, int**, int) const (args#3=721, args#2=<optimized out>, args#1=<optimized out>, args#0=..., this=0x555559687cb0) at ../../../../../src/lib/util/delegate.h:544
#5  0x0000555555c006c4 in sound_stream::generate_samples(int) ([email protected]=0x555559687c00, samples=721) at ../../../../../src/emu/sound.cpp:640
#6  0x0000555555c01759 in sound_stream::update() (this=0x555559687c00) at ../../../../../src/emu/sound.cpp:289
#7  0x0000555555c01759 in sound_stream::sync_update(void*, int) (this=0x555559687c00) at ../../../../../src/emu/sound.cpp:299
#8  0x0000555555be64ae in delegate_base<void, void*, int>::operator()(void*, int) const (args#1=<optimized out>, args#0=<optimized out>, this=<optimized out>) at ../../../../../src/lib/util/delegate.h:544
#9  0x0000555555be64ae in device_scheduler::execute_timers() (this=<optimized out>) at ../../../../../src/emu/schedule.cpp:908
#10 0x0000555555be64ae in device_scheduler::timeslice() ([email protected]=0x7fffffffce68) at ../../../../../src/emu/schedule.cpp:519
#11 0x0000555555b99398 in running_machine::run(bool) ([email protected]=0x7fffffff6820, qu[email protected]=false) at ../../../../../src/emu/machine.cpp:348
#12 0x0000555555743942 in mame_machine_manager::execute() ([email protected]=0x555556afb0e0) at ../../../../../src/frontend/mame/mame.cpp:223
#13 0x00005555557cb8c4 in cli_frontend::start_execution(mame_machine_manager*, int, char**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ([email protected]=0x7fffffffda50, [email protected]=0x555556afb0e0, [email protected]=5, [email protected]=0x7fffffffe128, option_errors="") at ../../../../../src/frontend/mame/clifront.cpp:287
#14 0x00005555557cbdc9 in cli_frontend::execute(int, char**) ([email protected]=0x7fffffffda50, [email protected]=5, [email protected]=0x7fffffffe128) at ../../../../../src/frontend/mame/clifront.cpp:314
#15 0x00005555557416b1 in emulator_info::start_frontend(emu_options&, osd_interface&, int, char**) (options=..., osd=..., argc=5, argv=0x7fffffffe128) at ../../../../../src/frontend/mame/mame.cpp:319
#16 0x000055555561271c in main(int, char**) (argc=5, argv=0x7fffffffe128) at ../../../../../src/osd/sdl/sdlmain.cpp:214

Actually I think this is a red herring. The issue is that upon loading a save state, generate_samples is getting called with a large sample count (~720) which gets passed on to sound_stream_update, which has an assert(samples == 1) at the beginning. (Under normal operation the function gets called like a billion times a second with a sample count of 1.) This doesn't have any reflection on whether or not sound is broken, as it always happens when you load a state.

So, back to zero clues! I revisted src/mame/drivers/konamigx.cpp to try and get more of an idea of how the audio system is wired together.

These two K054539 chips interest me. K054539 is listed as a "PCM Sound Chip", and a quick peek at the registers makes it clear that it's designed for wavetable music playback.

/* Registers:
00..ff: 20 bytes/channel, 8 channels
  00..02: pitch (lsb, mid, msb)
      03: volume (0=max, 0x40=-36dB)
      04: reverb volume (idem)
  05: pan (1-f right, 10 middle, 11-1f left)
  06..07: reverb delay (0=max, current computation non-trusted)
  08..0a: loop (lsb, mid, msb)
  0c..0e: start (lsb, mid, msb) (and current position ?)

100.1ff: effects?
  13f: pan of the analog input (1-1f)

200..20f: 2 bytes/channel, 8 channels
  00: type (b2-3), reverse (b5)
  01: loop (b0)

214: Key on (b0-7 = channel 0-7)
215: Key off          ""
225: ?
227: Timer frequency
228: ?
229: ?
22a: ?
22b: ?
22c: Channel active? (b0-7 = channel 0-7)
22d: Data read/write port
22e: ROM/RAM select (00..7f == ROM banks, 80 = Reverb RAM)
22f: Global control:
     .......x - Enable PCM
     ......x. - Timer related?
     ...x.... - Enable ROM/RAM readback from 0x22d
     ..x..... - Timer output enable?
     x....... - Disable register RAM updates

 The chip has an optional 0x8000 byte reverb buffer.
 The reverb delay is actually an offset in this buffer.
 */

At first I assumed the two chips were for some form of stereo playback, but the MCFG_SOUND_ROUTE stuff showed that each chip was outputting stereo and sending to both speakers. The K054539 has 8 sound channels with adjustable panning and effects, including a luxury 8kb buffer just for reverb! That means 16 audio channels all up. For kicks I tried turning down the gain on each chip to hear the changes; both chips are handling a mixture of PCM sample and music playback.

Sound programming is handled by the M68000 audio CPU, which I'm guessing keeps the two K054539s fed with frame-by-frame playback instructions, and meaning the main CPU only has to worry about firing off the odd cue every now and then. Communication between the two is managed with another beloved K chip, the K056800! Mercifully there's not too much to this one, it's basically a set of registers that the main CPU and sound CPU can read/write to.

Because clues are so thin on the ground, we need to make our own luck. The K054539 doesn't sound like it's playing back the music properly, so let's gets some eyes on those registers and check if that's indeed the case.

diff --git a/src/devices/sound/k054539.cpp b/src/devices/sound/k054539.cpp
index c275494..15e4b10 100644
--- a/src/devices/sound/k054539.cpp
+++ b/src/devices/sound/k054539.cpp
@@ -94,12 +94,14 @@ bool k054539_device::regupdate()

 void k054539_device::keyon(int channel)
 {
+    osd_printf_verbose("k054539:%p Ch%d ON\n", this, channel);
         if(regupdate())
                 regs[0x22c] |= 1 << channel;
 }

 void k054539_device::keyoff(int channel)
 {
+    osd_printf_verbose("k054539:%p Ch%d OFF\n", this, channel);
         if(regupdate())
                 regs[0x22c] &= ~(1 << channel);
 }

The above code prints a message every time one of the sound chips starts or ends playing a note. Here's a text dump from a working copy:

Samples: 871
k054539:0x55f4aa874710 Ch0 OFF
k054539:0x55f4aa874710 Ch0 ON
k054539:0x55f4aa874710 Ch1 OFF
k054539:0x55f4aa874710 Ch1 ON
k054539:0x55f4aa874710 Ch2 OFF
k054539:0x55f4aa874710 Ch2 OFF
k054539:0x55f4aa874710 Ch2 ON
k054539:0x55f4aa874710 Ch3 OFF
k054539:0x55f4aa874710 Ch3 ON
k054539:0x55f4aa874710 Ch4 OFF
k054539:0x55f4aa874710 Ch4 ON
k054539:0x55f4aa874710 Ch5 OFF
k054539:0x55f4aa874710 Ch5 ON
k054539:0x55f4aa874710 Ch0 OFF
k054539:0x55f4aa874710 Ch0 ON
k054539:0x55f4aa874710 Ch1 OFF
k054539:0x55f4aa874710 Ch1 ON
k054539:0x55f4aa874710 Ch0 OFF
k054539:0x55f4aa874710 Ch0 ON
k054539:0x55f4aa874710 Ch1 OFF
k054539:0x55f4aa874710 Ch1 ON
k054539:0x55f4aa874710 Ch2 OFF
k054539:0x55f4aa874710 Ch2 ON
...

And here's a dump from a busted save state copy:

Samples: 871
k054539:0x5600870a8680 Ch3 OFF
k054539:0x5600870a8680 Ch4 OFF
[slight pause]
k054539:0x5600870a8680 Ch0 OFF
k054539:0x5600870a8680 Ch1 OFF
k054539:0x5600870a8680 Ch2 OFF
k054539:0x5600870a8680 Ch3 OFF
k054539:0x5600870a8680 Ch4 OFF
k054539:0x5600870a8680 Ch5 OFF
k054539:0x5600870a8680 Ch6 OFF
k054539:0x5600870a8680 Ch7 OFF
k054539:0x5600870a9a40 Ch0 OFF
k054539:0x5600870a9a40 Ch1 OFF
k054539:0x5600870a9a40 Ch2 OFF
k054539:0x5600870a9a40 Ch3 OFF
k054539:0x5600870a9a40 Ch4 OFF
k054539:0x5600870a9a40 Ch5 OFF
k054539:0x5600870a9a40 Ch6 OFF
k054539:0x5600870a9a40 Ch7 OFF
[nothing]

To my eyes that looks like a CPU somewhere tripping over and resetting, so the K054539 is mostly absolved of blame. Reloading the save state after the reset provides working sound. Checking the IRQs, on the busted save state the sound CPU infiniloops and doesn't receive any signals on IRQ 2 until juuust before the reset dance, which is very unusual. Normal behaviour is lots and lots and lots of IRQ 2 all of the time.

Oh man. Interest is fading fast, but we're so close! SO CLOSE! Let's find out what is responsible for smashing IRQ 2.

Wait, what?! There's a whole other chunk of address map that I missed?!?

WRITE16_MEMBER(konamigx_state::tms57002_control_word_w)
 {
     if (ACCESSING_BITS_0_7)
     {
         if (!(data & 1))
             m_soundcpu->set_input_line(M68K_IRQ_2, CLEAR_LINE);

         m_dasp->pload_w(data & 4);
         m_dasp->cload_w(data & 8);
         m_dasp->set_input_line(INPUT_LINE_RESET, data & 0x10 ? CLEAR_LINE : ASSERT_LINE);

         m_sound_ctrl = data;
     }
 }

 /* 68000 memory handling */
 static ADDRESS_MAP_START( gxsndmap, AS_PROGRAM, 16, konamigx_state )
     AM_RANGE(0x000000, 0x03ffff) AM_ROM
     AM_RANGE(0x100000, 0x10ffff) AM_RAM
     AM_RANGE(0x200000, 0x2004ff) AM_DEVREADWRITE8("k054539_1", k054539_device, read, write, 0xff00)
     AM_RANGE(0x200000, 0x2004ff) AM_DEVREADWRITE8("k054539_2", k054539_device, read, write, 0x00ff)
     AM_RANGE(0x300000, 0x300001) AM_READWRITE(tms57002_data_word_r, tms57002_data_word_w)
     AM_RANGE(0x400000, 0x40001f) AM_DEVREADWRITE8("k056800", k056800_device, sound_r, sound_w, 0x00ff)
     AM_RANGE(0x500000, 0x500001) AM_READWRITE(tms57002_status_word_r, tms57002_control_word_w)
     AM_RANGE(0x580000, 0x580001) AM_WRITENOP // 'NRES' - D2: K056602 /RESET
 ADDRESS_MAP_END

 static ADDRESS_MAP_START( gxtmsmap, AS_DATA, 8, konamigx_state )
     AM_RANGE(0x00000, 0x3ffff) AM_RAM
 ADDRESS_MAP_END


 WRITE_LINE_MEMBER(konamigx_state::k054539_irq_gen)
 {
     if (m_sound_ctrl & 1)
     {
         // Trigger an interrupt on the rising edge
         if (!m_sound_intck && state)
             m_soundcpu->set_input_line(M68K_IRQ_2, ASSERT_LINE);
     }

     m_sound_intck = state;
 }

 **

So there's two spots inside the address mapper that change the state of IRQ 2 on the sound CPU, which rely on (surprise!) some untracked state. Now keep cool, but I think I've got this. Calm blue ocean, calm blue ocean, calm blue ocean...

diff --git a/src/mame/drivers/konamigx.cpp b/src/mame/drivers/konamigx.cpp
index e200d1c..e58a552 100644
--- a/src/mame/drivers/konamigx.cpp
+++ b/src/mame/drivers/konamigx.cpp
@@ -3707,6 +3707,9 @@ ROM_END

 MACHINE_START_MEMBER(konamigx_state,konamigx)
 {
+       save_item(NAME(m_sound_ctrl));
+       save_item(NAME(m_sound_intck));
+
        save_item(NAME(m_gx_wrport1_0));
        save_item(NAME(m_gx_wrport1_1));
        save_item(NAME(m_gx_wrport2));

In true monkey's paw fashion, the music plays back now but never stops, even as the game continues on. Ah well, like before it's only happening from a cold boot, so it's the same problem.

Backtracking a little: how does the main CPU tell the sound CPU to cue up a different thing? If you guessed "some ridiculous one-off chip starting with a K", you're right! It's the K056800.

Oh look. There's one variable that wasn't saved out of the four that makes up the chip.

Surely it couldn't be that.

diff --git a/src/devices/sound/k056800.cpp b/src/devices/sound/k056800.cpp
index 19e572d..fe42ed0 100644
--- a/src/devices/sound/k056800.cpp
+++ b/src/devices/sound/k056800.cpp
@@ -36,6 +36,7 @@ void k056800_device::device_start()
        m_int_handler.resolve_safe();

        save_item(NAME(m_int_pending));
+       save_item(NAME(m_int_enabled));
        save_item(NAME(m_host_to_snd_regs));
        save_item(NAME(m_snd_to_host_regs));
 }

YEAH BOIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

The solution

Keep adding save_item() calls to chunks of untracked state until the machine can resurrect itself.

(MAME pull request #1857)

Bonus content!

After getting in contact with Bennett Foddy, I foolishly asked if there were any more MAME games he wanted to include in Multibowl but couldn't for technical reasons. And, uh, there were some! So in my hubris I did some more debugging.

pc_vga/cs4031: fix save state support

(MAME pull request #2127)

With the VGA driver I could see what was wrong immediately (registers not being saved), but I had to comb through the huuuuuge commit history for the file to figure out what was going on. Quite often the problems are easy to read, but the line of reasoning for the current design choices are much harder to guess.

In short, the registers for the VGA card were being stored twice; once as a flat array called vga.crtc.data, and again as a whole bunch of unpacked discrete variables (e.g. vga.crtc.horz_total, vga.crtc.horz_disp_end, ...). All of the business logic was checking the discrete variables, but vga.crtc.data was the only thing being save_pointer()'d. Some repo dredging revealed the mystery: vga.crtc.data was originally the only thing, but 5 years ago there was a refactor that added the discrete variables and ported all the business logic to use them. vga.crtc.data hung around I guess for ease of testing (when you save_pointer() a thing, you can watch it in the debugger) and still gets updated by MAME, but it remains functionally useless.

So far so good. But what I couldn't figure out was why all the Protected Mode DOS games I ran would crash when loading the state from a cold boot? And at points which seemed to have nothing at all to do with the code?

Unfortunately, I couldn't find a nice way of trapping the MAME debugger when the crash happened, so I had to do the old standby of manually stepping through a bunch of times and taking lots of notes about e.g. which iteration of an inner loop to step into rather than over. Here are my scrawly notes from debugging the game Boppin':

  • Find that the function called at 0x002a326f causes the crash and severs control to the debugger!
  • Step into the function and keep hitting "step over" until the crash instruction is reached
  • It was on the 4th call to 0x002ac752, but why?!?
  • 0x002ac9ca, step into function, loop ahead until DL = 0x16
  • 0x002b2a2a, step 6 times
  • it's this iteration of the inner loop which is triggering it
0x002B2A18      mov ecx,[ebp+20h]
0x002B2A1B      lodsb               // load byte at address DS:ESI into AL, increment ESI
0x002B2A1C      or al, al
0x002B2A1E      je 2B2A30h          // jump if AL == 0
0x002B2A20      stosb               // store AL at address ES:EDI, increment EDI
0x002B2A21      loop 2B2A1Bh        // decrement ECX, continue if == 0

...

0x002B2A30      inc edi
0x002B2A31      loop 2B2A1Bh

The value of ESI is loaded in as garbage! Or rather, the 32-bit int from 0x0030248c is loaded in. And it's wrong. Very wrong.

Before my eyes in the debugger, on that stosb command for loop iteration CL=0D, the whole stack memory segment gets overwritten by garbage, from 0x300000 to 0x310000. Oh lord it looks like a reeaaaaaallll MAAAAAAAME buuuuuggggg. I'm sooooooooo saaaaaaaaddddddd.

Showing off my high pain tolerance, I ran MESS through Valgrind to see if the garbage came from uninitialized memory. It was very slow, but thankfully by now I knew what to type and could afford to wait 10 seconds after each action. No luck, the garbage was perfectly reproducible each time and didn't trigger any warnings.

INTERESTINGLY, when I tried to hot reload the save state, that region of memory remained as garbage! This suggests this region wasn't part of the saved CPU RAM map, which is definitely problematic. Most of the PC address range above 0x100000 is just free extended memory.

Now we're walking the knife edge of sanity that is not trusting your tools, or rather not trusting the MAME debugger. The only way to bring back certainty is to start proving individual facts. First up, is what we're seeing here really the allocated memory used for the RAM getting trashed?

We can do this in GDB with the following steps:

  • Put a breakpoint in the constructor of ram_device so you have eyes on all new RAM objects created
  • Grab the last buffer that appears before the emulator initializes
&((ram_device * const) 0x555557257380)->m_pointer[0]
  • For the default PC configuration it should be 4M in size. This maps perfectly to the address space.
  • Confirm that this is the right buffer by finding a string in the memory viewer and searching for it in GDB with find
find /b &((ram_device * const) 0x555557257380)->m_pointer[0], +0x400000, '\\' ,'B', 'O', 'P', 'P', 'I', 'N', '.', 'E', 'X', 'E'
  • Add a memory write watch
p &((ram_device * const) 0x555557257380)->m_pointer[0x300000]

// $39 = (unsigned char * ) 0x7fffacab2010

watch * 0x7fffacab2010

No change. Even after the debugger shows the address space getting sprayed, the RAM pointer is fine! Does this mean the bug is in MAME's memory mapper?

We need to go up one more level. The memory mapper is run inside the i386_device, so grab the device pointer from a breakpoint (I did it in i386_common_init), and try running READ8

p ((i386_device * ) 0x5555572ae3f0)->READ8(0x300000)
$47 = 0 '\000'

As expected. Now we can fast forward until after the memory has been sprayed.

p ((i386_device * ) 0x5555572ae3f0)->READ8(0x300000)
$53 = 65 'A'

FINALLY. FINALLY WE CAN REPRODUCE THE PROBLEM IN GDB SPACE. AFTER MANY GORMLESS HOURS OF PISSING IN THE WIND. I AM CRYING.

So, next question is: what in God's name is going on in READ8 to mess up like this?

b i386_device::READ8
Breakpoint 7 at 0x555555d4412c: file ../../../../../src/devices/cpu/i386/i386priv.h, line 596.
p ((i386_device * ) 0x5555572ae3f0)->READ8(0x300000)

From there we step through things a bit. The address is ANDed with m_a20_mask (uh?) which in the post-spray scenario reduces it from 0x300000 to 0x200000. Then it gets sent through a wall of templatese to read the single byte, which ends up in address_space::read_native. From HERE it does a lookup for the right read handler for the job, which itself has a pointer to some memory!

The difference seems to be the offset passed to ramptr. After spray, the byteaddress is 0x200000 and offset returned is 0x100000. Before spray, the byteaddress is 0x300000 and offset returned is 0x200000. Why, that's the same amount that m_a20_mask thing removed. Maybe I should read up a bit on this A20 thing and we might get som

OH

OH NO

OH SWEET MOTHER OF GOD NO

aaaaaaaaaaaAA AAAAAAAA̧̩̖A͔̱̖AAA̘̖̗̠͟A̶͕͙͍̪̣ AA̛̺̱̗͇̳̬̺̟͕͢À͕͍̀Ą͍̺̜̬͍͟A̻̠̦͞A͇̦̻͔͕̤̪̤͢͜A͓̭̫̺̪̜A̯̮A͉̠̕͡Ḁ͙̺͡A̡̛͉̞̪̕Ą͙A̮͚̞͇͇͕͢͠ Ą͚̯̭̰̹̮̫̬̥Ą̡͇̙̫̤̟̲̖͚̞̖͔͎̫͈̯̕͟A̢̮̪͈̳̝̖̘̤̟̜͔̲͕͓̱̘͢Ą͍̺̜̬͍͟A̻̠̦͞A̵̴͈̱͔̳̝̠̙̘̗̹͟͟A̴̛͚̠̰̻̻̣̟͔̭̯̪̮̬Ą̸̝̹͖̻́͞͠A͏̵҉̨̗̝̖̜̻̟̲̺̥̪̯A̸̛̝̼͍̹͚̦̤͉̠͙͉͖͜A̘͚̤̤̺̜̗͘͜͠͝ͅḀ̶̹̜͖̠̝̟̮͉̟̖̣̙̲̻̞̣͡A̡͙̞̫̭̳͙͈͖͢ͅ


Okay I'm back now. This whole experience in learning Old Testament x86 has been quite troubling.

x86 nerds probably know what went wrong. So say you're IBM, carving up a storm with your Intel 8088 range of business computers. After a while you decide to upgrade to Intel's fancier 80286 chip, which features a luxurious 24 line address bus (16MB) instead of 20 lines (1MB). But what happened on the old chip when you tried to access memory addresses greater than 1MB? Well the upper bits weren't used, so it wrapped around. And because segment math in x86 is so bad and adding things is fast, smart programmers everywhere started to use this "feature".

No problem, says Intel! The 80286 will boot in Real Mode to satisfy people who wrote bad apps between 1981 and 1984, and have the new Protected Mode which allows addressing all those new megabytes!

The 80286 ships, and Intel completely forgot to add 8086-style addressing to Real Mode. 0x100000 still accesses Extended Memory. At this point the IBM engineers started to get clever/desperate; nearly all of the bad apps only need to wrap around once (from 0xfffff to 0x00000), so technically we only need to be able to disable a single bit of the address space (0x100000, bit 20). That's easy; chip exposes address line 20, we AND-gate it against a switch, bish bash bosh. Better yet, force it to off by default, because it's 1984 and our future selves definitely won't mind having to do this dumb workaround again and again for eternity! Now how to control this switch from userspace... saaaayyyy, is that a spare line I see on the 8042?

What's that? You don't think the absence of a bit in the IBM AT keyboard controller should make the memory map repeat itself every alternate 1MB? Pat yourself on the back, that one took Intel 29 years.

The one silver lining to all this is that countless people have written low-level x86 and survived the freaky design cruft before; once your monster has a name, it's easy to track.

One mystery left; where is the state for the A20 switch? The trail goes cold from the i386 side: m_a20_mask is set by an interrupt fired by lord knows what code, and isn't read back anywhere. Better check out the ct486.cpp driver, as that defines the routing between components.

MCFG_DEVICE_ADD("keybc", AT_KEYBOARD_CONTROLLER, XTAL_12MHz)
MCFG_AT_KEYBOARD_CONTROLLER_SYSTEM_RESET_CB(DEVWRITELINE("cs4031", cs4031_device, kbrst_w))
MCFG_AT_KEYBOARD_CONTROLLER_GATE_A20_CB(DEVWRITELINE("cs4031", cs4031_device, gatea20_w))
MCFG_AT_KEYBOARD_CONTROLLER_INPUT_BUFFER_FULL_CB(DEVWRITELINE("cs4031", cs4031_device, irq01_w))
MCFG_AT_KEYBOARD_CONTROLLER_KEYBOARD_CLOCK_CB(DEVWRITELINE("pc_kbdc", pc_kbdc_device, clock_write_from_mb))
MCFG_AT_KEYBOARD_CONTROLLER_KEYBOARD_DATA_CB(DEVWRITELINE("pc_kbdc", pc_kbdc_device, data_write_from_mb))
MCFG_DEVICE_ADD("pc_kbdc", PC_KBDC, 0)
MCFG_PC_KBDC_OUT_CLOCK_CB(DEVWRITELINE("keybc", at_keyboard_controller_device, keyboard_clock_w))
MCFG_PC_KBDC_OUT_DATA_CB(DEVWRITELINE("keybc", at_keyboard_controller_device, keyboard_data_w))
MCFG_PC_KBDC_SLOT_ADD("pc_kbdc", "kbd", pc_at_keyboards, STR_KBD_MICROSOFT_NATURAL)

Oh yeah, the chipset! I forgot that the PC architecture uses those.

OH! OH! cs4031.cpp has 3 registers for A20 state (to deal with the three different ways of triggering it that existed in 1993), and only two of them had the state saved.

So there we go. A single missing register in the IBM keyboard controller was causing the PC's whole memory map to fail explosively. What day is it again?

i386/pc_vga: save state fixes

(MAME pull request #2178)

More PC fun! Liero would run for about 4 frames after cold-loading a state, then the screen would go black. By watching the the vga.memory buffer in the debugger as the frames went by, it was clear the animation continued on well after the screen disappeared, suggesting that maybe the VGA palette was getting trashed. And lo, it was: I forgot to add vga.dac to the save state. The MAME palette device acts as a cache which gets the real palette in vga.dac pulled into it every so often. I guess I didn't run into it with the test game I first tried because it only forced an update after the palette had physically changed? Weird.

Jump 'n Bump, on the other hand, would crash after a single frame somewhere in the VGA redraw wait loop. More precisely, all CPU instructions to pull info from the VGA port about redraw (0x3da) were getting cut off at the knees by a macro CRTC_PORT_ADDR which queried vga.miscellaneous_output (?!?)? I would've thought that getting as far as the READ8_MEMBER called "port_03d0_r" would be enough proof, but maybe this fixes some other crappy edge case.

Pressing further would make DOS4GW throw a page fault; the mini stack trace it printed showed the instruction pointer (EIP) was being set to some huge address like 0x3DF80000. That looks really suspicious; we know for a fact there's no memory mapped there, and the low word being all blank suggests that it was actually a 16-bit address being loaded. Jump 'n Bump is a proper 32-bit game, perhaps the issue is some "am I 16 or 32 bit" flag inside the CPU isn't being saved? I'm chuffed, this is officially the first time I've used the mini stack trace given by a DOS program to fix something.

Of course, vanilla DOS doesn't care about page faults! By loading a 16-bit savestate of the DOS command prompt from 32-bit mode, I could fill the screen with garbage as soon as e.g. a RET instruction hit and it tried to read the wrong size address off the stack. And it would chug along fine; one time it even ran AUTOEXEC.BAT again without asking.

Okay now I've probed some more, it appears to be crashing in the system timer IRQ 0 interrupt. Is it because that code is 16-bit and the stack segment is 32-bit? A working session changes SS, SSBASE, SSFLAGS et al. to be 16-bit, but the save state run doesn't do that. Let's throw a gdb breakpoint into the code which changes the segments and see how often it gets called right up until the interrupt handover.

BROKEN:
#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x7fffffff6690, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d08ec9 in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:845

#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x55555e0024ac, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d06b42 in i386_device::i386_load_segment_descriptor(int) (this=0x55555dfffda0, segment=1) at ../../../../../src/devices/cpu/i386/i386.cpp:244
#2  0x0000555555d09f8b in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:1089

WORKING:
#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x7fffffff6690, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d08ec9 in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:845

#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x7fffffff66b0, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d0929d in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:895

#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x55555e0024c0, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d0991f in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:971

#0  0x0000555555d06841 in i386_device::i386_load_protected_mode_segment(i386_device::I386_SREG*, unsigned long*) (this=0x55555dfffda0, seg=0x55555e0024ac, desc=0x0) at ../../../../../src/devices/cpu/i386/i386.cpp:180
#1  0x0000555555d06b42 in i386_device::i386_load_segment_descriptor(int) (this=0x55555dfffda0, segment=1) at ../../../../../src/devices/cpu/i386/i386.cpp:244
#2  0x0000555555d09f8b in i386_device::i386_trap(int, int, int) (this=0x55555dfffda0, irq=136, irq_gate=1, trap_level=0) at ../../../../../src/devices/cpu/i386/i386.cpp:1089

This looks promising, whole chunks of i386_trap aren't being reached!

                    CPL = m_CPL;  // current privilege level
                    DPL = (desc.flags >> 5) & 0x03;  // descriptor privilege level
...
                    if((desc.flags & 0x0004) == 0 && (DPL < CPL))
                    {

There it is. m_CPL isn't saved. Ughhhhh. So far I've tried to be austere about which registers I track, but that is proving to be a worse idea than just tracking everything. Shortly after finding that, a whole slew of new bugs pop up from not having certain two-letter CPU flags in the state, so I add all of them (after a cursory check that they were all persistent and not clobbered each cycle or whatnot). Surprise! No more crashes.

taito_f3: fix missing graphics on state load

(MAME pull request #1903)

This was pretty straightforward! CPU and sound were both fine, only problem was large chunks of graphics were missing. The one big catch was this editable set of tiles (called the "pixel layer") which wasn't being updated in MAME's gfx_element when the state loaded, even when the backing RAM for it was! I didn't really want to mess with the behaviour of gfx_element and friends, so after a lot of hemming and hawing I added a device_post_load hook on taito_t3_state to mark all the tiles in the gfx_element as "dirty", which hints to MAME to reload all the tiles from the source (i.e. the backing RAM) on the next access. I'm still not sure why this doesn't happen by default. And I probably never will! Because it's done.

sblaster/ymf262: save state fixes

(MAME pull request #2193)

Another boring one. The DSP and YMF262 chips on a Sound Blaster card have like a billion registers that needed to be tracked for sound playback to survive a save state. Actually while testing this I got really irritated by another bug; every time MAME dropped a frame my speakers would make this horrendous popping noise, but this behaviour would immediately stop once the soundcard had played some music. I recorded my computer's audio output with Audacity; on the signed 16-bit scale audio output was being held at -1 by default, but during the pops it would step back to 0. The fix was to explicitly set the twin DACs to 0 (or rather 0x8000, as the DACs are unsigned 16-bit) in sb_device::device_reset().