OAM DMA tutorial
A brief introduction
The purpose of this article is to explain DMA transfers on the Gameboy, specifically how to use them. We will be going over some assembly code examples, as well as a bit of disassembled code from one of my favorite Gameboy games, Megaman. By the end of this article you should have a strong understanding of not only what a DMA transfer is, but how to use them in your own homebrew games.
This article does not cover the basics of Gameboy homebrew, and it assumes you have at least a little understanding of assembly and/or the Gameboy, etc. Many topics are only glanced at because teaching them is out of the scope of this article, other such articles exist and I recommend doing your research beforehand. Should anyone feel stuck or confused about anything, feel free to contact me on IRC or twitter and I will help to the best of my ability.
The code in this article is written specifically for RGBASM. The tools used are gb-convert, my own tool to convert images to data supported by the Gameboy. And various reverse-engineering tools such as Radare2, to disassemble the Megaman rom.
If you require a working example of the code covered in this article, feel free to check out my most recent homebrew game Exeman on github, which I wrote for Ludumdare 38.
So without further ado, lets get started..
OAM and its relation to sprites
If you have toyed with game development in the past, chances are you know what a “sprite” is. For those unfamiliar with the term, a sprite is essentially just a 2D image of arbitrary size, in the context of games it is generally used to refer to entities, moving objects, etc.
The same is true for the Gameboy, with one exception being the size of the image. On the Gameboy sprites are limited to two possible sizes, one being 8x8, the other being 8x16. Pretty small right? Due to these limitations you have to form larger sprites by stitching numerous small sprites together.
The sprites actually get their image from a larger image split up into either 8x8 or 8x16 “tiles”, we will refer to this as a “tile map”. Every sprite has a number associated with it which corresponds to one of the tiles in our tile map.
To render sprites we need to utilize what is referred to as the “OAM”, or “Object Attribute Memory”. All this really is, is a memory location that is used to store the information of each sprite. On the Gameboy the OAM is a 160-byte long chunk of memory, and each sprite takes up 4 bytes which leaves just enough room for exactly 40 sprites.
So what are these 4 bytes for? Well, those 4 bytes are used to store some information about each sprite. The information being the following:
- Y location
- X location
- Tile number
The first two bytes store the position of our sprite in the game world. The third byte stores the tile number which corresponds to a tile in our tile map. And the fourth byte stores numerous attributes in its bits. Those attributes are as follows:
7: Render priority 6: Y flip 5: X flip 4: Palette number (GB Color only) 3: VRAM bank (GB Color only) 2: Palette number bit 3 (GB Color only) 1: Palette number bit 2 (GB Color only) 0: Palette number bit 1 (GB Color only)
Knowing this, lets take a look at Megaman. On the left we have Megaman as seen in the game world. At the top right we have our tile map, and at the bottom right we have the OAM sprites. Megaman consists of 12 sprites in total. This number could be reduced to 6 if it where to use 8x16 tiles. Also note that Megaman is facing the opposite direction of the tiles. This is done by setting the 6th bit in the flags attribute to 1, which enables mirroring on the X axis.
So we know we need 4 bytes per sprite, but how would that look in code? Well here’s a quick example. I use a separate file for my OAM variables to keep things organized. We’ll define these bytes somewhere in work RAM because they are going to get changed a lot while the game is running.
SECTION "OAM Vars",WRAM0[$C100] megaman_sprites: DS 4*12 megaman_bullets: DS 4*4
As you can see, we are defining these in WRAM0, which is work RAM, at address $C100 onwards. Because Megaman consists of 12 sprites, and each sprite requires 4 bytes, we reserve 48 bytes. We also reserve enough space for 4 projectiles. Setting these bytes to a specific value is nice and easy, lets set the position of the first bullet to X 32 and Y 24.
; byte 1 is the Y position ld a,24 ld [megaman_bullets],a ; byte 2 is the X position ld a,32 ld [megaman_bullets+1],a
Pretty straightforward huh? So far we have stored the bytes for the sprites in work RAM not OAM, and until we get those bytes to the OAM nothing will happen. So why don’t we just define these bytes in OAM? Well the thing with OAM is that much like VRAM, we can not access it while the display is updating (Which is a lot of the time!). This is where the so-called “DMA” comes into play.
DMA transfers, and how to use them
So we know what the OAM is, we know we need 4 bytes per sprite, and that larger sprites are comprised of numerous smaller ones. We’ve got the OAM sprite data stored in RAM, but now we need to get them to OAM. Manually accessing the OAM is impossible while the display is updating, and as this is most of the time accessing it manually just isn’t an option.
This is where DMA, or “Direct Memory Access” steps in. DMA transfers copy data from ROM or RAM to the OAM in a timely manner. Getting these to function takes a little bit of work and understanding. There are a few quirks we need to learn and work around. Most notably, the CPU can only access HRAM (which is memory between locations $FF80 - $FFFE) while a DMA transfer is taking place. On top of that DMA transfers take roughly 160 microseconds so we’re going to need to do something interesting to make this work.
First off, lets go over the code required to make a DMA happen. To do so we need to write some data to a location in memory, this location is actually a register and it resides at $FF46. As soon as anything is written to this location, a DMA transfer will begin. We also need to specify a location for the DMA transfer to copy data from, and the location we want is $C100, which is where we have stored our OAM bytes. The way to specify this is by loading the upper half of the address into the DMA register. Because the register is a single byte we can not give it our entire address $C100, which is two bytes. So we give it the upper half $C1, and the second half defaults to $00 leaving it with the address $C100.
Lets see what a DMA transfer would look like in assembly.
; first we load $C1 into the DMA register at $FF46 ld a, $C1 ld [rDMA], a ; DMA transfer begins, we need to wait 160 microseconds while it transfers ; the following loop takes exactly that long ld a, 40 .loop: dec a jr nz, .loop ret
So, we load $C1 into the DMA register which is at $FF46, the DMA transfer begins immediately and we wait 160 microseconds for it to finish. Pretty simple huh? If only life was that easy..
As I mentioned before, while the DMA transfer is in progress the CPU can only access HRAM, as the above subroutine would reside in ROM this would simply not work. So we need a workaround for this, and as it turns out one exists thats not overly difficult to implement. What we need to do is copy the above subroutine from ROM where it resides, into HRAM. Along with this DMA subroutine, we are also going to need one to copy it into HRAM.
To make life a little less difficult for ourselves, we are going to assemble the DMA subroutine above and store it as a bunch of hex numbers in our ROM. I’m going to save you some time and hassle and do that part for you.
SECTION "OAM DMA routine", ROM0 CopyDMARoutine: ld hl, DMARoutine ld b, DMARoutineEnd - DMARoutine ; Number of bytes to copy ld c, LOW(hOAMDMA) ; Low byte of the destination address .copy ld a, [hli] ldh [c], a inc c dec b jr nz, .copy ret DMARoutine: ldh [rDMA], a ld a, 40 .wait dec a jr nz, .wait ret DMARoutineEnd:
SECTION "OAM DMA", HRAM hOAMDMA:: ds DMARoutineEnd - DMARoutine ; Reserve space to copy the routine to
Looks a little confusing, doesn’t it? At first glance this subroutine is pretty unintuitive so I’ll break it down for you.
The ".copy" loop copies data; "DMARoutineEnd - DMARoutine" bytes from "DMARoutine" to "hOAMDMA" (the destination address). "DMARoutine" is the address of the routine we aim to copy, which makes sense; "DMARoutineEnd - DMARoutine" is the number of bytes between the two labels, that is, exactly the size of our routine. "LOW(hOAMDMA)" is the low byte of "hOAMDMA", so if hOAMDMA is $FF83, then LOW(hOAMDMA) is $83. We do this because the "ldh [c], a" instruction takes the low byte or an address, automatically use $FF as the high byte of the address, and accesses that. This allows using a single 8-bit register instead of a register pair, which is more optimized.
Now we have those subroutines out of the way, we can implement them into our game loop. The first thing we want to do is call our “CopyDMARoutine” subroutine, we only want to call this once as the assembled DMA subroutine will reside in HRAM for the remainder of the game. After this the game loop begins, now we need to call the subroutine from HRAM. We want to do this every frame so as to keep the OAM updated with the latest information about or sprites. Here’s what that code might look like:
SECTION "Program Start",ROM0 Start: ; *enable everything here* ; move DMA subroutine to HRAM call CopyDMARoutine .game_loop: ; wait for the display to finish updating call WaitVBlank ; update megaman and the OAM bytes call MegamanUpdate ; call the DMA subroutine we copied to HRAM ; which then copies the bytes to the OAM and sprites begin to draw ld a, HIGH(wShadowOAM) call hOAMDMA jr .game_loop
SECTION "Shadow OAM", WRAM0,ALIGN wShadowOAM: ds 4 * 40 ; This is the buffer we'll write sprite data to
And, hey presto, its done! Suddenly its less daunting once you take it apart bit by bit huh?
Cracking open Megaman.gb
So we have the Megaman ROM open in our favorite disassembler, the first thing we will need to do is locate either the DMA subroutine, or a subroutine that copies it to HRAM. First I’m going to do a search for DMA register and see what we can find. Doing so brings up this:
ROM:15E0 ld a, $C0 ROM:15E2 ld [$FF46], a ROM:15E4 ld a, $28 ROM:15E6 loc_15E6: ROM:15E6 dec a ROM:15E7 jr nz, loc_15E6 ROM:15E9 ret
Huh, that appears to be a DMA transfer subroutine. It’s not appearing as a subroutine in my disassembler so I assume it was pre-assembled just like ours is, and they are using the location $C000 for OAM data. Lets see if we can find the subroutine that copies this to HRAM.
Getting the address of the first byte for the above assembly code and searching for it brings up the following:
ROM:15D2 ; ========== S U B R O U T I N E ========== ROM:15D2 sub_15D2: ROM:15D2 ld c, $80 ROM:15D4 ld b, $A ROM:15D6 ld hl, $15E0 ROM:15D9 loc_15D9: ROM:15D9 ldi a, [hl] ROM:15DA ld [c],a ROM:15DB inc c ROM:15DC dec b ROM:15DD jr nz, loc_15D9 ROM:15DF ret ROM:15DF ; End of function sub_15D2
At a glance, that does indeed look like a subroutine that copies something. Whats odd is that it appears to be copying to the address in C, which starts at $80?
Obviously its not copying to the address $80, that wouldn’t make any sense. It looks like its actually a slightly different opcode and my disassembler just isn’t making note of it. Checking out the hex dump for the address that contains “ld [c],a” gives us $E2.
Giving this hexadecimal number to rasm2 produces the expected output.
$ rasm2 -a gb -d "E2" $ "ld [0xff00 + c], a"
So it appears this subroutine is copying the DMA subroutine at $15E0, to HRAM at $FF80. This method is perhaps a little more eloquent than ours as it uses less registers and generally less code overall. Though it isn’t using a reset vector subroutine like we are.
Not much to look at, but its interesting making the comparison and seeing how it was done in the commercial world. Disassembling commercial games is often a good opportunity to learn something new and get insight into different ways to approach a problem.
I hope by this point you have a firm understanding of OAM, DMA, and how to implement it into your own projects. The reason for this article is that while most other topics on Gameboy homebrew are pretty well covered, I couldn’t personally find a lot of information on DMA transfers and struggled for a little bit. Hopefully this helps someone avoid the difficulties I had with DMA transfers.
If you find yourself stuck, confused, or notice something wrong in this article feel free to contact me via IRC or Twitter. Contributions and fixes are more then welcome and I look forward to perhaps doing more articles like this in the future.
Adapted from Exezin's post