Discussion about software development for the old-school Gameboys, ranging from the "Gray brick" to Gameboy Color
(Launched in 2008)
You are not logged in.
Hey y'all,
100% noob question time.
I've been trying to wrap my head around DMA usage, and the pros/cons of using DMA vs. just writing to the OAM (specifically the DMA usage in not-color gameboys, i.e. not the HDMA stuff). Have any of you written programs that utilize DMA, and is DMA the preferred method of doing things?
/end noob post
Thanks!
Offline
Replying to myself.
Did some math, and DMA is definitely faster and smaller than copying things into OAM without it. If my math is correct, then a rolled DMA loop is preferable to a rolled not-DMA loop if you are handling more than 4 sprites, and is preferable to an unrolled not-DMA loop if you are handling more than 20 (the unrolled loop would be MUCH larger in size also).
Offline
This is for a classic DMG GameBoy? Or the results are the same on a GBC game?
Anyway, that's always interesting to know!
Offline
This is for DMG only (I'm focusing on devving for DMG), but from what I read regarding the GBC, the general-purpose DMA is slower, but the HDMA is even faster. The DMA on the GBC is also a lot more dynamic/useful.
In re-writing the non-DMA code, I forgot about the OAM bug, so my code was wrong. Adjusting for that bug makes the non-DMA code even slower.
So, here's how I figured out timing (in assembly, sorry).
Preamble: this is only the actual copying, no wrapper routines or anything. Wrapper routines would add overhead, sure, but not enough to skew the results majorly. Also, these functions will copy to the ENTIRE OAM ram, not just 2 or 3 sprites.
First, a rolled DMA loop.
ld A,[dma_copy_location] ;3 bytes, 16 cycles ldh [rDMA],A ;2 bytes, 12 cycles - note the "LDH", not LD ld A,40 ;2 bytes, 8 cycles .loop dec A ;1 byte, 4 cycles jr nz,.loop ;2 bytes, 12 cycles if it jumps, 8 if it doesn't ret ;1 byte, 16 cycles.
Total byte count: 11
Total cycle count: 688
No unrolled DMA loop, as that wouldn't fit into HRAM (and would be kinda pointless, as the loop above only serves to wait until DMA is complete.)
Rolled non-DMA copy:
ld HL,sprite_copy_location ;3 bytes, 12 cycles ld BC,_OAMRAM ;3 bytes, 12 cycles ld D,160 ;2 bytes, 8 cycles .loop ld A,[HL+] ;1 byte, 8 cycles ld [BC],A ;1 byte, 8 cycles inc C ;1 byte, 4 cycles jr nz,.skip_B ;2 bytes, 12/8 cycles - can't just "inc BC", or inc B ;1 byte, 4 cycles - trash gets written to the OAM .skip_B ; dec D ;1 byte, 4 cycles jr nz,.loop ;2 bytes, 12/8 cycles ret ;1 byte, 16 cycles
Total bytes: 18
Total cycles: 8324(!)
And, finally, a non-DMA unrolled copy routine
ld HL,sprite_copy_location ;3 bytes, 12 cycles ld BC,oam_ram ;3 bytes, 12 cycles ld A,[HL] ;1 byte, 8 cycles ld [BC],A ;1 byte, 8 cycles inc C ;1 byte, 4 cycles jr nz,.skip_B ;2 bytes, 12/8 cycles .skip_B ;... Repeat from "ld A,[HL]" to ".skip_B 38 more times. ld A,[HL] ;1 byte, 8 cycles ld [BC],A ;1 byte, 8 cycles ret ;1 byte, 16 cycles
Total bytes: 486(!)
Total cycles: 5744
If there's a faster way to code those routines, my math was off, or you just have questions, lemme know.
EDIT: Aligning comments in the code tags.
Last edited by l0k1 (2015-09-24 23:26:29)
Offline