Discussion about software development for the old-school Gameboys, ranging from the "Gray brick" to Gameboy Color
(Launched in 2008)
You are not logged in.
Hello. I'm new to ASM, and I'm trying to make a Pokédex like the one in Red/Blue (not looking at the disassembly though, since it's kind of intimidating for me at my current level). But I'm having this problem with the screen flickering as I scroll up and down the list:
https://drive.google.com/file/d/1LxxoO8 … sp=sharing
I believe that the problem is probably that I have too much code between turning off the screen and turning it back on. Here's what I have:
; [. . .] ; now we need to move the cursor. so we first need to turn off the screen... call turn_off_LCD ; is the cursor already in slot 7 (i.e. at y=136)? ld a,[_cursor1_y] cp 136 jp nz,.continue ; if so, increase the top of the list by 1 ld a,[_top] inc a ld [_top],a call RefreshScreen ; redraw the list jr .turn_on_screen .continue ; if not, move the cursor downwards a slot (i.e. down 16 pixels) add a,16 ld [_cursor1_y],a ld [_SPR0_Y],a .turn_on_screen ; turn the screen back on: ld a,[rLCDC] set 7,a ld [rLCDC],a ; and disable the input ld a,1 ld [_input_disabled],a .end ret RefreshScreen: ; pseudocode: ; for(i=0;i<7;i+=1){ ; hl = LoafNames+(10*(top+i)) ; de = _SCRN0+(64*i)+100 ; bc = 10 ; call DrawText ; ; hl = Numbers+(3*(top+i)) ; de = _SCRN0+(64*i)+65 ; bc = 3 ; call DrawNumber ; } ld a,0 ; a = 0 .loop ld [_i],a ; [_i] = a ld a,[_top] ; hl = [_top] ld h,0 ; '' ld l,a ; '' ld b,0 ; bc = [_i] ld a,[_i] ; '' ld c,a ; '' add hl,bc ; hl is now equal to ([_top]+[_i]) add hl,hl ; we now make hl equal to (10*([_top]+[_i])) ld b,h ; '' ld c,l ; '' add hl,hl ; '' add hl,hl ; '' add hl,bc ; '' ld b,h ; bc = hl ld c,l ; '' ld hl,LoafNames add hl,bc ; hl is now equal to LoafNames+(10*([_top]+[_i])) push hl ; save it for later ld a,[_i] ; hl = [_i] ld h,0 ; '' ld l,a ; '' add hl,hl ; we now make hl equal to (64*[_i]) add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' ld b,0 ; bc = 100 ld c,100 ; '' add hl,bc ; hl is now equal to (64*[_i])+100 ld b,h ; bc = hl ld c,l ; '' ld hl,_SCRN0 add hl,bc ; hl is now equal to _SCRN0+(64*[_i])+100 ld d,h ; de is now equal to _SCRN0+(64*[_i])+100 ld e,l ; '' pop hl ; hl is again equal to LoafNames+(10*([_top]+[_i])) ld b,0 ; bc = 10 ld c,10 ; '' call DrawText ld a,[_top] ; hl = [_top] ld h,0 ; '' ld l,a ; '' ld b,0 ; bc = [_i] ld a,[_i] ; '' ld c,a ; '' add hl,bc ; hl is now equal to ([_top]+[_i]) ld b,h ; we now make hl equal to (3*([_top]+[_i])) ld c,l ; '' add hl,hl ; '' add hl,bc ; '' ld b,h ; bc = hl ld c,l ; '' ld hl,Numbers add hl,bc ; hl is now equal to Numbers+(3*([_top]+[_i])) push hl ; save it for later ld a,[_i] ; hl = [_i] ld h,0 ; '' ld l,a ; '' add hl,hl ; we now make hl equal to (64*[_i]) add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' add hl,hl ; '' ld b,0 ; bc = 100 ld c,65 ; '' add hl,bc ; hl is now equal to (64*[_i])+65 ld b,h ; bc = hl ld c,l ; '' ld hl,_SCRN0 add hl,bc ; hl is now equal to _SCRN0+(64*[_i])+65 ld d,h ; de is now equal to _SCRN0+(64*[_i])+65 ld e,l ; '' pop hl ; hl is again equal to Numbers+(3*([_top]+[_i])) ld b,0 ; bc = 3 ld c,3 ; '' call DrawNumber ld a,[_i] inc a cp 7 jp nz,.loop ret DrawText: ld a,[hl] ; we load the byte into A sub 32 ; map the ASCII to the tile number add 6 ; map the ASCII to the tile number ld [de],a ; we copy the byte to the destination dec bc ; one less byte to copy ; check if bc is zero: ld a,c or b ret z ; if it's zero, we return ; but if it's not zero, we continue: inc hl inc de jr DrawText DrawNumber: ld a,[hl] ; we load the byte into A sub 32 ; map the ASCII to the tile number add 6 ; map the ASCII to the tile number ld [de],a ; we copy the byte to the destination dec bc ; one less byte to copy ; check if bc is zero: ld a,c or b ret z ; if it's zero, we return ; but if it's not zero, we continue: inc hl inc de jr DrawNumber
I suppose I could optimize it some, but is that really going to solve the issue? Or am I going about this in completely the wrong way?
Last edited by h0tp3ngu1n (2019-09-01 18:56:22)
Offline
My initial guess is that its just due to turning the LCD off and on again. You can leave the LCD on but you need to be mindful of the VRAM write access based on the STAT register when you write to VRAM:
http://gbdev.gg8.se/wiki/articles/Video … 28R.2FW.29
Offline
Ok, thanks. I completely forgot that you could leave the screen on.
That fixed the flickering.
But is there any way to sort of pause the screen, without turning it off? I want to keep the gameboy out of mode 3 for as long as the game is still in the RefreshScreen subroutine. Do I need to do some sort of trickery using the LCDC status interrupt?
Last edited by h0tp3ngu1n (2019-09-02 06:42:30)
Offline
Turning the LCD off and on again causes the first frame output by the LCD to be blank, except on SGB where things are a tad more complicated.
The LCD, if turned on, is always going in its cycle; if you want to access VRAM, you need to wait until you're in a favorable mode.
By the way, you should avoid the "_label" convention.
Offline
ISSOtm wrote:
By the way, you should avoid the "_label" convention.
Those are equates, not labels. But any particular reason why I should avoid that convention? (I normally wouldn't write the code that way, but I picked up the habit from a tutorial. I'm probably going to stop using it anyway though, since I find that it's a bit ugly-looking and slows down my typing)
So anyway, I guess my question is, if I keep the LCD turned on, but my code (to copy data to VRAM) takes a lot of cycles and doesn't finish before the gameboy goes into mode 3, what can I do about it? Surely I don't need to check for mode 3 every time I want to write a byte to VRAM?
Offline
In my game I use a queue data structure to manage all my VRAM writes. This lets me process all of the VRAM related changes in a very tight loop to avoid stalling the game logic waiting for VRAM access. After all the game logic is complete I process the VRAM queue. My queue breaks down to 2 VRAM commands right now. A command for writing tiles and a command for writing 4x4 background blocks. They are optimized to wait for the Mode 3 -> 0 transition or VBlank and run a set of unrolled writes. This is to try and push as much data on a single write and avoid having to check the mode for every byte. It takes about 2 scalines to write the full 16 bytes with a second wait between writes. Interrupts need to be disabled for something like this to work well though otherwise you dont have much of a choice but to check the mode for every byte.
There are downsides to this though, it takes alot of RAM to dedicate to the queue and it limits the ability to use scanline effects since it doesnt play nice with interrupts.
The idea behind it is just to find a known safe state for writing to the VRAM (transition from mode 3->0, or Vblank) and writing as much data all at once before the mode is no longer in a safe state to avoid the costly mode check on each write.
Offline
The way VRAM accesses work in my game Motherboard, which is some very VRAM-heavy software (animations are streamed, background is streamed, and so on), are in two ways: there's a VRAM queue, and individual accesses.
The VRAM queue is processed by the VBlank handler, and is intended for large transfers; to be able to push as much data as possible, an unrolled "popslide" loop is used:
; This copies 2 bytes in a mere 9 cycles (4.5 cycles/byte) ; Source = SP ; Destination = HL ; Trashes A, D, E ; Given the use of SP, make sure interrupts are disabled, or that you're OK with writing to the input buffer pop de ld a, e ld [hli], a ld a, d ld [hli], a
Considering handling the queue has some overhead, both in setting it up and reading entries, this is really meant for large transfers.
Smaller accesses are simply done here or there in the code; first is a simple *Blank wait loop, which exits during either Mode 0 or Mode 1:
wait_vram: MACRO .waitVRAM\@ ldh a, [rSTAT] and STATF_BUSY ; $02 jr nz, .waitVRAM\@ ENDM
Let's analyze what this loop guarantees. Reads are done on the last cycle of a read instruction, so we can say that the mode is guaranteed to be 0 or 1 on the last cycle of `ldh a, [rSTAT]`. The worst case would be that this very cycle is the last one of the current mode, and that it will be different starting with the following instruction.
What modes can come after 0 and 1? Well, after Mode 0 can be Mode 1 (line $8F -> line $90), but that case is trivial because it's VBlank, so let's skip it. The other mode that can follow is Mode 2, which consistently lasts for 20 cycles and allows VRAM access.
So, we will assume Mode 2 starts on the beginning of the `and STATF_BUSY` instruction, time to count cycles! The `and` takes 2 cycles, and the (untaken by hypothesis) `jr` 2 more. This means that after the loop exits, there are 16 cycles during which VRAM is guaranteed to be accessible. (Keeping in mind that accesses occur on the last cycle of most instructions, exceptions being mostly `push`.)
This also means it's quite possible to squeeze more than one write in a single check:
wait_vram ld [hl], e ; 0 + 2 = 2 ld a, l ; 2 + 1 = 3 add a, SCRN_VX_B ; 3 + 2 = 5 ld l, a ; 5 + 1 = 6 ld [hl], d ; 6 + 2 = 8 ld a, l ; 8 + 1 = 9 add a, SCRN_VX_B ; 9 + 2 = 11 ld l, a ; 11 + 1 = 12 ld [hl], e ; 12 + 2 = 14
However, interrupts can screw all this process up. In Motherboard, I circumvent this by the following:
- The VBlank interrupt skips performing a lot of operations when not "waited for", which means it will not overflow VBlank and accesses will go through normally (there's no guarantee of this, actually, but so far I've just rolled with it and it's OK).
- The HBlank interrupt performs some raster FX, and thus returns during HBlank, guaranteeing it ends before Mode 2 begins, which won't break the loop's guarantee of 16 VRAM cycles.
- No other interrupts are enabled, save occasionally for the timer interrupt, but when it's enabled game logic will be in an entirely different state anyways. Best way to avoid screwing up otherwise is to wait for Mode 0's beginning--don't forget returning from an interrupt takes some cycles!
Again, the rationale behind doing those "one-off" accesses is that pushing them into the fairly small VRAM queue (I have 16 to keep all the relevant memory in the same 256-byte memory "page") would both saturate it and be more costly than needed; it can be argued that accessing VRAM, especially the tilemap, outside of VBlank can cause tearing; however, the use for this is mostly updating the infinitely-scrolling tilemap, which means the update will happen off-screen and tearing won't occur. (Some of these accesses are on-screen, such as placing text on the textbox, but those are small enough that if tearing was to occur the user wouldn't notice.)
Offline
Thanks guys!
I've got my program working, and I'll keep those tips in mind for future reference
Offline
If it would help, I wrote a VRAM transfer framework for Libbet and the Magic Floor. It includes a generic VRAM queue that pushes 160 or so bytes during vblank (popslide_blit) and a bulk transfer that can push up to 1K in a single frame's hblank periods.
Source: popslide.z80
License: zlib
Last edited by PinoBatch (2019-09-12 22:18:44)
Offline