Gameboy Development Forum

PinoBatch · 2018-03-22 17:34:56

I have years of 6502 programming experience on NES (and before that Apple II) but haven't done anything with 8080 family (8080/8085, Z80, LR35902) before yesterday.

So I finally got around to trying RGBDS, ending up with a port of my NES sprite demo after six hours. It draws a background with a sprite over it and lets the player move it around with Left and Right on the Control Pad. It runs in mGBA 0.7-5026-1a6b47a2 and on my GBC with an EverDrive GB X5. With my luck, there are probably a bunch of 6502 cargo cult practices that made their way in, much like the blatant lack of optimization that code written by 68000-to-6502 converts often shows.

Screenshot

Download hello-0.01.zip (13 kB): hello.gb and RGBDS source

Things I could do from here:

* Make the code less "Ho-ly-shit" in its (lack of) organization
* Test on my cousin's Super Game Boy to make sure I'm not hitting any mono bugs that were fixed in color
* Write a preprocessor to allow local non-label symbols, anonymous local symbols, and sane indentation

Any advice for an immigrant from 6502 land?

ISSOtm · 2018-03-22 20:09:18

First of all, welcome !

Here are a couple of things.

If you want to use an emulator for testing, either use BGB or SameBoy. Both of these feature very good debuggers and emulation accuracy.
BGB is especially used, mostly because it's been around for longer :p, but also because it has a more evolved + GUI debugger.
mGBA's GB emulation is still WIP-ish, too.

I've read your code, and here are my impressions. Take them as suggestions, if you prefer writing your code in the way you do, I won't force anything upon you.

Why do you only use one section ? The point of sections is to be independent - thus, you can have `SECTION "Vectors", ROM0[$0000]` and put your RST and interrupt vectors there (or one section per vector), then one `SECTION "Startup", ROM0[$0100]` with the header, entry point, etc.

You should probably use this : https://github.com/tobiasvl/hardware.in … rdware.inc
That file is more or less the standard include file in the community, although not everyone uses it. It's still pretty handy, and standardizes different notations.

Why no RAM sections ? These would let the linker do the allocation for you (and you can force alignment of sections, if you want to do super-optimized wizardry)
Example :

Code:

SECTION "DMA routine", HRAM

hDMARoutine::
    ds 11

Instead of `REPT X ; nop ; ENDR`, you may want to use `ds X`, if the value there doesn't matter (it's set to the "fill value", by default 0, but you can change it via command-line params)
I wrote a `dbfill` macros anyways, so I can fill areas with a known value with only one line. RGBDS macros really are awesome.

Why setting all your variables so close to the stack ? You have all of C000-DEFF available, even if you allocate 256 bytes to the stack (I never saw stack usage growing past 64 bytes except with SDCC)

Don't use `memset` to clear OAM, since the 16-bit increment of hl will cause corruption. (BGB would break and warn you of this)

You always push hl (for example in lcd_off), but it's not necessary. On the GBz80, stack ops are much more costly than on the 6502, so personally I prefer to have registers caller-saved (also a lot of the time, I use side effects, eg. I know this function returns with a = 0, so I use that).
You might argue that you often need to save registers, but it hasn't been the case for my game. If you cleverly arrange your registers (eg. BC and DE are basically the same, so if you know there's a subroutine using DE, you might want to use BC for your vars, etc.)

You can turn on the LCD at any time when it's shut down, no need to ensure it's in VBlank. Writing to LCDC with bit 7 set while the LCD is on is also perfectly safe.

VBlank starts at LY = $90, ie. 144, not 145.

Pretty clever joypad routine, I did mostly the same, but `xor b` is more efficient than `and b ; cpl`, nice one !

Regarding your code at line 448-454 to perform sign extension, here's a shorter version :

Code:

ld a, [player_dxsub]
ld e, a
rla ; Bit 7 into carry
sbc a, a ; $00 if no carry, $FF if carry
ld d, a

Line 467, you could have removed the `ld e, 0` on the line above, and jumped to the one right before .notHitRight

Line 502, you can save one byte :

Code:

and $FC
rrca ; Only 1 byte
rrca
ld b, a
rra ; Carry was 0 because 0 bit was rotated in
and a ; \
rrca  ; / Same in size and speed as `sra a`, I'm just showing an alternative.

Overall, this code isn't as bad as what I have seen people write as their first program. I'm not really seeing anything that would smell like 6502, either.

As for general GB advice ; the GB is much more comfortable to work with than the NES, imo (more interrupts, much more convenient LCD for classy raster effects, and overall I prefer the GBz80/LR35902 to the 6502). Where it bites is graphics : accessing VRAM is always a bottleneck, the screen is really small, and you have barely any sprites to work with. The GBC adds its own twists, both good and bad, to the table.

Again, welcome to the community ! If you want, please also consider joining the IRC channel and/or Discord server.

AntonioND · 2018-03-22 21:24:35

A few comments...

> Don't use `memset` to clear OAM, since the 16-bit increment of hl will cause corruption. (BGB would break and warn you of this)

But clear the OAM, though! It's initialized to random values in a real GB/GBC. Just do it in a different way.

> You can turn on the LCD at any time when it's shut down, no need to ensure it's in VBlank. Writing to LCDC with bit 7 set while the LCD is on is also perfectly safe.

Well, with the LCD off there aren't VBlank periods at all. The LCD controller is simply disabled.

> Pretty clever joypad routine, I did mostly the same, but `xor b` is more efficient than `and b ; cpl`, nice one !

My concern with modified routines is that changing from one matrix input to the other one affects a real circuit with higher capacitance than any other access. I'm not sure if doing it any faster is good, it may cause more bouncing than the regular routine used everywhere. If I can't make it faster for the risk of not getting the actual input, I don't see any reason not to use the routine used everywhere... After all, anyone that sees that routine knows exactly what's going on there.

PinoBatch · 2018-03-22 22:14:08

Thank you for the detailed review.

If you want to use an emulator for testing, either use BGB or SameBoy.

My ThinkPad X61 runs Debian 9. Though beware tests bgb in Wine, bgb is 32-bit only, and it has no source code. These tend to cause problems for some purists (source; source). Some GNU/Linux distributions are even threatening to drop support for running 32-bit applications (which is a rawther Apple-like move). With no source code, there's no way to audit it for ransomware or other malware. What precautions do most users here take when using proprietary emulators?

SameBoy's tile viewer requires macOS (source), which requires Mac hardware, which (as beware pointed out to me in #gbdev on EFnet) has terrible keyboards.

My NES development uses FCEUX, which isn't the most accurate at PPU timing but is reasonably fast, and its CPU is still accurate enough to step through game logic. I'd then test the game authoritatively on a PowerPak, where hardware compatibility issues would become more apparent but input lag was a non-issue. The same was true back when I tried the GBA: Ctrl+R builds and runs in VisualBoyAdvance, then I send the program to the system through a flash cart or MBV2 cable for play testing.

Why do you only use one section ?

The lack of "SECTION" arose from having followed "Tutorial: Making an Empty Game Boy ROM (in RGBDS)". I think the "push hl" habit arises from some other tutorial. Now that I have a working baseline, I'll be able to work on organizing the ROM better.

Why setting all your variables so close to the stack ?

That actually relates to something else I had planned to ask about later, related to how to work around the LR35902's lack of indexed addressing modes. The Z80 has the [IX+n] mode with a small constant offset from a large register pointer, and the 6502 has the aaaa,X mode with a small register offset from a large constant pointer, and the (dd),Y mode with a small register offset from a large memory pointer. The implications of this conspicuous absence on organization of game state data structures could fill another topic, which I plan to post later after I put into place some of the suggested changes. A preview: treating RAM as if it were a 256x32-cell 2D array.

Don't use `memset` to clear OAM, since the 16-bit increment of hl will cause corruption. (BGB would break and warn you of this)

That's a bug. I thought I had already removed that. The first DMA would copy $DE00-$DE9F to $FE00-$FE9F, thereby erasing the corruption anyway.

You can turn on the LCD at any time when it's shut down, no need to ensure it's in VBlank.

Does LY increment while the LCD is off? Does the interrupt handler for vblank get called? I ask because someone else wondered why I'm using a busy wait as opposed to enabling interrupts and using halt, and I want to ensure I understand what's going on now before I dive into interrupts.

VBlank starts at LY = $90, ie. 144, not 145.

I take it there's no NES-style "post-render line", where there's a 1-line gap between and the vertical blank interrupt, and no Super NES-style inclusion of the pre-render pixel pipeline priming line in LY with the top visible line being 1. Do I understand correctly?

Thanks for the sign extension idiom. It'll take me a while to get used to the implications of the carry flag for subtraction having the opposite sense that it does on ARM and 6502. Is there a page for Z80 idioms analogous to the 6502 idioms that NESdev members have collected?

I'm not really seeing anything that would smell like 6502, either.

If you're curious, the code this was based on is nrom-template.

Where it bites is graphics : accessing VRAM is always a bottleneck

But unlike the NES, the Game Boy opens VRAM during hblank. This allows for 20-30 fps engines that do all the calculation in one frame and the VRAM updates in the other, and DMG's slow STN LCD makes 20-30 fps engines far more forgivable.

you have barely any sprites to work with.

On the NES, 64 sprites arranged 8 to a line cover 25% of 128 scanlines. On the Game Boy, 40 sprites arranged 5 to a line also cover 25% of 128 scanlines, and that's more overall coverage to boot.

I joined #gbdev a few hours ago. What Discord server are you referring to? It can't be the one Kaydus the Dragon just started. Is it the one under "Community" on this page?

Now for AntonioND's reply:

But clear the OAM, though!

I have a display list at $DE00-$DE9F, which gets DMA'd to OAM during vblank. The lcd_clear_oam subroutine clears from $DE00+[oam_used] to the end of this display list, which moves all unused sprites offscreen in the display list that gets DMA'd to OAM. I really ought to make this clearer in comments. Or does OAM need to be cleared even before DMAing on top of it?

Well, with the LCD off there aren't VBlank periods at all. The LCD controller is simply disabled.

Is that anything like AMD FreeSync?

My concern with modified routines is that changing from one matrix input to the other one affects a real circuit with higher capacitance than any other access. I'm not sure if doing it any faster is good

In this case, I was optimizing for size, not speed, hence the "call .onenibble". And this "xor b" happens after the read anyway. In any case, it's probably already faster than the serial read one has to use on the NES.

Last edited by PinoBatch (2018-03-22 23:21:27)

ISSOtm · 2018-03-23 03:13:39

AntonioND wrote:
My concern with modified routines is that changing from one matrix input to the other one affects a real circuit with higher capacitance than any other access. I'm not sure if doing it any faster is good, it may cause more bouncing than the regular routine used everywhere. If I can't make it faster for the risk of not getting the actual input, I don't see any reason not to use the routine used everywhere... After all, anyone that sees that routine knows exactly what's going on there.

The routine I have in Aevilia has been tested on DMG, MGB, CGB and AGS, and it didn't fail in either -- yet it's perfectly nonstandard, and polls only three times.
At the same time, there is a small delay between both polls due to U+D/L+R cancelling. Maybe that's why.

His optimization takes place after both polls, so it doesn't affect the polling delay anyways. Just the size and speed of the function.

PinoBatch wrote:
With no source code, there's no way to audit it for ransomware or other malware. What precautions do most users here take when using proprietary emulators?

Imo, it's a matter of trust. If someone doesn't trust any code they can't audit, then it's their choice, but they can't force it upon those who don't want to open-source. Beware had made clear points several time as to why he doesn't want to open-source BGB, and the argument that convinced me the most is that he doesn't have to if he doesn't want to. He's offering the result of years of work for free, and really that's enough.

As for Win32-only support, he has his reasons, and I made the choice to install Wine to be able to run it. If some want to never ever have to run Wine, it's their choice, and they do it knowing fully well that it implies dropping a part of the existing library. Besides, some people still have 32-bit architectures, and what reason would beware have to drop support for them ?

PinoBatch wrote:
The implications of this conspicuous absence on organization of game state data structures could fill another topic, which I plan to post later after I put into place some of the suggested changes. A preview: treating RAM as if it were a 256x32-cell 2D array.

I don't see anything wrong about a lack of addressing modes. It's possible to make a game with a completely garbled internal state, if it doesn't prevent the game from working, then nothing is wrong.
Further, there's nothing preventing you from using arrays when relevant. I don't see the point of game state structures -- the game's state is the entirety of the RAM, and how it's laid out doesn't change anything to that.

I'm doubtful about that approach has any benefit, tbh.

PinoBatch wrote:
You can turn on the LCD at any time when it's shut down, no need to ensure it's in VBlank.
Does LY increment while the LCD is off? Does the interrupt handler for vblank get called? I ask because someone else wondered why I'm using a busy wait as opposed to enabling interrupts and using halt, and I want to ensure I understand what's going on now before I dive into interrupts.

LY doesn't increment, no LCD-related interrupts occur : the driver is off. IIRC the PPU mode is stuck in VBlank (mode 1), and at LY 90 ?

PinoBatch wrote:
VBlank starts at LY = $90, ie. 144, not 145.
I take it there's no NES-style "post-render line", where there's a 1-line gap between and the vertical blank interrupt, and no Super NES-style inclusion of the pre-render pixel pipeline priming line in LY with the top visible line being 1. Do I understand correctly?

The PPU behavior is complex, but the gist of it is that line 0 is the first render line (although LY is zero for a short time near the end of VBlank). VBlank starts at $90, and ends somewhere in the middle of line 00 (btw, it's better to check for scanline number than for Mode 1, since that ensures you have some VBlank time once the loop exits).

PinoBatch wrote:
Is there a page for Z80 idioms analogous to the 6502 idioms that NESdev members have collected?

Not to my knowledge.

PinoBatch wrote:
This allows for 20-30 fps engines that do all the calculation in one frame and the VRAM updates in the other, and DMG's slow STN LCD makes 20-30 fps engines far more forgivable.

I prefer 60 fps engines, it's much simpler to have all frames equal, imo.

PinoBatch wrote:
I joined #gbdev a few hours ago. What Discord server are you referring to? It can't be the one Kaydus the Dragon just started. Is it the one under "Community" on this page?

It's that one, yep.

PinoBatch wrote:
Or does OAM need to be cleared even before DMAing on top of it?

It's recommended but not necessary. Personally I init all the CGB's memory to zeros at boot (although someone in one of the topics you linked discouraged that, I'm doing it to make sure I am in a known state as much as possible)

It's also nice to have a professional NES coder in the community !

PinoBatch · 2018-03-23 15:36:44

I've managed to change all WRAM allocations to SECTION-based. As for HRAM, man 5 rgbasm says "NOTE: If you use this method of allocating HRAM the assembler will NOT choose the short addressing mode in the LD instructions". My best guess is that RGBDS has no counterpart to ca65's ".globalzp" command that asserts at link time that an imported address is a short address.

Is it possible to make multiple WRAM0 or HRAM sections overlap, so that a subroutine can get (say) its own set of local labels at $FF80-$FF8F? In my NES work, I have a calling convention that $0000-$000F is local variables, where each subroutine's doc comment defines which locals it clobbers.

Another frustration I've found is that RGBASM is indentation-sensitive, which clashes with my coding style that uses indentation to delimit the equivalent of compound statements (like if, while, and for in C++, Java, PHP, or JavaScript). I guess I could write a preprocessor that emulates ca65's convention of using a colon to mark off labels. I previously wrote one to add support for C++-style "//" line comments to the m68k assembler in GNU Binutils. GNU as for m68k normally uses "|" as the comment separator, allowing either comments or bitwise OR of bits to go into a register but not both in the same file. My preprocessor stripped C++ comments into a copy of the file in a temporary folder before handing the file off to the assembler.

In order to make sure my HALT support is doing what it's supposed to: Does bgb have an option to test for several consecutive frames with display on and no HALT, like the "Break on" options? I tried logging the ROM under the "Execution profiler" menu, but choosing the number of instructions seen didn't display anything. Should I be looking at the green bar at top right?

Further, there's nothing preventing you from using arrays when relevant. I don't see the point of game state structures -- the game's state is the entirety of the RAM, and how it's laid out doesn't change anything to that.

It affects the speed and size of the code that moves actors of each type.

I prefer 60 fps engines, it's much simpler to have all frames equal, imo.

There's a reason Super Mario Land 2 looks less motion-blurry on a DMG than Super Mario Land, and I don't think it's just sprite outlining. But again, I'll expound on this more in a separate topic.

AntonioND · 2018-03-23 16:47:32

ISSOtm wrote:
AntonioND wrote:
My concern with modified routines is that changing from one matrix input to the other one affects a real circuit with higher capacitance than any other access. I'm not sure if doing it any faster is good, it may cause more bouncing than the regular routine used everywhere. If I can't make it faster for the risk of not getting the actual input, I don't see any reason not to use the routine used everywhere... After all, anyone that sees that routine knows exactly what's going on there.
The routine I have in Aevilia has been tested on DMG, MGB, CGB and AGS, and it didn't fail in either -- yet it's perfectly nonstandard, and polls only three times.
At the same time, there is a small delay between both polls due to U+D/L+R cancelling. Maybe that's why.

His optimization takes place after both polls, so it doesn't affect the polling delay anyways. Just the size and speed of the function.

Well, you're not polling, you're just waiting and getting the last result. Maybe reading from the register helps move the electrons a bit, but I don't think the effect is significant. And sure, it will most likely work, but I'd say that you are more likely to get bouncing or wrong reads if you wait less than the regular routine, for example.

ISSOtm wrote:
PinoBatch wrote:
With no source code, there's no way to audit it for ransomware or other malware. What precautions do most users here take when using proprietary emulators?
Imo, it's a matter of trust. If someone doesn't trust any code they can't audit, then it's their choice, but they can't force it upon those who don't want to open-source. Beware had made clear points several time as to why he doesn't want to open-source BGB, and the argument that convinced me the most is that he doesn't have to if he doesn't want to. He's offering the result of years of work for free, and really that's enough.

As for Win32-only support, he has his reasons, and I made the choice to install Wine to be able to run it. If some want to never ever have to run Wine, it's their choice, and they do it knowing fully well that it implies dropping a part of the existing library. Besides, some people still have 32-bit architectures, and what reason would beware have to drop support for them ?

I agree with PinoBatch on this one, being able to audit and to modify the source of the program is also important to me. It annoys the hell out of me that the best emulator is closed source. Everything in my dev environment is completely FOSS except for that emulator... I'm also on Debian, and I chose this distro because it's not related directly to any company, unlike Fedora or Ubuntu (or Mint, through Ubuntu).

ISSOtm wrote:
PinoBatch wrote:
You can turn on the LCD at any time when it's shut down, no need to ensure it's in VBlank.
Does LY increment while the LCD is off? Does the interrupt handler for vblank get called? I ask because someone else wondered why I'm using a busy wait as opposed to enabling interrupts and using halt, and I want to ensure I understand what's going on now before I dive into interrupts.
LY doesn't increment, no LCD-related interrupts occur : the driver is off. IIRC the PPU mode is stuck in VBlank (mode 1), and at LY 90 ?

IIRC most registers read 0, and HBL.

-------------------------

PinoBatch wrote:
I've managed to change all WRAM allocations to SECTION-based. As for HRAM, man 5 rgbasm says "NOTE: If you use this method of allocating HRAM the assembler will NOT choose the short addressing mode in the LD instructions". My best guess is that RGBDS has no counterpart to ca65's ".globalzp" command that asserts at link time that an imported address is a short address.

Is it possible to make multiple WRAM0 or HRAM sections overlap, so that a subroutine can get (say) its own set of local labels at $FF80-$FF8F? In my NES work, I have a calling convention that $0000-$000F is local variables, where each subroutine's doc comment defines which locals it clobbers.

Another frustration I've found is that RGBASM is indentation-sensitive, which clashes with my coding style that uses indentation to delimit the equivalent of compound statements (like if, while, and for in C++, Java, PHP, or JavaScript). I guess I could write a preprocessor that emulates ca65's convention of using a colon to mark off labels. I previously wrote one to add support for C++-style "//" line comments to the m68k assembler in GNU Binutils. GNU as for m68k normally uses "|" as the comment separator, allowing either comments or bitwise OR of bits to go into a register but not both in the same file. My preprocessor stripped C++ comments into a copy of the file in a temporary folder before handing the file off to the assembler.

If you want to use the short addressing, use 'ldh a,[address]'. There have been long discussions in the GitHub repository of RGBDS about why it's better to leave this optimization for the developer: https://github.com/rednex/rgbds/issues/243

You can use UNION to overlap things on WRAM, VRAM, OAM and HRAM sections.

Yes, the indentation thing is really annoying, but I'm not sure about changing that.

PinoBatch · 2018-03-24 23:52:29

AntonioND wrote:
You can use UNION to overlap things on WRAM, VRAM, OAM and HRAM sections.

But can I declare the different members of a UNION in different files, such as the file in which (say) a background-updating subroutine is defined and the file in which a sprite-updating subroutine is defined? Or would I have to have one file declaring all the members of that UNION used by all the subroutines in a program? I don't see how that would be any more practical to maintain than allocating locals with EQU.

AntonioND wrote:
Yes, the indentation thing is really annoying, but I'm not sure about changing that.

I made the preprocessor that changes that, and I plan to release it in a couple days once I have the reorganization where I want it.

ISSOtm · 2018-03-25 08:47:02

I personally declare all my variables in the same file (actually one per memory region), so UNIONs aren't a problem.

It's more convenient to declare them in that way, because I don't have to worry when I re-arrange my RAM, such as when I change the size of one of my buffers. (With EQUs I'd have to either modify all variables after the affected buffer, or change their order, which sometimes isn't possible)
I have also seen some programs declare one RAM SECTION per file.

PinoBatch wrote:
ISSOtm wrote:
Further, there's nothing preventing you from using arrays when relevant. I don't see the point of game state structures -- the game's state is the entirety of the RAM, and how it's laid out doesn't change anything to that.
It affects the speed and size of the code that moves actors of each type.

I'm not sure I read you there ; to me, the game state variables themselves can be located anywhere, although I do use arrays when it makes sense, even when they span more than 256 bytes.
That's why I don't get the point of treating RAM in units of any size, since I have many different structures of different sizes -- and that aren't processed in any similar way.

AntonioND · 2018-03-25 17:29:55

PinoBatch wrote:
AntonioND wrote:
You can use UNION to overlap things on WRAM, VRAM, OAM and HRAM sections.
But can I declare the different members of a UNION in different files, such as the file in which (say) a background-updating subroutine is defined and the file in which a sprite-updating subroutine is defined? Or would I have to have one file declaring all the members of that UNION used by all the subroutines in a program? I don't see how that would be any more practical to maintain than allocating locals with EQU.

Well, yeah, in that case it is a bit complicated. I wouldn't recommend you to use EQU, but RSRESET, RB, RL, etc: https://rednex.github.io/rgbds/rgbasm.5.html#SYMBOLS EQU is definitely not a good idea because of what ISSOtm has said, it's not maintainable.

You could allocate a section that is X bytes in size and just check with the last label in teh RS group that you're not overflowing the limits.

You could simply declare the labels in a header file and include it:

UNION
INCLUDE "header1.inc"
NEXTU
INCLUDE "header2.inc"
ENDU

PinoBatch wrote:
AntonioND wrote:
Yes, the indentation thing is really annoying, but I'm not sure about changing that.
I made the preprocessor that changes that, and I plan to release it in a couple days once I have the reorganization where I want it.

I'm not a fan of doing this kind of thing because of purely aesthetic reasons, but well, maybe someone will find it useful.

PinoBatch · 2018-03-25 23:59:52

I would prefer the "one RAM SECTION per file" method, but that doesn't work with UNIONs. (Insert political joke here.)

I prefer the "one RAM SECTION per file" method over declaring all variables in the same file because it allows for Don't repeat yourself and Single source of truth. Consider a program that uses a third-party library. When you first add the library to your program, you copy its RAM label definitions into the file declaring all of your program's variables. Halfway into development, you decide to update to a new version of the library that happens to use more RAM than the older version used. You forget to manually copy the changes to the library's RAM label definitions into the file declaring all of your program's variables. Now the library is silently corrupting other variables because it assumes the variables have the new sizes while the file declaring all of your program's variables is providing the old sizes.

AntonioND wrote:
You could allocate a section that is X bytes in size and just check with the last label in teh RS group that you're not overflowing the limits.

Statically allocating a 16-byte area at $FF80 for local variables and then divvying it up into individual variables adjacent to each subroutine would probably be the closest counterpart to how I had handled local variables on the NES. But doing RSSET $FF80 and then a bunch of RB after that is semantically the same as EQU, and RGBASM doesn't allow RSSET to a label. When I try to assemble this:

Code:

SECTION "test", HRAM[$FF80]
hLocals: ds 16

 rsset hLocals
Lobjxposition rb 1
Lobjyposition rb 1

I get this error:

Code:

ERROR: test.s(4):
    Expression must have a constant value

Or did you instead have a structure analogous to the following in mind, with each use of a local variable containing an explicit addition?

Code:

SECTION "test_hram", HRAM[$FF80]
hLocals: ds 16

; [snipped]
SECTION "test_rom", ROM0
; [snipped]
 rsreset
Lobjxposition rb 1
Lobjyposition rb 1
  ld a,[hLocals+Lobjxposition]
  ld b,a
  ld a,[hLocals+Lobjyposition]

AntonioND wrote:
You could simply declare the labels in a header file and include it

Following this would require my source code to have a separate header file devoted to local variables for every single subroutine that uses at least one local variable. In order to keep the count of source code files to a reasonable number and keep the local variable declarations close to the code for the subroutine that actually uses said local variables, I would need to write a preprocessor that outputs a set of such header files for each source code file.

By the time I write these preprocessors, I could have almost written LR35902 support for ca65, an assembler that's a bit more tolerant of expressions not being constant at assembly time. (HHOS)

EDIT: I think I found a compromise solution that minimizes but does not completely eliminate EQU. In global.inc:

Code:

hLocals EQU $FF80
locals_size EQU 16

In the file containing hardware init code, add a SECTION:

Code:

section "hram_locals", HRAM[hLocals]
  ds locals_size

In each subroutine that uses local variables:

Code:

draw_metasprite::
  RSSET hLocals
Lbasex rb 1
Lbasey rb 1
Lwidth rb 1
Lheight rb 1
  ; Code goes here
  ret

mul8::
  RSSET hLocals
Lfactor1 rb 1
Lfactor2 rb 1
Lproduct rw 1
  ; Code goes here
  ret

Last edited by PinoBatch (2018-03-26 15:13:12)

PinoBatch · 2018-03-27 16:11:02

I tried to take most of these suggestions into account.

* Split into separate files paralleling those of nrom-template
* Use RLA SBC A sign extension idiom (suggested by ISSOtm)
* Remove unnecessary and broken OAM clearing
* Add optional zealous memory clearing to satisfy BGB exceptions
* Add doc comments to most subroutines
* lcd_*: Don't push HL so much
* Put global variables and the DMA routine's run address under SECTION control
* Try RSSET/RB to allocate local variables, nearly but not fully eliminating EQU allocation
* Display frame count and initial A, B contents at top
* Change frame count in vblank IRQ, and HALT until it changes
* Explicitly mark MMIO port and local variable accesses as LDH because RGBDS lacks a counterpart to ca65 .importzp. What is this, NESASM? Ref
* Add indentation correction to the build process
* Change padding value to $FF
* makefile: Add BGB as secondary emulator for "make debug"

Video

Download hello-0.02.zip: hello.gb and RGBDS source

nitro2k01 · 2018-03-27 18:06:59

PinoBatch wrote:
* Remove unnecessary and broken OAM clearing
* Add optional zealous memory clearing to satisfy BGB exceptions

Could you elaborate further? If you're a perfectionist, you should either clear OAM before turning the LCD on, or make sure sprites are disabled in LCDC until the first DMA has been performed. Failure to do so would give one frame where OAM is uninitialized and could display junk sprites.

What type of read did the BGB exception trigger on? It should only trigger on actual reads of uninitialized memory, so if your program is only using 256 bytes of memory, you should only need to initialize those 256 bytes to pass the exception. Chances are the exception happened due to an actual bad read which could have caused a bug.

PinoBatch · 2018-03-31 17:25:34

nitro2k01 wrote:
you should either clear OAM before turning the LCD on, or make sure sprites are disabled in LCDC until the first DMA has been performed.

As a rule, my NES and Game Boy programs do the latter. On second look, I did mess this one up when I worked around the fact that the Game Boy doesn't assert vblank IRQ if the LCD is off. I plan to fix it in next revision by leaving the LCD on for a frame with both BG and OBJ hidden and then fully populating LCDC after the game loop has run for one full frame (which draws sprites and clears unused Y coordinates). Just clearing OAM (or clearing shadow OAM and doing DMA) would also be "imperfect", showing one frame with background and no sprites.

nitro2k01 wrote:
What type of read did the BGB exception trigger on?

My shadow OAM clearing loop was clearing only the Y coordinate of unused sprites. BGB was giving exceptions when copying the X, tile number, and attribute bytes. Is it important to clear those bytes even if the Y coordinate has a value marking the sprite as unused (namely 0 or 176-255)?

ISSOtm · 2018-04-01 08:02:49

No, it's not. If a sprite is at $00 or below $A0, it won't be displayed no matter what.
Its data will be preserved for the next sprite, though, if you only write Y values again.

nitro2k01 · 2018-04-05 11:15:53

PinoBatch wrote:
My shadow OAM clearing loop was clearing only the Y coordinate of unused sprites. BGB was giving exceptions when copying the X, tile number, and attribute bytes. Is it important to clear those bytes even if the Y coordinate has a value marking the sprite as unused (namely 0 or 176-255)?

No, that's fine. But I'm curious why though. It seems like it would be less effort to fully clear OAM/the OAM DMA source using an existing memory clear function rather than clearing every 4th byte...

PinoBatch · 2018-04-05 22:14:10

My sprite drawing routines typically start each frame at the beginning of shadow OAM, let's say $DE00, and write to increasing addresses from there through $DE9F. A variable called oam_used holds the number of bytes that have been written to shadow OAM during this frame, or four times the number of sprites. It's equivalently the low byte of the pointer to the first unused OAM entry. Once all sprites have been written, the lcd_clear_oam subroutine moves sprites from $DE00+oam_used through the end of shadow OAM to an offscreen Y coordinate. And because one ld [hl+],a and three inc l take fewer cycles than four ld [hl+],a, only the Y coordinate gets cleared.

The Game Boy code:

Code:

lcd_clear_oam::
  ; Destination address in shadow OAM
  ld h,high(SOAM)
  ld a,[oam_used]
  and $FC
  ld l,a

  ; iteration count
  rrca
  rrca
  add 256 - 40
  ld c,a

  xor a
.rowloop:
  ld [hl+],a
  inc l
  inc l
  inc l
  inc c
  jr nz, .rowloop
  ret

corresponds to the following code taken from the NES demo I ported:

Code:

ppu_clear_oam:
  lda oam_used
  and #$FC
  tax
  lda #$FF  ; Any Y value from $EF through $FF is offscreen
.rowloop:
  sta SOAM,x
  inx
  inx
  inx
  inx
  bne .rowloop
  rts

PinoBatch · 2018-05-12 01:06:28

Slowly but surely, I'm coming to terms with this hardware.

0.03 (2018-05-12)

* Switch to de facto standard hardware.inc (requested by gbdev Discord members)
* makefile: Use target name for rgbfix instead of hardcoding hello.gb
* Hide BG and sprites before game loop to avoid sprite garbage (reported by nitro2k01)
* Use palette registers instead of LCDC to hide BG and sprites for GBC friendliness (requested by ISSOtm)
* Fade out Nintendo logo while detecting Super Game Boy on DMG
* Add GBC palette approximating that used by nrom-template on NES

Video, now in color

Download hello 0.03 (ROM and source)

Gameboy Development Forum

Ads

#1 2018-03-22 17:34:56

My first attempt with RGBDS

#2 2018-03-22 20:09:18

Re: My first attempt with RGBDS

Code:

Code:

Code:

#3 2018-03-22 21:24:35

Re: My first attempt with RGBDS

#4 2018-03-22 22:14:08

Re: My first attempt with RGBDS

#5 2018-03-23 03:13:39

Re: My first attempt with RGBDS

AntonioND wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

PinoBatch wrote:

#6 2018-03-23 15:36:44

Re: My first attempt with RGBDS

#7 2018-03-23 16:47:32

Re: My first attempt with RGBDS

ISSOtm wrote:

AntonioND wrote:

ISSOtm wrote:

PinoBatch wrote:

ISSOtm wrote:

PinoBatch wrote:

PinoBatch wrote:

#8 2018-03-24 23:52:29

Re: My first attempt with RGBDS

AntonioND wrote:

AntonioND wrote:

#9 2018-03-25 08:47:02

Re: My first attempt with RGBDS

PinoBatch wrote:

ISSOtm wrote:

#10 2018-03-25 17:29:55

Re: My first attempt with RGBDS

PinoBatch wrote:

AntonioND wrote:

PinoBatch wrote:

AntonioND wrote:

#11 2018-03-25 23:59:52

Re: My first attempt with RGBDS

AntonioND wrote:

Code:

Code:

Code:

AntonioND wrote:

Code:

Code:

Code:

#12 2018-03-27 16:11:02

Re: My first attempt with RGBDS

#13 2018-03-27 18:06:59

Re: My first attempt with RGBDS

PinoBatch wrote:

#14 2018-03-31 17:25:34

Re: My first attempt with RGBDS

nitro2k01 wrote:

nitro2k01 wrote:

#15 2018-04-01 08:02:49

Re: My first attempt with RGBDS

#16 2018-04-05 11:15:53

Re: My first attempt with RGBDS

PinoBatch wrote:

#17 2018-04-05 22:14:10

Re: My first attempt with RGBDS

Code:

Code:

#18 2018-05-12 01:06:28

Re: My first attempt with RGBDS

Board footer