Difference between revisions of "ASM Snippets"
(Add Signed-by-unsigned multiplication) |
m (→Multiplying two signed 8.8 fixed-point numbers: Clarify the function's API) |
||
(4 intermediate revisions by 3 users not shown) | |||
Line 44: | Line 44: | ||
== Multiplying a signed 8-bit number by an unsigned one == | == Multiplying a signed 8-bit number by an unsigned one == | ||
− | Multiply signed H by unsigned C. If | + | Multiply signed H by unsigned C. If H is negative, it works using the distributive property. We want (unsigned H - 256) * C, but this is the same as (Unsigned H * C) + (256 * -C). So if H is negative, we put -C into A (which is otherwise zero), and add it to the high byte of the product (thus multiplying it by 256) |
− | + | <pre> | |
xor a ;clear our workspace | xor a ;clear our workspace | ||
ld l, a | ld l, a | ||
Line 67: | Line 67: | ||
add a, h | add a, h | ||
ld h, a | ld h, a | ||
+ | </pre> | ||
+ | |||
+ | == Multiplying two signed 8.8 fixed-point numbers == | ||
+ | |||
+ | Note that for unsigned fixed-point, using regular multiplication routines is sufficient without much change. There are few routines for signed multiplication, however! | ||
+ | |||
+ | <pre> | ||
+ | ; @param bc: One factor. | ||
+ | ; @param de: The other factor. | ||
+ | ; @return ah: The product. | ||
+ | ; (That is, the integer part is returned in `a`, and the fractional part in `h`.) | ||
+ | ; @destroy de l | ||
+ | MulQ7_8:: | ||
+ | xor a ; We're temporarily holding `l` in `a` for faster calculations. | ||
+ | ld h, a | ||
+ | bit 7, d | ||
+ | jr z, :+ | ||
+ | sub c | ||
+ | : | ||
+ | bit 7, b | ||
+ | jr z, :+ | ||
+ | sub e | ||
+ | : | ||
+ | ld l, a | ||
+ | |||
+ | ld a, e ; Holding `e` in `a` instead lets us use shorter (and faster!) instructions. | ||
+ | ld e, 16 | ||
+ | .loop | ||
+ | add hl, hl | ||
+ | rla | ||
+ | rl d | ||
+ | jr nc, :+ | ||
+ | add hl, bc | ||
+ | adc a, 0 ; We don't propagate the carry into `d` because we don't care about its final value. | ||
+ | : | ||
+ | dec e | ||
+ | jr nz, .loop | ||
+ | ret | ||
+ | </pre> | ||
+ | This is based off of [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Multiplication#16.2A16_multiplication_2 this 16-bit signed multiplication routine], although '''that routine is bugged''' when both inputs are negative (and not both 0x8000); fortunately, this modified version is not affected by this bug. | ||
== Advancing to the next aligned struct == | == Advancing to the next aligned struct == | ||
Line 146: | Line 186: | ||
= External links = | = External links = | ||
* [https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code "Optimizing assembly code"] from pret ([https://web.archive.org/web/20210210161944/https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code archived copy]) | * [https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code "Optimizing assembly code"] from pret ([https://web.archive.org/web/20210210161944/https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code archived copy]) | ||
+ | * [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Multiplication "Z80 Routines:Math:Multiplication"] and [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Division "Z80 Routines:Math:Division"] on WikiTI (many Z80 routines can be adapted to SM83 with few changes) |
Latest revision as of 12:53, 10 December 2023
Here are some useful and common ASM snippets. You may encounter them in others' code, or use them to solve situations yourself. It's a good idea to read all of the snippets and pick the best one for your use case, depending on the use.
Contents
Sign extension
Assuming we have a value, we may want to sign-extend it. It's possible using as little as two instructions!
add a, a ; Shift MSB into carry sbc a, a
Bytes: 2; Cycles: 2
16-bit arithmetic
It's tempting to answer that all the work is already done by hl instructions, but unlike the z80, the GB CPU can only perform 16-bit additions. The rest must be implemented by hand as the following examples do:
ld a, l sub a, {low byte} ld l, a ld a, h sbc a, {high byte} ld h, a
ld a, l adc a, {low byte} ld l, a ld a, h adc a, {high byte} ld h, a
This pattern can also be used for additions that can't be carried out using `add hl, reg16`, such as adding a value to some memory address, or when hl is clobbered.
16-bit and 8-bit
Often, we find ourselves the need to add an 8-bit offset to a 16-bit position. Common examples include accessing fields in a struct, or adding an 8-bit vector to a 16-bit value. It's tempting to use the above pattern with the high byte as zero, but it's possible to do better:
; Assuming A already contains the value to be added... add a, l ld l, a jr nc, .noCarry inc h .noCarry
This, however, requires creating a label, which is tedious in some cases (such as older assemblers without anonymous labels). There is however an alternative, slightly trickier to understand, perfectly equal in size and speed, that doesn't use any labels.
add a, l ; a = low + old_l ld l, a ; a = low + old_l = new_l adc a, h ; a = new_l + old_h + carry sub l ; a = old_h + carry ld h, a
Multiplying a signed 8-bit number by an unsigned one
Multiply signed H by unsigned C. If H is negative, it works using the distributive property. We want (unsigned H - 256) * C, but this is the same as (Unsigned H * C) + (256 * -C). So if H is negative, we put -C into A (which is otherwise zero), and add it to the high byte of the product (thus multiplying it by 256)
xor a ;clear our workspace ld l, a ld b, a ; special first iteration add hl, hl ; this shifts the sign of the signed multiplier into carry jr nc, .positive ; for the multiplication part, load instead of add for this first round ld l, c ; get negative c into a to add it later sub c ; now a is negative c if h was negative, or zero if x was positive .positive REPT 7 ; the other 7 iterations are standard binary long multiplication add hl, hl jr nc, :+ add hl, bc : ENDR ; finish off by adding a to h to correct for the sign add a, h ld h, a
Multiplying two signed 8.8 fixed-point numbers
Note that for unsigned fixed-point, using regular multiplication routines is sufficient without much change. There are few routines for signed multiplication, however!
; @param bc: One factor. ; @param de: The other factor. ; @return ah: The product. ; (That is, the integer part is returned in `a`, and the fractional part in `h`.) ; @destroy de l MulQ7_8:: xor a ; We're temporarily holding `l` in `a` for faster calculations. ld h, a bit 7, d jr z, :+ sub c : bit 7, b jr z, :+ sub e : ld l, a ld a, e ; Holding `e` in `a` instead lets us use shorter (and faster!) instructions. ld e, 16 .loop add hl, hl rla rl d jr nc, :+ add hl, bc adc a, 0 ; We don't propagate the carry into `d` because we don't care about its final value. : dec e jr nz, .loop ret
This is based off of this 16-bit signed multiplication routine, although that routine is bugged when both inputs are negative (and not both 0x8000); fortunately, this modified version is not affected by this bug.
Advancing to the next aligned struct
One of the largest issues when processing a struct is that we don't always find ourselves at a constant offset within the struct (for example, if you stop processing a NPC because it's actually far off-screen, you'll be in a different position than if you had read until the end. Thus, it's useful to be able to go to the beginning of the struct (or of the next one) at any point
Assuming we have a pointer to anywhere within a struct in any 16-bit register, advance to the next one. (Example given with the 16-bit register being de)
Requirements: The size of the struct must be a power of two.
Clobbers: A
ld a, e or {size of struct}-1 ld e, a inc de
Bytes: 5; Cycles: 6
Without page crossing
If the high byte of the pointer is constant across all structs, it's possible to save one cycle:
ld a, e or {size of struct}-1 inc a ld e, a
It also puts the low byte of the next struct's address in A, which is useful for termination comparisons:
ld a, e or 16-1 inc a ld e, a cp LOW(wArrayEnd) jr c, .processStruct
Comparisons
Set carry if two values match
Set the carry flag based on whether the value in A is zero. This can be useful to feed a following adc
, sbc
, rra
, rr
, or rl
instruction.
cp $01 ; Set carry if A == 0 ; or add $FF ; Set carry if A != 0 ; or cp $01 ; Set carry if A != 0 and preserve A ccf
Set the carry flag based on comparing A to another value.
sub b cp $01 ; Set carry if A == B ; or sub b add $FF ; Set carry if A != B
Less optimized snippets
These are common snippets which are less optimized than the ones shown above, but are documented regardless to aid at reading them. Note that the snippet you are looking for might slightly differ. Look at each one, and consider if a 16-bit register couldn't be swapped for another, for instance.
Adding a 8-bit number to a 16-bit one
In hl:
ld c, a ld b, 0 add hl, bc
In another register:
ld l, a ld h, 0 add hl, de ld d, h ld e, l
Those clobber hl and another registers = 4 out of 7. They do, however, preserve a and the Z flag.
Advancing to the next aligned struct
ld a, l and -{size of struct} add a, {size of struct} ld l, a
This is a more naive implementation of the "advance to next struct" snippet, but it's bigger by 1 byte, and 1 cycle slower.
External links
- "Optimizing assembly code" from pret (archived copy)
- "Z80 Routines:Math:Multiplication" and "Z80 Routines:Math:Division" on WikiTI (many Z80 routines can be adapted to SM83 with few changes)