Difference between revisions of "ASM Snippets"

From GbdevWiki
Jump to: navigation, search
(Without page crossing: The OR was wrong, as pointed out by luckytyphlosion)
m (Multiplying two signed 8.8 fixed-point numbers: Clarify the function's API)
 
(10 intermediate revisions by 3 users not shown)
Line 36: Line 36:
 
     inc h
 
     inc h
 
   .noCarry
 
   .noCarry
This, however, requires creating a label, which is tedious especially for compilers without anonymous label support such as RGBDS. There is however an alternative, slightly trickier to understand, perfectly equal in size and speed, that doesn't use any labels.
+
This, however, requires creating a label, which is tedious in some cases (such as older assemblers without anonymous labels). There is however an alternative, slightly trickier to understand, perfectly equal in size and speed, that doesn't use any labels.
 
   add a, l ; a = low + old_l
 
   add a, l ; a = low + old_l
 
   ld l, a  ; a = low + old_l = new_l
 
   ld l, a  ; a = low + old_l = new_l
Line 43: Line 43:
 
   ld h, a
 
   ld h, a
  
 +
== Multiplying a signed 8-bit number by an unsigned one ==
 +
Multiply signed H by unsigned C. If H is negative, it works using the distributive property. We want (unsigned H - 256) * C, but this is the same as (Unsigned H * C) + (256 * -C). So if H is negative, we put -C into A (which is otherwise zero), and add it to the high byte of the product (thus multiplying it by 256)
 +
<pre>
 +
    xor a ;clear our workspace
 +
    ld l, a
 +
    ld b, a
 +
    ; special first iteration
 +
    add hl, hl ; this shifts the sign of the signed multiplier into carry
 +
    jr nc, .positive
 +
    ; for the multiplication part, load instead of add for this first round
 +
    ld l, c
 +
    ; get negative c into a to add it later
 +
    sub c
 +
    ; now a is negative c if h was negative, or zero if x was positive
 +
.positive
 +
    REPT 7 ; the other 7 iterations are standard binary long multiplication
 +
    add hl, hl
 +
    jr nc, :+
 +
    add hl, bc
 +
:
 +
    ENDR
 +
    ; finish off by adding a to h to correct for the sign
 +
    add a, h
 +
    ld h, a
 +
</pre>
 +
 +
== Multiplying two signed 8.8 fixed-point numbers ==
 +
 +
Note that for unsigned fixed-point, using regular multiplication routines is sufficient without much change. There are few routines for signed multiplication, however!
 +
 +
<pre>
 +
; @param bc: One factor.
 +
; @param de: The other factor.
 +
; @return ah: The product.
 +
;    (That is, the integer part is returned in `a`, and the fractional part in `h`.)
 +
; @destroy de l
 +
MulQ7_8::
 +
    xor a ; We're temporarily holding `l` in `a` for faster calculations.
 +
    ld h, a
 +
    bit 7, d
 +
    jr z, :+
 +
    sub c
 +
:
 +
    bit 7, b
 +
    jr z, :+
 +
    sub e
 +
:
 +
    ld l, a
 +
 +
    ld a, e ; Holding `e` in `a` instead lets us use shorter (and faster!) instructions.
 +
    ld e, 16
 +
.loop
 +
    add hl, hl
 +
    rla
 +
    rl d
 +
    jr nc, :+
 +
    add hl, bc
 +
    adc a, 0 ; We don't propagate the carry into `d` because we don't care about its final value.
 +
:
 +
    dec e
 +
    jr nz, .loop
 +
    ret
 +
</pre>
 +
 +
This is based off of [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Multiplication#16.2A16_multiplication_2 this 16-bit signed multiplication routine], although '''that routine is bugged''' when both inputs are negative (and not both 0x8000); fortunately, this modified version is not affected by this bug.
  
 
== Advancing to the next aligned struct ==
 
== Advancing to the next aligned struct ==
Line 54: Line 119:
 
'''Clobbers''': A
 
'''Clobbers''': A
 
   ld a, e
 
   ld a, e
   or ~{size of struct}
+
   or {size of struct}-1
 
   ld e, a
 
   ld e, a
 
   inc de
 
   inc de
Line 73: Line 138:
 
   cp LOW(wArrayEnd)
 
   cp LOW(wArrayEnd)
 
   jr c, .processStruct
 
   jr c, .processStruct
 +
 +
= Comparisons =
 +
 +
== Set carry if two values match ==
 +
Set the carry flag based on whether the value in A is zero.  This can be useful to feed a following <code>adc</code>, <code>sbc</code>, <code>rra</code>, <code>rr</code>, or <code>rl</code> instruction.
 +
 +
  cp $01  ; Set carry if A == 0
 +
          ; or
 +
  add $FF  ; Set carry if A != 0
 +
          ; or
 +
  cp $01  ; Set carry if A != 0 and preserve A
 +
  ccf
 +
 +
Set the carry flag based on comparing A to another value.
 +
 +
  sub b
 +
  cp $01  ; Set carry if A == B
 +
          ; or
 +
  sub b
 +
  add $FF  ; Set carry if A != B
  
 
= Less optimized snippets =
 
= Less optimized snippets =
Line 98: Line 183:
 
   ld l, a
 
   ld l, a
 
This is a more naive implementation of the [[#Without_page_crossing|"advance to next struct"]] snippet, but it's bigger by 1 byte, and 1 cycle slower.
 
This is a more naive implementation of the [[#Without_page_crossing|"advance to next struct"]] snippet, but it's bigger by 1 byte, and 1 cycle slower.
 +
 +
= External links =
 +
* [https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code "Optimizing assembly code"] from pret ([https://web.archive.org/web/20210210161944/https://github.com/pret/pokecrystal/wiki/Optimizing-assembly-code archived copy])
 +
* [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Multiplication "Z80 Routines:Math:Multiplication"] and [https://wikiti.brandonw.net/index.php?title=Z80_Routines:Math:Division "Z80 Routines:Math:Division"] on WikiTI (many Z80 routines can be adapted to SM83 with few changes)

Latest revision as of 13:53, 10 December 2023

Here are some useful and common ASM snippets. You may encounter them in others' code, or use them to solve situations yourself. It's a good idea to read all of the snippets and pick the best one for your use case, depending on the use.


Sign extension

Assuming we have a value, we may want to sign-extend it. It's possible using as little as two instructions!

 add a, a ; Shift MSB into carry
 sbc a, a

Bytes: 2; Cycles: 2


16-bit arithmetic

It's tempting to answer that all the work is already done by hl instructions, but unlike the z80, the GB CPU can only perform 16-bit additions. The rest must be implemented by hand as the following examples do:

 ld a, l
 sub a, {low byte}
 ld l, a
 ld a, h
 sbc a, {high byte}
 ld h, a
 ld a, l
 adc a, {low byte}
 ld l, a
 ld a, h
 adc a, {high byte}
 ld h, a

This pattern can also be used for additions that can't be carried out using `add hl, reg16`, such as adding a value to some memory address, or when hl is clobbered.


16-bit and 8-bit

Often, we find ourselves the need to add an 8-bit offset to a 16-bit position. Common examples include accessing fields in a struct, or adding an 8-bit vector to a 16-bit value. It's tempting to use the above pattern with the high byte as zero, but it's possible to do better:

 ; Assuming A already contains the value to be added...
   add a, l
   ld l, a
   jr nc, .noCarry
   inc h
 .noCarry

This, however, requires creating a label, which is tedious in some cases (such as older assemblers without anonymous labels). There is however an alternative, slightly trickier to understand, perfectly equal in size and speed, that doesn't use any labels.

 add a, l ; a = low + old_l
 ld l, a  ; a = low + old_l = new_l
 adc a, h ; a = new_l + old_h + carry
 sub l    ; a = old_h + carry
 ld h, a

Multiplying a signed 8-bit number by an unsigned one

Multiply signed H by unsigned C. If H is negative, it works using the distributive property. We want (unsigned H - 256) * C, but this is the same as (Unsigned H * C) + (256 * -C). So if H is negative, we put -C into A (which is otherwise zero), and add it to the high byte of the product (thus multiplying it by 256)

    xor a ;clear our workspace
    ld l, a
    ld b, a 
    ; special first iteration
    add hl, hl ; this shifts the sign of the signed multiplier into carry
    jr nc, .positive
    ; for the multiplication part, load instead of add for this first round
    ld l, c
    ; get negative c into a to add it later
    sub c
    ; now a is negative c if h was negative, or zero if x was positive
.positive
    REPT 7 ; the other 7 iterations are standard binary long multiplication
    add hl, hl
    jr nc, :+
    add hl, bc
:
    ENDR
    ; finish off by adding a to h to correct for the sign
    add a, h
    ld h, a

Multiplying two signed 8.8 fixed-point numbers

Note that for unsigned fixed-point, using regular multiplication routines is sufficient without much change. There are few routines for signed multiplication, however!

; @param bc: One factor.
; @param de: The other factor.
; @return ah: The product.
;     (That is, the integer part is returned in `a`, and the fractional part in `h`.)
; @destroy de l
MulQ7_8::
    xor a ; We're temporarily holding `l` in `a` for faster calculations.
    ld h, a
    bit 7, d
    jr z, :+
    sub c
:
    bit 7, b
    jr z, :+
    sub e
:
    ld l, a

    ld a, e ; Holding `e` in `a` instead lets us use shorter (and faster!) instructions.
    ld e, 16
.loop
    add hl, hl
    rla
    rl d
    jr nc, :+
    add hl, bc
    adc a, 0 ; We don't propagate the carry into `d` because we don't care about its final value.
:
    dec e
    jr nz, .loop
    ret

This is based off of this 16-bit signed multiplication routine, although that routine is bugged when both inputs are negative (and not both 0x8000); fortunately, this modified version is not affected by this bug.

Advancing to the next aligned struct

One of the largest issues when processing a struct is that we don't always find ourselves at a constant offset within the struct (for example, if you stop processing a NPC because it's actually far off-screen, you'll be in a different position than if you had read until the end. Thus, it's useful to be able to go to the beginning of the struct (or of the next one) at any point


Assuming we have a pointer to anywhere within a struct in any 16-bit register, advance to the next one. (Example given with the 16-bit register being de)

Requirements: The size of the struct must be a power of two.

Clobbers: A

 ld a, e
 or {size of struct}-1
 ld e, a
 inc de

Bytes: 5; Cycles: 6

Without page crossing

If the high byte of the pointer is constant across all structs, it's possible to save one cycle:

 ld a, e
 or {size of struct}-1
 inc a
 ld e, a

It also puts the low byte of the next struct's address in A, which is useful for termination comparisons:

 ld a, e
 or 16-1
 inc a
 ld e, a
 cp LOW(wArrayEnd)
 jr c, .processStruct

Comparisons

Set carry if two values match

Set the carry flag based on whether the value in A is zero. This can be useful to feed a following adc, sbc, rra, rr, or rl instruction.

 cp $01   ; Set carry if A == 0
          ; or
 add $FF  ; Set carry if A != 0
          ; or
 cp $01   ; Set carry if A != 0 and preserve A
 ccf

Set the carry flag based on comparing A to another value.

 sub b
 cp $01   ; Set carry if A == B
          ; or
 sub b
 add $FF  ; Set carry if A != B

Less optimized snippets

These are common snippets which are less optimized than the ones shown above, but are documented regardless to aid at reading them. Note that the snippet you are looking for might slightly differ. Look at each one, and consider if a 16-bit register couldn't be swapped for another, for instance.

Adding a 8-bit number to a 16-bit one

In hl:

 ld c, a
 ld b, 0
 add hl, bc

In another register:

 ld l, a
 ld h, 0
 add hl, de
 ld d, h
 ld e, l

Those clobber hl and another registers = 4 out of 7. They do, however, preserve a and the Z flag.

Advancing to the next aligned struct

 ld a, l
 and -{size of struct}
 add a, {size of struct}
 ld l, a

This is a more naive implementation of the "advance to next struct" snippet, but it's bigger by 1 byte, and 1 cycle slower.

External links