Difference between revisions of "CPU speed comparison"

From GbdevWiki
Jump to: navigation, search
(Record access: Preparing for article about struct and ISSOtm's rgbds-structs)
(separate CPU/SoC and core columns)
Line 6: Line 6:
 
{| class="wikitable"
 
{| class="wikitable"
 
|+ Relative clock rates of three third-generation game platforms
 
|+ Relative clock rates of three third-generation game platforms
! Platform || CPU || Divider || Effective rate
+
! Platform || CPU/SoC || Core || Divider || Effective rate
 
|-
 
|-
| Nintendo Entertainment System || 6502 || 12 || 1.79 MHz
+
| Nintendo Entertainment System || Ricoh 2A03 || 6502 || 12 || 1.79 MHz
 
|-
 
|-
| Super Game Boy || LR35902 || 20 (M-cycles) || 1.07 MHz
+
| Super Game Boy || Sharp LR35902 || SM83 || 20 (M-cycles) || 1.07 MHz
 
|-
 
|-
| Sega Master System || Z80 || 6 (T-states) || 3.58 MHz
+
| Sega Master System || Zilog Z80 || Z80 || 6 (T-states) || 3.58 MHz
 
|}
 
|}
  
To "beat the spread", or complete an operation in fewer ''master clock'' cycles than a 6502 does, the LR35902 must complete it in three-fifths of the CPU cycles that the 6502 uses. The LR35902's work per clock is usually superior to 6502 but not always. Because the LR35902 and Z80 are architecturally very similar, both having been derived from the Intel 8080, Z80 results are shown only when the differences cause a noticeable speed discrepancy, such as things that can be done with <code>IX</code>, <code>[HL+]</code>, or <code>LDH</code>.
+
To "beat the spread", or complete an operation in fewer ''master clock'' cycles than a 6502 does, the 8080-family CPU must complete it in three-fifths of the CPU cycles that the 6502 uses. The 1-cycle register-register operations make the 8080 family's work per M-cycle usually superior to 6502 but not always. Because CPUs derived from the Intel 8080 are architecturally very similar, Z80 results are shown only when the differences cause a noticeable speed discrepancy compared to SM83, such as <code>JP</code> vs. <code>JR</code>, or things that can be done with <code>IX</code>, <code>[HL+]</code>, or <code>LDH</code>.
  
 
(We're not considering Game Boy Color at the moment because it'd always get trounced by the TurboGrafx-16's 7.16 MHz 65C02-based CPU.)
 
(We're not considering Game Boy Color at the moment because it'd always get trounced by the TurboGrafx-16's 7.16 MHz 65C02-based CPU.)
Line 22: Line 22:
 
An action game may store data related to the position and behavior of each of several enemies in a [[struct]] or record.
 
An action game may store data related to the position and behavior of each of several enemies in a [[struct]] or record.
  
For random access to an 8-bit field via a pointer, 6502 wins for one byte, but LR35902 catches up for two consecutive bytes and pulls ahead for more.
+
For random access to an 8-bit field via a pointer, 6502 wins for one byte, but SM83 catches up for two consecutive bytes and pulls ahead for more.
 
<pre>
 
<pre>
 
   ; 6502, pointer in zero page
 
   ; 6502, pointer in zero page
Line 32: Line 32:
 
   lda ($00),y                    ; 6, minus 1 for read not crossing page
 
   lda ($00),y                    ; 6, minus 1 for read not crossing page
  
   ; LR35902, pointer in DE
+
   ; SM83, pointer in DE
 
   ; One byte: 7 cycles or 140 clocks
 
   ; One byte: 7 cycles or 140 clocks
 
   ; Two consecutive: 9 cycles or 180 clocks
 
   ; Two consecutive: 9 cycles or 180 clocks
Line 54: Line 54:
 
   lda otherfieldname,x
 
   lda otherfieldname,x
  
   ; LR35902 or Z80, SOA index in DE
+
   ; SM83 or Z80, SOA index in DE
   ; Each byte (LR35902): 7 cycles or 140 clocks
+
   ; Each byte (SM83): 7 cycles or 140 clocks
 
   ld hl,fieldname      ; 3
 
   ld hl,fieldname      ; 3
 
   add hl,de            ; 2
 
   add hl,de            ; 2
Line 72: Line 72:
 
   lda otherfieldname,x
 
   lda otherfieldname,x
  
   ; LR35902 or Z80, SOA index in DE
+
   ; SM83 or Z80, SOA index in DE
   ; First byte (LR35902): 7 cycles or 140 clocks
+
   ; First byte (SM83): 7 cycles or 140 clocks
   ; Two bytes, 1 bit different (LR35902): 11 cycles or 220 clocks
+
   ; Two bytes, 1 bit different (SM83): 11 cycles or 220 clocks
 
   ld hl,fieldname      ; 3
 
   ld hl,fieldname      ; 3
 
   add hl,de            ; 2
 
   add hl,de            ; 2

Revision as of 13:56, 18 June 2019

This article attempts to assess the relative speed of the Game Boy's CPU compared to those of other third-generation video game platforms (Nintendo Entertainment System and Sega Master System).

Handicapping

The speed of a processor depends on both its clock rate and its work per clock. To abstract these, we use a standardized master oscillator at 21.47 MHz, or six times NTSC chroma. This same oscillator was used in the NES and Super NES.

Relative clock rates of three third-generation game platforms
Platform CPU/SoC Core Divider Effective rate
Nintendo Entertainment System Ricoh 2A03 6502 12 1.79 MHz
Super Game Boy Sharp LR35902 SM83 20 (M-cycles) 1.07 MHz
Sega Master System Zilog Z80 Z80 6 (T-states) 3.58 MHz

To "beat the spread", or complete an operation in fewer master clock cycles than a 6502 does, the 8080-family CPU must complete it in three-fifths of the CPU cycles that the 6502 uses. The 1-cycle register-register operations make the 8080 family's work per M-cycle usually superior to 6502 but not always. Because CPUs derived from the Intel 8080 are architecturally very similar, Z80 results are shown only when the differences cause a noticeable speed discrepancy compared to SM83, such as JP vs. JR, or things that can be done with IX, [HL+], or LDH.

(We're not considering Game Boy Color at the moment because it'd always get trounced by the TurboGrafx-16's 7.16 MHz 65C02-based CPU.)

Record access

An action game may store data related to the position and behavior of each of several enemies in a struct or record.

For random access to an 8-bit field via a pointer, 6502 wins for one byte, but SM83 catches up for two consecutive bytes and pulls ahead for more.

  ; 6502, pointer in zero page
  ; One byte: 7-8 cycles or 84-96 clocks
  ; Two random: 14-16 cycles or 168-192 clocks
  ldy #offsetof(type, fieldname)  ; 2
  lda ($00),y                     ; 6, minus 1 for read not crossing page
  iny                             ; 2
  lda ($00),y                     ; 6, minus 1 for read not crossing page

  ; SM83, pointer in DE
  ; One byte: 7 cycles or 140 clocks
  ; Two consecutive: 9 cycles or 180 clocks
  ld hl,offsetof(type, fieldname) ; 3
  add hl,de                       ; 2
  ld a,[hl+]                      ; 2
  ld a,[hl+]                      ; 2

  ; Z80, pointer in IX
  ; One byte: 19 T-states or 114 clocks
  ; Two random: 38 T-states or 228 clocks
  ld a,[ix+offsetof(type, fieldname)]    ; 19
  ld a,[ix+offsetof(type, fieldname)+1]  ; 19

However, the 6502 has a trick up its sleeve: structure of arrays. If records are statically allocated, all the values for one field can be placed together. This lets the 6502 use "absolute indexed" addressing, which adds an offset in an 8-bit register to a 16-bit pointer.

  ; 6502, SOA index in X
  ; Each byte: 4-5 cycles or 48-60 clocks
  lda fieldname,x       ; 5, minus 1 for read not crossing page
  lda otherfieldname,x

  ; SM83 or Z80, SOA index in DE
  ; Each byte (SM83): 7 cycles or 140 clocks
  ld hl,fieldname       ; 3
  add hl,de             ; 2
  ld a,[hl]             ; 2
  ld hl,otherfieldname
  add hl,de
  ld a,[hl]

But if the positions differ by only one bit, such as if they're 8, 16, 32, or 64 bytes apart, bit operations on L can speed up calculating the address for subsequent accesses. This requires thinking of your record as a binary hypercube, where each field is connected to the fields whose address differs by one bit.

  ; 6502, SOA index in X
  ; Each byte: 4-5 cycles or 48-60 clocks
  ; Two random: 8-10 cycles or 96-120 clocks
  lda fieldname,x       ; 5, minus 1 for read not crossing page
  lda otherfieldname,x

  ; SM83 or Z80, SOA index in DE
  ; First byte (SM83): 7 cycles or 140 clocks
  ; Two bytes, 1 bit different (SM83): 11 cycles or 220 clocks
  ld hl,fieldname       ; 3
  add hl,de             ; 2
  ld a,[hl]             ; 2
  set log2(otherfieldname^fieldname),l  ; 2
  ld a,[hl]             ; 2

Memory clearing

Memory copying