Unfortunately, 6502 can't XOR the accumulator with itself. I don't recall if the Z80 can, and loading an immediate 0 would be most efficient on those anyway.
XOR A absolutely works on Z80 and it's of course faster and shorter than loading a zero value with LD A,0.
LD A,0 is encoded to 2 bytes while XOR A is encoded as a single opcode.
XOR A has the additional benefit to also clear all the flags to 0. Sub A will clear the accumulator, but it will always set the N flag on Z80.
Yeah, the article seems to have missed the likely biggest reason that this is the popular x86 idiom - that it was already the popular 8080/Z80 idiom from the CP/M era, and there's a direct line (and a bunch of early 8086 DOS applications were mechanically translated assembly code, so while they are "different" architectures they're still solidly related.)
The 6502 gets by doing immediate load: 2 clock cycles, 2 bytes (frequently followed by single byte register transfer instruction). Out of curiosity I did a quick scan of the MOS 1.20 rom of the BBC micro:
Are you sure you're not an LLM? There is no way anybody writing 6502 would do anything else, because there's no other way to do it.
(You can squeeze in a cheeky Txx instruction afterwards to get a 2-or-more-for-1, if that would be what you need - but this only saves bytes. Every instruction on the 6502 takes 2+ cycles! You could have done repeated immediate loads. The cycle count would be the same and the code would be more general.)
I suppose using Txx instructions rather than LDx is more of an idiom than intended to conserve space. Also, could an LDx #0 potentially be 3 cycles in the edge case where the PC crosses a page boundary? (I'm probably confused? Red herring?)
I don't know how the 6502's PC increment actually worked, but it was an exception to the general rule of page crossings (or the possibility thereof) incurring a penalty, or, as was also sometimes the case, just ignored entirely. (One big advantage of the latter approach: doing nothing does take 0 cycles.)
The full 16 bits would be incremented after each instruction byte fetched, and it didn't cost any extra if there was a carry out of the MSB.
That's a good idea because, although I love this, 1 minute per token is absolutely savage. Whereas if you can juice the performance you're into semi-credible Jar Jar Binks simulator territory.
It does also make me wonder what you could do with somewhat more powerful retro hardware. I'd love to see what a transformer running on a PSX or an N64 could do.
A possible point of comparison might be pool drain injuries (a/k/a suction entrapment), and some of these have disemboweled people, though largely children. Vacuum toilets in cruise ships have also been implicated in such incidents (see, among others, https://www.upi.com/Archives/1987/03/06/70-year-old-womans-i... ). A more, er, pressing concern in adults might be rectal prolapse.
I'd say up to a couple of hundred is much more than 40. Not a full decimal order of magnitude, but even without compression the 170KB on one side is up to 4½×.
Although he's trying to avoid using floating point, the dirty secret in many Microsoft-derived BASICs, including Commodore's, is that everything is floating point. In fact, even if you explicitly declare a variable as integer, it actually gets truncated and expanded: the native format for calculations is still 40-bit MBF. The only advantage integer variables have is smaller array storage. Every variable in his program is actually internally handled as a floating point value even though they're all integrals.
reply