As trollbridge said, there isn't enough VRAM in the CGA to have 200 rows of 80 columns (2 bytes per cell - one for character and one for attribute). But it can be done in 40-column mode. As the CRTC can only do 128 rows unattended, it requires the CPU to be involved (reprogramming the CRTC several times per frame on particular scanlines) but 8088 MPH and Area 5150 did such things.
I haven't published it yet as there are still some rough edges to clear up, but if you email me (andrew@reenigne.org) I'll send you the current work-in-progress (the same one that nand2mario is working from).
I wrote a program called xtce-trace (https://www.reenigne.org/software/xtce_trace.zip ) to do just this (albeit non-interactively - you just give it a program and it will generate a cycle-by-cycle trace of which lines of microcode are executed). GloriousCow aka Daniel Balsom recently fixed some of its bugs and turned it into an actual emulator (https://github.com/dbalsom/XTCE-Blue ), though it's not finished yet so there's no binary release at the moment.
xtce_trace sounds fantastic — exactly what I need right now while debugging my Verilog 8086 core. Thank you also for the microcode disassembly. It’s been fun to work through.
It's not - the "DA" means that the write happens to the ES segment, while "DS" means that the read happens from the DS segment. The use of different segments for the source and destination gives these instructions extra flexibility.
Yes, it pops the stack value into the CS register. But it doesn't update the PC as well - it'll continue to point to the offset (in the old CS) of the instruction after the "POP CS". So whatever code is in the new CS, it has to be "compatible" with the code in the old CS, in terms of the instructions starting in the right place. However, it's even more complicated than that because the prefetch queue is not flushed, so the exact address where it switches over will be unpredictable. For certain very specific scenarios it could potentially be useful to speed up conditional execution by eliminating those prefetch queue flushes, though.
A more recent example of an finite loop with no exit condition can be found in the credits section of the 2015 demoscene production "8088 MPH". The loop (which can be found at https://www.reenigne.org/blog/8088-pc-speaker-mod-player-how... under "v:") has two jumps (one conditional, one unconditional) both backwards, and runs with interrupts disabled. The CPU's instruction pointer stays within that block until the end of the routine - there's no wraparound to make a forward jump from a backward one.
There's a POP instruction in the loop that pops to a memory location addressed by a register. When that register contains the address of the final JMP instruction, the latter gets overwritten by a forward JMP.
I used the DOSBox debugger to debug several of the effects during the making of Area 5150, but other than that I used real hardware. 86Box is probably the most accurate one at the moment but still isn't accurate enough to run the entire demo correctly.
Most of the tricks in the demo should work unmodified (or could be made to work) on most CGA implementations of the era. Some clones had slightly different font ROMs so it would look a little bit wrong on those. The final lake effect (and the ripply picture shortly before it) use very tight cycle counting so probably won't work on anything except a genuine IBM PC/XT and CGA. Some effects (like the radial fire effect and the voxel landscape) should work on just about anything. The Amstrad PC1512 will have trouble with any effect that modifies the CRTC timing registers as it doesn't have a fully programmable CRTC and always generates a 15.7kHz/59.92Hz 640x200 image. I don't have personal experience with the Tandy 1000 and don't know how compatible it is off the top of my head.
Thank you for your reply, it is conforting to know the tricks are solid. Maybe it will help some retroprogrammed CGA games to appear with more colors and effects.
I have a VGA XT clone (4.77/8 MHz) and an ATI Small Wonder CGA clone board. I will try to bind them together and run Area 5150.
The ATi Small Wonder is not entirely cycle-accurate. When I studied the CGADEMO by Codeblasters, I found that the scroller didn't look entirely correct, because the Small Wonder was slightly slower: https://scalibq.wordpress.com/2014/11/22/cgademo-by-codeblas...
It appears to insert a few extra waitstates compared to an IBM card.
This may trip up some effects in Area 5150.
As I remember it, the term "XT class" was used back in the day to distinguish between "AT class" PCs and those prior - i.e. to mean an IBM PC, XT or comparable machine. So (weirdly enough) the 5150 was considered XT class despite predating the XT (they're almost identical as far as software is concerned anyway). The term "PC class" wasn't a thing because PC came to be shorthand for any IBM PC/XT/AT or later x86 machine. Some more powerful machines like the Amstrad PC1512 (with an 8MHz 8086) were also considered XT class - these machines could run games like Bruce Lee and Digger which were designed for the PC/XT (and which were too fast on AT and later machines), though the gameplay was quicker than they were designed for so were extra-challenging.
Yeah, that seemed to be the usage in most contemporary literature/magazines. There are also those other architectural factors like the bus width, the number of PICs, the keyboard subsystem and so on, which is why you could have "XT-class" 8086 and even 286 machines (like a Tandy 1000 model or two).
What may also have had an impact was that clones were generally XT-clones (using the newer, smaller ISA card layout, and no cassette port), not PC-clones. And indeed, they were specifically advertised as 'XT-clone', 'XT-class' and such.
So the 5150 is really the oddball here.
Back in the old days I understood the usage as follows: 'PC' was a catch-all term for all IBM PC-compatible machines that ran DOS. 'XT' was the term used for machines with an 8088 (or sometimes V20) CPU (and of course there was the 'Turbo XT' subclass for CPUs running at more than 4.77 MHz). 'AT' was the term for 286 machines.
After that era, people just identified PCs by the CPU used, so you had '386es', '486es', 'Pentiums' etc.
The C000 segment was used for the EGA/VGA extension ROM. I'm guessing that using D000-EFFF would be unnecessary (because of the planar addressing squeezing 256kB of video memory into a 64kB address space), inconvenient (because the addresses wouldn't be contiguous - EGA and VGA were designed to coexist with either CGA or monochrome adapters in B000-BFFF) and (for VGA) insufficient - you'd still not have enough to map the entire 256kB of VRAM linearly. I also expect that IBM's engineers didn't want to take up all the extension ROM space because then it wouldn't be possible to add EMS cards, network cards, and whatever else ended up being mapped there. Though 192kB of write-only video memory in that space would be an interesting design!