No, this is panels from which interposers will be made. Which are now larger than chips and rectangular, so wasted edges from a 300mm wafer are high. The proposed size is much larger than chip-grade ingots.
They don't need perfect silicon. It can be grown on a continuous ribbon which is sliced into panel sizes like they do for solar cells. If they need a perfect surface they can deposit some pure Si to finish it. Maybe we will eventually see that replace ingots for chip grade.
MIPS had load const to high or low half. More that 40 years ago Transputer had shift-and-load 8 bit constants. Lots of ancient precedents for rare big constants.
So does classic PowerPC, SPARC, and many other ISAs. It's the most common way to handle it on RISC. The Power10 prefixed instruction idea just expands on it.
Interesting idea. Effectively moving the extra decode stage in front of the Icache, making the Icache a bit like a CISC trace/microOp cache. On a 512b line you would add 32 bits to mark the instruction boundaries. At which point you start to wonder if there is anything else worth adding that simplifies the later decode chain. And if the roughly 5% adder to Icache size (figuring less than 1/16th since a lot of shared overhead) is worth it.
When Moore wrote in 1965, commercial use of MOS was 10 years in the future and Dennard scaling would not become widely understood and stirring interest in CMOS until 15 years in the future. So, he was actually observing an era much like now, with multiple chiplets inside the can and all sorts of random improvements that had an emergent trend. The Dennard era, which gave Moore its main impulse, was about 20 years long. Maybe 25 years if you include controlling tunnel leakage by introducing Hf-based dielectrics, and FinFETs since they sort of crinkle the surface of the chip to give you double the area, and otherwise obey classic Dennard laws of constant power per unit area.
But even during the Dennard era there were a bunch of big random innovations needed to keep things going. CMP allowing the number of metal routing layers to balloon, keeping distances under control. Damascene metals allowing much finer metals carrying heavier currents. Strained channels for higher frequency and better balance between P and N. Work-function-biased fully depleted transistors to avoid the stochastic problems with doping small channels. Etc.
So what really happened is not that Moore ended. We still have a wealth of random improvements (where "wealth" is the driving force) which contribute to an emergent Moore improvement. But the large change is Dennard ended, which gave us scaling at constant power. Although some of the random improvements do improve energy efficiency per unit of computation, they are not overall holding the line on power per cm2. At the end of the classic Dennard we were around 25W /cm2 but now we commonly have 50W in server chips, and there are schemes in the works to use liquid cooling up to hundreds of W / cm2.
Well, ok. But does that kill Moore? Not if it keeps getting cheaper per unit function. And by that I do not mean per transistor. But as long at that SOC in your phone keeps running faster radio, drawing better graphics, understanding your voice, generating 3D audio, etc., and is affordable by hundreds of millions of consumers, Moore remains undead.
DRAM is not greatly affected by radiation, because the capacitors are large structures relative to radiation events. SRAM is affected, which is why SRAM arrays should always use SECDED ECC.
The dominant cause of DRAM failures is bit flips from variable retention time (VRT), where the cell fails to hold charge long enough to meet refresh timing. These are believed to be caused by stray charge trapped in gate dielectric, a bit like an accidental NAND, and they can persist for days to months. This is why the latest generation (LPDDR4x, LP/DDR5) have single bit correction built into the DRAM chip. Along with permanent single cell failures due to aging this probably fixes more than 95% of DRAM faults.
The DRAM vendors sure could do a lot better on publishing error statistics. They are probably the least transparent critical technology used in everything, but no regulation requires them to explain and they generally refuse statistics on faults even to major customers (which is why folks like AMD run large experiments at supercomputer sites to investigate, and most clouds gather their own data).
That said, DRAM chips are pretty good. The DDR4 generation probably had better than a 1000 FIT rate per 2GB chip, so in a laptop with 16GB that would have been less than 10 error per million hours, or under 1 per 50 laptops used for a year.
For many of us the vast majority of data is in media files. I personally notice broken photos and videos every now and then. I would love to have a laptop with a competent ECC level, but they do not exist. Even desktop servers often come without. It is unclear how much better the LP/DDR5 generation will be since the on-die ECC still does not fix higher order faults in word lines and other shared structures, which may sum to as much as 10% of aging faults. All simply educated guesses, since the industry will not publish.
So this might be a dumb question but it's been bothering me and you sound like you might know.
What's the big advantage of DRAM over SRAM? In school we learned that DRAM was cheaper -- but surely the difference between 1T1C and 6T isn't more than 6 and my intuition says C is big so it's probably 2 or 3 or something for a given process generation. The problem is that the latency of DRAM is absolutely dreadful. On one hand I see a staggering amount of engineering that goes into hiding DRAM latency, and on the other hand I see that DRAM has become so cheap that many systems are over-provisioned by a factor larger than its theoretical cost advantage purely by accident. The "obvious solution" would seem to be DIMMs of SRAM with (comparatively) wicked fast timings -- but this doesn't happen, despite the fact that the memory industry is extremely competitive and filled to the gills with smart people, so presumably there's another factor that stops "DIMMs of SRAM" from being viable. Do you happen to know what it is?
Density. Storing a bit in DRAM requires one capacitor, whose dielectric is simply the gate insulation layer on a transistor. Storing a bit in SRAM takes at least two complete transistors.
That's precisely the answer I didn't find convincing for the reasons I mentioned. If it were that simple, I strongly suspect we would have SRAM-DIMMs and DRAM-DIMMs duking it out in the marketplace in analogy to SSD vs HDD a decade ago.
> one capacitor, whose dielectric is simply the gate insulation
Every DRAM cell depiction I've seen in the last ~5 years has had a gigantic trench capacitor. Are those not in production?
Density isn't the only issue - power usage is also a contributing factor. Because SRAM uses more transistors per bit, leakage of the transistors in large arrays is a significant source of power draw. In DRAM leakage of the single transistor per cell can be compensated for by adjusting the refresh rate.
MRAM and other persistent memory technologies might be used someday, but there's a lot of R&D work to get them to the same level of price and performance as DRAM. It's sad that Intel gave up prematurely (imho) on Optane.
>SRAM is affected, which is why SRAM arrays should always use SECDED ECC.
How many SRAM arrays exist in electronics we use every day that don't have, at minimum an error detection and reset mechanism? What about hardware registers that aren't structured as 2D arrays, how often are those protected? Things like buses and counters are at least somewhat vulnerable too.
Those cell sizes are theoretical anyway. They do exist, but on sample chips optimized for SRAM. If you design a chip and select an IP macro for SRAM the density you actually get may be 3x or larger per bit. This is due to compromises on the process the rest of the chip goes through, the need for row and column periphery, ECC, and ports. One advantage the Z-cache has at AMD is the cache chip appears to be optimized for SRAM. It has 2x the capacity of the cache on the CPU chip even though it directly matches the overall cache outline on the base.
The main problem with scaling is that SRAM found an optimal layout in FinFET with no significant wiring issues. While scaling from 7nm on down none of the tricks that benefit logic density - fewer lines, reduced gaps, contacts over active gate - SRAM does not need them. It was already optimal. The only thing it benefits from is honestly finer features, which is happening only slowly.
The next major jump expected is CFET where going vertical is matched by a new optimal pattern that takes advantage if the N and P being above each other. That is generally expected for N2/20A processes.
Nice idea, but I honestly don't think it has much value for study. It was a solution to a problem which is no longer important, and what impressed David (and was fun for me) was implementing it under constraints (8086) that are no longer relevant. I would vote for some of the other stuff mentioned by others, like TeX as an example of mastery of both the application requirements with beautiful algorithms and inspirational documentation, PostgreSQL as a thriving large system with brilliant modularization that has enabled research, or LLVM as a pinnacle of code generation which has enabled a golden era of language and hardware design over the last 20 years.
Hi David, thanks for the kind words. I still have the source for that I think, it was written around 1984 originally for a project of my own, then sold to Logitech, Zorland, and Borland (before I went to work for Borland). The Borland one was probably tweaked a bit - like you my memory does not include details.
There were 7 registers to play with. AX, BX, CX, DX, SI, DI, and occasionally BP. MS-DOS had no rules about frame pointer although later Borland would prefer it was kept to a standard use so that exceptions can be handled.
It would have run faster in 32 bit. Many fewer cross products for multiply and faster convergence for division and sqrt, plus just fewer registers needed. But updating the entry and exit code may have been the largest win. By the mid 1990s the FPU was moving onto the CPU chips even at the low end so the main goal would have been stability and backwards compatibility.
I wrote assembly code for many machines before and after that. In the early days such careful juggling of registers was routine. I generally wrote my algorithms in Pascal (later in Modula or Ada and by Borland days, in C++) and then hand compiled down to assembly. That separation of algorithm and coding stopped it getting too tangled.
Thanks for the shout out! These days I write System Verilog (algorithms modelled with C++) for hardware projects, and Julia for the pure software.
They don't need perfect silicon. It can be grown on a continuous ribbon which is sliced into panel sizes like they do for solar cells. If they need a perfect surface they can deposit some pure Si to finish it. Maybe we will eventually see that replace ingots for chip grade.