HN2new | past | comments | ask | show | jobs | submitlogin

Off topic but "Break the ___ barrier" has got to be my least favorite expression that PR people love. It's a "tell" that an article was written as meaningless pop science fluff instead of anything serious. The sound barrier is a real physical phenomenon. This is not. There's no barrier! Nothing was broken!


The Exascale barrier is an actual barrier in HPC/Distributed Systems.

It took 15-20 years to reach this point [0]

A lot of innovations in the GPU and distributed ML space were subsidized by this research.

Concurrent and Parallel Computing are VERY hard problems.

[0] - http://helper.ipam.ucla.edu/publications/nmetut/nmetut_19423...


The comment isn't saying that the benchmark isn't useful. They are saying that there is no 'barrier' to be broken.

The sound barrier was relevant because there was a significant physical effects to overcome specifically when going trans-sonic. It wasn't a question of just adding more powerful engines to existing aircraft. They didn't choose the sound barrier because its a nice number, it was a big deal because all sorts of things behaved outside of their understanding of aerodynamics at that point. People died in the pursuit of understanding the sound barrier.

The 'exascale barrier', afaict, is just another number chosen specifically because it is a round(ish) number. It didn't turn computer scientists into smoking holes in the desert when it went wrong. This is an incremental improvement in an incredible field, but not a world changing watershed moment.


When Exascale was defined as a barrier in the mid-late 2000s, a lot of computing technologies and techniques that are taken for granted today did not exist outside of the lab.

For example, FPGAs were considered as much more viable for sparse matrix computation instead of GPUs, BLAS implementations were not as robust yet, parallel programming APIs like CUDA and Vulkan were in their infancy, etc.

Just because you didn't do well in your systems classes or you think spinning up an EC2 instance on AWS is "easy" doesn't mean it's an easy problem.

That's like saying Einstein or Planck are dummies because AP Physics E&M students can handle basic relatively and quantum theory.


Exascale was an arbitrary target- it didn't unlock any magical capabilities. In fact the supercomputer folks are now saying the real barrier that will unlock their science is "zettaflops" (just saw a post from a luminary about it).

(also please don't be rude or condescending, it detracts from your argument)


Exascale is the primary target decided among everyone in the HPC community in the mid-late 2000s because FLOPS (floating point operations per second) is the unit used to benchmark, as there are various different variables like compiler, architecture, etc are very difficult to account for.

It's functionally the same as arguing that GB or TB are arbitrary units to represent storage.


Yes, that's correct: GB and TB are arbitrary units. We use them because historically, successive multiples of 1000 have been used to represent significant changes in size, mass, and velocity.

I used to work with/for NERSC and was around when they announced exascale as a target, and now they want to target zettascale. There is no magic threshold where science simulations suddenly work better as you scale up. It's mainly about setting goals that are 10-15 years away to stimulate spending and research.


> I used to work with/for NERSC and was around when they announced exascale as a target

We most likely crossed paths. How close were you to faculty in AMPLab?

> It's mainly about setting goals that are 10-15 years away to stimulate spending and research.

And that's not a barrier to you?

That feels very condescending about applied research or the amount of effort put into the entire Distributed Systems field.


It’s not a barrier because there is nothing qualitatively different at 999 vs 1000. It’s just a goal.

This is not condescending to the field at all. Crossing an arbitrary goal that is very difficult to get to is still impressive. Just stop using the “breaking the barrier” phrase.


I am being condescending about the effort put into the distributed systems field- very specifically, about classical supercomputers.

How close was I to faculty in AMPLab? Pretty close; I attended a retreat one year, helped steer funding their way, tried to hire Matei into Google Research, and have chatted with Patterson extensively around the time he wrote this (https://www.nytimes.com/2011/12/06/science/david-patterson-e...) then later when he worked on TPUs at Google.

(I'm not a stellar researcher or anything, really more of a functionary, but the one thing I do have is an absolutely realistic understanding of how academic and industrial HPC works)


> > It's mainly about setting goals that are 10-15 years away to stimulate spending and research.

> And that's not a barrier to you?

In what way is that a barrier? Barrier and goal aren't synonymous, it's just marketing speak that confuses them.

Running a marathon is not a barrier, it's a goal, even if many people can't reach it. A combat zone at the halfway point of the marathon is a barrier because it requires a completely different approach to solve.


> There is no magic threshold where science simulations suddenly work better as you scale up.

For certain problems there are absolutely magic thresholds. ML was famously abandoned for 2 decades because the computers were too slow, and the ML revolution has only been possible because of having a ~teraflop on a per-machine level. Weather and climate models are another where there have been concrete compute targets, a whole earth model requires about 10 teraflops (hence the earth simulator super computer). ~1 meter resolution is an exascale level target.


What specifically about the system changes as you cross from "almost exascale" to "definitely exascale" to justify calling it a 'barrier'?


The same reason we choose to define a Gigabyte as 10^9 or use the Richter scale to measure earthquakes.

We need a benchmark to delineate between large magnitudes.

Furthermore, there are very real engineering problems that had to be solved to even reach this point.

A lot of noobs take GPUs, Scipy, BLAS, CUDA, etc for granted when in reality all of these were subsidized by HPC research.

The DoE's Exascale project was one of the largest buyers for compute for much of the 21st century and helped subsidize Nvidia, Intel, AMD, and other vendors when they were at their worst financially.


Did you even read the question?


You're wasting your time by engaging here.


Indeed.

I rarely do these days.


And I answered - building the logistics and ecosystem


You did not answer the question. Reaching 0.7 exaflops also requires those logistics and ecosystem. You didn't say what changes when you reach 1.0 (because it's nothing).

It's an easy to understand mark on a very smooth difficulty curve. Not a barrier.


He asked what is significant about 1 exascale that it's a "barrier".

Now granted, the original rendition of this saying ("Breaking the sound barrier.") is also arbitrary because mach 1 is the speed of sound travelling through air on planet Earth, it's still a valid question that you did not answer.


Breaking the sound barrier isn't arbitrary. Indeed even measuring it using the Mach number shows that. Mach 1 isn't fixed, it is variable based on a number of attributes.

At Mach numbers above 1 the compressibility of the air is entirely different. The medium in which the airplane operates behaves differently, in other words. The sound barrier was a barrier because the planes they were using stopped behaving predictably at Mach > 1. They had to learn to design planes differently if they wanted to fly at those speeds.

Mach 1 is an external constraint mandated by the laws of physics. There is a good reason that sound can't travel faster.

That is why it is a barrier to be broken. It is a paradigm shift imposed entirely by the properties of our physical world.


> Just because you didn't do well in your systems classes or you think spinning up an EC2 instance on AWS is "easy" doesn't mean it's an easy problem.

Please leave personal attacks out of this, it is not in the spirit of HN, or in helping to see people's perspectives.

I'm not saying its not a hard problem, not at all. I respect the hell out of the work that has been done here.

I'm saying that the sound barrier is called a barrier for a very good reason. Aerodynamics on one side of the sound barrier are different than aerodynamics on the other. It is a different game entirely. That is why it is considered a barrier. A plane that has superb subsonic aerodynamics will not perform well on the other side of that barrier

Exascale computers on the other hand, while truly amazing, are not operating differently by hitting 10^18 FLOPS. If you're computer does 10^18 -20 FLOPS it is not operating in a fundamentally different set of rules than one running above the exaflop benchmark.

I never said that the achievement wasn't laudable. I argued that there is no barrier there.

If I'm wrong I would appreciate you explaining why doing things at 10^18 FLOPS is fundamentally different than computing just below that benchmark.


Because the unit used to measure IO is FLOPS [0], and most societies have settled on base-10 as their numerical system of choice for thousands of years. Furthermore, it is not possible to predict the exact number of cycles you'll need, since that depends on the architecture, compiler and many other factors. This is why FLOPS are used as the unit of choice.

Each jump in flops by a magnitude of 10^3 is a significant problem in concurrency, IO, parallelism, storage, and existing compute infrastructure.

Managing racks is difficult, managing concurrent workloads is difficult, managing I/O and storage is difficult, designing compute infra like FPGAs/GPUs/CPUs for this is difficult, etc.

[0] - https://en.m.wikipedia.org/wiki/FLOPS


So its just an arbitrary base-10 number, and not a constraint imposed by physics or some other outside constraint like the sound barrier?

That's kind of my point, an Exaflop is a benchmark and not a 'barrier'. The sound barrier wasn't a 10^3 change in speed, the difference between a subsonic plane and a supersonic plane is measured in percentages when it comes to speed, e.g. a plane that is happy at .8 Mach for a top speed is going to be designed under a completely different set of rules than one that tops out at 1.2 Mach.

Again, I'm not saying that the accomplishments are insignificant, I'm just arguing press-release semantics here.


If you're not being facetious I recommend listening to the 2022 ACM Gordon Bell Award Winner lecture.

What you're doing is the equivalent of asking why do we use Gigabyte or Terabyte as a metric.

Just reaching 10^12 floating point operations per second was not something that was done until 2018.


No, it’s not equivalent at all. Nobody is asking about the unit of measure.


It is definitely not incremental. If you watch some talk on Gordon Bell prize for Frontier, you see that the dynamics have changed completely.

Data, software stack, I/O have suddenly become bottlenecks in multiple places. So yes, it is a watershed moment..


Can you explain how 10^18 FLOPS is fundamentally different than (10^18 - 20) FLOPS? Do the conventional rules of computing completely change at that exact number?


https://www.osti.gov/servlets/purl/1902810

See this for example. Different applications have different scales at which they reach similar problems. Exascale is decent upper bound for most of the fields. If you really dig in deep, the bound may be found at slightly lower value > 500 Pflops. But it's a good rule of thumb to consider 1 EFlop/s to be safe.

Also see this https://irp.fas.org/agency/dod/jason/exascale.pdf


But that same point was made at every 3 levels of magnitude improvement in supercomputing history.


For example, In situ visualization is now preferred due I/O bottleneck in application instead of compute bottleneck. This seems different from last generations.


“It didn't turn computer scientists into smoking holes in the desert when it went wrong. “

Shame, really, HPC could use a little bit of high stakes adventure to make it sexier. (Funny how risking death makes things more attractive ?)

There’s something about working with equipment where the line between top performance and a smoking hole is a matter of degree.

Also opens up a lot more Netflix production opportunities and I bet code safety would get a bump as well.


100% agreed. I might put my thoughts about it into an essay and name it "Breaking barriers considered harmful".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: