More

SuperV1234 · 2026-03-10T18:02:11 1773165731

Please check the article again -- I made a mistake in the original measurements (the Docker image I used had GCC compiled in debug mode) and now the (correct) times are ~50% faster across the board.

Not free, still need to audit, but much better than before. Sorry.

SuperV1234 · 2026-03-10T18:01:14 1773165674

The data was incorrect due to an oversight on my part (the Docker image I had used compiled GCC with internal assertions enabled).

Including <print> is still very heavy, but not as bad as before (from ~840ms to ~508ms)

SuperV1234 · 2026-03-10T18:00:02 1773165602

I've updated the article to say "translation unit" the first time "TU" is introduced. The data was also incorrect due to an oversight on my part, and it's now much more accurate and ~50% faster across the board.

SuperV1234 · 2026-03-10T17:59:02 1773165542

Update & Apology:

I've fully updated the article with new benchmarks.

A reader pointed out that the GCC 16 Docker container I originally used was built with internal compiler assertions enabled, skewing the data and unfairly penalizing GCC.

I've re-measured everything on a proper release build (Fedora 44), and the compile times are ~50% faster across the board.

The article now reflects the accurate numbers, and I've added an appendix showing the exact cost of the debug assertions.

I sincerely apologize for the oversight.

SuperV1234 · 2026-03-10T11:15:58 1773141358

Nah, many great features are extremely cheap. E.g. constexpr, templates, fold expressions, equality operator defaulting, concepts...

direwolf20 · 2026-03-10T12:29:53 1773145793

constexpr means running code at compile time. template means duplicating lots of code lots of times. these are not cheap.

jeffbee · 2026-03-10T12:50:17 1773147017

Yeah this is the very first time I am hearing that templates are "extremely cheap". Template instantiation is pretty much where my project spends all of its compilation time.

SuperV1234 · 2026-03-10T18:04:46 1773165886

It depends on what you are instantiating and how often you're doing so. Most people write templates in header files and instantiate them repeatedly in many many TUs.

In many cases it's possible to only declare the template in the header, explicitly instantiate it with a bunch of types in a single TU, and just find those definitions via linker.

jeffbee · 2026-03-10T18:22:41 1773166961

On the few times that I have looked at clang traces to try to speed up the build (which has never succeeded) the template instantiation mess largely arose from Abseil or libc++, which I can't do much about.

JamesTRexx · 2026-03-10T12:09:38 1773144578

Until you try to add / modify a feature of the software and run into confusing template or operator or other C++ specific errors and need to deconstruct a larger part of the code to find (if possible) out where it comes from and spend even more time trying to correct it.

C++ is the opposite of simplicity and clarity in code.

kreco · 2026-03-10T11:31:53 1773142313

Yes, but they are still all hidden.

SuperV1234 · 2026-03-06T23:32:38 1772839958

I ran some more measurements using import std; with a properly built module that includes reflection.

I first created the module via:

  g++ -std=c++26 -fmodules -freflection -fsearch-include-path -fmodule-only -c bits/std.cc

And then benchmarked with:

  hyperfine "g++ -std=c++26 -fmodules -freflection ./main.cpp"

The only "include" was import std;, nothing else.

These are the results:

- Basic struct reflection: 352.8 ms

- Barry's AoS -> SoA example: 1.077 s

Compare that with PCH:

- Basic struct reflection: 208.7 ms

- Barry's AoS -> SoA example: 1.261 s

So PCH actually wins for just <meta>, and modules are not that much better than PCH for the larger example. Very disappointing.

SuperV1234 · 2026-03-10T18:03:18 1773165798

Update: I had originally used a Docker image with a version of GCC built with assertions enabled, skewing the results. I am sorry for that.

Modules are actually not that bad with a proper version of GCC. The checking assertions absolutely crippled C++23 module performance in the original run:

Modules (Basic 1 type): from 352.8 ms to 279.5 ms (-73.3 ms) Modules (AoS Original): from 1,077.0 ms to 605.7 ms (-471.3 ms, ~43% faster)

Please check the update article for the new data.

SuperV1234 · 2025-11-12T06:57:52 1762930672

This post completely misunderstands how to use exceptions and provides "solutions" that are error-prone to a problem that doesn't exist.

And this is coming from someone that dislikes exceptions.

marler8997 · 2025-11-12T15:00:15 1762959615

I avoid using exceptions myself so I wouldn't be surprised if I misunderstand them :) I love to learn and welcome new knowledge and/or correction of misunderstandings if you have them.

I'll add that inspiration for the article came about because It was striking to me how Bjarne's example which was suppose to show a better way to manage resources introduced so many issues. The blog post goes over those issues and talks about possible solutions, all of which aren't great. I think however these problems with exceptions don't manifest into bigger issues because programmers just kinda learn to avoid exceptions. So, the post was trying to go into why we avoid them.

xerokimo · 2025-11-13T00:17:33 1762993053

RAISI is always wrong because the whole advantage of the try block is to write a unit of code as if it can't fail so that what said unit is intended to do if no errors occur is very local.

If you really want to handle an error coming from a single operation, you can create a new function, or immediately invoke a lambda. This would remove the need break RAII and making your class more brittle to use.

You can be exhaustive with try/catch if you're willing to lose some information, whether that's catching a base exception, or use the catch all block.

If you know what all the base classes your program throws, you can centralize your catch all and recover some information using a Lippincott function.

I've done my own exploring in the past with the thought experiment of, what would a codebase which only uses exceptions for error handling look like, and can you reason with it? And I concluded you can, there's just a different mentality of how you look at your code.

thegrim33 · 2025-11-12T16:36:08 1762965368

No offense, but why did you decide to write an instructional article about a topic that you "wouldn't be surprised that you misunderstand"? Why are you trying to teach to others what you admittedly don't have a very solid handle on?

marler8997 · 2025-11-12T17:44:11 1762969451

None taken :) I think sharing our thoughts and discussing them is how we learn and grow. The best people in their craft are those who aren't afraid to put themselves out there, even if they're wrong, how else would you find out?

mixedmath · 2025-11-13T13:55:42 1763042142

I'm not very familiar with proper exception usage in C++. Would you mind expanding a bit on this comment and describing the misunderstanding?

guy2345 · 2025-11-12T09:06:51 1762938411

what about the misreported errno problem?

binary132 · 2025-11-12T11:23:15 1762946595

obviously the errno should have been obtained at the time of failure and included in the exception, maybe using a simple subclass of std exception. trying to compute information about the failure at handling time is just stupid.

SuperV1234 · 2025-11-01T11:54:06 1761998046

https://github.com/hanickadot/compile-time-regular-expressio...

canucker2016 · 2025-11-01T16:39:54 1762015194

FYI: This is a C++ template version of compile time regex class.

A 54:47 presentation at CppCon 2018 is worth more than a thousand words...

see https://www.youtube.com/watch?v=QM3W36COnE4

followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw

As the above github repo mentions, more info at https://www.compile-time.re/

SuperV1234 · 2025-09-02T19:18:00 1756840680

Note that taking a 'const' by-value parameter is very sensible in some cases, so it is not something that could be detected as a typo by the C++ compiler in general.

Animats · 2025-09-06T08:11:24 1757146284

Right. Copying is very fast on modern CPUs, at least up to the size of a cache line. Especially if the data being copied was just created and is in the L1 cache.

If something is const, whether to pass it by reference or value is a decision the compiler should make. There's a size threshold, and it varies with the target hardware. It might be 2 bytes on an Arduino and 16 bytes on a machine with 128-bit arithmetic. Or even as big as a cache line. That optimization is reportedly made by the Rust compiler. It's an old optimization, first seen in Modula 1, which had strict enough semantics to make it work.

Rust can do this because the strict affine type model prohibits aliasing. So the program can't tell if it got the original or a copy for types that are Copy. C++ does not have strong enough assurances to make that a safe optimization. "-fstrict-aliasing" enables such optimizations, but the language does not actually validate that there is no aliasing.

If you are worried about this, you have either used a profiler to determine that there is a performance problem in a very heavily used inner loop, or you are wasting your time.

spacechild1 · 2025-09-06T07:55:28 1757145328

Yes. For example, if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.

vitus · 2025-09-06T20:53:21 1757192001

> if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.

Whether an argument is passed in a register or not is unfortunately much more nuanced than this: it depends on the ABI calling conventions (which vary depending on OS as well as CPU architecture). There are some examples where the argument will not be passed in a register despite being "small enough", and some examples where the argument may be split across two or more registers.

For instance, in the x86-64 ELF ABI spec [0], the type needs to be <= 16 bytes (despite registers only being 8 bytes), and it must not have any nontrivial copy / move constructors. And, of course, only some registers are used in this way, and if those are used up, your value params will be passed on the stack regardless.

[0] Section 3.2.3 of https://gitlab.com/x86-psABIs/x86-64-ABI

Rubberducky1324 · 2025-09-06T07:34:16 1757144056

clang-tidy can often detect these. If the body of the function doesn't modify the value, for example.

But it needs to be conservative of course, in general you can't do this.

SuperV1234 · 2025-07-01T11:07:10 1751368030

Level 4: switching to C++