Hacker News .hnnew | past | comments | ask | show | jobs | submit | SuperV1234's commentslogin

Please check the article again -- I made a mistake in the original measurements (the Docker image I used had GCC compiled in debug mode) and now the (correct) times are ~50% faster across the board.

Not free, still need to audit, but much better than before. Sorry.


The data was incorrect due to an oversight on my part (the Docker image I had used compiled GCC with internal assertions enabled).

Including <print> is still very heavy, but not as bad as before (from ~840ms to ~508ms)


I've updated the article to say "translation unit" the first time "TU" is introduced. The data was also incorrect due to an oversight on my part, and it's now much more accurate and ~50% faster across the board.

Update & Apology:

I've fully updated the article with new benchmarks.

A reader pointed out that the GCC 16 Docker container I originally used was built with internal compiler assertions enabled, skewing the data and unfairly penalizing GCC.

I've re-measured everything on a proper release build (Fedora 44), and the compile times are ~50% faster across the board.

The article now reflects the accurate numbers, and I've added an appendix showing the exact cost of the debug assertions.

I sincerely apologize for the oversight.


Nah, many great features are extremely cheap. E.g. constexpr, templates, fold expressions, equality operator defaulting, concepts...

constexpr means running code at compile time. template means duplicating lots of code lots of times. these are not cheap.

Yeah this is the very first time I am hearing that templates are "extremely cheap". Template instantiation is pretty much where my project spends all of its compilation time.

It depends on what you are instantiating and how often you're doing so. Most people write templates in header files and instantiate them repeatedly in many many TUs.

In many cases it's possible to only declare the template in the header, explicitly instantiate it with a bunch of types in a single TU, and just find those definitions via linker.


On the few times that I have looked at clang traces to try to speed up the build (which has never succeeded) the template instantiation mess largely arose from Abseil or libc++, which I can't do much about.

Until you try to add / modify a feature of the software and run into confusing template or operator or other C++ specific errors and need to deconstruct a larger part of the code to find (if possible) out where it comes from and spend even more time trying to correct it.

C++ is the opposite of simplicity and clarity in code.


Yes, but they are still all hidden.

I ran some more measurements using import std; with a properly built module that includes reflection.

I first created the module via:

  g++ -std=c++26 -fmodules -freflection -fsearch-include-path -fmodule-only -c bits/std.cc 
And then benchmarked with:

  hyperfine "g++ -std=c++26 -fmodules -freflection ./main.cpp"
The only "include" was import std;, nothing else.

These are the results:

- Basic struct reflection: 352.8 ms

- Barry's AoS -> SoA example: 1.077 s

Compare that with PCH:

- Basic struct reflection: 208.7 ms

- Barry's AoS -> SoA example: 1.261 s

So PCH actually wins for just <meta>, and modules are not that much better than PCH for the larger example. Very disappointing.


Update: I had originally used a Docker image with a version of GCC built with assertions enabled, skewing the results. I am sorry for that.

Modules are actually not that bad with a proper version of GCC. The checking assertions absolutely crippled C++23 module performance in the original run:

Modules (Basic 1 type): from 352.8 ms to 279.5 ms (-73.3 ms) Modules (AoS Original): from 1,077.0 ms to 605.7 ms (-471.3 ms, ~43% faster)

Please check the update article for the new data.


This post completely misunderstands how to use exceptions and provides "solutions" that are error-prone to a problem that doesn't exist.

And this is coming from someone that dislikes exceptions.


I avoid using exceptions myself so I wouldn't be surprised if I misunderstand them :) I love to learn and welcome new knowledge and/or correction of misunderstandings if you have them.

I'll add that inspiration for the article came about because It was striking to me how Bjarne's example which was suppose to show a better way to manage resources introduced so many issues. The blog post goes over those issues and talks about possible solutions, all of which aren't great. I think however these problems with exceptions don't manifest into bigger issues because programmers just kinda learn to avoid exceptions. So, the post was trying to go into why we avoid them.


RAISI is always wrong because the whole advantage of the try block is to write a unit of code as if it can't fail so that what said unit is intended to do if no errors occur is very local.

If you really want to handle an error coming from a single operation, you can create a new function, or immediately invoke a lambda. This would remove the need break RAII and making your class more brittle to use.

You can be exhaustive with try/catch if you're willing to lose some information, whether that's catching a base exception, or use the catch all block.

If you know what all the base classes your program throws, you can centralize your catch all and recover some information using a Lippincott function.

I've done my own exploring in the past with the thought experiment of, what would a codebase which only uses exceptions for error handling look like, and can you reason with it? And I concluded you can, there's just a different mentality of how you look at your code.


No offense, but why did you decide to write an instructional article about a topic that you "wouldn't be surprised that you misunderstand"? Why are you trying to teach to others what you admittedly don't have a very solid handle on?


None taken :) I think sharing our thoughts and discussing them is how we learn and grow. The best people in their craft are those who aren't afraid to put themselves out there, even if they're wrong, how else would you find out?


I'm not very familiar with proper exception usage in C++. Would you mind expanding a bit on this comment and describing the misunderstanding?


what about the misreported errno problem?


obviously the errno should have been obtained at the time of failure and included in the exception, maybe using a simple subclass of std exception. trying to compute information about the failure at handling time is just stupid.



FYI: This is a C++ template version of compile time regex class.

A 54:47 presentation at CppCon 2018 is worth more than a thousand words...

see https://www.youtube.com/watch?v=QM3W36COnE4

followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw

As the above github repo mentions, more info at https://www.compile-time.re/


Note that taking a 'const' by-value parameter is very sensible in some cases, so it is not something that could be detected as a typo by the C++ compiler in general.


Right. Copying is very fast on modern CPUs, at least up to the size of a cache line. Especially if the data being copied was just created and is in the L1 cache.

If something is const, whether to pass it by reference or value is a decision the compiler should make. There's a size threshold, and it varies with the target hardware. It might be 2 bytes on an Arduino and 16 bytes on a machine with 128-bit arithmetic. Or even as big as a cache line. That optimization is reportedly made by the Rust compiler. It's an old optimization, first seen in Modula 1, which had strict enough semantics to make it work.

Rust can do this because the strict affine type model prohibits aliasing. So the program can't tell if it got the original or a copy for types that are Copy. C++ does not have strong enough assurances to make that a safe optimization. "-fstrict-aliasing" enables such optimizations, but the language does not actually validate that there is no aliasing.

If you are worried about this, you have either used a profiler to determine that there is a performance problem in a very heavily used inner loop, or you are wasting your time.


Yes. For example, if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.


> if an argument fits into the size of a register, it's better to pass by value to avoid the extra indirection.

Whether an argument is passed in a register or not is unfortunately much more nuanced than this: it depends on the ABI calling conventions (which vary depending on OS as well as CPU architecture). There are some examples where the argument will not be passed in a register despite being "small enough", and some examples where the argument may be split across two or more registers.

For instance, in the x86-64 ELF ABI spec [0], the type needs to be <= 16 bytes (despite registers only being 8 bytes), and it must not have any nontrivial copy / move constructors. And, of course, only some registers are used in this way, and if those are used up, your value params will be passed on the stack regardless.

[0] Section 3.2.3 of https://gitlab.com/x86-psABIs/x86-64-ABI


clang-tidy can often detect these. If the body of the function doesn't modify the value, for example.

But it needs to be conservative of course, in general you can't do this.


Level 4: switching to C++


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: