sfbea's comments

sfbea · on March 1, 2024

Apologies, the benchmark is fine. The reason the system allocator is faster than I expected is because Linux's slab allocator takes over for especially small allocation sizes, and it's terrifically fast.

I'm changing up my random-actions benchmark to display results over various allocation sizes, as some allocators do much better than others at different sizes. As a heads up, Frusa takes a large hit at higher allocation sizes. Perhaps tuning bucket sizes or something could help? I'll try to have the benchmarks on GitHub this weekend so you can play around with them, if you'd like to investigate.

sfbea · on Feb 29, 2024

Your results caught me off guard. Particularly, the (linux) system allocator is too fast. I think the simplicity of the benchmark (allocating and immediately deallocating) might be causing issues... perhaps unwanted optimizations? I'm not sure.

On my random actions benchmarks (this resembles real allocation patterns somewhat better?):

- 1 thread: Talc is faster than Frusa and System, Frusa is comparable to System

- 4 threads: System is fastest, Frusa does about ~half as well, Talc does ~half as well as Frusa

Our benchmarks agree on the Frusa vs Talc comparison.

Benchmarks aside, Frusa seems neat. In particular, I had some misconceptions about how to tackle concurrency in Talc which Frusa's code demonstrates not to be true. I may give writing a concurrent version of Talc another shot soon.

sfbea · on Feb 29, 2024

Because the glibc allocator is designed for hosted systems with threading (uses pthreads) and memory management utilities not found on bare metal/other smaller platforms. You shouldn't be using Talc where MiMalloc, Jemalloc, the glibc allocator, etc. would be used instead, besides some very particular situations. (Correct me if I'm wrong.)

I could add these benchmarks. They were there at one point in the past, but it's a disingenuous comparison unless the reader understands the particulars of the workload and the particulars of the tradeoffs each allocator makes. Talc will probably beat these allocators in single-threaded allocation, but will suffer under heavily multithreaded loads and does not currently have the system integrations to release unused blocks of memory back to the system (this can be achieved, to a degree via the OOM handler system, but I haven't yet implemented something like this), nor will it be making syscalls like mmap/sbrk at all.

There is the case where you'd want a faster single-threaded allocation pool within a larger application though, which is a case to be made for using Talc when you have access to the system allocator or mimalloc/jemalloc. Perhaps I'll set up something for that.

sfbea · on Feb 29, 2024

Thanks for opening the issue. The allocator looks pretty interesting. Happy to try add it to the benchmarks, although doing apples-to-apples tests with its limitations might not be possible without some changes.

sfbea · on Feb 29, 2024

[Author of talc] Glad this feature is proving useful. Seeing this makes me think I should implement a better-looking Display implementation than the default Debug impl though. Something for the next update ^-^