Hacker News new | past | comments | ask | show | jobs | submit login
The Linux audio stack demystified (rtrace.io)
134 points by ruffyx64 43 days ago | hide | past | favorite | 54 comments



Wrote this blog article as I needed to get a better understanding of the audio stack on Linux (esp. PipeWire, PulseAudio, ALSA, etc. ...). The article turned out to be a lenghty in-depth explanation of how audio works, how digital audio works, and what sound servers on linux actually do. Tried to write it in a way so it is accessible and understandable for beginners but also enlightening for experienced users. Hope it's helpful to HN


My experience:

I'm interested in how Linux Audio works. The first half of the article covers other topics. It could be a separate article. An article focused on Linux Audio could say "For audio basics, click this link to my article on Audio Basics."

Even for beginners, that's useful because even beginners just want to get sound out of their speakers and anatomy and physics lessons are in the way. It's ok to start with ALSA. There's no need to boil the ocean.


Learning by trying to teach is probably the best way to clarify and crystallize what we think we know. Always appreciate these kind of posts, especially since they tend to shine a line on all the contextual bullshit that experts take for granted.

Right now I’m doing the same for an identification/contextual guide of local weeds and insects for seasonal scouts (I’m an agronomist). Unfortunately I find complexity tends to quickly become fractal and highly interlinked and it’s hard to set an entry point or tell when to limit scope.

I think you’ve done a great job of doing just that.


A friend of mine who is a doctor said that when he was learning the saying was “watch one, do one, teach one.”

He made a morbid modification; “watch one, botch one, do one, teach one.”


I really appreciate blogs/articles like this. It really helps me get beyond the surface on things and I always learn something. Thanks for taking the time to share.


I can explain it much more simple

"At first Linus created /dev/dsp, and the user did smile upon him, and the user did see that it was good, and the user did see that it was simple, and people did use their sound, and people did pipe in and out sound as they did please, and Ken Thompson Shined upon them for following the way"

"Then the fiends got in on it and ruined it all, with needless complexities and configurations and situationships, with servers and daemons, and server and daemon wrappers to wrap the servers and daemons, and wrappers for those server wrappers, and then came security permissions for the server wrapper wrapper wrappers, why doesn't my sound work anymore, and then the server wrapper server wrapper wrapper server did need to be managed for massive added complexity, so initd was replaced by systemd, which solves the server wrapper wrapper server server wrapper through a highly complicated system of servers and services and wrappers"

RIP /dev/dsp you will be missed

- Kernighan 3:16


I'm not going to be routing 6 input streams into VSTs then out my stereo monitors with just /dev/dsp.


Thanks for the nice writing. But do you have any insight on why is bluetooth audio so clunky on Linux? I'm using a pair of Sony XM4 and I have never had any problems on my 4 Windows machines. But on Ubuntu (both 22.04 and 24.04), I have had to jump through many hoops, from editing a bunch of config files, changing kernel flags, disable and enable a bunch of things I don't understand (mostly from reading Arch Wiki), just to get it working some of the times. Some days it will just outright refuse to connect, sometime it connects but not playing anything (switching audio device to it generates some undecipherable error logs), and (probably worst) sometime it connects very quickly but stay locked in low fidelity mode instead of a2dp sink. I'm so fed up that I just switched to wired headphones every time I use my Ubuntu.


I also have XM4's and they worked fine on Arch after addressing two problems:

Do you dual boot? Different OS's on the same computer will generate different pairing keys even though they share the same MAC, and this will cause connection issues. Usually that's reported as having to re-pair every time you switch OS's though.

https://unix.stackexchange.com/questions/255509/bluetooth-pa...

I've also experienced audio skipping & popping using a dual WiFi/Bluetooth card that were eliminated by disabling WiFi. Apparently the Linux driver was faulty and allowed some interference; the card worked fine on Windows.


Thank you for linking the SE thread! So this is the reason I've been having so much trouble with my Bt devices recently. I've used Linux as my daily for years but have also started dual-booting Windows as of a few weeks ago, and I've had to re-pair every single time I'd switch systems. I just chalked it up to just generic Bluetooth issues.


Debian does not ship the AAC codec, due to legal quagmire surrounding the necessary code. The same probably goes for Ubuntu. That might be the cause of at least some of your problems. https://tookmund.com/2024/02/aac-and-debian


It's so clunky, IMO, because bluetooth is a dumbass protocol with things in the standard that should not be there (including which audio codecs are supported with which levels of bluetooth). Rather than just being a more simple network of wireless devices, it's a very complex protocol which makes everything more complicated.

Why you may struggled could be anything from the firmware blob for your bluetooth device, to the kernel driver installed, to bluez, to the sound server you are using. Any one of those things messing up will lead to a bad experience.

I've had a relatively good experience with kde-plasma's bluetooth management stuff. But I still have to do dumb things like manually selecting which audio codec to use when I go on a call.

How could bluetooth be better? It should be at least 2 standards. 1 defining the wireless data transfer and network capabilities, a second which defines how a computer negotiates with a device to send audio. It shouldn't be 2 standards merged together like it currently is. Wifi Direct is more what bluetooth should be.


> It's so clunky, IMO, because bluetooth is a dumbass protocol with things in the standard that should not be there

And yet GP has no issues on Windows...

> Why you may struggled could be anything from the firmware blob for your bluetooth device, to the kernel driver installed, to bluez, to the sound server you are using. Any one of those things messing up will lead to a bad experience.

Ah, so actually the complexity and instability of the Linux audio stack _could_ be at fault after all. But let's blame the protocol instead, even though it works fine on other operating systems.

To be fair, I agree that BT is a mess. And I've personally also had bad experiences on Windows with it. But the insanity of the Linux audio stack is indefensible. It's a major part of the problem, even if BT were a flawless and simple protocol.


> And yet GP has no issues on Windows...

How well bluetooth works depends largely on the quality of drivers from chipset manufactures. As you can imagine, manufacturers put a priority on making sure their windows systems have well functioning drivers.

You'll also notice that Android (usually) has well functioning bluetooth drivers even though it's ultimately the same linux kernel under the covers.

> Ah, so actually the complexity and instability of the Linux audio stack _could_ be at fault after all. But let's blame the protocol instead, even though it works fine on other operating systems.

The linux audio stack doesn't help things, for sure, however a lot of the complexity between the audio stack and bluetooth revolves around the fact that bluetooth requires a well implemented driver for it to work well with the audio stack.

If you compare it to something like a regular sound card you'd quickly see why that's the case. For a sound card, the driver manufactures just need a driver that can convert PCM into soundwaves. The interface is quite simple which is why you generally don't see issues with the linux audio stack and a hard wired soundcard/chip.

That's why I blame the protocol more than the stack. The protocol is very complex (needlessly so). So instead of something that could just be "send these packets to this device" you have to hope and pray that the driver you are integrating with has properly coded up various codecs needed to talk to your headphones. Instead of just throwing a bitstream at a device you are now stuck with your audio stack negotiating with the driver about which codecs to select before sending in an audio signal. This is part of what adds complexity to the audio stack in the first place.

You end up with 2 routes for the audio stack, all other sound producing and receiving devices then bluetooth.


> I'm using a pair of Sony XM4 and I have never had any problems on my 4 > Windows machines. But on Ubuntu (both 22.04 and 24.04), I have had to > jump through many hoops [...]

I also have XM4's (best headphones in my life; seriously, they've saved my sanity and lowered my stress levels, more than a few times), but I never had any problems with BT pairing. I use them with my phone, Ubuntu, OpenSUSE, ArchLinux and macOS, although not Windows, and they always pair up perfectly fine. I have two-device mode activated at all times.

My SO uses them (she has her own XM4's) with Windows and her phone, and also never had any problems.

Maybe it's a hardware issue?


I have no issues with bluetooth. Just click on the device, associate and then it works. After the 1st time just being on is enough.


I use arch linux and have never had an issue with pairing bluetooth with anything. In fact, imho, it works much smoother than Windows because I keybind bluetoothctl to connect to any bluetooth headphones, speakers, keyboard or whatever automatically using their bluetooth device IDs. To do this you must first pair them (I use the blueman-manager gui) and then get their bluetooth device ids and keybind the bluetoothctl command. All of this is easy to do by asking ChatGPT. Hope this helps.


I've never done much with Bluetooth under desktop Linux, but that sounds like a woeful pain in the ass compared to the usual steps for Android or Windows:

1. Pair headphones in a couple of clicks/taps; sound comes out.


You can just pair as usual, yes, like any other OS, via a similar gui. And the device will then reconnect in the future.

What the parent is describing is an advanced flow, that can be helpful if you have lots of computers & need to juggle bt devices.

Setting up a hotkey just takes pre-work to setup. This workflow is optional. But it saves time & effort if for some reason you are one of the very few users who moves devices around a lot.


A hotkey is more work than GP is describing. Pairing is a one-time thing, after that they connect automatically when the headphones are on and nearby.

...which, also, is exactly what mine do with Ubuntu. I used bluetoothctl to pair them once when I first got them, and when I turn them on Ubuntu automatically connects and switches the audio over. I don't have the same model headphones as GGGP, so I'm guessing it's a problem specifically with that model's implementation (Edit: or from another person who has the same model and no issues, perhaps some combination of hardware/software specific to that user).


I think we're actually somewhat in alignment, but when you say

> Pairing is a one-time thing,

You ignore the two scenarios I face regularly, that stem from me having lots of devices and lots of computers & wanting to switch around what's paired to what.

We both seem to be trying to defeat the notion that using Bluetooth in Linux is hard or special (it's not at all, it works like anywhere else, and these reports of it being hard are from people with at best extremely small domains of experience & knowledge).

I was trying to add that Linux has further upsides for when you do want to go further, and highlight & interpret the parent post to show how I have those issues & describe how adding hotkeys (something only Linux does) would help me, an advanced user juggling many systems & device. I've clarified my post to mention that auto-reconnecting will just work on most scenarios (but I get why some folks might think it's cool to have hotkeys).


Yes the couple of clicks is the pairing. You have to pair.


Then this keybinding and device ID management business accomplishes what, exactly, other than exercising extra steps?


He likes to do it from command line. The steps are always the same.


I miss the simplicity of OSS :\


Hardware gets more nuanced and Linux needs to accommodate it. Otherwise we'd be stuck with blurry fonts and no UI scaling like it's 2014


>Otherwise we'd be stuck with blurry fonts

Things have only gotten worse as Pango has killed off bitmap font support. We already had crisp, clean, sharp, beautiful fonts, and apparently that upset some people who have more power than they ought to. Back in 2014 everything was grand. You have to choose your terminal emulator and other programs carefully now.

It's insane that people get monitors so pixel dense you can't use them normally, and post-scaling you have equal or less usable space to the monitors of old, just to avoid blurry fonts that didn't even need to be used in the first place. Then people try to use circular logic to justify it all.


Consumer grade audio hardware has not gotten any more "nuanced" for several decades now. For the vast majority of use cases OSS was perfectly fine and it offered more than enough API to handle new features.

For the small minority of uses cases where you might have two sound cards and you may want to do some kind of sample accurate combined production between the two at very low latencies, sure, OSS was _somewhat_ inadequate.

So we ended up with a giant complicated audio stack where the boundaries between kernel space and user space are horribly blurred and create insane amounts of confusion and lost hours to benefit the 1% of users who might actually use those features.

It was a complete mistake.


The OSS was inadequate the same day, when it was introduced; it couldn't even handle hardware available at the time (GUS, for example). It was really just mapping of the Soundblaster to a device file. For a single process, of course, all the others would have to wait, mute -- for mixing multiple inputs, you would need that dreaded daemon. Or GUS-like hardware, but with enough channels, so that yo won't run out of them. But then, mixing them in CPU is more effective, than pushing them all over external bus.

In a modern computer, you might have more sound cards than you are aware; the onboard sound codec, the outputs on your graphic card (that thing that pushes sound over DP/HDMI is a separate "sound card"), you might have some usb device (soundbars on monitors are usually usb sound devices), heck, even microphones from the last two decades have their own output. Webcam? Another sound device. Gamepad? That one too. And that's before anyone connects anything bluetooth. So it is not a small minority, in fact, it is the vast majority.

Audio stack boundary is in user space; period. It does stuff, that doesn't belong to kernel and is a perfect candidate for a daemon.


> it couldn't even handle hardware available at the time

I would not call the GUS "consumer hardware." It was also the cast that most games offered support for it, but most companies did not put significant effort into it, and the support was either broken or buggy.

> For a single process, of course

ALSA is no different. dmix is purely in userspace. Which is why it has IPC keys that you can configure, and have to configure under certain circumstances.

> you would need that dreaded daemon.

You could use any of a number of different daemons depending on your particular use case and you weren't required to make one of them work or keep it compatible with your kernel driver versions. The OSS audio API was completely stable. The ALSA audio API eventually was.

> So it is not a small minority, in fact, it is the vast majority.

The import of my comparison is that the problem with OSS is attempting to use these cards /simultaneously/ in a "sample accurate and low latency way." OSS could, of course, handle multiple different cards and devices (easy as/dev/dsp0 vs /dev/dsp1). It did not offer any way to time them with a common reference, which made them inadequate for certain types of _professional recording_ scenarios.

You have not, so far, described anything OSS could not do.

> Audio stack boundary is in user space; period.

Yea, except the timing, which is the effectively the only benefit ALSA brings over OSS. Which, by the way, is a feature that is not at all in user space.

> that doesn't belong to kernel and is a perfect candidate for a daemon.

The one you dread?


I use Void Linux, and find it reasonably simple :) (the reason I like the distro essentially)

Nothing against complex things, if that's your thing though. (usually complex things are made to be 'easier'/more convenient to operate too, for some definition of easier)


I think they meant OSS (Open Sound System), not OSS (Open Source Software). In the Linux space, OSS predates ALSA.

(Back in the OSS days, we tended to use the term "free software" or even "copyleft" more than we did "OSS" to describe software licensing.)


I think OSS is still a default sound framework on FreeBSD?


Yeah, IMHO the best audio linux has ever was with OSS v3 and a soundcard that did hardware mixing. No software mixers like ESD or ARtS were needed.


There have been no cards that can do hardware mixing under production for more than 15 years. This is delusional.

Also, the cards that could do that back in the day were, audio quality speaking, shite.

If that's really what you consider "the best audio linux has ever", I think you don't know audio on linux very well.

I will grant you one thing: if you did have one of those cards, it certainly made multiple applications all playing (same sample rate) audio at the same time as easy as it could be. But that's all.


> There have been no cards that can do hardware mixing under production for more than 15 years. This is delusional.

I think you are mistaken. (And a bit rude.)

https://us.creative.com/p/sound-blaster/sound-blaster-audigy...


Alright, fair enough.

That's totally the exception however, whereas back in, say, 2000, such devices were the norm. These days, the current crop of prosumer/proaudio audio interfaces (both PCI and USB) do not offer this sort of facility.

Yeah, that was a bit rude, sorry.


An informative article for the Linux parts, I skipped the basics/intro.

I’d like to see some more detail on the rating chart, particularly on the axes where pipewire doesn’t surpass JACK/pulseaudio.

As an embedded software engineer who deals with processing at hundreds of kilohertz, it is funny hearing anything running Linux called “real time”.

If it’s not carefully coded on bare metal for well understood hardware, it’s not real time, it’s just low latency. No true Scotsman though(looking over my shoulder for the FPGA programmers).


So far the audio section is a great intro to audio and digitization, and applies to any a-to-d process at some level. Looking forward tomplowing through the rest.

The problem with audio is it's realtime (isochronous), which means good audio processing requires a guarantee of sorts. To get that guarantee requires a path through the system that's clear, which can be difficult to construct.


Well, the rest of the article read a bit like a readme summary of the various audio daemons.

I was hoping to see more of the "how does linux audio guarantee low latency/time guarantees kids of things, especially when IPC is known to be slow?" What kids of shortcuts through the kernel are there to gwt those processing guarantees, if any? Etc etc.

Still, coming from someone who is clueless linux audio subsystems the article gave me a starting point so I know where to dig.


"Professional audio will typicall utilize 24-bit. Everything higher than that is usually bogus. Bogus where only audiophiles will hear a difference." Does he mean internal DAW bit rates like 64/32bit float are bogus, I am probably reading it wrong ?


If you listen to an audio file at 24 bit vs 64 bits (bit depth, not bitrate), you won't notice a difference. However, if you're manipulating audio in a DAW or similar, it's possible for noise to end up amplified in the final output, so a higher bit depth could make a difference.

Think of it this way: every time you add a filter or any type of audio manipulation in your DAW, you're discarding some information and replacing it with noise (how much depends on what manipulation you're doing, but it's almost always >0). If you start at 24 bits and then don't manipulate anything, it's all good. But if you start at 24 bits and then lose 10 bits of the true signal, you're down to just 12 bits of information. But if you start at 64 bits, you can lose 40 bits before you start to notice anything (or really it depends quite a lot on many different factors, but in general there's a threshold where noise goes from "not noticeable" to "noticeable" and it's probably usually between 8 bits and 32 bits).

Don't quote me on the details (I am not an audio engineer or anything even slightly related), but that's the general gist of it.


I read them as talking about listening, as represented in mentioning audiophiles.

The extra depth/range available in DAW's are useful for effects processing, mixing, and mastering and are a little colored by trying to squeeze max-performance DSP on a general-purpose/commodity CPU. I just don't take them as talking about that here though.


And the bits are basically free. If we had very cheap 24-bit floats and nothing bigger, maybe we'd use those, but we've got cheap 32-bit floats, so those are fine.

The most important property of floating point is "infinite headroom". In integer space, sixteen times quieter means 4 fewer bits of audio, get the levels wrong badly enough and people can hear your mistake even if you fix it later - but in float space it barely makes any difference, so long as the levels are correct in the final consumed audio nobody cares.


We would NOT use 24 bit floats since that would make them less than ideal at matching the hypothetical (and almost certainly never reach) 24 bit resolution of integer DAC/ADC hardware.

The reason why 32 bit floats work great is that they can handle a 24 bit integer without any loss, and then if for some reason the values get kicked up above the maximum you can represent there, you get subtle noise rather than heavy distortion.


I don't think I agree. As you say, those extra few bits in your integer PCM are probably just noise, worse they might be correlated noise. They're not worthless, but I can't agree that they're automatically better than the infinite headroom option.

We don't have a world with 24-bit float DAWS, in our world stuff tends to offer 32-bit float, and so that's a no brainer, but just as I'm sure the 14-bit CD would have been perceived much the same as our world's 16-bit CD (bad engineers would do a bad job with it, good engineers would learn to use it well, some people would hate it for no reason), I think 24-bit float in the studio would have similar fans to 32-bit float.


“16 times quieter” is not 4 bits.

“Half volume” is subjective, and for music is typically between 6 and 10dB (most US audio engineering classes use 10dB).


I think he's kind of wrong. As you say, anything going through any kind of professional audio editing software is probably 32/64 bit float. AFAIK all audio plugin standards work on 32/64 bit floats.

Although I imagine at least historically that's more because 32 bit floats are a native data type.


I don’t deal with audio, but I do use high frequency DACs/ADCs.

I have never found a DAC that actually has useful/detectable output differences above 16-18 bits. I’m not talking about audible, I mean with oscilloscopes. Many DACs take 32 bit inputs, but those extra bits aren’t useful in the real world.

The integral and differential non linearity of DACs in the real world make those extra bits misleading.


Very nice article, I love posts that go right from the basics and build up to answer the question. And I certainly have a better understanding of DACs as a bonus!


Dupe from three days ago by the same author https://news.ycombinator.com/item?id=41042753


No mention of AoIP. I make heavy use of Netjack2 in my production / streaming studio. Great way to move 25/30 channels of audio between 5 PCs in real-time.

Beats the pants off DANTE.


PipeWire is starting to get AES67 support, which seems to be the audio and/or video streaming standard the industry is rallying around. PTPv2 vs DANTE's PTPv1, and just a much clearer protocol. I'm so excited for it! https://gitlab.freedesktop.org/pipewire/pipewire/-/wikis/AES...

There's a bunch of neat hardware listed in a ticket thread that folks have been playing with. Bluetooth to AES67 adapters, analog to AES67, whole huge video wall streamers. https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/32...


Well, the most confusing part of linux is definitely the audio stack. Thanks for the writeup.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: