I am not having a good time with systemd on my embedded device.
Somewhat complicated scenario, I need to change the mode of a wifi driver (which I do via modprobe), use iw to create a second interface for that wifi device, and finally use networkmanager to create a wifi hotspot based on the host name of the device. We set the hostname based on the mac address of wlan0.
There are a lot of places this can go wrong, and you need a lot of deep knowledge of systemd to get it right.
I start with three services, one to set the hostname, one to create the wlan0_ap device, and one to start up the wifi hotspot using nmcli.
Alright, simple enough, this is the kind of thing systemd is good at, right?
Well no, (and pardon me if my memory isn't perfect on this, it's tricky to debug weird race conditions) sometimes setting the hostname fails. Why? Well sometimes wlan0 doesn't exist, there's a udev rule that renames wlan0 during start up. Apparently I can add the systemd tag to that udev rules, and it will create a `.device` dependency I can Require. Alright, use both require and after keywords just to be sure. Good enough.
Huh, onehsot services are listed as stopped even though they've run. Do some research, find out I need the `RemainAfterExit` keyword yet. Sure, dependencies are working.
Now wlan0_ap device creation sometimes fails. Why? No really, why, this is hard to debug. Well looking around at dmesg it looks like the problem is that networkmanager is trying to do something with the device at the same time I am. Looks like it's enumerating device capabilities at the same time I'm trying to create a new device? Oh well, I'll just add the retry keyword to my service.
Oh, wait, I can't add the retry keyword to one-shot services? Really? Internet tells me to change the service type to simple. Sure, we'll try that. Now that wlan0_ap device exists, let's start our access point.
Also it's still sometimes failing on the hostname step still? The set-hostname command is asynchronous I think? I'll add a line to the start_access_point script that waits until the hostname isn't the default.
Everything is looking good. Oh wait, it's sometimes failing intermittently. What the hell, why doesn't wlan0_ap exist? Service type `simple` considers a service to be started when the command first launches, not when it exit successfully. I've introduced another race condition.
How can I make a service that works like a oneshot, but that you can retry? What, write a shell script and use systemd-notify to say it's complete?
Long story short I ended up just writing a traditional shell-script based init script. Maybe I'm thinking about systemd wrong or something.
> I need to change the mode of a wifi driver (which I do via modprobe)
Is that something that could be added to modprobe.conf to have it work right without any modprobe calls?
> Huh, onehsot services are listed as stopped even though they've run. Do some research, find out I need the `RemainAfterExit` keyword yet. Sure, dependencies are working.
Yeah I've also been burned not having RemainAfterExit=y in a oneshot unit. What's worse, such an unit can get started multiple times.
> Well looking around at dmesg it looks like the problem is that networkmanager is trying to do something with the device at the same time I am. Looks like it's enumerating device capabilities at the same time I'm trying to create a new device?
To be fair such race condition could have happened in any init system.
> Oh well, I'll just add the retry keyword to my service.
Perhaps having a dependency to start your service before network-manager would have solved this.
> Long story short I ended up just writing a traditional shell-script based init script. Maybe I'm thinking about systemd wrong or something.
It's perfectly fine IMO to have small shell script services for such stuff.
Probably some of the problems are due to network-manager being mostly designed/used for desktop use cases. It also likes to take complete control of all the interfaces so calling iw/ifconfig behind it's back will cause tears (as you found out).
I do agree that race conditions caused by parallel service startup really suck in embedded devices. Systemd really could use a "please be as deterministic as possible" mode for embedded (if it already doesn't have).
>Is that something that could be added to modprobe.conf to have it work right without any modprobe calls?
Yes, that's how I do it. But that just switches the mode to one that supports virtual interfaces, I still need to add the virtual interface, and I can't do that in a modprobe conf.
>To be fair such race condition could have happened in any init system.
I feel like it's really easy to make that happen under systemd if you're not completely on the ball, where as it's more difficult in things like openrc.
>Yeah I've also been burned not having RemainAfterExit=y in a oneshot unit. What's worse, such an unit can get started multiple times.
You live and you learn, on its own something like that isn't a huge problem.
>Perhaps having a dependency to start your service before network-manager would have solved this.
It was one in a long-chain of similar issues, but yes. There are a lot of ways I probably could have made this work, but at the end of the day I'm still writing shell scripts and not using very many of systemd's "helpful" features.
I think you're trying to force the init and service manager systemd side to do network things. Well those are more complex than the purely service manager side to handle and that's ultimately why you struggle.
All the extra complexity is the reason that systemd ships with its own network daemon systemd-networkd to handle that complexity separately. It sounds like you need a .netdev file to create your virtual interface. Also .device files are also handled by networkd. I use them on our embedded devices pretty successfully. Take a look (especially wlan type): https://www.freedesktop.org/software/systemd/man/latest/syst...
Somewhat complicated scenario, I need to change the mode of a wifi driver (which I do via modprobe), use iw to create a second interface for that wifi device, and finally use networkmanager to create a wifi hotspot based on the host name of the device. We set the hostname based on the mac address of wlan0.
There are a lot of places this can go wrong, and you need a lot of deep knowledge of systemd to get it right.
I start with three services, one to set the hostname, one to create the wlan0_ap device, and one to start up the wifi hotspot using nmcli.
Alright, simple enough, this is the kind of thing systemd is good at, right?
Well no, (and pardon me if my memory isn't perfect on this, it's tricky to debug weird race conditions) sometimes setting the hostname fails. Why? Well sometimes wlan0 doesn't exist, there's a udev rule that renames wlan0 during start up. Apparently I can add the systemd tag to that udev rules, and it will create a `.device` dependency I can Require. Alright, use both require and after keywords just to be sure. Good enough.
Huh, onehsot services are listed as stopped even though they've run. Do some research, find out I need the `RemainAfterExit` keyword yet. Sure, dependencies are working.
Now wlan0_ap device creation sometimes fails. Why? No really, why, this is hard to debug. Well looking around at dmesg it looks like the problem is that networkmanager is trying to do something with the device at the same time I am. Looks like it's enumerating device capabilities at the same time I'm trying to create a new device? Oh well, I'll just add the retry keyword to my service.
Oh, wait, I can't add the retry keyword to one-shot services? Really? Internet tells me to change the service type to simple. Sure, we'll try that. Now that wlan0_ap device exists, let's start our access point.
Also it's still sometimes failing on the hostname step still? The set-hostname command is asynchronous I think? I'll add a line to the start_access_point script that waits until the hostname isn't the default.
Everything is looking good. Oh wait, it's sometimes failing intermittently. What the hell, why doesn't wlan0_ap exist? Service type `simple` considers a service to be started when the command first launches, not when it exit successfully. I've introduced another race condition.
How can I make a service that works like a oneshot, but that you can retry? What, write a shell script and use systemd-notify to say it's complete?
Long story short I ended up just writing a traditional shell-script based init script. Maybe I'm thinking about systemd wrong or something.