How to Crash Systemd in One Command

ltbarcly3 · on Sept 28, 2016

It seems like it is quite fashionable to hate on systemd, and it seems like systemd is kindof a piece of crap - in some ways.

However, linux is missing basic functionality other os's offer, and systemd is showing up and trying to fill in those blanks. This is open source, if you don't like systemd, if you think it's crap, if you think there are obviously better ways to do it, well, what are you waiting for then?

Systemd is winning because they showed up and basically nobody else did. It's open source fork it and 'do it right'. Or point out the problems and hope that someone else will step up instead?

I think the world would better place if people spent less time griping and more time fighting to get their patch that fixes the problems merged. I guess coding is harder than writing blog posts though.

EDIT: I'm not saying systemd is perfect or even good. My point is that the people writing the code are going to have the final say, whether it's right or wrong, and it is unlikely that people writing blog posts lampooning it are really going to make a difference. I mean systemd has been despised and highly criticized from the beginning, and look where we are.

djsumdog · on Sept 28, 2016

We don't live in the open source world of the 1990s. People don't make this stuff in their basements. What was a community of devs that made things in their own time has turned into a huge industry supported system of semi-bullshit sharing. The OSS devs of the 90s wanted open source end products: a GIMP that could replace Photoshop. A blender that could replace 3D Studio. Today we mostly have OSS middleware.

As the other comments have stated, there have been several systemd drop in attempts, such as this one:

http://uselessd.darknedgy.net/

But maintaining them is terribly difficult when Redhat is pouring money into people to work on it full-time. There are lots of systemd-free forks, including ones for Debian and Arch. But you need a large user base to continue to use these forks so other people will be encouraged to support them. Money, donations and good ole kudos help too.

TL;DR we don't live in the open source world you think we do.

wmf · on Sept 28, 2016

Is Red Hat funding over 50% of Linux development? If systemd was so bad it seems like maybe our corporate overlords like IBM, Google, Intel, or Canonical could have spoken up or even banded together to compete with it, but they didn't.

fridsun · on Sept 29, 2016

Why would IBM care? It's AIX is live and well.

Canonical tried to compete with Upstart but failed.

What's curious is why Debian fell. It was the cornerstone.

foepys · on Sept 28, 2016

Canonical tried with Upstart but failed.

MrFurious · on Sept 29, 2016

Really not, the problem was upstream. Debian decided go with systemd.

jude- · on Sept 28, 2016

Uselessd has been abandoned, as your link points out. The rationale [1] is, predictably, that "mangling systemd into something a bit saner turned out to be frustrating" and it was "a fool's errand" to have tried.

[1] https://forums.darknedgy.net/viewtopic.php?id=4963

korethr · on Sept 28, 2016

> However, linux is missing basic functionality other os's offer, and systemd is showing up and trying to fill in those blanks.

I'm curious what functionality other OSes had that Linux did not before systemd came along. It seems to me that it has been subsuming functions of other systems on Linux that were already present, as opposed to offering anything new.

Socket activation? inetd/xinetd. Timed start of programs? Cron and At Logging? rsyslog/syslog-ng/etc. DNS? -- dnsmasq, powerdns, bind, etc.

Honestly, I think if system had stopped at doing init, and focused solely on that and doing the best damned possible job of it, there'd be less hate. I think what's pissing off a lot of the systemd haters is the scope creep of the project has undergone. I was personally fine with the idea of systemd when all it was promising was init and service management. But then oh, were going to take over /dev, and cron and logging, and logins and DNS and (to be determined). Ahh, thanks, but no thanks.

bjourne · on Sept 28, 2016

> I'm curious what functionality other OSes had that Linux did not before systemd came along.

Uniform daemon/service handling. Just look at the scripts in /etc/init.d (or wherever they are located in your particular SysV init setup). It's a total mess.

The rest of what systemd offers I can't comment on because I'm not familiar with it. But it is clear to me that systemd's declarative .service files are a billion times better than shell scripts.

dijit · on Sept 28, 2016

I'm definitely not claiming that openBSD has an awesome init system but _all_ my init files look like this:

   #!/bin/sh
   #
   # $OpenBSD: apmd,v 1.1 2011/07/06 18:55:36 robert Exp $
   
   daemon="/usr/sbin/apmd"
   
   . /etc/rc.d/rc.subr
   
   rc_reload=NO
   
   rc_cmd $1

there's also SMF, or runit, or any of the other dozens of init systems which solved the issue you have with consistency, they also solved other problems like socket activation which people wanted in init for some reason.

kakwa_ · on Sept 28, 2016

The messier part of the old sysinit V was dependency handling, LSB headers are just an horrible hack and do a very poor job at it.

However, imho, shell scripts are great for that kind of work. Services come in all sort of shapes, some are proper daemons (fork, cd /, close([stdin, stdout, stderr]) and the rest), some are a little worse to say the least. And more often than not, the later is your super business critical important app built in house by incompetent developers.

The point being that services comes in all forms and shapes, and having a scripting language for integrating them and, if needed, compensating for crappy software is extremely useful.

Another aspect is that programs contain bugs, and the more complex a program is, the more bugs it contains. Keeping PID 1 as simple as possible should be a goal, nobody needs an http server inside its init system (or a dhcp client, or a dns resolver or...).

For some years now, the Gentoo Project has OpenRC/runscript (specialized shell for init scripts). It works really really well and it's quite simple to write an init script in it. It's a bit sad that it wasn't more considered in other distributions.

justinsaccount · on Sept 28, 2016

> proper daemons (fork, cd /, close([stdin, stdout, stderr]) and the rest), some are a little worse to say the least. And more often than not, the later is your super business critical important app built in house by incompetent developers.

"proper daemons" are broken by design. Perhaps your "incompetent developers" know something that you don't?

https://cr.yp.to/daemontools/faq/create.html

How can I supervise a daemon that puts itself into the background? When I run inetd, my shell script exits immediately, so supervise keeps trying to restart it.

Answer: The best answer is to fix the daemon. Having every daemon put itself into the background is bad software design.

tete · on Sept 29, 2016

I agree on OpenRC. OpenRC is a great, simple, portable solution that really solves the problem without introducing new ones.

What I think is that we see an extreme case of "Not Invented Here". To be honest, OpenRC (at times) was displayed as Gentoo-Only-Project, despite being very portable.

I however want to highlight another general purpose Linux distribution that seems to be kicking along well (ie. not just being an anti-systemd distribution) and uses OpenRC.

Alpine Linux: https://alpinelinux.org/

I have seen it being used and worked with people who have been using it for a while. I never really got into it myself - lack of time being the reason.

bjourne · on Sept 29, 2016

Are you saying that there are a lot of daemons out there that doesn't fit into systemd's declarative architecture? If so, the daemons could be fixed or systemd could be extended to cater for "not perfectly written" daemons.

On the other hand if the daemon is written by incompetent developers and is doing very stupid stuff then it is a good thing that it doesn't fit. The developers will have an incentive to fix their broken code. And even with systemd it is possible to run arbitrary shell scripts at startup (afaik) which could start the daemon so they aren't totally out of luck.

> For some years now, the Gentoo Project has OpenRC/runscript (specialized shell for init scripts).

But is a scripting language really needed for daemon management? If the answer is no, then a more limited configuration language (systemd's .service files) is better.

regularfry · on Sept 28, 2016

It might be better than the shell scripts that were there. It's not better than the shell scripts that could have been there.

ltbarcly3 · on Sept 28, 2016

Since shell scripts are Turing complete...

pdkl95 · on Sept 28, 2016

While it is very important to avoid Turing completeness in some areas such as anything going over a network, daemon start needs flexibility for non-standard use cases.

Limiting startup scripts wouldn't protect against much anyway, because it would only defer problems into the daemon you are starting up.

sp332 · on Sept 28, 2016

Didn't upstart and launchd cover this before systemd?

ltbarcly3 · on Sept 28, 2016

I don't know launchd, but upstart was a piece of crap that didn't even implement it's own documented functionality.

rocqua · on Sept 28, 2016

True, upstart wasn't good. It's existence rebukes the claim that systemd won because it was the only replacement for sysv.

Daviey · on Sept 28, 2016

"piece of crap" really isn't fair. It powered many Ubuntu releases and Red hat 6. I liked upstart.

amaranth · on Sept 28, 2016

RHEL 6 only supported using upstart in sysvinit compatibility mode, I believe. They didn't actually use any upstart jobs.

Daviey · on Sept 28, 2016

...and that still doesn't make it a POC.

lsh · on Sept 28, 2016

I liked upstart!

rspeer · on Sept 29, 2016

Upstart also had inscrutable crashing bugs, kind of like what this article describes about systemd, and it didn't even do as much.

loeg · on Sept 28, 2016

launchd was for OSX.

sp332 · on Sept 28, 2016

Oh, whoops. Wasn't there another Linux init system around the same time?

gens · on Sept 28, 2016

There were plenty

http://blog.darknedgy.net/technology/2015/09/05/0/

JdeBP · on Sept 29, 2016

> > I'm curious what functionality other OSes had that Linux did not before systemd came along.

> Uniform daemon/service handling.

As others have pointed out, Linux did not lack this, and there were quite a number of systems that provided such on Linux.

5ilv3r · on Sept 28, 2016

You refuse to look, and thus have clarity. Dude that is not a good argument.

What is not uniform about the previous service handling implementation in debian?

bjourne · on Sept 28, 2016

> You refuse to look, and thus have clarity. Dude that is not a good argument.

What? Should I have to know everything about systemd to comment on one specific part of it? I don't know about alternative init systems either, all I wrote is that for service management: systemd >> SysV.

> What is not uniform about the previous service handling implementation in debian?

I can't tell you since I don't use Debian. I can tell you that in Ubuntu there are lots of inconsistencies. Like different service scripts respond differently to the help command.

Even the "status" commands output varies between scripts.

This means there is no way to introspect a service to know if it supports things like the "restart" command or if "stop" and then "start" should be used instead. This also means that making a robust GUI tool for handling services becomes hard/impossible.

pdkl95 · on Sept 28, 2016

> systemd >> SysV

Why are you framing the conversation as if these were the only options?

> I don't know about alternative init systems either

Oh, so you are speaking from ignorance. I suggest learning about the alternatives such as OpenRC and the management tools such as daemontools, runit, and s6.

There are many solutions to these problems, although they may not be supported (by default?) in your favorite distro. Judging the Linux ecosystem by the failings of one distro isn't helpful.

bjourne · on Sept 28, 2016

Don't be obtuse and don't misunderstand on purpose. It is very clear that I compared SysV vs. systemd in their service/daemon management capabilities and nothing else.

5ilv3r · on Sept 28, 2016

Ubuntu has used a different init system each time for the last 3 LTS releases. It's only bad because they keep changing it!

namecast · on Sept 28, 2016

I strongly disagree. I didn't need anyone to "show up" to replace my init system in production across several thousand servers, and the way systemd "won" that battle wrt Debian was, to be polite, controversial, political and extremely divisive. They sure as heck didn't win because "they showed up and no one else did". They showed up and RedHat's money showed up with them.

I hold no animus towards the systemd project's goals or aspirations, but I have yet to hear a compelling reason for systemd to be shipping in it's current state.

korethr · on Sept 28, 2016

> I hold no animus towards the systemd project's goals or aspirations, but I have yet to hear a compelling reason for systemd to be shipping in it's current state.

I am, in a way, reminded reminded of PulseAudio. I kept PA of my desktop and laptop for years, having heard about PA being a complex nightmare to configure and make work that caused audio to break. Eventually, there was an application I needed to use that had a hard dependency on PA, so I bit the bullet and installed it. And everything magically Just Worked. My only complaint at the moment is that the pactl command seems more fiddly and verbose than it might need to be.

A closer look at the complaints about PA showed that many of them were older. This gives me the impression that PA was in fact a steaming pile that broke audio at one point, but eventually matured into something functional and useful.

And I think this is where systemd is at present: perhaps a good idea, but its implementation is not yet production ready, and should not be the default on production systems. And the project's continued scope creep is working against it in that regard.

Twirrim · on Sept 28, 2016

PulseAudio took ages to get to a stable and reliable state. There are still warts (try and make your machine act as a bluetooth audio receiver. It's horribly ugly.)

systemd wasn't reliable or fit for purpose for servers when it got shoehorned in to everything.

When it comes to production servers you need stable and reliable. Not new and shiny. You're ideally rarely having anything happen on the server other than "run this software"

digi_owl · on Sept 28, 2016

Sadly there are two definitions of server floating around these days.

There is the classical "box in the corner sending and receiving data over the network".

And then there is the new "bunch of software in a container in the cloud somewhere".

And with the latter definition, stable and reliable seems to be of lesser importance in the eyes of the faithful. This because they can just fire off a new instance of the container if the existing one falls over (Uptime by machinegun as i like to think of it).

http://www.commitstrip.com/en/2015/07/08/true-story-fixing-a...

marcosdumay · on Sept 28, 2016

On either case, a fault means that at least one person will get a bad interaction with your service. And usually a few.

On either case, respawning time and scalability are limited by data coherence needs.

digi_owl · on Sept 28, 2016

Best i can tell, Poettering cooked up PA back when ALSAs dmix was something you manually enabled on cheap soundchips.

Frankly i think it would have gone nowhere if Canonical didn't decide they needed to copy Windows' "one volume slider pr program" thingy that was introduced in Windows 7 or there about.

Note btw that these days Poettering is no longer involved with PA development.

BTW, the main goal of systemd seems to be to create a single baseline for desktop Linux in code rather than spec.

It is a "continuation" of the -kit mentality that spawned within Freedesktop (or frankly within Fedora, at it seems said distro is basically the Petrie dish for anything under the Freedesktop umbrella).

korethr · on Sept 28, 2016

I find the 'one slider per program' frequently useful. Different programs can be outputting at wildly different levels. I might have my music playing at a comfortable volume, then discover a video on YouTube I want to watch, and it comes in painfully loud, such that I'm immediately reaching for the volume control on the speakers or yanking off my headphones. Then when the video's over, I get to turn my speakers/headphones back up.

With only a single master volume control I'm often fiddling with it to keep things at a reasonable level. With per program volume, I'm fiddling with levels less often and spending more time just listening.

digi_owl · on Sept 28, 2016

Frankly the problem there is that browsers, even though they are becoming a second OS, do not have a central volume control. Heck, i swear Youtube resets the volume setting at near random (or maybe its some kind of insane AB testing?).

kefka · on Sept 28, 2016

With PulseAudio, I have 1 singular feature that makes most people wow:

When I plug in any new audio device, whatever I'm playing/recording automatically sinks to the most recent device. Seamlessly. So if I'm on mail.google.com and a GVoice call comes in, I can take the call, and during the call, plug in my USB headset.

It just works.

Mac can't do that. Windows can't do that. Linux/PulseAudio can. And that kicks ass.

dijit · on Sept 28, 2016

Mac absolutely can do that, I'm not sure why you think it can't, perhaps it couldn't in the past.

I do it almost daily when I come home and plug in my Bose Companion 5 into my Macbook (USB Speakers)

kefka · on Sept 28, 2016

Admittedly, it's been quite some time since I last worked on a Mac. Think 10.6.8 era.

It could also have been a borked install as well. But I'm sitting in front of a Win10 install on a Surface 4 Pro, and it can't auto-sink audio.

gman99 · on Sept 29, 2016

Windows 10 can _definitely_ do that (I do the exact same thing with bluetooth/wired headsets frequently; I'm "fairly" sure I used to do that on Windows 7 too -- I remember being quite amazed when my Ubuntu machine "finally" caught up and was able to do the same thing.)

In fact, I quite often start music playing on the laptop (chrome/Google Play Music) and then power up my bluetooth headset -- it initially uses eSCO when it routes the audio over (mono) and then switches to A2DP a few seconds later

Maybe the issue is something to do with the SW (chrome/firefox?) you use for GVoice? Or maybe there are some edge cases that I'm not triggering (in particular, a USB headset is effectively a new soundcard and not just an audio sink; I can completely believe windows is flaky with them especially with third party drivers)

lilyball · on Sept 28, 2016

Mac can do that, but it's also up to the software. Software can either detect when the default audio device changes, or it can grab the device at startup and never change. Google's software, for example, has demonstrated the latter behavior in the past for me (specifically, Google Hangouts has had a tendency to not notice if I plug in headphones+mic and continues to use the built-in mic instead).

digi_owl · on Sept 28, 2016

USB headsets are a technological abomination.

You are talking about a headset welded to a soundcard, that you then insert into a data bus each time you want to use it.

fridsun · on Sept 29, 2016

Now that Apple ditched the headphone jack in iPhone 7...

boobsbr · on Sept 29, 2016

That's actually a pretty funny definition.

fridsun · on Sept 29, 2016

Windows definitely can do that.

boobsbr · on Sept 29, 2016

I'm still bitter about PulseAudio...

It completely fucked up the audio on my desktop at the time. Lots of cracking and hissing, some really weird delays as well.

ansible · on Sept 29, 2016

Up to Ubuntu 14.04, PulseAudio worked just great for me. To the point that I didn't notice it was running, because everything just worked.

With 16.04, PA has decided to not recognize the analog ports (headphone, microphone) on my motherboard. They are still there, and I can use ALSA programs to control the volume and play audio. But they just disappear after a couple seconds. It is annoying enough I might just move back to Ubuntu 14.04.

merb · on Sept 28, 2016

Well without production testing a product can never mature. It just depends how fast things will get shipped and fixed.

brainspider · on Sept 28, 2016

> However, linux is missing basic functionality other os's offer

Could you please elaborate on what some of those are?

> systemd is showing up and trying to fill in those blanks

I haven't really kept on top of all systemd releases, but all I can see is them replacing things that already exist, just doing them in a different way.

I agree the init system replacement was necessary, but I'm struggling to understand why and how it extended to trying to extinguish and replace every core function of the operating system ecosystem.

That's rubbing a lot of people the wrong way, especially as mentioned in the article, those who adhere to the unix philosophy of do one thing and do it right.

ltbarcly3 · on Sept 28, 2016

I basically agree with everything you're saying, I just think if the people who think systemd is misguided spend their time writing blog posts about it, nothing will change and the people actually writing the code will have the final say, right or wrong.

brainspider · on Sept 28, 2016

Just to play Devil's advocate here, I wouldn't say that nothing will change. Not everybody can code, and not everybody that can code has time to code for a new project. And that's no guarantee that if someone does submit code that it will be accepted.

I get the strong impression that systemd is driven by ego first, and technical innovation second.

But writing blogposts and making non-coders aware of some of the frustrations and dangers system administrators face will help the "business people" understand, and hopefully can put pressure from another angle.

We'll never all agree on one way of doing things, but I think that it's valid to socially encourage coders who have such a large sphere of influence and whose code can potentially be very harmful, to take more care and consider things outside their immediate bubble.

That blogpost alone has driven a lot of good discussions in this very thread, and I think that's as good and as healthy as helping more directly with submitting code.

ltbarcly3 · on Sept 28, 2016

I think it's probably not that valuable to write a blog post. Lets see if the UMASK problem, which is totally legitimate and the blog post makes a great case for fixing, actually gets fixed.

rocqua · on Sept 28, 2016

If the issue with systemd is deemed to be the backing culture, instead of specific code, writing patches isn't going to work. Instead you need to change hearts and minds. Blog posts work better for that (then again, if you are arguing, you've lost).

ultramancool · on Sept 28, 2016

The thing is though - their code is already written. Systemd is entirely unnecessary for 99% of people, probably everything except some of it's LXC features has clear and popular alternatives. LXC is the only thing making me eye systemd with interest, but I'll come back to it in a few years once all the weird stuff is worked out.

dijit · on Sept 28, 2016

It's nice to hear that you have the choice of not running systemd.

All production ready Linux distros (RedHat (and by extension CentOS), SuSE and Debian) are all running systemd and as a systems administrator for a large company I _must_ adopt it, I don't have a choice.

ultramancool · on Sept 29, 2016

If those major distros weren't switching though, you wouldn't have to, which is what my point was. The only reason you're adopting it is not for a practical benefit, just because the distros did.

I run devops though for a small business and we've avoided the issue entirely using runit for our software.

jude- · on Sept 28, 2016

It's probably because all the features they need are already met by existing tools. In that case, there's no need to write more code.

rhinoceraptor · on Sept 28, 2016

I think Systemd is winning because it works (in the overwhelming majority of cases), and no one cares about init systems except the vocal minority.

1_2__3 · on Sept 28, 2016

Can we please just have one OS we use in data centers and servers whose features and development is guided by professionals, not rubes? I'm so sick of people telling me I'm not the target user or some variant thereof. Linux is not OSX or Windows, and not swayed by arguments of the form "most people don't are about that". I do. They should. And the fact that they don't shouldn't result in crappy operating systems.

rhinoceraptor · on Sept 28, 2016

You're free to use Linux (or BSD or Illumos for that matter, have fun!) with any init you want.

For every person whinging online about systemd, there are hundreds of people using it every day in their laptops and servers without giving it a second thought.

wott · on Sept 29, 2016

Without having given it a first thought either.

sparewalking · on Sept 28, 2016

So what features don't work for you?

gaius · on Sept 28, 2016

Systemd is winning because they showed up and basically nobody else did.

Systemd is "winning" because Red Hat are throwing money at it as a way to control the ecosystem.

sparewalking · on Sept 28, 2016

Red Hat refused the project initially.

snorrah · on Sept 29, 2016

But after the Fedora Technical Committee were convinced to accept it, it ballooned from there. Seed blossomed from within.

zdw · on Sept 28, 2016

Alternatives have been implemented before:

http://cr.yp.to/daemontools.html http://smarden.org/runit/

The more you look at any of DJB's software, the more of the future you'll see.

ultramancool · on Sept 28, 2016

I've switched everything I write to use runit because runit doesn't require any particular init system, you can easily integrate it with sysvinit, freebsd's RC, ubuntu's upstart, systemd, etc. It just runs on top and doesn't need to be the king and it's incredibly simple to write scripts for it. Completely removes the need to worry about this init mess.

xaduha · on Sept 29, 2016

I did too. At one point. Then I abandoned all that in favor of NixOS. It uses systemd and I don't care.

qwertyuiop924 · on Sept 28, 2016

you forgot http://skarnet.org/software/s6/

zdw · on Sept 28, 2016

Ah, forgot to list that... he's got this great page too: http://skarnet.org/software/skalibs/djblegacy.html

qwertyuiop924 · on Sept 28, 2016

Heh. Yeah. I just wrote a comment on your post of that about how I had posted http://skarnet.org/software/execline/. Which is a cool thing in its own right.

lasermike026 · on Sept 28, 2016

The Unix philosophy: Write programs that do one thing and do it well.

Systemd doesn't observe the Unix philosophy and that bothers many people.

koffiezet · on Sept 29, 2016

I'm not that strict on the "unix philosophy". It tends to work, but sure there are exceptions. But if you happen to write something like that does many things, you should also make sure you do every single of those many things well.

What bothers me is that systemd doesn't do that, and that somehow this "unix philosophy" criticism seems to be an excuse/defense for crappy design and code.

geofft · on Sept 28, 2016

It's baffling to see people insist that that's the UNIX philosophy when the UNIX philosophy has always been about monolithic kernels over microkernels, so "do one thing" is out the window, and about worse-is-better over do-the-right-thing (https://www.dreamsongs.com/RiseOfWorseIsBetter.html), so "do it well" is out the window.

And the userspace tools don't follow this either. Quoting http://prog21.dadgum.com/139.html :

> The UNIX ls utility seemed like a good idea at the time. It's the poster child for the UNIX way: a small tool that does exactly one thing well. Here that thing is to display a list of filenames. But deciding exactly what filenames to display and in what format led to the addition of over 35 command-line switches. Now the man page for the BSD version of ls bears the shame of this footnote: "To maintain backward compatibility, the relationships between the many options are quite complex."

Why, for instance, does ls have its own sorting options if the sort command exists? If ls, the most quintessential UNIX command, can't do one thing and do it well, why should anything else?

lasermike026 · on Sept 28, 2016

The Unix philosophy is most applicable in user space. It's a philosophy not a law. I didn't say Systemd was bad just that it doesn't observe the Unix philosophy.

Crashing an init process is a very bad thing, is it not? Wouldn't less surface area be a better approach?

geofft · on Sept 28, 2016

Yes, I argued in another comment that it would be a good idea to split systemd into two processes, where PID 1 does nothing other than reap children, send info to PID 2 actual systemd, and restart PID 2 if needed. But it seems to make sense for PID 2 to be as monolithic as systemd currently is (which is not very monolithic; udev, journald, D-Bus, etc. are separate daemons).

JdeBP · on Sept 29, 2016

That's a common misconception. The major task of process #1 with modern kernels is not, ironically, reaping children but receiving signals. Reaping orphaned children can be, and with several softwares is, the job of other processes.

* https://hackernews.hn/item?id=8904429

* http://unix.stackexchange.com/a/197472/5132

* https://hackernews.hn/item?id=8384251

* http://unix.stackexchange.com/a/177361/5132

geofft · on Sept 29, 2016

That makes no sense. Who's sending SIGWINCH to pid 1? Is it in a tty? Why is it in a tty? Why does it care about the size of its window?

It is true that the userspace helper programs of some init system suites, such as sysvinit's telinit, work by sending signals to pid 1. But 1) they send signals to pid 1 because pid 1 is the service manager, not because it's the system manager, and 2) they could do something else other than sending signals (like sending D-Bus messages). The choice to use signals is a design decision of the init system suite, and not a part of the inherent requirements of being pid 1.

It is also true that with prctl(PR_SET_CHILD_SUBREAPER) you can move reaping away from pid 1. But at that point, the proper function of pid 1 is nothing, except perhaps to respawn pid 2 if it wants to. And as I argued in the other comment in this thread https://hackernews.hn/item?id=12600734, that's silly and you should just make the kernel do it.

I don't see any explanation in any of those links, or the things they link to, about why pid 1 is receiving signals. You claim twice that SIGINT, SIGPWR, and SIGWINCH are things that pid 1 can receive, but I have no idea why it might receive them. Can you explain?

JdeBP · on Sept 29, 2016

It's explained two times over, and they are indeed system state controls, not service state controls.

You are exemplifying the people with off-the-top-of-the-head designs for init that I mentioned. The signals to process #1 from the kernel are rarely included, or even thought of, in such designs; despite the fact that they are the things that a process #1 program cannot escape and the child reaping is actually the least of its work and the thing that it can mostly escape. Here you are with yet another off-the-top-of-the-head design being thrashed out in a discussion forum, and you have not even encountered this stuff to know that a design has to include it, as evidenced by the questions about TTYs and windows.

As I said, go and look at many actual process #1 programs, such as Gerrit Pape's runit, Felix von Leitner's minit, and the system-manager program from the nosh package. They all have to handle these signals, and they all do. Don't repeat ... ahem! ... some other people's mistake of not learning about existing softwares and mechanisms.

The nosh system-manager and systemd both discuss them in their manual pages. They both have an explicit list of signals and what they trigger. The manual for /etc/inittab in van Smoorenburg init also discusses these signals and what they are, as do the manual for Joachim Nilsson's finit (http://troglobit.com/finit.html) and its TODO list.

I recommend some manual reading. (-:

geofft · on Sept 30, 2016

OK, I went and learned some things and it turns out I was wrong but (at least assuming Linux, which is fair since this is a thread about systemd) you're wrong too. None of these signals have to be handled by pid 1.

SIGWINCH for a keyboard request is a thing that I did not know about, yes. But this is opt-in functionality for any process that has any VT open and is suitably privileged, and it is simply traditional for init to call ioctl(0, KDSIGACCEPT, SIGWINCH). Any process can request this, not just init; any signal can be used, not just SIGWINCH. And by default, no signal gets sent, neither to init nor to any other process. So init does not need to handle it.

(This of course leaves aside the question of whether anyone uses this functionality. How many users who are not init system developers know about this? And how many people have machines with physical keyboards that are not just running some window system that puts the keyboard in raw mode, anyway?)

See https://github.com/torvalds/linux/blob/v4.7/drivers/tty/vt/k... and L593-602 (initialized to 0 and disabled by default), https://github.com/torvalds/linux/blob/v4.7/drivers/tty/vt/v... (KDSIGACCEPT ioctl allows any suitably privileged process to request this), and http://www.fifi.org/doc/console-tools-dev/examples/spawn_log... (a separate daemon for doing this - via SIGHUP, incidentally - which says, "Note: this functionality has become part of init", implying it was originally not part of init!).

SIGINT on Ctrl-Alt-Del (if reboot(LINUX_REBOOT_CMD_CAD_OFF) has been called), and SIGPWR on power failure from the two drivers that implement it, are sent via the kill_cad_pid() function. This does default to pid 1, but that's just a default. And it's very easy to change that setting in userspace; it's a sysctl, /proc/sys/kernel/cad_pid.

See https://github.com/torvalds/linux/blob/v4.7/init/main.c#L998, which defaults cad_pid to init, and https://github.com/torvalds/linux/blob/v4.7/kernel/sysctl.c#..., the sysctl implementation.

The two drivers that implement SIGPWR, incidentally, are the S390 NMI handler and the SGI SN Platform's system controller driver, both of which I'm pretty sure see no use today. It's usually sent by a userspace process these days, which again could just choose to not use signals.

So none of the signals you've mentioned are pid 1's responsibility, They are traditionally pid 1's responsibility, yes. But those are two very different things, especially if we're talking about using PR_SET_CHILD_SUBREAPER to move all the interesting work out of pid 1.

The remainder of the signals handled by sysvinit, finit, runit, and nosh are all generated by userspace, and minit handles no other signals. And not all of these init systems bother to call KDSIGACCEPT.

I recommend being less condescending, especially if you're going to be wrong. (-:

fridsun · on Sept 29, 2016

You just described how daemontools family works. They are less monolithic than systemd because they didn't subsume udev.

ltbarcly3 · on Sept 28, 2016

Agree 100%. There isn't some magic potion that makes commands compose simply and efficiently.

justinsaccount · on Sept 28, 2016

There is, but it involves using a standardized structured data exchange format instead of formatted text.

powershell solves this by using structured data everywhere at the expensive of not working that well with plain text.

You should be able to run something like

  ls -l | sort +date,-size

but you can't :-)

geofft · on Sept 28, 2016

Right, but the same people who keep advocating "do one thing and do it well" also keep advocating "Write programs to handle text streams, because that is a universal interface," and there's no way to handle that sort of things with text streams. A "universal interface," whatever that means, cannot carry that metadata.

justinsaccount · on Sept 28, 2016

I know.. It's almost like the people who came up with these ideas 40 years ago didn't think of everything.

There are plenty of ways to do things better with basic text interfaces - a more standardized header format for example.

If most tools just output tabbed separated values with a header, or the same thing vertically.. ie..

    foo bar baz
    1   2   3
    4   5   6

or

    foo 1   4
    bar 2   5
    baz 3   6

It would have been a lot easier to write more "universal" tools that worked with this data.

Or at least more tools like 'ifdata' that is part of moreutils.

  $  ifdata -pa wlan0
  192.168.1.2

Instead of the mess people come up with to parse ifconfig output.

Bryan Cantrill even said in a presentation something like that it should be "Write programs to handle json text streams, because that is a universal interface" these days.

ddalex · on Sept 29, 2016

Sure you can. 5th column of `ls -l` is size; sort can do numerical values

> ls -tl | sort -k 5 -n

justinsaccount · on Oct 1, 2016

Not the same thing,

  ls -l | sort +date,-size

was to show how sorting by date ascending and size descending could work. 'sort' can sort by different fields in different orders, but to sort times properly you'd need to use ls with `--full-time` or --time-style=long-iso` which is in GNU ls but not the BSD ls on OS X.

You'd end up with something like

  ls -l --full-time |sort -k 6,5rn

But that doesn't quite work. Besides, the 'sort -k 6,5rn' instead of 'sort +date,-size' is exactly the kind of thing I was trying to show.

We could have had a sort that let you do '+date,-size', instead we have a 'universal interface'.

regularfry · on Sept 28, 2016

Because that specific thing is crufty and bad, and now can't be fixed for back-compat reasons. Neither of these are justifications for introducing more crufty badness.

thescribe · on Sept 29, 2016

Instead of talking about Unix philosophy let me put it this way. I love whitebox components I can swap out for competing components. Systemd threatens that for me.

wruza · on Sept 28, 2016

Probably because sorting human-oriented dates is not a trivial task, as it is for recursive listings.

fapjacks · on Sept 28, 2016

> "Systemd is winning because they showed up and basically nobody else did"

I can't see how you would qualify this statement, which is utterly false. Specifically, systemd adoption was the result of political pressure by LP (the author), who works at Red Hat. There are plenty of other choices out there. The idea that systemd won because of some kind of meritocracy is wrong. It was mostly political with some technical discussion at the edges. Most people "making the decision" actually just outsourced their critical thinking to the Debian Technical Committee. That's how we ended up with most distros adopting systemd. Systemd is better than sysvinit (for some values of "better"), and that is what the Debian TC voted on, and what caused them to adopt systemd. Not that systemd is better than other, more modern init systems.

georgyo · on Sept 28, 2016

systemd is by far the best linux init system. The options are sysvinit, OpenRC, Upstart, and systemd. Of those, the only reasonable choice is systemd.

Upstart was almost good, but its really frustrating to use. OpenRC is decent, but confusing.

systemd solved problems, not only that it solved them well.

The scope creep of systemd is something that is a bit alarming, however even with the creep it's doing a good job. The tools that it is building work fairly well, and its a relatively new project.

djsumdog · on Sept 28, 2016

The only thing systemd solves, and I will say this is a big problem, is full process management. It also unifies startup scripts which does make it easier to support multiple distributions with a simpler packaging process. I'll give you that too.

The scope creep is really bad. It's not just bad, it's terrible. I don't want a program that's an init, and replaces xinitd and replaces my logging system and replaces cron. That's all totally beyond unnecessary.

Even with full process management, I'd hardly say it is the best. I have had cases where I stopped a service on a systemd box and it did leave child processes running. It's command line tools are terrible.

In fact it's this creep that keeps people from even being able to make drop in replacements. uselessd was a fork that kept up for a bit, but lost maintainership. With systemd being RedHat funded for full time development, it's very difficult for independent programmers to put in the time to make anything that can get adopted in the way it is.

captainmuon · on Sept 28, 2016

I know not everybody has the same needs, but at least for me, systemd did not solve any problem I had. In fact, I found that it's main selling point, "better" service management, got much worse.

Before, I could simply do "sudo service xxx start". Now, with systemctl, everything is much more cumbersome. Just run "systemctl list-units". Ho boy, you get a huge list of... things. Not just what used to be called services or daemons, but units belonging to mount points, /dev entries, and who knows what.

Granted, it is probably easier to create service scripts for developers than with init.rc or upstart (although I never had a problem the few times I had to muck with them). But as a simple consumer, I do not feel systemd solves any problem I had. It just made things some things that worked fine before slightly more confusing, and there are some regressions every now and then while other programs are getting ported to systemd.

fapjacks · on Sept 28, 2016

The one thing that is painfully obvious about the situation with init systems is that people have strong opinions about their preferences. And that makes the systemd situation even more terrible, since it's a choice unwillingly foisted upon a huge number of people. This situation really deserves some choices. For me, that is the number one reason I do not like systemd, even if it's solved some problems.

mjevans · on Sept 28, 2016

There are exactly two compelling (for me) arguments against systemd, and one that seems to result from it's current focus:

* Binary log files (WHYYYYY; OTOH you could argue that the only proper place to do logging is to a remote database.)

* Monolithic nature - everything gets shoved in to the daemon, making it harder and more unusual to do things without it.

* Servers - systemd seems to be focused on a desktop lifecycle, where users interactively log in, have settings and log out. It is less clear how secure or well this inter-operates with services running as 'users' for privilege separation. I think this may have recently gotten better.

georgyo · on Sept 28, 2016

All of these are false.

It stores everything in binary logs, however your old system didn't store it at all. You had a daemon (syslog, rsyslog, etc) that wrote them as plain text. If you start those daemons, you at exactly the same spot as before. Just with an additional copy in a binary log. Which btw, journalctl can do some magical things, and you should take some time to play with it.

systemd is not monolithic, or at the very least its a distributed monolith. The vast majority of services happen in their own userspace process.

Systemd excels with servers, since it standardizes how their execution gets defined. It takes only an added line to spawn them in their own namespaces, set cgroup restrictions, prevent network access, etc. Before hand, doing these things was very hard, so no one did them.

Sanddancer · on Sept 28, 2016

If you had a basic syslog setup, it stored as text files. However, a more interesting syslog setup would take those entries and send them to a remote database, filter them to match certain contents, run scripts based on those matches, etc. Systemd only seems magical because most distros don't use even a sliver of the abilities that modern syslogs provide.

Systemd is a Jenga tower. Yes, there are lots of little pieces, and yes, removing a few won't hurt anything, but it's still fragile, and rather inconsistent. "Parts" like the journal, are a separate executable, but are not managed like other services. It uses a poorly documented ipc interface to talk to the various parts, and the configuration files are a nasty tangle of symlinks and control files stretching across the filesystem. It's just not a very well engineered piece of software.

fapjacks · on Sept 28, 2016

Yes, these are also reasons I dislike systemd. Also, one of the original selling points of systemd was "It will reduce boot times significantly!" ... And then it ends up that systemd is actually slower to boot than any of the other ones out there.

dijit · on Sept 28, 2016

it can be with the process timeout weirdness, where init used to be 'fire and forget' so if something didn't respond it would carry on systemd actually depends on the output so will sleep for 180seconds or so.

But honestly, even an extra 30s booting a server is nothing, I don't give a shit about boot times on my servers.. I care about what order my services are started in- my servers should be 100% reproducable otherwise you could have bugs that never get found except in 1% of cases.

And since it's turtles all the way down 1% of one system can be a lot of problems in a 2,000,000 server architecture.

So "Fast boot" in the way systemd achieves it is not beneficial for me.

justinsaccount · on Sept 28, 2016

> Also, one of the original selling points of systemd was "It will reduce boot times significantly!"

It was not.

http://0pointer.de/blog/projects/the-biggest-myths.html

Myth: systemd is about speed.

rkeene2 · on Sept 28, 2016

http://0pointer.de/blog/projects/systemd.html The original blog post introducing systemd on 30-APR-2010.

The first points that it details regarding the thesis of systemd is that it is a good init system, and a good init system is fast, and that SysV init is not fast.

The first time it begins talking about things a service management system should be interested in is about half-way through with "Keeping Track of Processes".

It spends roughly the rest of the document on important service management things, but to dismiss the idea that "systemd is about speed" is a myth is too much.

systemd IS about speed of booting -- during the development there were flame graphs showing how doing all this in parallel you could make the system boot really really fast. A lot of time was spent making the boot process with systemd minimal, or speedy.

Personally I'd prefer a less speedy service management system that didn't open listening sockets for my daemons so that it could deal with starting services out-of-order... to make the boot occur in a more parallel fashion... to make the boot faster.

wott · on Sept 29, 2016

No, that's revisionism, Poettering rewriting history.

georgyo · on Sept 28, 2016

I agree with your reason, however for me they get outweighed by other factors. If I am a maintainer of a popular distribution, I want my distro to have the best init system, and I want that init system to have the largest user base.

EL6 and its derivatives was upstart, debian and its derivatives was upstart. The vast majority of linux systems were upstart for a long period of time. Other systems did not switch to upstart, because it introduced headache.

Switching init systems requires quite a bit of work. The fact that they all switched to systemd shows that beyond political pressure, they needed a better system.

regularfry · on Sept 28, 2016

That's not my recollection. Upstart was an Ubuntu project, Debian never adopted it. Red Hat picked up upstart, then Poettering managed to convince them that they'd be better off with one they controlled. Somehow that filtered back through to Debian, and they jumped straight to systemd from sysvinit.

At no point was it necessary for systemd to actually be better than upstart at being an init system. It just had to be bigger by the time it got to Debian.

jude- · on Sept 28, 2016

I don't think the distro maintainers had that much say, since the most popular Linux desktop software began to depend on it. It started with GNOME, and now KDE is on its way. This basically forces their hand--if they want to provide a good out-of-the-box experience, they either have to switch to systemd, or continuously fight upstream's patches.

fapjacks · on Sept 28, 2016

The choice that was put before the Debian TC was literally "Should we use systemd instead of sysvinit" and did not consider any of the many other init systems. In fact, a motion was put forward to the Debian TC to consider other init systems, and guess what? Pro-systemd TC members pointed to the original motion ("Should we use systemd instead of sysvinit") as the reason not to consider other init systems. That is pretty much as political as it gets.

Aeolos · on Sept 28, 2016

For the record, this post is completely false, as even a cursory glance at the relevant Debian bug report will prove: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=727708#672...

  We exercise our power to decide in cases of overlapping jurisdiction (6.1.2) by asserting that the default init system for Linux architectures in jessie should be

  D    systemd 
  U    upstart
  O    openrc
  V    sysvinit (no change)
  F    requires further discussion

  [...]

  Therefore, the resolution reads:

  We exercise our power to decide in cases of overlapping jurisdiction
  (6.1.2) by asserting that the default init system for Linux
  architectures in jessie should be systemd.

fridsun · on Sept 29, 2016

Clearly you have missed this article: http://blog.darknedgy.net/technology/2015/09/05/0/

ltbarcly3 · on Sept 28, 2016

What are the other choices? Upstart was awful, and init is a hundred years old.

qwertyuiop924 · on Sept 28, 2016

runit, openrc, s6, later nosh, bsdinit, take your pick.

Sanddancer · on Sept 28, 2016

launchd, solaris smf, etc. There were tons of choices that were thrown away for silly reasons.

qwertyuiop924 · on Sept 28, 2016

launchd is almost as bad, as is SMF (XML - Yuck). But at least they're still mostly focusing on process and service management.

yedpodtrzitko · on Sept 28, 2016

OpenRC

qwertyuiop924 · on Sept 28, 2016

what functionality? systemd integrated components, it didn't do a whole not new. I'm not firmly convinced that replacing cron was a good idea, integrating udev and the -kits definitely wasn't, and neither was integrating the hostname, or filesystem mounting. And again, I really can't think of anything truly new that it does.

thescribe · on Sept 29, 2016

I'm not sure forking would be worthwhile, I think the issue people have with systemd is one of goals and structure of the operating system. Any fork that satisfied the other side would not even be in the same category of software.

rcarmo · on Sept 28, 2016

First they came for PID 1, and we were silent.

Oh, wait, we weren't. But although I don't miss the rickety init scripts, I've grown more and more worried about stuff like this, enough to ponder replacements (in Go or Python, which seem to be generally acceptable for systems software these days).

Ignoring for the moment the usual biases against Python, how feasible/how much support would it be/have to build something to replace systemd (by having PID 1 do the minimum possible, in C, and having the rest done with Python)?

djsumdog · on Sept 28, 2016

I've been working at a Docker shop for a while and I have a feeling it will be the future for running services. It makes systemd somewhat irrelevant.

There are still a lot of issues I've had with Docker, which I won't go into. It also can't replace everything running on your Linux system. But it's a neat idea and big players use it in production for huge workloads.

Now the whole AWS-containers, docker-machine, swarm, mesos, coreos, kubernetes thing is a cluster fuck of choices .. but you have choices. And unlike running an Ansible script on some provides base image (which is totally different from Rackspace, Linode, etc.), you are guaranteed your container will work on any docker engine with little modification (your orchestration files may change though).

I have a feeling, for better or worse, containers will be the future and will stick around long after systemd hopefully dies.

digi_owl · on Sept 28, 2016

Nah, systemd is angling to supplant docker for containers.

There is already a great deal of bad blood between the two because of the reworked cgroups management.

And there is also the likes of CoreOS that is basically systemd-the-distro.

Never mind that we have the likes of xdg-app/flatpak that is basically aiming for what you are thinking about, and it is systemd dependent.

fridsun · on Sept 29, 2016

It was quite amusing seeing systemd causing systemd-specific trouble in CoreOS development though, such as systemd requiring too much privilege when running inside container. Many people just use s6 happily.

digi_owl · on Sept 29, 2016

The schadenfreude is real.

wyldfire · on Sept 29, 2016

> The immediate question raised by this bug is what kind of quality assurance process would allow such a simple bug to exist for over two years (it was introduced in systemd 209). Isn't the empty string an obvious test case?

All bugs are terribly easy to spot in hindsight. I don't think it's fair to criticize this one in that fashion. Systemd has lots of pros and cons, but the age of this bug isn't interesting IMO. Its severity is, but that only speaks to the critical need to patch it ASAP.

milankragujevic · on Sept 28, 2016

Archived page, since it's not loading for me... http://archive.is/8xUSY

agwa · on Sept 28, 2016

My hosting provider thought I was under DDoS attack and started aggressively throttling port 443. I failed over to my backup provider, probably permanently.

mavdi · on Sept 28, 2016

How to crash a blog with one submission

SysArchitect · on Sept 28, 2016

The issue has a bug report on the systemd repo on Github: https://github.com/systemd/systemd/issues/4234

xemdetia · on Sept 28, 2016

At this point in time my biggest frustration is that even compared to upstart writing concise and clear config files for network services is just really hard, and there isn't great documentation or a short book around that covers all the edge cases other than TIAS. I feel like I've wasted more time trying to get a tidy systemd configuration than I ever had even with upstart or some of the other ones.

redbeard0x0a · on Sept 28, 2016

This article got me wondering if maybe the NSA or similar has somebody on the inside of the systemd project that is helping the project along.

Systemd has replaced a lot of init systems in the linux ecosystem and the general development practices of the project is leaving a huge footprint of code that might be exploited.

If they aren't involved, I'm sure they probably know a few different exploits for the system already.

pdkl95 · on Sept 28, 2016

If you want to go down that rabbit hole, see:

https://igurublog.wordpress.com/2014/02/17/biography-of-a-cy...

johnfjacobi · on Sept 28, 2016

I've heard this a lot and first just dismissed it as a conspiracy theory. Is there actually any substance to these thoughts, or are they only speculation?

belovedeagle · on Sept 28, 2016

This is an excellent writeup explaining without hysterics why systemd is terrible software. Unfortunately, I expect that this post will be brigaded by systemd-apologists anyways, denying newcomers to the debate the opportunity to hear the facts...

andrewstuart2 · on Sept 28, 2016

I'm not sure how this comment is helpful to the discussion. As far as I can tell, you'd rather only hear the facts that fit your opinion of systemd as terrible software. I think the obvious right answer is that both sides present only facts and let newcomers decide based on that.

belovedeagle · on Sept 28, 2016

As the other reply said, the problem is that newcomers aren't allowed to hear the anti-systemd facts, as the systemd gods have already decided that all Linux users (or at least all distros) must bow down and accept them, and no dissenting opinion is allowed to stay on the front page of HN for long (this post has already been brigaded off the front page, apparently).

fapjacks · on Sept 28, 2016

It could be the "controversial topic" penalty, which is extremely heavy-handed. But I also notice that both anti-systemd and pro-other-init-system posts and comments are barraged with downvotes on all kinds of forums, including HN. It is definitely something that certain people don't want out in the world.

fapjacks · on Sept 28, 2016

I completely agree with you. That would be nice, if people were presented with facts and a choice. But they weren't, and aren't.

andrewstuart2 · on Sept 28, 2016

Well, it's never too late to start. :-) As history shows, you can try to suppress truth for a while, but it is usually either unstoppable, or insignificant.

geofft · on Sept 28, 2016

I've been thinking for a bit that it's reasonably easy to write a PID 1 that reaps children and sends siginfo_t structs over a pipe to PID 2, which can be an init system as monolithic as you want.

Of course, it would also be simple for the kernel to allow PID 1 to die/crash, and just restart it instead of panicking. I don't know why the kernel continues to treat PID 1 so specially—if the ewontfix.com PID 1 is sufficient, it's simpler for the kernel not to bother with the reparenting logic and stop treating PID 1 as a special process.

mschuster91 · on Sept 28, 2016

> I don't know why the kernel continues to treat PID 1 so specially

By design all processes inherit from PID 1 (directly or indirectly), so you always have an unbroken tree from every process back to PID 1. PID1 aka init is supposed to do bookkeeping (e.g. zombie processing), and if PID1 crashes the bookkeeping records are lost.

Because of this, it is better to fail hard, instead of allowing for corrupted process trees or similar.

Also, ps, top and basically every program that works with processes expects one root process for the tree. Breaking this would basically break compatibility with every existing userland (which the kernel cares the most about!)

justinsaccount · on Sept 28, 2016

> Breaking this would basically break compatibility with every existing userland

By "Breaking this" you mean "fixing the fact that pid 1 can't crash and restart"?

> Because of this, it is better to fail hard

It can't be fixed because it would "break compatibility" where that "compatibility" is that the machine should panic?

You're not really making a good case for why this can't be changed.

Why not

  * pid 1 exits
  * kernel re-executes init as a new pid 1
  * userspace continues on as if nothing happened.

geofft · on Sept 28, 2016

Right. If init opts into it, e.g. by setting some sysctl or something, the kernel changes behavior in the following way:

1. Processes that are reparented to init aren't actually considered children of init for the purposes of the wait() system call. Their ppid still becomes 1, so parent process walks work, but when they exit, the kernel immediately cleans them up.

Zombies still exist in this change, as long as their parent still exists; the only thing that changes is that processes (zombies or not) are not reparented to init when their parent dies. Direct children of pid 1 are not treated specially.

2. When init is about to exit or crash, the kernel turns it into a zombie. It then starts a new init process and reparents the old init to the new one, allowing it to do things depending on the cause of the crash. Direct children of the old init have lost their parent, and are treated like any other parentless process as described above.

3. If execing the new init fails, the kernel panics.

This would be completely backwards-compatible. (I might even argue against 3, to remove all worry about init crashing; since pid 1 is no longer required to do anything, if exec fails, the kernel can just leave pid 1 as a stuck process table entry, let the system continue running, and log something to dmesg so the sysadmin can diagnose the problem and reboot the machine cleanly.)

justinsaccount · on Sept 28, 2016

Sounds reasonable. Any ideas why no one has ever done this?

Kind of reminds me when the behavior of rm -rf / was changed in solaris.

I can't find the article, but basically someone pointed out that they couldn't change the behavior of rm -rf / because posix defined how it should work. Someone then pointed out that the spec says that trying to remove the parent directory(i think?) should fail, and since rm -rf / includes removing the parent directory, it was within the spec that it should fail.

JdeBP · on Oct 6, 2016

Lennart Poettering's response to this was interesting:

> The bug is caused by an error check that filters out garbage sent to PID 1. The check works correctly, except that the resulting action is a too harsh: instead of complaining and dropping it will abort the process.

* https://www.reddit.com/r/linux/comments/54yfcd/how_to_crash_...

In fact, the opposite was the case. The root of the problem was identified in the GitHub bug report:

* https://github.com/systemd/systemd/issues/4234#issuecomment-...

* hhttps://github.com/systemd/systemd/commit/d875aa8ce10b458dc2...

What happened was that Lennart Poettering removed an error check that filtered out zero-length messages. This left the flow of control to fall through to a later point where an assertion, that had been earlier added by Lennart Poettering with the assumption that this check for zero was in place (as it had been at the time), then triggered.

Of course, no-one that I have seen has yet asked why "" as a command-line argument to systemd-notify results in a zero-length message in the first place. After all, according to the doco that would be length 1, a single terminating LF byte, not zero length. That said, the server still should be proof against zero-length network input.

aorth · on Sept 29, 2016

A CVE has been requested, albeit after publication of the bug:

http://www.openwall.com/lists/oss-security/2016/09/28/9

JdeBP · on Sept 29, 2016

Discussed on Hacker News at https://hackernews.hn/item?id=12600994 .

fridsun · on Sept 30, 2016

Lennart Poettering has replied on Reddit.

https://www.reddit.com/r/linux/comments/54yfcd/how_to_crash_...

> So let's summarize this. There's a bug in some software. OMG! Shock! This of course never happened before!

> The bug is in not exploitable remotely. The bug does not allow privilege escalation nor insertion of code. This of course makes the bug a massive vulnerability like there was no other on the planet ... ever. As bad as heartbleed multiplied by the Debian OpenSSL random generator bug to the power of 10.

> The bug is caused by an error check that filters out garbage sent to PID 1. The check works correctly, except that the resulting action is a too harsh: instead of complaining and dropping it will abort the process. Such a bug is of course unprecedented and the authors of said software should be stoned and flogged right away given the severity of the issue: after all a safety check worked a bit too well, and we really can't have that because undefined behaviour of course would be a lot better than a local DoS.

> The project the error was found in is large. Yet the number of CVEs collect so far is pretty small comparing it witht other projects of similar extent. Given that another bug was discovered now this obviously shows how incompetent the programmers are and that security is a unknown concept to them.

> The program the bug was found in is longer than 50 lines of code but runs with privileges, all written in a low-level programming langauge that many call little more than a fancy macro assembler. The code runs on top of an operating system kernel written in the same language but running with a lot higher privileges and consisting of expoentially more lines of code including drivers of questionnable quality. This together is of course proof that the project at hand is flawed conceptually to its core.

> Dha!

> Lennart

> (More seriously: yes this is a bug, we should fix it. But it's very low impact and the bruhaha it generated appears wildly out of scale. If all bugs in the wider Open Source ecosystem would have a similarly low impact we'd live in a much much safer world!)

kokada · on Sept 28, 2016

You could simple do a DoS attack with a local user using a fork bomb too. It would be even shorter than the command showed in this post blog.

So while this is a bug, the author is overreacting just to make systemd look bad.

rascul · on Sept 28, 2016

your fork bomb can be mitigated with ulimit

kokada · on Sept 29, 2016

I am not saying that this is not a bug, just that this isn't the problem that the author is making. If you have local access, you probably can do other things if your only objective is to do a DoS.

reactor4 · on Sept 29, 2016

Your point is covered in the article.

"If you think systemd doesn't need privilege separation because it only parses messages from local users, keep in mind that in the Internet era, local attacks tend to acquire remote vectors. Consider Shellshock, or the presentation at this year's systemd conference which is titled "Talking to systemd from a Web Browser.""

5ilv3r · on Sept 28, 2016

Ran as a normal user on debian jessie, and now ssh logins are delayed by 30 seconds. I don't see any other problems yet, but this is a VM host and so leave it running for a while for fun. :)

aorth · on Sept 29, 2016

Ran it on an up to date Ubuntu 16.04 and lost the ability to run any systemctl commands like status, restart, etc. Web server and other system services were still working, but I couldn't `reboot` or `systemctl halt` etc... had to hard reboot the VPS from hosting console. :)

andreyv · on Sept 29, 2016

"systemctl reboot --force --force" should work in such cases. The reboot will be executed by systemctl itself.

JdeBP · on Sept 30, 2016

It's interesting to see a "hacker" discussion of this that entirely misses several topics that one would expect programmers to at least mention:

* Whether release builds should have assert() enabled.

* Whether a process #1 program should use assertions at all.

* Whether assert() is the right mechanism to use for input validation.

* How a process #1 program should recover from assertion failures.

xaduha · on Sept 28, 2016

Don't really care about that bug, it will be fixed I'm sure. But thank you for this post anyway, the word LLMNR caught my eye, very nice to have if it works.

EDIT: works fine with Windows, without installing Bonjour/iTunes.

tcoppi · on Sept 28, 2016

> "Systemd is defective by design."

The most succinct explanation of Systemd possible.

mSparks · on Sept 28, 2016

I've complained a few times that systemd haters havent explained the hate.

having read this post I'd say a slightly less succinct but infinitely more informative version would be

> "Systemd is defective by design because they totally disregard the fail-safe by design rules"

qwertyuiop924 · on Sept 28, 2016

Yes. I wish people would explain their hatred. I understand that there are valid reasons to hate systemd, which is why I avoid it like the plague. But so many people get sucked in by the initial usefulness, and don't see the problems lurking beneath the surface.

fapjacks · on Sept 28, 2016

Yes, a large part of which is the process behind systemd. You're adopting more than just software when you adopt software, and I think investigating what the systemd developers are like is a worthy exercise -- in particular how they approach bugs. I think most people would be surprised to learn what that is like.

qwertyuiop924 · on Sept 28, 2016

Ah, yes: When is a bug not a bug? According to the systemd developers, it's when we don't encounter it on our personal machines.

fapjacks · on Sept 28, 2016

... And also when they can pass the buck to kernel developers, even when it's not a kernel bug.

digi_owl · on Sept 28, 2016

Reminds me that i ran into an email chain a while back where the kernel devs pondered forking or taking over udev development because they were tired of power games from the maintainer. This was from back before udev became part of systemd btw.

qwertyuiop924 · on Sept 28, 2016

Greg-KH is both one of the original developers on udev and a major force behind systemd. He maintains -stable on the linux kernel.

So either the kerbel devs did take over udev development...

digi_owl · on Sept 28, 2016

I do wonder if the kernel will split between GregKH and T'so once Torvalds steps down.

BTW GKH was oddly absent in that thread, and the kernel devs ire was aimed at Sievers...

qwertyuiop924 · on Sept 28, 2016

Well, GKH is One Of Them. He's less likely to attract their ire, and if he does, at the end of the day, they have to answer to him.

But in a fight between GKH and T'so, I'll take T'so. /dev/random's ...interesting design aside, I frankly trust T'so far more than I trust GKH, who backs LP, a person I emphatically do not trust, in a lot of things.

5ilv3r · on Sept 28, 2016

Binary only logs was enough for me to hate it.

qwertyuiop924 · on Sept 28, 2016

That's it? Man that's just the start...

bitmapbrother · on Sept 28, 2016

>Despite the banality, the bug is serious, as it allows any local user to trivially perform a denial-of-service attack against a critical system component.

Is it really that serious of a bug? You have to be a local user in order to execute it. Are other OS's somehow resilient to the actions of local user to degrade the OS?

>The Linux ecosystem has fallen behind other operating systems in writing secure and robust software. While Microsoft was hardening Windows and Apple was developing iOS, open source software became complacent.

He must be unaware of the numerous patches Microsoft and Apple issue on a regular basis to fix their "secure and robust software".

falcolas · on Sept 28, 2016

Any remote code execution bug can exploit this. Normally, a remote execution code bug when the user is non-root isn't a huge issue, because the user effectively sandboxes them away. This bypasses all of these benefits.

Containers may prove to be a sufficient sandbox, assuming this is not executable against kdbus (which is shared across containers since it's in the kernel).

packetized · on Sept 28, 2016

Erm, this does not actually seem to work on Centos7, running systemd-219-19.el7_2.13. Even tried it as root inside of a sandbox - nothing.

agwa · on Sept 28, 2016

Hmmm... I've repro'd on v215, v229, and v230, although only on Debian and Ubuntu. The bug has been in systemd upstream since v209 as far as I can tell by reading the code.

SysArchitect · on Sept 28, 2016

  while true; do NOTIFY_SOCKET=/run/systemd/notify systemd-notify ""; done

Try this. As soon as you loop it, the bug is hit pretty quickly.

nilved · on Sept 28, 2016

I don't understand why that makes a difference if the bug is `assert(n > 0)`

JdeBP · on Sept 29, 2016

There is a race in the systemd-notify mechanism that means that some notification messages get discarded if the systemd-notify process happens to complete its work and exit before systemd picks up the message. They never reach the point of the assertion that is triggered here, because systemd is unable to determine the service unit that encompasses the message sender.

* http://jdebp.eu./FGA/unix-daemon-readiness-protocol-problems...

SysArchitect · on Sept 28, 2016

I have no idea either. Race condition somewhere? It was mentioned here: https://github.com/systemd/systemd/issues/4234

lake99 · on Sept 28, 2016

It's the same on Archlinux, systemd-231. Nothing happened. I was able to start and stop a couple of services as quickly as before.

walter_bishop · on Sept 28, 2016

Doesn't work here on Ubuntu 16.04 xenial

Twirrim · on Sept 28, 2016

I've put a comment on the github bug with an updated PoC. IT does work on 16.04, but you seem to have to do it over and over until it triggers. A working PoC seems to be:

while true; do NOTIFY_SOCKET=/run/systemd/notify systemd-notify ""; done

zzxxmm11aajj · on Sept 28, 2016

Note: 112 comments from the peanut gallery before someone looked at the code enough to submit the trivial fix.