More

webvictim · on Nov 13, 2020

I genuinely thought the same thing. I opened my MBP and it was sluggish, felt like it was dead. Browser wouldn't load, Zoom wouldn't load, I rebooted and the same problems persisted. I honestly thought the hardware was giving out.

I almost cannot believe the actual cause. Absolutely awful experience.

webvictim · on July 10, 2020

The problems are very real if you work at any large organisation which has compliance requirements.

webvictim · on July 10, 2020

Setting all of that infrastructure up and subsequently maintaining it involves a considerable amount of time and knowledge. Some people just want a solution that's easy to deploy and that handles all of the management for you.

thwarted · on July 10, 2020

You went through all the effort of having a manually configured fleet of machines and the thing that's daunting is solved with this?

webvictim · on April 2, 2020

Author here. My take on this is that fail-closed is a vastly better security model than fail-open. I am genuinely surprised that OpenSSH actually issues certificates with no expiry date as a default.

If you have a certificate which expires within a day by default then an unsuccessful revocation is no longer a huge cause of stress. In the worst case, you lock down access to your bastions and disallow the issue of any future certificates for that user. Within a day, any potential threat from that certificate has vanished. This seems preferable to having a mandatory requirement of an up-to-date revocation database which is synced everywhere.

293984j29384 · on April 2, 2020

Just to make sure I understand your example, for users who require regular access you re-issue certificates daily? I could see that being useful for a "one off" type thing (i.e. you want to temporarily grant access for one day) but how does that help regular users?

I'm also not sure it's easier to "lock down access to your bastions" and wait out the certificate expiration instead of having a certificate revocation database. Although OpenSSH does not provide a mechanism to distribute the revocation list it seems trivial to add a certificate to the revocation list and distribute it in an automated fashion.

Lastly, since you have to both lock down hosts and wait out the expiration, does that not constitute a fail-open system? I really don't think an expiration date mechanism makes this a fail closed system. Either method requires manual intervention upon compromise.

webvictim · on April 6, 2020

Yes, even for very regular users I would recommend setting up a process requiring users to get a new certificate on a daily basis with a short validity period. You can automate a lot of this and make it a simple one-command process to get a new certificate - even something like a simple shell script called by ProxyCommand is a good habit to get into. In bigger organisations you'd likely want to centralise this process somehow or institute other tooling.

The overarching reason isn't really a question of "helping users" as such, although I would strongly encourage making the certificate issuing process as quick and easy as possible to encourage adoption and reduce pushback. The people it really helps are security teams and organisations as a whole who can now have more confidence that they haven't left holes in their infrastructure which can be exploited by bad actors. It also checks a lot of boxes for auditing, compliance and reporting purposes which are huge positives in a corporate environment. If you're able to say "yes, disgruntled former employee X had a certificate that would have given them access to all these servers, but it expired three days ago" then that's a lot better than saying "X has a certificate that gives them access to all our servers, but we _think_ we've blocked it from being used everywhere".

Overall, I agree that the model does lend itself better to things like access to critical production infrastructure (where access should be the exception rather than the rule), but in my opinion it's a good practice to get into for access to everything. The ability to log that a certain user requested a certificate at a certain time and then link that to exactly where the certificate was used (via centralised logging, for example) is incredibly powerful.

You're perhaps correct that both do constitute fail-open systems at first. The difference is in the vulnerability period - with an expiring certificate, that ends at a fixed point in the future. With a certificate that has no expiry, that period never ends until such time as you rotate your CA and force everyone to get a new certificate - something which is also far less of a burden when your certificates expire every day by default and you have a process for getting a new one, incidentally.

293984j29384 · on April 7, 2020

I appreciate your detailed response but I think we'll just have to agree to disagree here. My personal opinion is that there isn't any value in this arbitrary temporal benchmark for certificates expiring. When a certificate is compromised, or needs to be revoked, it needs to be revoked immediately. At that point, your trusting the same mechanisms to remove access in either system. An auditor is going to be interested in the period between the user having access and that access being revoked. The fact that the key expires later on (even within just hours) is irreverent, as it's after revocation and it's already invalid. Anything less provides the bad actor with plenty of time to do something malicious. The example you give in quotes would be immediately followed with "Okay, but how did you disable that access immediately?"

You could make keys valid for only a minute and it wouldn't add any security, as only seconds are needed for a malicious action to take place.

webvictim · on April 2, 2020

Author here - yes, this is why. I looked into Ed25519 and while there are a lot of great reasons to use it (such as a shorter key footprint and it being much quicker on mobile devices), RSA is still more widely supported and has more documentation/examples available. ECDSA was an option too but doesn't provide the same benefits as Ed25519 would.

webvictim · on April 2, 2020

Don't get me wrong, using AuthorizedKeysCommand is a lot better than having a static ~/.ssh/authorized_keys file on a server, but it isn't anything like as powerful as using user certificates.

Certificates can do a lot more than authorized keys can, like enforcing the use of specific principals, commands and options and embedding that information into the file itself without needing to modify each server's SSH configuration. They're also self-contained and will still work in situations where some external service providing a list of keys goes down. I've been on the rough side of a huge LDAP outage which prevented necessary access to the infrastructure to fix it, and it was a horrible experience. There's none of that problem with certificates as long as you make sure you have one which is currently valid.

I'm also generally of the opinion that it's safer to enforce the use of authentication which expires by default rather than relying on some external process to do that for you.

Spivak · on April 2, 2020

But AuthorizedKeysCommand and certs are at least equally powerful because they're both ways of specifying the content of the same authorized_keys file.

webvictim · on April 6, 2020

It's something of an implementation detail - you don't generally specify the usage of certs on a user-by-user level, you do it by trusting the entire CA in /etc/ssh/sshd_config and then using the signed content of the individual cert (expiry date, principals etc) to dictate whether someone should be allowed to get access or not.

Look at it in terms of building in a decision at compile-time rather than at runtime. With AuthorizedKeysCommand, you're running something just-in-time on an SSH login to determine whether something should be allowed to proceed. With a CA and a process for issuing certificates, that decision is made at the time the cert is issued and then the cert is good for the duration it's issued for. It's entirely self-contained as sshd itself is making the decision about whether the cert is within its validity period or not.

It's obviously a decision that people can make based on their own infrastructure, but my opinion is that the compile-time model is more reliable as it's a fully self-contained system and doesn't rely on an entire fleet of servers being able to connect back to an external service at runtime to determine whether you should be allowed to log in. That sort of thing invariably comes back to bite you when you really _need_ to be able to log in and you can't because the external service is down.

webvictim · on April 1, 2020

Having been on the rough end of this during a huge LDAP outage, I can confirm that LDAP is great until such time as it isn't.

sytringy05 · on April 2, 2020

+1 this is no fun.

webvictim · on April 1, 2020

This is definitely the premise of what I was going for with the post. I'm a firm believer in the idea that short-lived certificates which expire by default are one of the best ways to provide access to infrastructure, and enforcing that access comes from a limited list of bastions gives you an easy choke point to withdraw access as desired when you need to.

sytringy05 · on April 2, 2020

Isn't there a netflix ssh CA that does this?

V-eHGsd_ · on April 2, 2020

yes, it's called bless.

https://github.com/Netflix/bless

there's also cashier

https://github.com/nsheridan/cashier

webvictim · on April 1, 2020

Author here - thanks for the feedback. As another reply points out, I did try to also cover the use of a bastion host along with one form of 2-factor authentication.

I'm considering doing a future post on how to set up U2F for SSH with hardware devices (like a Yubikey) as well. I'm curious if you have anything else you'd like to see on this topic.

webvictim · on April 1, 2020

Author here. If you specify an IdentityFile then that’ll be tried first (as an explicit identity) but if that doesn’t work then by default, ssh-agent identities will be tried sequentially afterwards. IdentitiesOnly suppresses that behaviour.