Key pairs are all right but what about certificate-based auth? Anybody use that? I set up Teleport on my home cluster one time, it was smooth but (I think?) required all sessions be proxied through a public central server. Wonder whether there’s a way to combine its cert management capabilities with point-to-point SSH sessions like Tailscale enables.
Edit: reading the docs I think proxying is only necessary for reverse tunneling when nodes are behind NAT or firewall, which Tailscale will take care of. Maybe time to set this up again and check; the free VM you can get on Oracle cloud seems like a decent option for hosting the public Teleport cluster management server. Heck, maybe I’ll set up my own Tailscale cluster management server on the same box!
I set up certificate-based auth for a team a couple of years ago and it generally worked pretty well. We didn’t have a great system in place to manage revocation lists, but had a good-enough workaround. Before doing this, the situation was:
Dev systems generally just had a common username and password that everyone knew
Prod systems had strong passwords that no one knew and manually-managed authorized_keys files
The issue that remained when we moved to certs without a good revocation list strategy was that, upon termination, we didn’t have a good way to revoke the ex-employee’s long-term dev certificate. We punted on solving that problem in the short-term by justifying that it was no worse of a situation than having to roll passwords on the dev systems every time someone quit.
On the prod side, devs very rarely needed access to any prod infrastructure, but they would occasionally need it. Certificates were awesome for that; we could issue them a certificate that only gave them access to a particular set of prod machines for a very limited timespan (e.g. 24 hours). No passwords ever had to be revealed to the devs, nor did we have to worry about accidentally leaving something in the authorized_keys files down the road.
It wasn’t perfect, but it worked quite a bit better than how it had been running previously. I would have likely gotten a revocation solution put together, but I ended up leaving the company before then.
The issue that remained when we moved to certs without a good revocation list strategy was that, upon termination, we didn’t have a good way to revoke the ex-employee’s long-term dev certificate…. I would have likely gotten a revocation solution put together, but I ended up leaving the company before then.
Ironically, you could still go back and fix that for them ;)
The down side of this is that someone who compromises your system has a list of the servers that your key will work with. OpenSSH hashes the server names in the known hosts file specifically to prevent this kind of attack because it’s trivially wormable: compromise one machine, harvest the ssh keys and known hosts file, then log into every machine listed and do the next step. Encrypted ssh keys only slow things down slightly because an attacker who gets account-level access can easily install a Trojan that either captures your password or just attaches a debugger to ssh-agent and dumps the key (if ssh-agent is running then they can simply use it to mount an online attack on the listed machines and install new authorised keys).
Using -sk U2F SSH keys also fixes this. Although since best practice is to have a second key generated with a backup U2F element, you end up managing a lot of keys with this one-key-per-host strategy.
Key pairs are all right but what about certificate-based auth? Anybody use that? I set up Teleport on my home cluster one time, it was smooth but (I think?) required all sessions be proxied through a public central server. Wonder whether there’s a way to combine its cert management capabilities with point-to-point SSH sessions like Tailscale enables.
Edit: reading the docs I think proxying is only necessary for reverse tunneling when nodes are behind NAT or firewall, which Tailscale will take care of. Maybe time to set this up again and check; the free VM you can get on Oracle cloud seems like a decent option for hosting the public Teleport cluster management server. Heck, maybe I’ll set up my own Tailscale cluster management server on the same box!
I set up certificate-based auth for a team a couple of years ago and it generally worked pretty well. We didn’t have a great system in place to manage revocation lists, but had a good-enough workaround. Before doing this, the situation was:
authorized_keys
filesThe issue that remained when we moved to certs without a good revocation list strategy was that, upon termination, we didn’t have a good way to revoke the ex-employee’s long-term dev certificate. We punted on solving that problem in the short-term by justifying that it was no worse of a situation than having to roll passwords on the dev systems every time someone quit.
On the prod side, devs very rarely needed access to any prod infrastructure, but they would occasionally need it. Certificates were awesome for that; we could issue them a certificate that only gave them access to a particular set of prod machines for a very limited timespan (e.g. 24 hours). No passwords ever had to be revealed to the devs, nor did we have to worry about accidentally leaving something in the
authorized_keys
files down the road.It wasn’t perfect, but it worked quite a bit better than how it had been running previously. I would have likely gotten a revocation solution put together, but I ended up leaving the company before then.
Ironically, you could still go back and fix that for them ;)
Bahahahaha, that’s probably true! I didn’t take my SSH key with me when I left, but probably have a copy of the signing root certificate somewhere…
The down side of this is that someone who compromises your system has a list of the servers that your key will work with. OpenSSH hashes the server names in the known hosts file specifically to prevent this kind of attack because it’s trivially wormable: compromise one machine, harvest the ssh keys and known hosts file, then log into every machine listed and do the next step. Encrypted ssh keys only slow things down slightly because an attacker who gets account-level access can easily install a Trojan that either captures your password or just attaches a debugger to ssh-agent and dumps the key (if ssh-agent is running then they can simply use it to mount an online attack on the listed machines and install new authorised keys).
Oh good point. I’m using the host-based file name scheme and had not thought of it.
Would be cool if openssh supported naming along the hashed host name, maybe?
Using -sk U2F SSH keys also fixes this. Although since best practice is to have a second key generated with a backup U2F element, you end up managing a lot of keys with this one-key-per-host strategy.