ssh – Mutli user/mutli service authentication + HSM as a key signing/encrypting key?


I’m looking to implement a multi user authentication environment for a small (11) but growing team, to a reasonable number (currently 500+) of managed devices/services (routers, firewalls, linux cloud instances and on prem physical servers etc). I’m struggling to understand where/how to originate the root(s) of trust for a lot of unique key material that achieves compartmentalisation, particularly as the number of services/devices/users grows, and how to tie it back to a control system for revoking/validating those keys.

This is mostly about infrastructure, so SSH, VPN tunnels etc, rather than web apps with built in authentication via single sign-on/AD integration etc. That said, I’m interested in how a solution might cater to providing authorisation in that space for HTTPS web interface sign-in (obviously also subject to what the specific app/service/site provides as it’s own authentication integration, SAML/TACAS/RADIUS/LDAP/AD etc). Perhaps that can be done by:

  • tying a key to a user in LDAP/AD/RADIUS/TACAS?
  • tying a certificate to a user and presenting a signed key and certificate?

I’m leaning towards an on premise, centralised system utilising a bastion host (or something to that effect). If, however, there are good suggestions for a distributed and largely decentralised ‘local to the user’ solution, I’m all ears. We do need a way to securely maintain access control, even if a key is known to a host, we need the ability to invalidate it as a login credential (ideally in real time). So , either reclaim their keys or have a method of rendering any keys they maintain from being a valid login credential.

There’s two problems here as I see it:

  1. Secure generation and use of keys when there are lots of them.
  2. Externally validated and centrally administered access control, based on those user keys, to control valid logins over time.

Key security:

It would be nice for each user to have their own key (so a key is tied to an identity), and for that user to use a unique key for each service, on each system, such that in some sort of compromise, only one such service/system is comprised…hopefully. This obviously starts to require a lot of keys, and some sort of key-agent for the user to help manage it all.

A nice answer would be an HSM per user that can support an arsenal of keys, tied to an agent that automatically selects the right one. If each user had a low cost USB HSM’s (Yubikey/NitroKey etc), they seem to have a very limited number of slots/keys they can store. Is it valid to try and expand this so it can authenticate (somewhat indirectly) more services by storing keys externally, but making them only usable via the HSM?

i.e. HSM as a master key, where it generates and exports encrypted keys, which are stored in a software agent, and passed back to the HSM for decryption by the agent when needed for login?

Similarly, if using a central server as a bastion host that users log in to, it would still need to hold a heap of keys, any reason that this approach would be unwise there?

EDIT:
I suppose the other obvious thing is that if a users machine is compromised, the keys are subject to misues by the evil actor, until such time as the keys can’t reach the HSM (i.e. it’s removed, or they try to move the keys elsewhere). Is that a fair statement?

Key administration – validation/revocation:

I suspect this going to involve PKI and some sort of online CRL…

Assuming there is a good solution to generating and storing lots of unique keys, what is the best way to provide a separately managed validation server for those keys?

The granularity would only need to be basic authentication, i.e. this user can login to this server/service; yes or no.

This seems like, in order to scale, it would require a root of trust for a PKI and certificates to be associated with users, either as a ‘signed key’ or just a separate traditional certificate that must also be presented and validated.

In my mind, it would be something where the user authenticates once a day (LDAP or similar), and the server validates their cert/key for say 8 or 12 hours (but an admin can remove that validation at any point in time, nullify login attempts from that point on). When a login request hits a managed device, it would query said server via a secure connection to check authorisation for the cert/key and allow/deny login accordingly.

I know of commercial solutions that exist for certain environments (i.e. AD, proprietary firewall managers etc), but nothing that is fairly simple and ‘cross environment,’ for say OpenVPN/SSH/WireGuard authentication. LDAP or RADIUS seem like the best bet, but not sure how to tie that into SSH with a permissions cross check, even if the key was authorised on the host?