1. 20

  2. 2

    Nice. Absolutely the right direction. We discussed it not long ago, but “identity” is basically a core issue to package trust. If I know a package is written by ddevault, or ltorvalds, and I can verify their identity … huge win there. What’s a bit weird is why it’s taken this long, when the machinery exists already, like GPG, to achieve it all. I guess UX.

    1. 1

      That doesn’t really scale unless there’s some accountability mechanism. If the identity of a package author is Itorvalds, I need to be very careful that I don’t confuse it with ltorvalds and I need to have the context to know that one of these is the well-known maintainer of a couple of popular open source things. I doubt I could name the maintainers of more than 1% of the things installed on my system.

      It’s useful in the context of something like the Apple App Store because a corporate ID will have been validated against some official documentation, cost money to acquire, and will be revoked (losing a revenue stream) if found to be behaving maliciously. Identity for this kind of thing is useful only as a building block for a reputation system and, after decades of research, the summary of the state of the art for reputation systems is ‘it’s hard’.

    2. 1

      Interesting take on authentication, however perhaps a bit too convoluted?

      Let aside the fact it relies on JWT that has a bad reputation of being impossible to get right in implementation, but what caught my eye was how the verification (by PyPI for example) is being done: by applying certain policies over the verified claims in the received JWT.

      Because the article doesn’t say, and doesn’t give any screenshots in case of PyPI, I can only wonder how this is actually achieved:

      • either it supports a predefined set of providers (like GitHub, GitLab, etc.) and provides out-of-the-box policies for them; (but then the openness argument dissipates;)
      • or the user now has to write some policies themselves (in some DSL perhaps) to check the various claims (user, repository, branch, etc.);

      So, I’m left wondering if they haven’t just moved the problem from one place to another by introducing yet another attack surface (which besides JWT is now the policies)?

      They have even identified one themselves – that of resurrecting dead or renamed accounts – and they’ve solved it with the intervention of GitHub, which means that this issue has to be solved by each individual provider.

      I can’t believe we can’t find simpler approaches than this…

      1. 6

        This was meant to be just a technical summary of the architecture. The link at the very top of the post goes to the the PyPI blog, which contains an announcement that’s more specific to the features available at launch-time; it also contains a link to the technical documentation1, which should address some of your questions around policies and security model.

        To summarize:

        • Yes, JWTs at bad. They’re also, practically speaking, the only game in town. OAuth2 and OIDC use them, so we have to use them. There would be no trusted publishing at all without these underlying technologies.

        • At the moment, PyPI supports just one provider: GitHub. That provider was selected because it accounts for an overwhelming majority of publishing activity, meaning it’s the one that would be most immediately useful to potential users. I’m currently working on support for two more providers: Google’s IdP and GitLab.

        • It would be fantastic if we could do vendor neutral policy verification, but neither OAuth2 nor OIDC is rigid enough for us to assume a ‘baseline’ of shared relevant claims. Each IdP needs to be added manually, since each has its own claim set and quirks; eliminating that problem is a problem for standards bodies, not relying parties like PyPI.

        • The “policies” in question are just HTML form inputs for (in GitHub’s case) the username, repository name, workflow name, and an optional actions environment. You don’t have to know a DSL; that’s something we considered early on in design, and rejected for complexity reasons. There are examples of this in the docs (linked above).

        Trusted publishing is indeed architecturally complex. But it’s using OIDC as it’s intended, and we’ve tried to abstract nearly all of this complexity from the end user (who only needs to know which user/repo and workflow they trust). The surface that it exposes is roughly equivalent to granting an attacker access to your CI (including all secrets in it), so I don’t think that part is a significant new risk.

        1. 1

          I’m not trying to dismiss the work you have done with this integration. It is a first step towards a better solution for a quite hard to solve problem.

          However, PyPI is now in the same league as Rust’s crates.io in that it requires a GitHub account to be able to “play” in Rust’s (and now Python’s) ecosystem. I don’t have anything against GitHub either, I use them for all my code, however there are developers (and certainly also amongst Python ones) that don’t want to have a GitHub account. Thus for this category of users the problem is still unsolved.

          However, assuming GitHub is only the first such identity provider, what are the next steps to support multiple independent providers? How would the UI change to support something that doesn’t have GitHub’s particularities? would a separate HTML UI (i.e. form with inputs) be developed for each?

          Because each time such a proposal appears, that in theory supports multiple identity providers, it falls short and supports only the largest ones like Google / Facebook / Apple.

          To summarize, I understand why PyPI (and crates.io for that matter) have chosen to start (and stop) with GitHub: at the moment it has a large user-base and it’s pretty secure (or at least with deeper pockets to throw money at the problem if needed); in a few words it’s a practical approach (that I would have taken myself). But I doubt there would be a time when the “trusted publishing” would expand to other providers…

          So, if GitHub is the only one, why bother with all the complexity?

          Google’s IdP and GitLab.

          GitLab is perhaps similar to GitHub, thus it might have the concepts of repositories, branches, etc. But would you support only the “official hosted” GitLab instance, or will you support independent GitLab deployments?

          But what are the equivalents for Google? They’ve killed their code.google.com long ago. Are you targeting another of their products?

          1. 2

            However, PyPI is now in the same league as Rust’s crates.io in that it requires a GitHub account to be able to “play” in Rust’s (and now Python’s) ecosystem.

            This new feature, which offers an ability to publish from CI with a short-lived token in order to make CI compromises less severe, chose GitHub as the first supported IdP, not the only one that will ever be supported, and you’re literally replying to someone who says they’re working on adding support for others.

            And if you don’t have or want a GitHub account, or object to using a GitHub account, or don’t want to publish from CI at all, you can still use plenty of other PyPI publishing workflows to put your packages onto PyPI, all the way down to just running python -m build && python -m twine upload dist/* manually from your own personal machine. So claiming that PyPI now “requires a GitHub account” is false.

            1. 1

              I appreciate this concern!

              I’ll state it unambiguously: trusted publishing is not intended to deprecate ordinary PyPI API tokens. Users will continue to be able to create API tokens, which can then be used however they like. Trusted publishing is only intended to provide a faster and more convenient alternative, where the platform (i.e. GitHub, and soon others) permit.

              Support for independent providers is something that PyPI’s admins will probably consider on a case-by-case basis, as a balancing act between implementation complexity (each IdP needs special handling), the security margins of each new IdP (there’s relatively little value in a random self-hosted IdP versus a normal API token), and the impact to the community (e.g., major CI platforms).

              But what are the equivalents for Google? They’ve killed their code.google.com long ago. Are you targeting another of their products?

              Yes, sorry if this was unclear: in this context, the IdP being supported is the one that provides proofs for machine identities on Google Cloud Build. In other words, similar to GitHub Actions’ machine identities.