1. 19
    1. 2

      Interesting article! Maybe I’m missing context but I’m not sure the post sufficiently explained how they verify that a “trusted source” wasn’t injecting something it shouldn’t into some software artifact that it then wrote to cache (think SolarWinds). The closest the post got was:

      Creating trusted build workers is easier said than done. Having our trusted worker run Bazel actions implies that we’re evaluating arbitrary untrusted code on our trusted machine. It’s critical that untrusted actions are prevented from writing directly to the action cache or otherwise influencing other co-tenant actions.

      However, it then only really talks about isolation.

      1. 4

        All cache in Bazel is verified with SHA256 hash. So if you upgrade from lib-a-v1 to lib-a-v2, that’s a change to it’s SHA256. This SHA256 will be used to calculate a merkle-tree of build actions, so when you update a dependency SHA256, the rest of the downstream build actions will get cache-invalidated.

        So if lib-a-v2 is detected to be vulnerable, then you can just revert the commit and rebuild the entire repo to mitigate (and guarantee that your cache will behave correctly). Moreover, the article elaborated that the remote cache is only accessible from their trusted Remote Build Executor, thus increasing the security of the deployable artifact.

        I write a series of articles explaining how Bazel works on the side here https://sluongng.hashnode.dev/bazel-caching-explained-pt-1-how-bazel-works.

        1. 1

          Yes, I think the cache invalidation is useful for recovery but I was thinking more about how the trusted Remote Build Executor is actually trusted / validated. I think there are some interesting problems there and while you likely can’t ever be 100% certain, I was wondering which SLSA requirements they were meeting. I think they probably do meet several (build service, scripted builds, hermetic), I’m curious about what else they’re doing.

          I write a series of articles explaining how Bazel works on the side here https://sluongng.hashnode.dev/bazel-caching-explained-pt-1-how-bazel-works.

          Thanks for the link! I’m curious, as this seems to indicate hermetic builds are also guaranteed to be deterministic. Is that always the case? I’ve always thought of them as very close / related guarantees but not necessarily the same properties.

          1. 1

            I’m curious about what else they’re doing.

            I have seen several orgs where these are not applicable to their setup. I.e. They self-host their infrastructure and have runtime monitoring to ensure all the configs (baremetal, VM, container) are up-to-date. So having an additional validation layer just for the build system is redundant and was not a requirement for these orgs.

            For some other orgs I know of, only a certain degree of hermeticity is required. There are certain loopholes such as OS patch version, or git version that can be relatively safe to ignore. They are pragmatic about what needs to be reproducible and to what degree is it needed to be validated. Over-engineering a validation system does come with a negative side effect, which is that changes to such systems would become more difficult and slower over time and speed is definitely something folks pay a lot of attention to.

            Finally, there are some orgs that really need correctness. Bazel is used by companies with physical hardware: Tesla, Nvidia, SpaceX, ASML as well as some banking/healthcare institutions. These companies have a low tolerance for errors and highly value correctness. I have seen folks in similar orgs investigating Nix Package Manager recently as a means to wrap the environment Bazel operates in. Bazel itself guarantees the hermeticity during build execution but does have leaks when it comes to external dependency and toolchain management. So by using Nix to wrap around Bazel and provide Bazel with the needed external dependencies, they were able to plug where things previously tend to leak. Before Nix, these are often solved using Docker container, but when the container itself is not reproducible, you will have to fall back on a lot of manual investigation and fixes.

            Thanks for the link! I’m curious, as this seems to indicate hermetic builds are also guaranteed to be deterministic. Is that always the case? I’ve always thought of them as very close/related guarantees but not necessarily the same properties.

            So Bazel build is cryptographically verifiable thanks to the Merkle-tree made of SHA256 of everything related to your build: source code, script, environment variables, external dependencies, bazel version, compiler version, etc…

            But every cryptography is breakable under a certain assumption. And it’s possible to design a build that is not reproducible in Bazel. For example, I could intentionally make my build action to depends on some API over the network, and if that API goes down, my build would break.

            So I would say that using Bazel, it’s easier to make your build hermetic, and eventually deterministic. But you can certainly break Bazel hermeticity if you use it wrong, thus the high cost of adopting Bazel today (I also wrote about this in another blog post).

        2. 1

          Hah! You just helped me out earlier today on the Bazel Slack. Nice to see you around here as well!

      2. 1

        I think we need to understand their threat model to better understand why it is “trusted source”. It seems they tried to use ephemeral VMs (or locked-down containers) to avoid builder being compromised through Bazel remote build execution path (i.e. inject code to take over these builders and then inject malicious code into the cache). Of course, it doesn’t protect cases where their container repos compromised and start to ship malicious container images to these VMs.

    2. 1

      Isn’t this what commercial tools like Jfrog XRAY are solving

      https://jfrog.com/screencast/jfrog-xray-securing-your-builds-and-artifact-downloads/

      1. 7

        No, this is about protecting your remote build cache from accidental cache poisoning and/or malicious actor. Xray is more about scanning external dependencies for known vulnerabilities.