1. 14
  1. 5

    I saw a lot of confusion on Twitter around what this is for. It’s very similar to systems like Souffle. There’s a great blog post about doing interesting static analysis using Souffle.

    According to the authors, Glean is a bit different in that it’s optimised for doing quick analysis, like for an editor or IDE. With Souffle, I have previously tried to analyse a medium sized Java code base using Doop, which consumed all memory and crashed.

    Hopefully Glean is more promising for this type of work. Sadly the Java indexer is not open source yet.

    1. 3


      Any opinions on how these relate to semgrep?

      1. 2

        For history: Semgrep was originally a Facebook project built upon their pfff project. Facebook stopped working on both of those so I guess Glean is being used as their replacement.

        From my understanding:

        • Semgrep has parsers and typers for all of the languages it supports, while Glean is designed to read data dumped out from a compiler
        • Pretty-printing and semantic diffing are part of Semgrep but wouldn’t really be feasible with Glean
        • Angle is a query language in Glean allowing abstraction of facts, e.g. you could define the concept of “type class” and have it applied to Scala and Haskell - it wouldn’t be based just on syntax patterns

        It’s say that Semgrep is more of a syntactical tool while Glean is a more full on static analysis tool.

    2. 3

      Not that useful. If you are familiar with the domain, then you would wana look up Google Kythe and how it works (completely opensource with a few talks on YouTube)

      Essentially, to extract code intelligence, your ‘indexer’ need to be very closely intertwined with the language compiler where all the source of truth of how the syntax is interpreted into actual ast/bytecode.

      Glean is essentially like Kythe, but more flexible. Instead of a universal schema, they let you define your own schema so that you can decide to get more or less info from a language easier. I.e. comments might be very useful to extract in Golang or Java but does not exist in Json so you can have separate schemas for each. Or if there is some special concept that only your language have (i.e. Golang struct tags), you can add a special schema for it.

      However, the bread and butter of this is actually the indexers. Glean indexers are closed source, with only Hack and Flow being opensourced in the compilers(not in Glean) as FB control the compilers of those languages. To use Glean for other languages, most likely you will need to change the schema in Glean as well as bring-your-own-compiler, which is a huge overtake, especially for languages without a plugin extraction extension built-in to their compiler.

      It would be interesting if somebody started to hook up existing LSP/LSIF implementation into Glean for a better/easier adoption.

      1. 1

        It would be interesting if somebody started to hook up existing LSP/LSIF implementation into Glean for a better/easier adoption.

        Yes but this would lose a bunch of the supposed benefit of Glean. The point is to have language specific schemas, where LSP is designed to be the opposite.

        1. 2

          Every language has it’s own lsp. The protocol is the only thing common.

          1. 1

            Yes, that’s my point

          2. 1

            Thats ok, the trade-off is ease of adoption. Once users have adopted the tool, the schema/indexer could be modified then

          3. 1

            Are these components not available, or simply closed source? There’s a big difference.

            1. 2

              Facebook uses C++ and has an indexer closed source.

              Other languages that facebook does not use won’t have indexer available

              1. 1


          4. 2

            Any example instances we can browse around?

            1. 1

              The thing missing here is an explanation on what kinds of data is given.

              Marketing tip: I know what it is from the title of the lobste.rs article. What I am interested in is what this brings to my toolkit that I didn’t have already. What new data will it allow me to see?

              The only other source was the getting started, the first page of which describes the tools you can work with to see the data, and your tech stack, but at no point actually shows me the data that I can supposedly derive from my compiler output.

              So as someone with ADHD that’s about as far as my brain can go before it loses interest.