1. 35
  1.  

  2. 17

    Perhaps it’s just me, but running CLI apps with docker run seems rather weird. I tried using the semgrep-v0.6.1-ubuntu-16.04.tgz from the releases page, but can’t get that to work. Could probably figure that out, but having to figure out how to even run it for a quick test to see if I like it is a bit of a turn-off IMO.

    1. 4

      Hey, I’m on the semgrep team, sorry it’s been giving you trouble! If you just wanna give it a quick go, semgrep.live might be to your liking. As for the installation woes, we provide an install script[0] (warning, will download) for the time being.

      [0]: The install script’s needed cause the fastest way we could get things working was to do the parsing and heavy lifting in OCaml, and write the more feature-packed and user-friendly CLI in Python, so we have two binaries to ship together. I assume your preferred way here would be to just install a .deb package?

      1. 17

        heavy lifting in OCaml, and write the more feature-packed and user-friendly CLI in Python,

        Is writing a featureful cli in ocaml really that hard? Shipping a single binary would be much more palatable than docker imo,

        1. 2

          Daniel Buenzli made a reasonable arg parsing library for OCaml but I can’t remember the name. It’s pretty good, though.

          EDIT: It’s cmdliner

        2. 12

          My preferred way would be to install a single binary with a simple build into $PATH. Packaging would be nice, but I’m using OpenBSD, so a deb package (or a docker container) doesn’t really help.

          (Docker as a part of someone’s build has become a red flag for “hellish to get building” – it’s led me to steer away from a whole bunch of packages when I was doing stuff that had to run on Android, as well as for my personal computing environment.)

          1. 11

            I think you’re insanely undervaluing this by saying “like grep, but for code.” This is a linter language language for writing linters that can target multiple languages, and has “one liner” support for ad hoc searching. Don’t sell yourself short.

            1. 1

              Did you write your own parser for all 5 languages currently supported?

              1. 1

                Yes, Debian package would be great.

            2. 13

              Typescript — Coming… PHP — Coming…

              This is basically why grep, diff and all the text based tools still live. Supporting a usable subset of individual languages is simply impractical in the long run for any volunteer team, and relying on 3rd party support is a hit or miss. Hence you lose to simple text based tools from the point of reliability: I know grep is going to always work, while for your tool I have to look every time.

              P.S. This is not to discourage, it’s still useful for those languages that are supported.

              1. 1

                Maybe using Language Server Protocol could help with this issue?

                1. 2

                  As things stand now, it’s gonna be one slooooow grep indeed…

              2. 11

                This is a program that reads text files and writes to stdout; why do you use docker to distribute it? Is it difficult to compile? If that is the case, why don’t give us a static binary?

                OK, I would love to try it, but I don’t use docker. Typically, in these cases, I can read the dockerfile and reproduce the build steps. But here, it seems to be really complex. Why does it mess with certificates?

                I would appreciate very much a clear explanation of the compilation instructions, if a binary cannot be made available. As in: I have just installed debian and cloned the semgrep repo. Which packages do I need to apt-get before compiling semgrep? (and a similar thing for openbsd).

                1. 2

                  Same maintainer speaking as above. Thanks for the feedback; we just shipped binaries for the first time for macOS and Debian in the most recent release, click through to find them.

                  As for the certificates: I’m not sure off the top of my head! Perhaps it’s because Nuitka[0] doesn’t embed the certificates that our Python dependency chain brings in via certifi? That’s sort of a wild guess, but I’m quite curious now. I’d imagine some tools might expect to get a path to a certificate as opposed to just the certificate content itself.

                  And if you want to compile from source, development.md will point you in the right direction. None of us tried compiling on OpenBSD so far, so I’d suggest a Debian base for self-compiling.

                  [0]: https://github.com/Nuitka/Nuitka compiles our Python package into a binary

                  1. 2

                    Thank you very much, it looks great! Will try to compile it on debian.

                2. 5

                  This is really cool! I’m not sure the flag here is fair, it’s not like grep but for code, I use grep for code every day and it works fine for the majority of use cases, and I wouldn’t use semgrep for them.

                  It’s a semantic grep, it’s very useful for security, the examples show exactly that, for instance: requests.get(..., verify=False, ...)

                  Will show every time someone in your codebase is using SSL while not verifying the CA. This would help an organization locate which data might be leaking externally and also what internal services haven’t been properly signed/certificates missing…

                  With grep alone, you would take ages to be able to cover all formats like they show: https://semgrep.live/jqn

                  For the security use case, I’d even deal with the weird Docker image distribution.

                  1. 4

                    Cool! On the other hand, this is a lot longer of a command than grep foo

                    docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep --lang python --pattern '$X == $X' test.py
                    
                    1. 1

                      For programs that run in Docker, it’s handy to wrap the boilerplate in a shell function. For example, you might stick something like this in your Bash or ZSH startup script.

                      semgrep() {
                        docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep "$@"
                      }
                      

                      Then running your example looks a little better:

                      semgrep --lang python --pattern '$X == $X' test.py
                      
                      1. 1

                        Wouldn’t an alias be more appropriate? But yes, a function works fine.

                        1. 1

                          I think the alias wouldn’t be able to interpolate the working directory into the volume mount with -v "${PWD}:/home/repo".

                          1. 1

                            It can; why wouldn’t it?

                    2. 2

                      Building something like this has been on my todo someday list for years now. I’m excited to give it a try!

                      1. 2

                        This is great! It looks a lot like Semmle’s ql, except usable from the command line. This is the kind of tool I’ve been dreaming of for a long time.

                        I’m wondering, how are you parsing the languages? Are you using tree-sitter or re-implementing your own parsers? Why did you choose to base your work on pfff rather than github’s semantic?

                        1. 1

                          As you’ve seen, semgrep is a frontend to a larger program analysis library named pfff. Pfff began and was open-sourced at Facebook, but is now archived. Its primary maintainer now works in our team at r2c.

                          The syntax for queries largely originated in INRIA’s coccinelle project, which created automatic semantic patches for the Linux kernel. The original creator of the tool, Yoann, did a PhD there.

                          Because of that, the parsers are quite custom at the moment. Semantic lacks some linter-specific functionality we have in pfff; that being said, we are quite interested in using https://tree-sitter.github.io/ as a base.

                        2. 1

                          Somewhat related: comby, also written in ocaml, better search-and-replace for code.

                          1. 1

                            None of the examples seem like anything that can’t just be solved using a somewhat simple regular expression.