Can this tool report things like unused variables or references to undeclared variables?
From what I can tell, the tool can only match against AST nodes and report problems if the conditions are met. But it can’t combine the information from the different nodes it encountered. So I’m guessing these kinds of rules are not possible?
With the current release there is no proper way to detect unused/undeclared variables.
Next release (planed for next week), will add the possiblity to test predicates against “far away” nodes (that are not a direct parent/child/sibling), making it relatively easy to express such rules.
The links you gave don’t really explain much about the why of this project, only the how.
When do you reckon that this would be a good tool to use? I can see the value for languages without existing tooling, but fail to see it when the target language already has an extensible linter.
There is value to having consistency in syntax for writing rules and the interaface for the CLI tool itself.
Here’re two random semgrep rules; one for python [1] and one for JS [2], to find stray breakpoints left in the code. You will find they share a lot of structure, and perhaps you might be able to write these rules without even knowing Python or JS yourself by just defining some “shape”.
I feel like we are moving towards one-ish “tool” to rule them all, instead of re-inventing the wheel for each language. This makes sense if you realize more and more developers work with multiple languages over their careers(sometimes in a single day).
Comby [3] tool for searching and changing code structure. i.e. refactoring across files
Tree-sitter [4] a parser generator, which is used in emacs to do syntax highlighting and manipulation across a variety of languages
First, the goal of this tool is to be very approachable: rules can be written iteratively using the REPL, and usually don’t require much code.
If you are only ever using a single programming language, this might be the most visible upside of Sylver. But if you use multiple languages/formats, I think there is value in learning a tool that can lint your js frontend, your go bakend as well as your .env files and your Kubernetes config:
1 query language/interface to learn instead of multiple APIs and interfaces
1 central configuration
1 tool to install/run in your CI
the ability to discover rulesets through sylver’s registry (sylver init will install them for you, and a web UI is in the works) instead of googling endlessly to find the right linter/plugin.
In the end, I don’t think I’ll be writing majority of the rules anyway, I’ll just use some standard “package” of them. But then we’ll add our project-specific things that we can all share and understand.
There’s value to do that in one tool for bigger teams.
Congratz on the release!
I’ve stumbled upon semgrep a couple of times but never used it. It seems to do similar things. How would Sylver compare to it?
This specific example (with the assignment right after the let) should be encodable in SYLQ!
I’ll post the query as soon as I’ll get back in front of a computer.
Now, hard mode… is this possible: instead of “next sibling” I really want something like: there have been no reads of x and x is transformed at any later time, then trigger the flag.
Another, unrelated one that came up today: We have SQL queries inside of multi-line template strings (in JS), and we currently manually enforce in code review that they are formatted with a specific SQL formatter. There are two challenges:
You have to recognize it is SQL (easy to do manually), but probably needs a heuristic isSql function to automate, or else the ability to mark with comments.
The rule to identify “not formatted correctly” involves actually running the string through the formatter and checking that if it changes.
2 is probably out of scope for your tool, but since you’re deep in this space maybe you have some advice: other tools or the approach you’d use?
The hard mode version can’t be expressed with SYLQ in the current release, but the next big feature to land will allow the user to test predicates against “far away” nodes, so it will be possible to check that the first reference to x in the nodes that follow it’s declaration is indeed part of a call to map.
The second question is really interesting. A feature that would allow running a parser on a node’s text to produce a ‘subtree’ -possibly using an other language’s parser- would make a lot of sense.
Regarding point 2, I very strongly believe that proper formatting should be checked by the formatter itself and not by an analyzer, so the approach that you are suggesting is definitely the one I’d use.
It’s also the approach used in: https://github.com/gajus/eslint-plugin-sql
I used to do a lot of work in a pretty obscure language called Pawn which had zero tooling. I wrote a package manager for it and always wanted to write a proper AST based parser (as the original compiler just generated bytecode directly from the token steam with no AST stage) to allow me to write a linter and formatter.
sadly I never got around to it, a tool like this would have been a game changer years ago! But I’ve shared this in the still-alive community for that language to see if anyone finds it useful!
Can this tool report things like unused variables or references to undeclared variables?
From what I can tell, the tool can only match against AST nodes and report problems if the conditions are met. But it can’t combine the information from the different nodes it encountered. So I’m guessing these kinds of rules are not possible?
With the current release there is no proper way to detect unused/undeclared variables.
Next release (planed for next week), will add the possiblity to test predicates against “far away” nodes (that are not a direct parent/child/sibling), making it relatively easy to express such rules.
If you can remember to, please post a link here when that gets released :)
This belongs squarely in the “it’s so obvious, why didn’t anyone think of it before” category (in other words, it’s brilliant).
Thanks a lot !
The links you gave don’t really explain much about the why of this project, only the how.
When do you reckon that this would be a good tool to use? I can see the value for languages without existing tooling, but fail to see it when the target language already has an extensible linter.
There is value to having consistency in syntax for writing rules and the interaface for the CLI tool itself. Here’re two random semgrep rules; one for python [1] and one for JS [2], to find stray breakpoints left in the code. You will find they share a lot of structure, and perhaps you might be able to write these rules without even knowing Python or JS yourself by just defining some “shape”.
I feel like we are moving towards one-ish “tool” to rule them all, instead of re-inventing the wheel for each language. This makes sense if you realize more and more developers work with multiple languages over their careers(sometimes in a single day).
[1] https://github.com/returntocorp/semgrep-rules/blob/develop/python/lang/best-practice/pdb.yaml
[2] https://github.com/returntocorp/semgrep-rules/blob/develop/javascript/lang/best-practice/leftover_debugging.yaml
[3] https://comby.dev/
[4] https://tree-sitter.github.io/tree-sitter/
First, the goal of this tool is to be very approachable: rules can be written iteratively using the REPL, and usually don’t require much code.
If you are only ever using a single programming language, this might be the most visible upside of Sylver. But if you use multiple languages/formats, I think there is value in learning a tool that can lint your js frontend, your go bakend as well as your .env files and your Kubernetes config:
In the end, I don’t think I’ll be writing majority of the rules anyway, I’ll just use some standard “package” of them. But then we’ll add our project-specific things that we can all share and understand.
There’s value to do that in one tool for bigger teams.
Congratz on the release! I’ve stumbled upon semgrep a couple of times but never used it. It seems to do similar things. How would Sylver compare to it?
Putting aside the difference in maturity here is my take on it:
@sevender:
How complex can the rules get?
For example, would I be able to use this to create a rule that does things like flag an unnecessary use of
let
vsconst
?Original code:
with the 2nd line being the only reassignment. This can be rewritten:
I’d love to be able to catch things like this in an automated way…
This specific example (with the assignment right after the let) should be encodable in SYLQ! I’ll post the query as soon as I’ll get back in front of a computer.
Here is the ruleset detecting your pattern:
You can run it on your project with the following command:
Let me know if it helped !
This is awesome, thanks for responding!
Now, hard mode… is this possible: instead of “next sibling” I really want something like: there have been no reads of
x
andx
is transformed at any later time, then trigger the flag.Another, unrelated one that came up today: We have SQL queries inside of multi-line template strings (in JS), and we currently manually enforce in code review that they are formatted with a specific SQL formatter. There are two challenges:
isSql
function to automate, or else the ability to mark with comments.2 is probably out of scope for your tool, but since you’re deep in this space maybe you have some advice: other tools or the approach you’d use?
You’re welcome !
The hard mode version can’t be expressed with SYLQ in the current release, but the next big feature to land will allow the user to test predicates against “far away” nodes, so it will be possible to check that the first reference to x in the nodes that follow it’s declaration is indeed part of a call to map.
The second question is really interesting. A feature that would allow running a parser on a node’s text to produce a ‘subtree’ -possibly using an other language’s parser- would make a lot of sense. Regarding point 2, I very strongly believe that proper formatting should be checked by the formatter itself and not by an analyzer, so the approach that you are suggesting is definitely the one I’d use. It’s also the approach used in: https://github.com/gajus/eslint-plugin-sql
This is way cool and super exciting. Thanks so much for doing this and for sharing it.
You are very much welcome!
This is impressive! Nice work!
I used to do a lot of work in a pretty obscure language called Pawn which had zero tooling. I wrote a package manager for it and always wanted to write a proper AST based parser (as the original compiler just generated bytecode directly from the token steam with no AST stage) to allow me to write a linter and formatter.
sadly I never got around to it, a tool like this would have been a game changer years ago! But I’ve shared this in the still-alive community for that language to see if anyone finds it useful!
Thanks, it means a lot !