I am currently working in a challenging javascript codebase, desperately in need of refactoring and tidying up. There’s a lot of convoluted logic and repetition that I’d like to eliminate, but there are no tests to help me not break things. The original developer no longer works on the project. This code powers an application that is prominently displayed in the offices and headquarters of a number of very large companies, so I have to be careful not to introduce bugs.
What I would like is a tool that can take two pieces of javascript and tell me that they’re equivalent (possibly with some qualifiers). Does such a thing exist? Are there any other techniques I could use to help this work (one is obvious: Right tests for the current code, and then make the changes)
That sounds like it’s veering into halting problem territory - and since you’re not trying to solve the general case in theory it’s not impossible, but it does tend to get inconveniently difficult fast.
I’d go with golden master testing. Write some high-level integration tests that click around the app and save the final state of the app - HTML output, if possible - to a fixture. Then write the test to fail if the output ever changes from the saved example. Now you can refactor to your heart’s content with high confidence that you haven’t introduced subtle, user-visible changes.
The book Working Effectively With Legacy Code by Michael Feathers is also a must-read for this sort of scenario. Good luck!
I like this Golden Master idea, thanks. It would be great to have this kind of testing anyway, even if I didn’t want to refactor the code.
I have also been thinking about instrumenting the code somehow to try and figure out the dead code paths, because right now there’s a fair amount of uncertainty over whether certain functions get called, etc… If I could see that then I’d have some idea of what’s high-risk (critical path, called frequently, major customer, etc…) and what’s not.
A few tips:
isEqualcan be implemented with===only for primitive types):Computer science trivia: the halting problem is trivially solvable for finite-memory Turning machines. Proof: if you have b bits of memory then you can represent 2^b states. Load the program and step it 2^b + 1 times. If it’s halted then you’re done. If not then according to the pigeonhole principle there are two steps corresponding to the same state. This proves the existence of an infinite loop.
Unfortunately, I’m not aware of any tools that could help you prove the equivalence of two programs.
The short answer is no. It’s a very hard problem in general.
The longer answer would be it depends. I am not aware of any such tool, but if the pieces of code over which you want to run such an analysis use a very specific subset of javascript in a very specific way, then it might be doable.
If you want to test the equivalence of pure functions, generative testing (e.g. QuickCheck in Haskell) could also be used.
Sadly, there’s very little in the way of pure functions to be found in this particular codebase. If I could snapshot the program state at the entry and exit points of a function I was refactoring, and then replace the old code with the new, I might be able to glean some insight. But that seems… hard 🤔
you can introduce seams, to use the WEwLC language, by turning the relevant state into input to the function and turning any relevant mutations into output that get applied once the functions has returned. Now you do have a pure function whose behaviour can be captured in tests.
BTW if that feels like too big a leap to make in one go, look at introducing
proxyquireinto your tests, which lets you replace any dependencies when you set up your fixtures. We use a combination ofproxyquireandsinonat work where we need to test that a function performs some expected external action.“JavaScript code” is too broad. Can you provide some more concrete examples?
Would it help to say that I’m interested in arbitrary, but relatively small, amounts of code? Refactoring within functions, for example.
I can imagine something like (contrived, but sadly this kind of thing isn’t out of the question in this codebase)
And a tool which can tell me that this code is equivalent. But now that I type this out, I can see how it would be pretty hard. In this case it’s obvious to me, because I know how
mapworks, and I can see the inner function code is pure, but how could such a tool know thatlistOfStrings.map, or the function that is provided tomap, didn’t have some kind of side-effect that is no longer happening.Indeed, this would be difficult if not impossible, however, perhaps you should try this from another angle. A tool like ESLint could help by flagging or not flagging each piece of code. If you put the “new” code next to the old, and ESLint flags one, there’s a good chance they’re not doing the same thing. You mentioned “side-effects”, so maybe a rule set like this would be of use:
https://github.com/jfmengels/eslint-plugin-fp
In any case, the only true way to test code like this is to write unit tests. Karma, Ava, Jest: these can help. Check out egghead.io or similar sites to help there.
Good luck!
It’s great that you’ve identified the problem – work on this first! Create tests for the existing code. Unfortunately the best way that you can validate that the tests’ expected results are accurate is by interviewing the team that created the existing code.
IMO if your team thinks that refactoring the existing code is a good way to go, the best move is to make these tests first. You will inevitably create incomplete test cases and discover gaps after you start refactoring. But the good news is that you will force yourself and your team to go through the exercise of evaluating “what is the intent of the existing design?” I predict that you will end up with net fewer bugs as a result.
They call these formal, equivalence checks. Hardware developers do it routinely since the HDL’s are easier to handle formally. Software is more complicated with formal equivalence done in limited ways (often custom). If you can’t use formal equivalence, you can get some assurance with automated testing that compares the outputs of the old and new programs. The testing should get deep into the code’s behavior. For instance, the KLEE people have a tool that combines testing and symbolic execution to do this. For this and empirical testing, I was thinking more of just several categories of automated test generation to get strengths of each. Example categories here.
I can’t tell you what’s available for Javascript, though, since I haven’t looked at those tools in a long time. Just that you’d establish some probability of equivalence with structure-based and random testing with output comparisons.
No, it’s not possible for any nontrivial program, because it’s equivalent to the halting problem: take a program you know halts and determine if the other program is semantically/logically equivalent. You can’t even do it in principle.
Suppose two different programs result in the exact same machine code after compilation. That’s a proof that their behavior is equivalent, no matter how nontrivial they are. In fact, it even proves their halting behavior is equivalent.
Yes, but that’s just coincidental for those two programs. If your compiler always compiles a program in a Turing-complete language to the same machine code if that program is guaranteed to halt, you’ve got a magic compiler. :)
More accurately: your example doesn’t prove the general case. Two programs can be logically equivalent with vastly different machine code. In fact, there’s no way for the compiler to guarantee identical machine code for logically identical but different programs.
Of course and fortunately the OP isn’t asking for a solution to the general case. He is asking for a solution to his practical problem of proving equivalence of a certain set of code snippets and their intended replacements. It’s likely that their practical problem is solvable for a large subset of the cases, which would already help then immensely.
Telling them it is impossible in theory is much less helpful than telling them what is possible in practice.
To be fair, the question just says “What I would like is a tool that can take two pieces of javascript and tell me that they’re equivalent (possibly with some qualifiers).”
And the problem is that the set of qualifiers would have to be very specific for this to work even in limited cases. What’s possible in practice is pretty small, especially for a language like JavaScript where you can change fundamental parts of the language (modifying prototypes, etc) and can have non-local control transfers (exceptions, etc).
Formal equivalence verification tools are not magic… but they’ll often get you farther than just throwing up your hands and saying “halting problem”! Here’s one that works pretty well with small chunks of fairly well-behaved C or Java. Won’t solve OP’s Javascript problems, though, for all the reasons you mention.
any testing-based approach will very likely report that f and g are equivalent
With a code coverage tool that measures branch coverage you could know that your test didn’t exercise all branches of this code.
So to answer the OP: if you can find a code coverage tool with which you can measure a sufficient coverage metric (e.g. multiple condition decision coverage), you can create a test set that covers the code you intend to replace, in the sure knowledge all cases are covered. Then assert your replacement passes the same test set. It’s not a proof, but perhaps it’s enough for your needs.
And if that doesn’t fool it, this almost certainly will:
Can you at least identify the type information for these functions? If you have type information, and you have the ability to identify plausible bounds for the primitive types, you can rely on sampling to some extant to get reasonably high confidence that two functions that take similar types of arguments are the same. The good news is that pathological cases usually occur in with values that are reasonably small see small scope hypothesis. Further, if you rely on statistical sampling, you can be have a statistically high confidence after sampling a constant number of inputs (under certain assumptions).