It seems to be using geckodriver, the Firefox implementation of the WebDriver protocol and that’s nice.
But how can it be high-confidence if it tests only in one browser?
Happy to report back with: https://docs.quickstrom.io/running/cross-browser.html
As you say, geckodriver is one implementation of the WebDriver protocol. Consequently, I expect that you could replace geckodriver with another tool that uses that protocol and, bugs notwithstanding, the same tests would run. I suspect the documentation uses geckodriver: using the WebDriver protocol for browser automation suggests a desire for widespread browser compatibility.
While I agree that testing with multiple browsers is valuable and important, it’s not been a big focus so far.
The novelty in Quickstrom lies in the combination of generative testing with DOM introspection and the specification language. I’ve only been building this for the last five months and I’ve had to focus almost exclusively on that part to get something running “end to end”.
I’ve very much planned supporting multiple browsers, and as pointed out, by using WebDriver it shouldn’t be very hard. There’s a lot left to do in this project, for sure. :)
This project was previously called WebCheck. I posted here on Lobsters about it a few weeks back, how I found problems in TodoMVC implementations using a specification: https://wickstrom.tech/programming/2020/07/02/the-todomvc-showdown-testing-with-webcheck.html
Hey all, author here. If this sounds interesting, feel free to sign up to my newsletter at https://buttondown.email/webcheck to get more updates. The source code for WebCheck is not available yet, but I’m planning on getting it released in a not-too-distant future. Cheers!
What’s your plan for making sure the tool can discover actions even for, let’s say, non-semantic SPAs? ie. it might have onClick handlers on arbitrary divs, instead of as or buttons.
I guess another way to ask is: when in the example you say:
WebCheck generates click actions for all clickable elements.
WebCheck generates click actions for all clickable elements.
What is the definition of all clickable elements?
I guess it could be a nice way to force folks to make their SPAs more accessible by requiring either correct elements or some ARIA / role=“button”-like attributes… But to help adoption, it should be possible for users to say their own rules on what’s clickable?
Yeah, that’s a good point. Currently, WebCheck has its rather strict rules about “clickability”, like it has to be an anchor, button, or submit input, not be disabled or out-of-viewport, and so on. Maybe there is need for a custom “forced” click or whatever to support such JS solutions.
Most frontend frameworks bind a document click handler and then dispatch events based on the current target.
Figuring out which elements have event handlers attached statically is essentially impossible in these cases, other than by instrumenting all common frameworks.
Yeah. Ultimately, it’s up to the user to define a set of possible actions (in the spec) that provide good enough coverage. WebCheck can be somewhat smart, but the specification writer might need to narrow down the search and be more specific than just “all possible clicks”.
Yeah - that was roughly where I got to when I looked at implementing a similar tool (targeting screenshot testing).
I’d actually be quite interested in a SAAS which would explore the state space of a page and show me screenshots of all the unique states it finds (particularly if it does so for a few browsers / screen sizes). Being able to attach an image to a pull request showing what the page looked like before / after is awesome but often overlooked.
Definitely! I’ve thought about attaching screenshots in WebCheck to introspect failed state transitions. I don’t really have any plans yet for non-failing behaviors, but I guess it could be an option!
Selling any kind of developer tooling is super hard - here’s hoping for your success!
Ugh, yeah it’s likely a steep uphill battle.Maybe if I can win the hearts of developers, but it’s hard with test tools. I think it’s not enough to just “find weird bugs”. It has to do that without adding a bunch of effort, because in many jobs you aren’t encouraged to do advanced testing or formal specification. It’s by your own choice. If WebCheck can help without introducing tons of extra work, I think it might be very appealing.
Is there more info available other than that single page? I’m for example interested in what the motivations are why this tool exists, what makes it stand out from other tools, what tech it’s using etc etc.
Not quite yet. I would like to get a more detailed blog post out soon. I’m painfully close to something that I can release to a few brave souls to try out, so I’ve been focusing on wrapping it up. If you’re interested, please sign up (or follow me on Twitter if that’s your thing), and I’ll make sure to get some more details out there soon.
Super quick summary: This tool is meant to marry property-based testing with models, temporal logic specifications (much like in TLA+), and browser testing with webdriver. I want to make it easy to use this on any web page, without loads of boilerplating test setup. Basically write a spec, point at your website, off you go.
It takes a rather black-box approach to the SUT, although it needs to know about CSS selectors and other DOM attributes. But a web page can be React, server-side rendered, a mix, whatever. You can test multiple sites with one spec. It detects changes to relevant DOM elements so your state changes can be synchronous or asynchronous (like changing something after an HTTP request is completed).
It is different from PBT with models because you don’t write the simplified/abstract model. You write a specification (which is a bit different), and you can gradually make that specification more detailed. With models and PBT, in my experience, you need to capture all of the essential complexity of the SUT in your model.
It’s different from model checking as seen in TLA+, because this is testing the implementation. It’s also using finite traces which are much smaller. In TLA+ you can check huge state spaces in seconds. Here you might test a few hundred or thousand different cases with, say, 50 actions in each, and it’s currently in the order of minutes. But even if failing traces can be somewhat large, WebCheck tries to shrink it down (like QuickCheck does).
Regarding implementation, the current version is built in Haskell and the logic DSL is an adaption of PureScript. I’m using the PureScript compiler as a library, but interpreting the CoreFn representation and adding the temporal logic operators next and always, along with queryAll. The specification must be a pure expression, it doesn’t support Effect from PureScript. But it supports PureScript packages! You can use monad transformers in your specs, if you’d be so inclined.
Recommended tag: show, because this is definitely something you should be proud of :)
Nice work @cmeiklejohn. Congrats on the awesome results!
Author here. This case study dives into a bit more complex functionality than the previous one, including a few fun bugs that I found using “oracle generators” and property-based testing. I hope these real-world examples are motivating showcases of the effectivenses of PBT in an “industrial” setting. This is not a commercial project, but complicated nonetheless. Enjoy!
Thanks for writing this series. I’m looking forward to adding some property tests to my projects and these posts really help with the thinking and approach.
Thank you so much for the feedback! I’m happy that they’re appreciated. On to the final case study… :)
Great article! Very interesting to read about an application of property-based testing to an unusual domain. Also a very clear example of using oracle generators. I found that generating input data was a tricky problem in PBT, and oracle generators look like a good alternative to distribution-based input generation.
Thank you! Yeah, I’m glad I found this example to showcase how the “backwards” style of oracle generators is practical and can help overcome that issue. And also that it found some bugs! It’s fun when there were actual bugs found, not only “what if I had mad this error” kind of scenarios.
I really enjoy how this article shows various techniques, evalutating how they fail or succeed, and not only the technique that worked best. Hillel’s blog is a streak of good posts!
Thank you! I think it’s important to talk both about what works and what doesn’t work, so people know the pitfalls before they do the same thing.
Using property-based testing at the “integration” level. Put differently, testing the entire application stack with minimal stubs using properties, runnings housands of cases. Also working on a series of articles on this very topic, where 2 out of 5 are published so far at https://wickstrom.tech/blog.html. I love this stuff!