Wow. Bleach has been a staple in my Python toolkit for years - v1 was released about 12 years ago, and I’m fairly certain Bitbucket was using it when I worked there almost 8 years ago. Thank-you for maintaining it for so long. I hope another library steps up to take its place.
Though, outside of compatibility updates, and major bug fixes, is there much that needs to be done with bleach? It seems like part of the reason for its longevity is that it’s had a similar API for a long time, and it’s been very stable. Maybe it could be changed to use lxml.html under the hood to get around html5lib seeming to be unmaintained, but it doesn’t seem like a very high priority unless there’s a security issue or major bug with the underlying library.
While it is sad to see Bleach deprecated, as it is a library I’ve used for SO long time, I expect some amount of projects like nh31 to emerge soon to fill the void.
Which even can be seen as win-win situation, as ammonia2 crate declares itself as 15x time faster alternative to Bleach.
This feels like it’s been a while coming. Gloriously, html5lib deprecated its sanitizer in favor of Bleach in 2020, but the project’s owners haven’t passed the torch.
Around that time I looked into forking html5lib due to the lack of maintenance (they aren’t great about merging PRs) and slow performance. My thought was to type annotate it enough to run mypyc on it. However, after triaging all the open issues and digging into the implementation I don’t think it’s really worth salvaging, for a number of reasons:
It feels like the victory of the project was html5lib-tests, which were used to build html5ever, not so much the actual software product.
I’ll probably end up porting my HTML processing code to Rust so I can use html5ever directly.
If using rust, I can highly recommend https://github.com/rust-ammonia/ammonia which is already sitting on top of html5ever.