Imagine making a silly somewhat innocuous bug that nobody told you about and getting slammed by a random dude on the web. And this dude isn’t even Linus.
Innocuous would mean something not harmful or offensive. A significant portion of the author’s traffic was failed requests from their bots. 88% of failed traffic were this business’ bots. The failed requests appear to be a result of some pretty basic parsing issues. While the author could be a little more generous in his language, the issue is not an innocuous bug. Also I think comparing the author to Linus is over the top, Linus was abusive in his language. I’ve provided a comparison of what Linus talked like for comparison.
Here’s the author’s response
I’ve read your page on the mj12 bot, and I don’t necessarily mind the 404s your bot generates, but I think there’s a problem with your bot making totally bogus requests, such as:
I’m not a proxy server, so requesting a URL will not work, and even if I was a proxy server, the request itself is malformed so badly that I have to conclude your programmers are incompetent and don’t care.
Could you at the very least fix your robot so it makes proper requests?
Here’s an example of linus
I’m upset, because I expect better quality control. In fact, I expect some qualitty control, and this piece-of-shit driver has clearly seen none at all.
And those patches were apparently committed yesterday.
WHAT THE ACTUAL FUCK?
in tinydrm-helpers.h gets rid of the complete build failure, and only leaves the tens of lines of warnings.
How the hell did this get to the point where crap like this is even sent to me? Nobody tested anything*?
AND WHY THE HELL WAS THIS UTTER SHITE SENT TO ME IF IT WAS COMMITTED YESTERDAY?
Totally innocuous, he wasn’t against having those requests hitting his web server but he was against the code they generated. So, nothing would change if they all turned into valid requests.
Your reply assumes the author said anything of this poses an immediate practical problem, such as increased hosting costs. But I read nothing like that. So I am not sure how you conclude that this bug is not innocuous
When you get a bunch of invalid requests, it’s bad. It’s bad for you, you need to go through and make sure they weren’t sent via a bad link by a major site. It’s bad for scrapers, which is bad for you because you might get marked as a dead site because “//%22https://www.zaxbys.com//%22” was invalid. The author went to pretty decent lengths to explain why it was a problem for them and what steps they needed to follow to figure out why the issue was happening. I could go on and on about why it is a burden to have a business whose job is to connect you with potential viewers, and instead all they do is spam your page with very clearly invalid requests to the point where a significant percentage of your requests are invalid requests.
He has a blog, he wants viewers to see his blog. Bot is taking up traffic and is not providing value. Seems like a pretty reasonable practical problem to me. Just because you have bigger problems doesn’t mean his problem isn’t a real problem.
Is he being DoSed? This is missing from the article. The numbers don’t suggest this. I still don’t see the problem at all. I see somebody who gets worked up over a bug they’re not really affected by.
You do have to pay for traffic, so it’s convenient when that traffic is purposeful in any way. You’re going through a lot of effort to defend a company who has probably abandoned this product 2 years ago.
As far as I can determine, that particular bug has existed since November 2017. I’m a bit surprised that no one working there (or for) has found it in a year and a half. It appears to be a pretty basic parsing bug.
Whatever the bug is, if somebody put me on public display like that I would still block them wherever possible. I assume that’s what’s happening already.
Following a URL is the primary purpose of a bot, yet that feature is completely broken in a way which would have been obvious had they tested the bot on any web site.
Whether you want to call that incompetence or not is up to you, but, well, it’s not a good look.
If you find a bug and want it to get fixed, do you think that insulting maintainers is going to be helpful? I don’t think so.
feature is completely broken in a way which would have been obvious had they tested the bot on any web site.
And how do you know that? Maybe there is something special with the site/content in question that exposes bug that is not affecting 99% of other websites? Instead of assuming worst possible interpretation it might be better to just report the issue without judging project maintainers :)
It can be helpful in making sure that nobody uses it. Not everyone has a vested interest in getting every bug fixed. Sometimes the best solution is encouraging people to use other software and not something completely broken. The article was pretty clear about the bug in question, and it would affect really any page with comments or discussion.
I’ve seen pretty obvious bugs in code written by very competent programmers. These things slip through. It’s a bit like shouting “YOU ARE SUCH A FUCKING IDIOT” at a friend because he calls a wrong number.
I agree, these things don’t make you look particularly good, but they do happen to anyone, and calling someone incompetent over something like this is just being a dick. The bad thing is that it hits the insecure people disproportionally hard. If you get an aggressive reply, then maybe it’s time to act like a dick.
This “Majestic 12” bot actually looks like an interesting project. According to https://www.majestic12.co.uk/ it was started around 2005 as a distributed search engine. From what I can tell it’s more of a “distributed crawler” that feeds info to a centralized search/analytics tool at https://majestic.com. However, it’s cool that there’s still a little community of people running nodes for the project, even if it’s awful software.
The content of the post is just programming sucks and web crawlers authors don’t look their mail. Nothing like a real novelty. Don’t know the reason of the upvotes.
If you have any problems installing or running software then please report them in forum.
The forum link is a list of bugs, the most recent of which was added in December of 2018. I don’t see this issue reported in the way they’re asking. Perhaps that would be a first step before calling them incompetent?
The useragent MJ12Bot uses links to this page. I cannot find a link to the forum anywhere on that page, but it does give an email address to contact them.
But I did get another email reply from them—the person who handles questions about the MJ12Bot is out of the office for the day.
People are getting pretty worked up about calling someone incompetent, and hey I get it. However when 77% of requests being made fail, they may have no real incentive to provide working software. At that point it may be effective to discourage people from using or seeking out that product, and calling the manufacturer of that product incompetent is a potentially effective move towards that end.
Before I even clicked the link, “I bet it’s mj12bot.” Is this really a bug? I just assumed it was an attack and blocked their network. They do this all day, every day, in parallel:
That with different product IDs. Just beating the crap out of one particular Woo’ site I host. Is that really REALLY just a shitty bot and not an attack?
I always assumed these must be coming from actual links generated by other garbage websites (e.g. amateurish SEO content mills that scrape other sites).
It’s hard to believe that a crawler would have such a basic parsing deficiency. HTML is full of subtle gotchas, but quoted attributes are pretty obvious. They must be fetching millions of pages a day, and they’d have to notice a bug after crawling the first couple of pages.
Hmm I’ve seen worse mistakes in bigger businesses. They don’t “have” to notice anything. Issues that regularly happen and have happened since you’ve joined a business often get seen as “Well if I fix it and anything else goes wrong, I’ll be fired for it.”. Sometimes if it ain't broke don't fix it, means don't fix any issue unless I specifically tell you to.
But //%22https://www.zaxbys.com//%22 is not a valid request. Nor is //"https://www.zaxbys.com//". And while you can give a URL as a web request, that only works for web proxies, which I’m not running. So even if this is a garbage link from some content mill, it should be rejected.
But http://www.zaxbys.com//%22https://www.zaxbys.com//%22 could be a valid link somewhere. It could happen if something mis-parsed HTML and then tried to make all URLs absolute.
Probably it was developed in an environment that does not have dependency management. I remember most PHP projects contained no external libs, everything that standard library lacks¹ was developed in-house. Before Composer appeared, there was pear and it simply didn’t work at all. And parsing everything with regexps was common. This example is probably a result of parsing with regexps, when you don’t have easily available “soup”-family library (and such stuff is complex).
So, this is the opposite of js ecosystem and the example of what will happen when you try to run away from npm problems too far.
¹ for added fun, standard library was not enough standard and depended on global php.ini and arguments with which php was compiled
How is ’‘incompetent’’ offensive now? Do you people look for new words to be offended by? Is that the entire reason this has received so much attention?
Anyway, I don’t feel sorry for anyone who writes a WWW robot; in particular, this is one of those robots that doesn’t obey robots.txt, which makes them even worse than usual, although I’ve not experienced this parsing flaw, yet all of my HTML uses double quotes for attribute values and whatnot.
These people are cretinous and don’t warrant the kindness the author gave them, yet several of you believe he was insufficiently kind? How absurd.
I’ve personally worked in the business long enough to do dirty and naughty things that the business context forced upon me. I took shortcuts when I would’ve rather fully understood the spec, and have released things knowing that some areas are more untested than others. If someone would call me incompetent because of a bug resulting from that, I’d think of them as an ignorant and assume they live in a bubble where software isn’t built with business needs as the number one priority;)
Also nothing against the author, but their website mentions “bug free” software in multiple places, yet when I checked their github there were a couple of bug fix commits. I personally think that’s completely fine and I’d still at least appreciate the quest for developing bug free software, but if I was as uncharitable in my interpretation of the circumstances that I don’t know about as they were with the bot, I’d call them a liar.
Edit: I don’t find “incompetent” offensive by the way. I find the author’s use of it in this case to be “ignorant”.
Answering the question in the title: “commercial grade” just means you have to pay for it to use it. It doesn’t say anything about the quality. It was a bit of a surprise for me as well, that when I started working as a programmer, the quality of code I saw wasn’t particularly good in comparison to the open-source programs I saw before.
For the same reason “military grade” isn’t a compliment except among marketers.
People always use the truism “You get what you pay for” without understanding it: If you pay for absolute shit, you get it, regardless of how much it cost. Price and quality are at least loosely correlated in some industries, but software isn’t one of them and it never has been.
Imagine making a silly somewhat innocuous bug that nobody told you about and getting slammed by a random dude on the web. And this dude isn’t even Linus.
Just amazing programmers are.
Innocuous would mean something not harmful or offensive. A significant portion of the author’s traffic was failed requests from their bots. 88% of failed traffic were this business’ bots. The failed requests appear to be a result of some pretty basic parsing issues. While the author could be a little more generous in his language, the issue is not an innocuous bug. Also I think comparing the author to Linus is over the top, Linus was abusive in his language. I’ve provided a comparison of what Linus talked like for comparison.
Here’s the author’s response Here’s an example of linusTotally innocuous, he wasn’t against having those requests hitting his web server but he was against the code they generated. So, nothing would change if they all turned into valid requests.
The hope is that a bot will improve the SEO such that real users read his page down the line. Why did you think we had bots?
Lots of uses to be fair, one is that, that’s true.
Your reply assumes the author said anything of this poses an immediate practical problem, such as increased hosting costs. But I read nothing like that. So I am not sure how you conclude that this bug is not innocuous
When you get a bunch of invalid requests, it’s bad. It’s bad for you, you need to go through and make sure they weren’t sent via a bad link by a major site. It’s bad for scrapers, which is bad for you because you might get marked as a dead site because “//%22https://www.zaxbys.com//%22” was invalid. The author went to pretty decent lengths to explain why it was a problem for them and what steps they needed to follow to figure out why the issue was happening. I could go on and on about why it is a burden to have a business whose job is to connect you with potential viewers, and instead all they do is spam your page with very clearly invalid requests to the point where a significant percentage of your requests are invalid requests.
He looked at the stats and saw some ugly data. If this is what he calls a practical problem I truly envy him for his problems.
He has a blog, he wants viewers to see his blog. Bot is taking up traffic and is not providing value. Seems like a pretty reasonable practical problem to me. Just because you have bigger problems doesn’t mean his problem isn’t a real problem.
Is he being DoSed? This is missing from the article. The numbers don’t suggest this. I still don’t see the problem at all. I see somebody who gets worked up over a bug they’re not really affected by.
You do have to pay for traffic, so it’s convenient when that traffic is purposeful in any way. You’re going through a lot of effort to defend a company who has probably abandoned this product 2 years ago.
You’ve found a bug and reported it - there is no need to call them ‘incompetent’.
As far as I can determine, that particular bug has existed since November 2017. I’m a bit surprised that no one working there (or for) has found it in a year and a half. It appears to be a pretty basic parsing bug.
Whatever the bug is, if somebody put me on public display like that I would still block them wherever possible. I assume that’s what’s happening already.
Following a URL is the primary purpose of a bot, yet that feature is completely broken in a way which would have been obvious had they tested the bot on any web site.
Whether you want to call that incompetence or not is up to you, but, well, it’s not a good look.
If you find a bug and want it to get fixed, do you think that insulting maintainers is going to be helpful? I don’t think so.
And how do you know that? Maybe there is something special with the site/content in question that exposes bug that is not affecting 99% of other websites? Instead of assuming worst possible interpretation it might be better to just report the issue without judging project maintainers :)
It can be helpful in making sure that nobody uses it. Not everyone has a vested interest in getting every bug fixed. Sometimes the best solution is encouraging people to use other software and not something completely broken. The article was pretty clear about the bug in question, and it would affect really any page with comments or discussion.
I’ve seen pretty obvious bugs in code written by very competent programmers. These things slip through. It’s a bit like shouting “YOU ARE SUCH A FUCKING IDIOT” at a friend because he calls a wrong number.
I agree, these things don’t make you look particularly good, but they do happen to anyone, and calling someone incompetent over something like this is just being a dick. The bad thing is that it hits the insecure people disproportionally hard. If you get an aggressive reply, then maybe it’s time to act like a dick.
Definitely. The wording in this blog post is really too pedantic.
I don’t think this comes across as well as you think it does. The incompetency comments are uncalled for.
This “Majestic 12” bot actually looks like an interesting project. According to https://www.majestic12.co.uk/ it was started around 2005 as a distributed search engine. From what I can tell it’s more of a “distributed crawler” that feeds info to a centralized search/analytics tool at https://majestic.com. However, it’s cool that there’s still a little community of people running nodes for the project, even if it’s awful software.
The content of the post is just programming sucks and web crawlers authors don’t look their mail. Nothing like a real novelty. Don’t know the reason of the upvotes.
From the crawler’s download page
The forum link is a list of bugs, the most recent of which was added in December of 2018. I don’t see this issue reported in the way they’re asking. Perhaps that would be a first step before calling them incompetent?
The useragent MJ12Bot uses links to this page. I cannot find a link to the forum anywhere on that page, but it does give an email address to contact them.
But I did get another email reply from them—the person who handles questions about the MJ12Bot is out of the office for the day.
People are getting pretty worked up about calling someone incompetent, and hey I get it. However when 77% of requests being made fail, they may have no real incentive to provide working software. At that point it may be effective to discourage people from using or seeking out that product, and calling the manufacturer of that product incompetent is a potentially effective move towards that end.
Another way to look at this is the fact that there is a lot of opportunity for stuff out there if the market is tolerating this much error.
Before I even clicked the link, “I bet it’s mj12bot.” Is this really a bug? I just assumed it was an attack and blocked their network. They do this all day, every day, in parallel:
“GET /product-category/collections/[deletia]/page/3/?remove_from_wishlist=61345&add_to_wishlist=60862 HTTP/1.1”
That with different product IDs. Just beating the crap out of one particular Woo’ site I host. Is that really REALLY just a shitty bot and not an attack?
I find the name Majestic-12 to be quite amusing. I guess instead of
robots.txt
, you could always send JC Denton on the case.I always assumed these must be coming from actual links generated by other garbage websites (e.g. amateurish SEO content mills that scrape other sites).
It’s hard to believe that a crawler would have such a basic parsing deficiency. HTML is full of subtle gotchas, but quoted attributes are pretty obvious. They must be fetching millions of pages a day, and they’d have to notice a bug after crawling the first couple of pages.
Hmm I’ve seen worse mistakes in bigger businesses. They don’t “have” to notice anything. Issues that regularly happen and have happened since you’ve joined a business often get seen as “Well if I fix it and anything else goes wrong, I’ll be fired for it.”. Sometimes
if it ain't broke don't fix it
, meansdon't fix any issue unless I specifically tell you to
.But
//%22https://www.zaxbys.com//%22
is not a valid request. Nor is//"https://www.zaxbys.com//"
. And while you can give a URL as a web request, that only works for web proxies, which I’m not running. So even if this is a garbage link from some content mill, it should be rejected.But
http://www.zaxbys.com//%22https://www.zaxbys.com//%22
could be a valid link somewhere. It could happen if something mis-parsed HTML and then tried to make all URLs absolute.http://web.archive.org/web/*/https://lobste.rs/ is totally a valid URL, though.
Probably it was developed in an environment that does not have dependency management. I remember most PHP projects contained no external libs, everything that standard library lacks¹ was developed in-house. Before Composer appeared, there was pear and it simply didn’t work at all. And parsing everything with regexps was common. This example is probably a result of parsing with regexps, when you don’t have easily available “soup”-family library (and such stuff is complex).
So, this is the opposite of js ecosystem and the example of what will happen when you try to run away from npm problems too far.
¹ for added fun, standard library was not enough standard and depended on global
php.ini
and arguments with which php was compiledHow is ’‘incompetent’’ offensive now? Do you people look for new words to be offended by? Is that the entire reason this has received so much attention?
Anyway, I don’t feel sorry for anyone who writes a WWW robot; in particular, this is one of those robots that doesn’t obey robots.txt, which makes them even worse than usual, although I’ve not experienced this parsing flaw, yet all of my HTML uses double quotes for attribute values and whatnot.
These people are cretinous and don’t warrant the kindness the author gave them, yet several of you believe he was insufficiently kind? How absurd.
I’ve personally worked in the business long enough to do dirty and naughty things that the business context forced upon me. I took shortcuts when I would’ve rather fully understood the spec, and have released things knowing that some areas are more untested than others. If someone would call me incompetent because of a bug resulting from that, I’d think of them as an ignorant and assume they live in a bubble where software isn’t built with business needs as the number one priority;)
Also nothing against the author, but their website mentions “bug free” software in multiple places, yet when I checked their github there were a couple of bug fix commits. I personally think that’s completely fine and I’d still at least appreciate the quest for developing bug free software, but if I was as uncharitable in my interpretation of the circumstances that I don’t know about as they were with the bot, I’d call them a liar.
Edit: I don’t find “incompetent” offensive by the way. I find the author’s use of it in this case to be “ignorant”.
Answering the question in the title: “commercial grade” just means you have to pay for it to use it. It doesn’t say anything about the quality. It was a bit of a surprise for me as well, that when I started working as a programmer, the quality of code I saw wasn’t particularly good in comparison to the open-source programs I saw before.
Seems the latest version of that bot is v1.4.8 from April 2017
For the same reason “military grade” isn’t a compliment except among marketers.
People always use the truism “You get what you pay for” without understanding it: If you pay for absolute shit, you get it, regardless of how much it cost. Price and quality are at least loosely correlated in some industries, but software isn’t one of them and it never has been.