1. 53
  1.  

  2. 16

    My biggest complaint about anti-spam systems like Gmail’s is that they offer no indication that your message was quarantined or dropped, so you’re just left wondering if your message ever got anywhere. I don’t buy the argument that telling spammers if their messages were rejected would benefit them, as if they will suddenly get smart and be able to work around the filters. That’s basically security through obscurity.

    1. 16

      This is not about spam. Combating spam is not difficult, especially for Google’s Gmail team, and they’ve been doing a wonderful job up until someone in management decided that they need to use the anti-spam system to additionally wipe out competition and further “encourage” more people to switch to gmail (and in the case of the OP author, they were successful).

      This is about capturing more users and getting more people to switch to Google services like Gmail.

      The correct response is to do the opposite.

      Google’s use of spam filtering for purposes other than spam filtering is both anti-competitive and willfully deceptive, and therefore constitutes fraud. A web host that’s willing to disrupt millions of conversations and use deception to get its way is not a host I want to get in bed with or give up my autonomy to, so I will continue to self-host my email.

      1. 8

        That sounds like a bit of a stretch to me. There will always be the Hotmails and big corporations of the world that will never switch to Google for e-mail. I think it’s just a case of Google not caring about the little guys anymore than it being actual malice. Granted, not caring about the little guys on the internet when you have the size and weight of Google is dangerous but I don’t think it’s a hidden agenda.

        1. 3

          I think it’s just a case of Google not caring about the little guys anymore than it being actual malice. Granted, not caring about the little guys on the internet when you have the size and weight of Google is dangerous but I don’t think it’s their hidden agenda.

          This is fraud because it fits the definition of fraud:

          • a: intentional perversion of truth in order to induce another to part with something of value or to surrender a legal right
          • b: an act of deceiving or misrepresenting

          Google is deceiving its users and misrepresenting their offering:

          1. It tells users that they should use Gmail because it will do a 99.9% accurate job of filtering their email for spam.
          2. Everything we know about Google’s technical expertise and their unique access to millions of inboxes indicates that there is no reason they should not live up to the claim of 99.9% perfect spam filtering.
          3. And yet there is plenty of evidence to the contrary:
            • As OP shows, Google marks newly setup email servers as spam even though they correctly implement Internet anti-spam standards.
            • They have a 30% false positive rate according to Linus Torvals and we hear similar horror stories from others.
          4. Google has been warned about this repeatedly for a long time, and yet it chooses to do nothing about the problem.

          TLDR: All signs point to Google knowingly and intentionally marking email as spam that it knows it shouldn’t. This is fraud, and fraud that causes real damages to Gmail users. Who knows how many critically urgent emails went missing, how many deals failed to close, how many opportunities were lost. etc.

          1. 7

            Google only claims 99.9% total accuracy in classifying spam vs. non-spam, not that Linus Torvalds specifically will have his own personal inbox filtered with 99.9% accuracy. I wouldn’t be surprised if they meet their stated accuracy across all of Gmail. What you link is them reporting the results of a study which claims to have found just that.

            It’s also quite unlikely to be “fraud”; if it were anything illegal, it’d be false advertising, but: 1) they don’t advertise 99.9% accuracy as any kind of SLA, it’s just a report of what they currently find their accuracy to be; and 2) my guess is the claim is actually correct.

            1. 2

              Google only claims 99.9% total accuracy in classifying spam vs. non-spam, not that Linus Torvalds specifically will have his own personal inbox filtered with 99.9% accuracy.

              The claim is 99.9% accuracy. The word “total” was not there, nor was there even any fine print in that article.

              Regardless, even if they didn’t give a specific number, the intentional marking of legitimate messages as spam for personal gain is fraud. It is a lemon of a product that’s causing actual harm. Users expect spam filtering. Not legit-email-filtering.

              If it were a mistake then it can be called false advertising or whatever. If it’s intentional it’s fraud.

        2. 6

          I’m willing to believe this is a purely anti-competitive move on Google’s part, and certainly it does have that effect on new email services. But I’m not convinced that’s the motivation. I think its just easier (and frequently correct) to assume that a new email host we’ve never heard of is a guilty spammer, until proven innocent. Any email service that actually caused Google to be concerned about competition would also be big enough that they would be forced to accept their mail.

          1. 3

            I think its just easier (and frequently correct) to assume that a new email host we’ve never heard of is a guilty spammer, until proven innocent.

            On the claim that it is “easier”:

            • From a purely technical point of view, it is easier to not add any special code for keeping track of whether an email host is new or not.

            On the claim that it is “frequently correct”:

            • By no metric that I’m aware of is it “correct” to have a policy that directly conflicts with the idea of the Internet itself. New nodes are the norm, not the exception. The more nodes there are, the healthier and more decentralized the Internet is, which is as it should be.
            • Spam filtering could be done entirely on the basis of the email content (EDIT: yes, you can incorporate user input as well to help). Do you only have conversations in real life with people you’ve only known for the first 2 years of your life? Of course not. New people come into your life all the time. It is mostly the content of the message that determines whether or not it is spam. Google’s Deep Learning tech and access to millions of gmail accounts is more than enough to get a near-perfect spam filtering score.
            • EDIT: And we do have a standardized definition of “correct”: the Internet standards that are mentioned in the blog post: SPF, DKIM, and DMARC, which his server correctly implemented.

            Any email service that actually caused Google to be concerned about competition would also be big enough that they would be forced to accept their mail.

            This sort of thinking can lead to total centralization, even if it appears there are a handful of email hosts. That’s anti-competitive, as it prevents new entrants. The Internet is literally an escape from centralization. I know of no one who wants a future where everyone on the planet has their email hosted by a handful of mega corps who have total and complete access and control to the world’s email.

            1. 9

              Your comments indicate to me that you haven’t actually worked on spam filtering. It’s a terribly difficult problem, and the open internet is ridiculously adversarial. Spam filtering on content alone is insanely hard. SPF and DKIM aren’t actually that helpful. And most importantly, the single most useful signal of illegitimacy is domain age. This goes for email and malicious websites. If the author had let his domain sit around for 90 days or so, that would have done more to avoid blacklisting than anything else he could do.

              You assume that, for some reason, new servers popping up is an infrequent event. For every legitimate email server that stands up, thousands if not tens of thousands of spam servers appear, send mail, and disappear. Whether you like it or not, it’s not feasible to successfully identify every tiny yet legitimate domain in the deluge of spam email bullshit. Email sucks big time, any admin who deals with that crap has my utmost respect and deepest sympathy.

              Furthermore, some IP blocks are more trustworthy than others. When I set up an email server, my provider blocked connections on port 25 unless you had a discussion with tech support. Once I had a lengthly and cordial discussion with them on why I wanted to use port 25, they opened it for my machine. Every email I sent made it through spam filters, even asinine emails like “test” with no subject. That port 25 policy helps make my provider’s IP blocks trustworthy, as they likely have a good reputation with regards to spam. EC2 on the other hand, just rate limits port 25 by default. Thus, the reputation of EC2 IP blocks is extremely low.

              I appreciate your ideals about an open internet, but you have to be realistic.

              1. 1

                I appreciate your ideals about an open internet, but you have to be realistic.

                I am being realistic. I ask you to be realistic about the consequences that follow from using spam as a (very poor) excuse to turn the Internet into one giant population control mechanism.

                Even if spam really were such an intractable problem, I would prefer having to manually sort every piece of spam than living a life where I cannot communicate with my friends and acquaintances, or for the rare times I succeed my conversations are monitored for the purpose of controlling me and everyone I love.

                So, “full stop” as they say.

                But, back to the technicals of spam:

                And most importantly, the single most useful signal of illegitimacy is domain age.

                Bullshit.

                As I pointed out, gmail had no problem with classifying spam prior to their recent behavior of blocking small mail servers.

                I do not have any problem making split second decisions on whether or not a piece of email is spam just by looking at it. I do not use domain age as a classifier. It is totally unnecessary information. I could build a DL net to classify spam easily. Google seems to have already done this, but in addition to that they intentionally mark legit email as spam.

                Again: this is not about spam.

                1. 1

                  I expect better of Lobsters. If I were looking for downvotes without cogent reasoning I’d be having this conversation on Hacker News. :-\

                  Downvote if you feel I’m wrong, fine, but reply with logical arguments.

                  1. 1

                    +2, -2 off-topic — …. Someone upvoted the GP so it’s no longer grey, maybe that’s the confusion?

      2. 13

        This isn’t how the internet is supposed to work.

        Sure, but 90% spam is also not how the internet is supposed to work. Email was broken as a medium for a long time before these anti-spam reputation systems were put into place.

        1. 11

          I feel the problem the author faces. Sadly I think going back to Google apps only helps the issue to persist. I commented on this once on HN and more recently directly mentioned the email issue on my last blog post.

          Run your own, be loud about it. Complain that people don’t get your messages but please don’t give up. That will only make the problem harder to solve.

          1. 10

            Using these services is to contribute to the problem itself. It’s usually best to avoid digging your own grave; self host or use a friend’s server instead.

            Decentralized email alternatives:

            If you know of others please let us know! I’m always interested in hearing about such projects.

            1. 4

              It is nice to see these projects, because setting up email properly is intimidating (even for people who routinely setup lots of other types of servers) and the behavior of “silent dropping” that Google (and many others) do makes it even harder to feel confident about it. An additional problem is lots of ISPs don’t like email being run by their customers. Heck, a lot of hosting providers too – EC2 requires you apply to be on a whitelist.

            2. 9

              I ban at least a /24 a day for sending me spam. So far these bans have been permanent since nobody has complained, but I do sometimes wonder at what point somebody will inherit these addresses.

              (It’s hard for me to reconcile “IPv4 is exhausted” with the spam I get. solarpanels.science is apparently able to burn through thousands of addresses per week for what must be a pretty low margin business.)

              1. 7

                Email is definitely one of the last standing bastions of his the internet was supposed to work. The other being HTTP, I suppose.

                1. 6

                  Maybe I got lucky, but I’ve had no problems like this. I set up my own email server about a year ago, which I still use as my primary email. I correspond with quite a few people at companies, Gmail, Office365, etc., and it seems to be successful (in that I’m getting replies, anyway). The only deliverability problem I had was due actually to SPF: I had previously set up a hard-fail SPF, which caused my mail to bounce from some people who forward their mail (that SPF breaks forwarding is a known problem). Since dialing back the SPF to soft-fail I haven’t had trouble.

                  1. 2

                    At least one of the stats quoted in this piece is stale. Gmail now has over 900 million active users according to http://www.usatoday.com/story/tech/2015/05/28/google-inbox-gmail-900-million-users/28016983/