1. 16
    The world needs more search engines privacy web 0x65.dev
  1.  

  2. 11

    Heh, the Steven Black adblock host file includes cliqz.com, an attempt to remove it was denied: https://github.com/mitchellkrogza/Badd-Boyz-Hosts/issues/34

    I flagged this as spam. Even though the premise of the article is correct (we need more search engines), it’s clearly an advertisement. (the title of the article on the webpage is “The world needs Cliqz. The world needs more search engines.”)

    1. 5

      Aren’t you saying, then, that the only people who can post something titled “the world needs more search engines” are those who do nothing about it?

      1. 6

        That’s a rather reductive argument given that the title of the article includes “The world needs Cliqz”. If we allow anyone and everyone who says they should use their product, we will be inundated with spam.

        1. 4

          And if we allow none, then people who work on something because they consider it important can’t post about it.

          The way to avoid these undesirable consequences is IMO to not use such tests, ie. to not decide whether a posting is permissible based on who posts, but rather to consider its content.

          1. 7

            We should allow no advertisements. There’s a difference between a product announcement, and a product advertisement. The arguments you’re making here are hyperbolic.

            1. -1

              I have good (offline) reason believe that at least some of the people at Cliqz work on Cliqz because they believe in that headline and the article, so my argument is directly relevant to the case at hand.

              1. 4

                Many people who advertise believe in what they’re advertising, that doesn’t make it not advertising. It makes it honest advertising, but it’s still advertising. For the record I didn’t (yet) mark it spam, but I felt that your argument that it wasn’t spam was also lacking.

                1. 1

                  I assume s/advertiting/marketing/g

                  If I understand you correctly, then I personally could post something about some random hack on Lobsters, and I could post about deep technical details of my work, but I could not post a user-level or api-level description of the thing I’m working on, because that would be marketing and marketing is impermissible. Other people, who know less about it, could however post such a description. Do I understand you correctly?

                  If so, then I feel that this is an unfortunate affordance. Fora such as lobste.rs have a tendency to boost people who spend more time reading and posting stuff on the internet, compared to people who spend more time in an editor or IDE. Lobste.rs isn’t particularly bad (certainly not compared to horrors like twitter) but I still feel that the tendency is an unfortunate one, and any rule or affordance that strengthens the tendency is bad. IMO.

                  1. 1

                    Again you’re failing to consider the tone of the article. If your overarching goal with the written piece is to sell me something, I don’t want to read it. If you are trying to describe something, say a user-level or api-level description that is fine. Saying the world needs my product is needless self aggrandizing, and in my mind drives me to question the rigor of the article, if any. I would not mind if the author said, “Here is my product, here is why I think it is valuable, here’s what I think it does better than the competition”, however if instead they said “My product is the best, here’s why you should use it” it has a tone of a sales pitch, and really undermines the value as an article.

                    I know it sounds like I’m splitting hairs here but tone matters.

            2. 5

              I’d be okay with an established community member submitting an article they wrote (under the show tag). I object to advertising that comes from someone with no other post history (“I joined to sell you my product”), but “I hang out here and this is what I work on” is fine by me.

      2. 9

        Maybe helpful: If you care about your privacy, go to myactivity.google.com/ and check out what information is saved. (It’s really a lot, and very fine-grained).

        The main things are search history, location history, youtube view and search query history. Since I don’t really need any services that rely on that stuff, I deleted all of the history, and turned off future saving of that info.

        I have my own wiki where I save information and I find that categorizing is a better method of retention than letting the search engine find the same thing over and over again. The wiki also has a “grep” feature which is just as fast and effective as Google for many use cases.

        1. 4

          The article misrepresents both Qwant and DuckDuckGo as only providing results from Bing. Apparently Qwant used Bing to bootstrap, but now uses its own crawler. DuckDuckGo aggregates results from many sources, not just Bing.

          1. 4

            I’m “pleasantly” surprised that google is at only 93% market share. I assumed it would be higher. Although I suspect if one were to exclude China, google would be 98+%.

            1. 3

              Good effort. I’ll give it a try while backing up with ddg. The interface is clean.

              1. 7

                Have you actually tried it? This Cliqz thing appears to have a smaller index than www.Gigablast.com, which is not only 100% independent and does not depend on the big 4 — Google, Bing, Yandex or Baidu — but is actually OSS on GitHub, and, as mentioned, appears to have a much bigger index than this Cliqz thingy.

                Also, Cliqz is broken for me, because they show the whole UI in a GeoIP-based language which I don’t understand, without any way to switch to English. GeoIP is so 1999, BTW.

                1. 3

                  There are seven notable search engines, right? Google, Bing, Yandex, Baidu, Gigablast, Seznam and now Cliqz.

                  I tried a search on all. I searched for reviews of an expensive machine that’s been produced for several decades by a small company with more engineering expertise than SEO skills. There are (at least) two good reviews on the web of that machine, as well as innumerable pages titled “buy [name] here”, and not a few pages with shallow or uninformed opinions. None of the seven search result pages included either of the two well-informed pages. All of the seven search results were almost interchangeable. Notably, the search engine with the biggest index did not find the two needles in the haystack.

                  I’m not sure what this shows. Perhaps that distinguishing well-informed pages from uninformed, shallow or shilling pages is terribly difficult? That doing better than Google is actually really difficult? Or maybe I was just unlucky.

                  BTW, I wondered whether Clicq is regional. They don’t claim to be, but I tested with a search for someone down the street from Cliqz’ main office. The results were a year old. No, Cliqz isn’t regional ;) Good luck to them. It’s a worthwhile task, but not a simple one.

                  1. 1

                    How abour Duckduckgo? Many people use it

                    1. 1

                      DDG outsources most of the search work to Bing.

                      The web page search results generally come from pages crawled by Bingbot. IIRC the “instant answer” part is done by DDG iself, the web page search is all Bing, I have no idea about the image or video searches.

                    2. 1

                      Naver is #1 in South Korea.

                      1. 2

                        Neat, I hadn’t heart of it.

                        FWIW gives worse results than the other seven on my query (which I haven’t described in detail, just in case I want to use it again). The wikipedia page on Naver implies that it’s focused on Hangul/Korean pages and that machine is not built and perhaps not even sold in South Korea, perhaps that’s why.

                2. 4

                  At first I thought this is about Lucene and Xapian… Wikipedia redirects “search engine” to “web search engine”, I guess the battle is lost.

                  The world also needs more search engines, not just web search engines. I think the design space is rather underexplored.

                  1. 2

                    Agreed. Having a better search engine for things like GitHub, where punctuation should explicitly not be ignored, would be great. I’d guess, though, that the market doesn’t really want such a thing as attempts to do this outside of/on top of GitHub haven’t exactly succeeded.

                    1. 1

                      Please let us know such attempts which didn’t exactly succeed.

                      For me I got very impressed by livegrep, but yes, it didn’t take off.

                  2. 4

                    If competition was going to solve the problem, shouldn’t we have seen something happen by now? And now the barrier to entry is even higher.

                    Perhaps instead the Google should be strictly regulated, or outright nationalized, with all code and documentation being put into the public domain, and control of the organization being placed under democratic review.

                    1. 7

                      I kind of don’t want governments to have the power that Google currently has, either. Speaking as somebody whose job for many years was protecting Google’s data.

                      1. 4

                        Governments already have that kind of power - they work right with Google and have the power to coerce it too, a lot of times with very little accountability.

                        What I’m talking about is transparent democracy - everything they do is available for public review and management of the newly basically independent, public company is up for general election, not like appointed by the president.

                        1. 1

                          Regarding the data, kind-of. You’re not wrong, but it’s more complicated than that. It’s kind of late and figuring out how much detail I can go into without describing sensitive business information is not an easy task, so I apologize for not elaborating.

                          I’m in favor of the transparent democracy aspect of this, but that doesn’t make the data-gathering aspects okay.

                      2. 2

                        How would that actually improve the state of the world?

                        1. 6

                          The advertising incentive can be completely eliminated so we the people can manage the public service instead of being a product to be sold.

                      3. 1

                        I like how the search results have a little radial chart of trackers on the page: Screenshot