1. 9
  1.  

  2. 11

    This may be incomplete or wrong: My understanding is that you send a search query to a searX instance, which then sends the query to various conventional search engines (google, bing, etc.), receives the results and passes them on to you.

    It seems to me that hosting your own searX instance offers very little benefit in terms of privacy. If there are only a few searX instances, each with many users, then it acts a bit like a VPN. Google still gets all the search queries, but it can’t identify individual users of the instance. However, if you have your own searX instance which only you use, then google can still build up a picture of an individual (you), because the searX instance is just a simple proxy between you and google.

    Is this correct, or does searX do something more complex under the hood (such as caching or federation with other instances)?

    1. 2

      I don’t know about SearX, but there’s YaCy, which does federate. It’s effectively a distributed search index. As with all approaches not relying on getting results finally from Google or Bing, the search results aren’t of high quality, though.

      1. 2

        It seems to me that hosting your own searX instance offers very little benefit in terms of privacy.

        But it does. It strips, or at least normalizes, any identifying information in the request.

        From the searx faq:

        Searx protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:

        • removal of private data from requests going to search services
        • not forwarding anything from a third party services through search services (e.g. advertisement)
        • removal of private data from requests going to the result pages

        Removing private data means not sending cookies to external search engines and generating a random browser profile for every request. Thus, it does not matter if a public or private instance handles the request, because it is anonymized in both cases. IP addresses will be the IP of the instance. But searx can be configured to use proxy or Tor. Result proxy is supported, too.

        1. 2

          The “generating a random browser profile for every request” part of it is what makes SearX interesting to self-host, because a random browser profile for every request makes it that much harder for the asked engines to track me.

          1. 2

            Exactly, and the fact that it comes from a single IP means that they may attempt to build a ‘user profile’ for that IP address but any number of users could be hidden behind it, so it would definitely be an exercise in futility for google, etc.