1. 131

Hi all,

I put together an experimental search engine for developers: quickref.

Features:

  • index is a community-curated subset of the web
  • can filter between docs/forums/QAs/blogs/repos
  • explicit by default
  • custom helper cards and bangs

I am not sure this is a viable thing, curious to see if people find it valuable.

  1.  

  2. 32

    There’s a huge vacuum in the search space right now since Google is so broken. This has so much potential!

    Could you tell us a little about how the search engine is implemented?

    1. 20

      Sure!

      I use Bing search services (via MS Azure). Quickref itself is a simple RoR app which loads metadata from https://github.com/freetonik/quickref.dev and talks to Azure, while maintaining privacy and not transferring any client info to MS.

      Bing is pretty good, since I didn’t have to crawl and index the sites myself (the first iteration was built with scrapy + elasticsearch), and the results are generally ok, but there are limits to the number of sites in the index and the number of priority adjustments. So, if this project proofs to be of value, I’d probably need to go back to custom crawl/search setup, hopefully with the help of others. I’ll definitely open source the app.

      1. 4

        Very interesting, I had no idea that Bing provides that kind of service. I think it’s good to use their service for now; for the users of a search engine the value should be in the search results, not in how the results are computed. The results are good as far as I can see, I’m finding lots of things I never knew about!

        If you have some thoughts about it, could you provide some guidance on what sites belong in the index and what is out of scope?

        1. 4

          If you have some thoughts about it, could you provide some guidance on what sites belong in the index and what is out of scope?

          Yeah, I’ll put that info to the github page. In short, I think it’s a good idea to stick to these:

          • official websites and docs
          • community-driven docs
          • private blogs
          • blogs of engineering departments of companies

          The borders are vague though, it’s a bit tricky to define. Like, I feel like dev.to is okay, but something like programiz.com is not…

      2. 14

        There’s a huge vacuum in the search space right now since Google is so broken.

        Absolutely yes. I believe that nowadays Google presents the most popular results and not the most accurate anymore. This is especially hard for programmers because we sometimes search for very narrow and specific things where accuracy actually matters. You can still find your ways around this (I call this search skills) but it seems to get harder recently.

        Maybe it’s time to have more specialized search providers / search engines in the future for different fields or purposes.

        1. 11

          One idea I had floating around my head is to seed a search engine with this human-curated content. These “awesome” lists have been appearing to fill a hole in Google, IMO.

          https://github.com/sindresorhus/awesome – 133 K stars

          https://github.com/vinta/awesome-python – 82K stars

          https://github.com/ziadoz/awesome-php - 24K stars

          https://github.com/aalhour/awesome-compilers – 5K

          Wikipedia is also another obvious source of human-curated content (and one that Google heavily relies on to answer many queries)


          It’s like a distributed Yahoo on top of Github! (if anyone even remembers what Yahoo did originally …!) Human curation rather than algorithmic.

          Basically the theory is that github collaboration prevents spam and is a filter for quality. Spam and dupes are the #1 problem you’ll run into when creating a search engine. And I think the web is full of pages that just BARELY meet the Google threshold of “not spam”. But those pages don’t make it onto these lists…

          So it should be an easier ranking problem in theory. Too bad I don’t have years to work on this :-/

          1. 3

            I mean, that’s how DMOZ and Yahoo worked in the Before Times™.

            1. 2

              If we think of a search engine as a shared view into a projected subset of the web and not a gate keeper, then the search service itself should be a form of collaboration so that groups of people can train/code/filter a specific hyperplane in a high dimensional space.

              There are plenty of niche industries that would be better served by industry specific search services.

              Combine reddit, full text search, rss/federated queries. Basically a community workbench for creating custom search and presentation services.

        2. 7

          I too find it interesting, but am somewhat disappointed that only the metadata is public, while the site itself seems closed source. Do you plan to change that?

          1. 6

            Probably, but there’s not much there anyway. The search engine is Azure’s Bing services, and the app itself is just a simple RoR app.

            1. 4

              Ah, I see. For some reason I was hoping you had been working on your own search-engine implementation, but I guess that’s not someone one should expect from a side project.

              1. 1

                It is a good starting point, and changing the user expectation is already a huge gain, giving more to margin to similar projects to maneuver with later-on.

          2. 6

            Definitely some potential. But the search results will need some prioritization before it truly hit the mark. Searching for “mongoc_client_pool_new” I could not see the documentation page on the first page of result. And searching for “std::vector” came up with a bunch of Stack Overflow links.

            1. 4

              Agreed, some prioritization is needed, maybe even by users themselves (voting?).

              Also, have you tried narrowing the search to “docs” only?

              1. 4

                I had not but I just did. It is better, but still needs some refinement from what I would expect.

                Still, keep up the good work!

                1. 1

                  I didn’t see that narrowing the search was possible at first. In Firefox, your form select { appearance: none } CSS style hides the disclosure triangle, so the “All” part of the search bar doesn’t look at all like a menu. It looks more like a button, which makes it feel janky that a pop-up menu opens after I click it.

                  If you’re going to hide the browser-default disclosure triangle, I suggest adding your own (which can match your site’s theme) so the menu still looks like a menu. Alternatively, if you really want to make the menu look like a button, I suggest giving it a darker background color and making its background color change on hover.

                  1. 1

                    Got it, I’ll work on it, thank you!

                  2. 1

                    Also, have you tried narrowing the search to “docs” only?

                    I do not understand this reasoning. If this is a search engine for programmers, then looking for “std::vector” must give you fist the most official documentation of that class, no matter what.

                2. 4

                  all I want is to be able to search $ & &= >>= <=> etc and get some answers. I realized that’s a big tokenization issue, but overloaded special characters are ruining searchability in a lot of languages.

                  1. 1

                    i think one of the issues is resource cost, which is much greater than that of simple keyword lookup in a tree

                  2. 4

                    Is there some way to make it actually search for what you write? It seems to handle a search for operator< about as well as Google (I.e treating it as a search for operator).

                    1. 2

                      I was wondering this. I suspect that since it depends on bing for the results, it can only deal with what bing considers to be a valid word/token.

                      Sometimes it’s really frustrating trying to find out what some obscure operator does in a language or framework, because you can’t search for something like ?&, and if you try to describe it in words, there are multiple different ways to describe it, many of which won’t appear on pages which document it (e.g. pages will write “The floob operator, ?& floobs things.”, not “The floob operator, question mark ampersand, floobs things.”). Some dedicated language documentation search engines can handle operator searches. e.g. Haskell’s hoogle: https://hoogle.haskell.org/?hoogle=<?>

                    2. 2

                      Very cool. I found some helpful blog posts right away!

                      1. 2

                        Thanks! I’ve been wanting for something like this for a while!

                        Just a small thing: my browser (Firefox 68.8, Xwayland, dark theme) renders <input type="text"> with white text on a black background, so the background-color:#fffdf9; directive on form input makes it unreadable

                        1. 2

                          This is pretty cool! It works better than I would have expected. I just made a pull request for Common Lisp.

                          1. 2

                            Merged! Thank you.

                          2. 2

                            Consider allowing searching via POST, so that the search string is not sent to sites via referrer URL.

                            1. 3

                              I disagree in this instance, only because a linkable result set is extremely helpful. These days we have the Referrer-Policy header to better control what is leaked.

                              1. 2

                                i think the request is to allow it, not making default. you can receive GET and POST at the same address.

                              2. 2

                                Is there a downside of using rel=noreferrer instead? https://html.spec.whatwg.org/multipage/links.html#link-type-noreferrer

                                Some pages that uses POST over GET make it very annoying to go backwards, prompting for a ‘do you want to submit the form again’ kind of alert.

                                1. 2

                                  i think gp was talking about allowing it as an option, not making it default?

                              3. 2

                                I have put it on my topsites on Firefox. If it is decently ok in its results, as it seems for now, I’d continue to use it.

                                Nice go.

                                1. 2

                                  This is so cool! I just tried this for the issue I’m working on now, and it surfaced a relevant github issue I haven’t seen yet!

                                  I think this really fills a need. I’ve been finding myself using !g on DuckDuckGo a lot, since google’s results are a lot more relevant for dev related searches than DDG’s. This looks like a far more privacy-friendly option, with better results too!

                                  The only feedback I have is to pad the search results to appear in the center of the screen on wide monitors like most other search engines do.

                                  1. 1

                                    Thanks! I’ll work on improving the layout soon. Btw, maybe we should also add the !g bang :-)

                                  2. 2

                                    I posted a comment here not too long ago bemoaning the lack of a reasonable tech-centric search engine.

                                    Thank you. I will use this every day.

                                    1. 2

                                      Is it open source?

                                      1. 2

                                        Is the “explicit” vs “implicit” distinction a common one for search engines? I’m not familiar with it. I’m guessing that explicit means “literal” match, implicit means some kind of fuzzy/“guess what I mean” matching?

                                        1. 5

                                          Most search engines assume you’re looking for something common, and thus:

                                          • “Fix” queries with typo & misspelling correction based on string edit distance and term frequency
                                          • “Expand” queries by projecting your terms into some kind of topic space (embedding) and giving you results that score highly on nearby terms

                                          These are great for natural language queries for popular resources, and really terrible for precise, niche queries.

                                          1. 1

                                            Hmm, I haven’t heard those terms being used either in this context. I definitely would find it a little more clear if it used the terminology you mentioned instead.

                                            1. 4

                                              Thanks, I’ve updated the terminology on the frontpage, and it should go live in a few minutes.

                                          2. 1

                                            I will definitely be trying this out, it looks quite handy, and will be very useful for the various “lang + lib” searches I do in a week. I also like that it searches over a curated subset of the web, that’s very handy, since there’s often a lot of noise and distraction on Google and DDG.

                                            I’ll keep notes on using it for a week or so.

                                            If you’re taking suggestions, I’d consider add 4guysfromrolla.com as a resource, they’ve helped me track down some odd issues in the past.

                                            1. [Comment removed by author]

                                              1. 1

                                                Could you please make a pull-request? (https://github.com/freetonik/quickref.dev)

                                              2. 1

                                                With Google and friends, automatic indexation of anything means poor quality of indexation on everything.

                                                Documentation is not of this kind: it is structured and much much used everyday. Search engines are moving toward using more of these structured sources of data (among the first of these: Wikipedia).

                                                Thank you!

                                                1. 1

                                                  –foo excludes word

                                                  I think you mean

                                                  -foo excludes word

                                                  (Something auto-converting hyphen to en dash)

                                                  1. 1

                                                    Yeah, something in my toolchain auto-converts this and it drives me crazy… Fixed, deployed, thanks!

                                                  2. 1

                                                    Would it be a crazy idea to add some basic regular expression support?

                                                    1. 1

                                                      No JS is bad feature. Instead of adding more value (autocomplete, autosuggestion, etc) you treated with small benefits of “no js”. A graceful degradation will be better. More features for normal users, less for adepts on “No Script”.

                                                      1. 3

                                                        That’s probably how I’ll proceed from now on. Autocomplete, keyboard shortcuts, etc via js, but never breaking the core functionality.

                                                      2. 1

                                                        Does it include lobste.rs users’ blogs in the search stack?

                                                        1. 1

                                                          Sorry, what are lobste.rs users’ blogs?

                                                          1. 1

                                                            Blogs written by people on lobste.rs :-) See for example here.

                                                            1. 1

                                                              I’ll try to, gradually :) Thanks!

                                                        2. 1

                                                          Passed the litmus test for searching “createwindow”, and that’s already exceptional for software nowadays.

                                                          I can imagine myself using this when I use e.g. string manipulation in a language I’m not super familiar with.

                                                          1. 1

                                                            This is really cool, besides being useful and a good idea, I’ve been using to see what comes up when I search non-dev related things (such as band names or celebrities) and getting hilarious results

                                                            1. 1

                                                              I’m also slightly terrified of entering stuff like “how to kill children processes” into Google.

                                                            2. 1

                                                              Might be interesting to add devdocs.io docs collection as a source. It has a really large number of docs collected.

                                                                1. 2

                                                                  It doesn’t work though. It only finds the documentation groups, but cannot find symbols in the documentation, e.g. I’d expect https://quickref.dev/search?q=site%3Adevdocs.io+godot+object.connect&type=all to point to https://devdocs.io/godot~3.1/classes/class_object#class-object-method-connect

                                                                  1. 2

                                                                    I could add a bang for devdocs, though. Maybe, !dd ?

                                                                    1. 1

                                                                      Hmm, google yields similar results with site:devdocs.io godot object.connect. Maybe devdocs loads that content dynamically and somehow doesn’t let crawlers to scrap the data… Quickref uses Bing’s index.

                                                                2. 1

                                                                  Neat idea, interested in seeing where this ends up.

                                                                  One big thing that’s missing is issue trackers. A lot of folk knowledge only resides in various issue trackers.

                                                                  1. 1

                                                                    Atm, “repos” filter includes github, gitlab, sr.ht and sourceforge (https://github.com/freetonik/quickref.dev/blob/master/sites/repositories.txt). Are there other issue trackers I should add?

                                                                    1. 2

                                                                      Oops, sorry. I guess the issue is then “it didn’t order search results in way I expected”, then.

                                                                      I maintain somewhat obscure logging library named eliot, but searches for eliot logging python pandas do not return the issue in the Eliot issue tracker covering Pandas support. Plausibly too hard to get this right in a general search engine, so not sure it’s a flaw or just a hard-to-fix edge case.

                                                                      Apache’s JIRA might be useful.

                                                                      1. 1

                                                                        Ah, seeing that it’s driven by Bing, my guess is Bing (and definitely Google) considers individual GitHub issues in less popular repos to be not worth indexing.