This is not ready to be posted :(
You didn’t add anything to suggest it was a draft, and it’s ‘live’ on the internet, so you can’t blame them for posting it here…
Making your own search engine comes with a lot of challenges:
That is why I am working on one
Cloudflare can not block every IPs, and people can spoof user agent, my project is to help people host their own search engine
Too costly at scale, not necessarly for a personnal search engine
I thought about that a few months ago. I came to the conclusion that the only way was to do a hybrid SE: meta search + collaborative.
The meta part uses the API (or any privileged access) of some reference websites (Wikipedia, SO, official websites, …)
The collaborative part is a web browser plug-in that reads any page you visit, build the inverse index and send it to the SE pipelines. The advantage is that you bypass any Cloudflare/captcha because you are a real human. The human is the crawler.
Problem to be solved: privacy. How to anonymize data that reveals your browsing history?
About the PageRank algorithm, let users decide what pages are relevant (through the plug-in) by voting. The plug-in may ask “Is this page relevant according to your terms: “Python” “socket” “hang””?
I have no idea what the result would be. However, I’m sure it’d be pretty fun to run that.
Look into https://aurelieherbelot.net/pears/