but why is not being catch by the filter? I mean. I know the URL is not the same, but… am I supposed to track every single submission to lobsters before posting, just in case the URL is different but points to the same thing?
I’m guessing (the source is on Github, but I’m busy) that it’s a dumb filter, and that capitalization and paths matter. So / and /index would probably be unique. @jcs is better equipped to handle this sort of bug report.
I don’t know that you’re expected to track every submission… And the filter ought to work. Personally, I saw the post and thought “gee, I think I’ve seen this already,” hit the search and presto.
I wouldn’t call it dumb but there’s only so much it can do in sql. There are a ton of possibilities for a trailing “/” in a URL to match to like /index.php, /index.html, /index.shtml, /index.cgi, /index, etc.
Everybody likes random suggestions about how to do it better, right? It would be faster to build and save a canonical url for each story. strip protocol, www, lowercase, etc. and store that in the database as well. Later submissions compute the same and check for it.
Wouldn’t have helped with these two urls, though.
The issue with stripping protocol, www, and strtolower'ing the path (or removing index.html etc) is that there is no guarantee that /OpenCatalog/ and /opencatalog/ are the same, or that https://example.com and http://example.com are serving the same content. In fact, RFC3986 Uniform Resource Identifier (URI): Generic Syntax defines the path component as being case-sensitive.
The “better” way would be to judge not the URL but the content. But then you’re storing the content (or a hash of it), which means fetching the content and thereby opening up lobste.rs' submission form for use as a proxy or ddos attack vector.
Well, that’s what the code is already doing. I’m just pointing out a faster way to do it. And if somebody is silly enough to make /OpenCatalog/ and /opencatalog/ have different content, they don’t deserve to be linked from lobsters. :)
Looking at content would probably work less well. If somebody fixes a typo on their blog post, that doesn’t make it a new link.
Thanks for linking, and stay classy, @jcs
My problem is the “down vote” trigger because “I’ve saw this already, repost!”. I’m not complaining about lobsters the tech, but the community. I understand if something was posted 2 hours ago and somebody posts exactly the same, then yeah, down vote.
Anyway, “internet points”, who cares
I think that right now, lobsters will redirect you to the old post for thirty days if it detects a duplicate. The idea is to consolidate discussion to a single lobsters page. A few months ago, it was easier to do this because lobsters had very low throughput. Now we might get the same number of posts in two hours that we got in a month. Maybe we should consider lowering that time limit.
I think that downvoting on duplicates was originally for downvoting duplicate comments. It might make sense to change the “downvote on duplicate” instead to “flag on duplicate”. I think that consolidating the discussion is useful.
People often do care about internet points, so taking away internet points often feels like punishment. Making this kind of mistake doesn’t really make sense as something to punish.
Older discussion on downvoting duplicates here and here.
Yeah. I know about that algorithm and for me it makes sense. Now in this case were URLs don’t match I think it’s an honest mistake from the part of the person submitting the story.