1. 1

    Cierge utilises reCAPTCHA to ensure magic codes (which expire quickly) are not brute-forceable.

    Is there server side account-based throttling or locking? Relying exclusively on reCAPTCHA means that anyone bypassing it will be able to easily bruteforce the small magic code.

    1. 1

      Bypassing reCAPTCHA doesn’t sound easy. Your first link mentions, at the end of the post, that it doesn’t work anymore. And the second link is based on humans solving reCAPTCHA with an average response time of 10s, which is way too long to brute force. Am I missing something? Anyway, with or without reCAPTCHA, throttling is a must have.

      1. 1

        My point with the first 2 links is to show that every once in a while someone find some way to bypass reCAPTCHA. reCAPTCHA is not provably secure, it’s just security through through a bunch of heuristic. The last link show how with a few bucks you can solves many thousands of reCAPTCHA. 10s might look slow, but you can do them concurrently and if there’s no throttling, the attacker has as many tries as he wants, so he’s bound to win sooner or later.

    1. 3

      AMD claims “zero vulnerability due to AMD architecture differences”, but without any explanation. Could someone enlighten us about this?

      1. 10

        AMD’s inability to generate positive PR from this is really an incredible accomplishment for their fabled PR department.

        1. 7

          The spectre PoC linked elsewhere in this thread works perfectly on my Ryzen 5. From my reading, it sounds like AMD processors aren’t susceptible to userspace reading kernelspace because the cache is in some sense protection-level-aware, but the speculative-execution, cache-timing one-two punch still works.

          1. 4

            From reading the google paper on this it’s not quite true but not quite false. According to google AMD and ARM are vulnerable to a specific limited form of Spectre. They’re not susceptible to Meltdown. The google Spectre PoCs for AMD and ARM aren’t successful in accessing beyond the user’s memory space so it’s thought that while the problem exists in some form it doesn’t lead to compromise as far as we currently know.

            1. 2

              aren’t successful in accessing beyond the user’s memory space so … it doesn’t lead to compromise as far as we currently know.

              Well, no compromise in the sense of breaking virtualization boundaries or OS-level protection boundaries, but still pretty worrying for compromising sandboxes that are entirely in one user’s memory space, like those in browsers.

            2. 4

              I just found this in a Linux kernel commit:

              AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

              1. 4

                Which is a much stronger statement than in the AMD web PR story. Given that it is AMD, I would not be surprised if their design does not have the problem but their PR is unable to make that clear.

              2. 2

                AMD is not vulnerable to Meltdown, an Intel-specific attack.

                AMD (and ARM, and essentially anything with a speculative execution engine on the planet) is vulnerable to Spectre.

              1. 4

                The provided link leads to a 404. Maybe replace with https://github.com/spencertipping/jit-tutorial ?

                1. 2

                  Yep, thanks!

                  1. 1

                    thanks !

                  1. 3

                    This gets close to a thing I enjoy in Swift:

                    struct Vec2 {
                        var x: Float
                        var y: Float
                    }
                    
                    var pos = Vec2(x: 42, y: -42)
                    

                    Free memberwise initializer for structs. You can also provide default values by appending = val to the members. The @dataclass proposal looks closer still.

                    You don’t get free comparison operators (but it’s getting there) so this is nicer in that regard.

                    1. 3

                      Yup I think this is basically “value types” for Python! I wanted this a long time ago!

                      Namedtuple was close but the syntax is a bit awkward.

                      1. 3

                        In C99 you can do this:

                        typedef struct {
                                float x;
                                float y;
                        } Vec2;
                        
                        Vec2 pos_foo = {42, -42};
                        
                        // or using designated initializers (x will be 0 in this case):
                        Vec2 pos_bar = {.y = 42 };
                        
                        // or using compound literals:
                        draw(foo, (Vec2){42, -42});
                        
                        // more with designated initializers (everything else will be 0):
                        enum { FOO, BAR, BAZ, COUNT };
                        Vec2 pos[COUNT] = { [BAR] = {.y = 42} };
                        
                        1. 2

                          It’s funny that statically typed languages like Swift and Go cover this better than a dynamically typed language like Python ;-)

                          1. 1

                            Not sure what this has to do with dynamic vs static typing.

                        1. 1

                          Thanks!!!

                          1. 12

                            I agree that the concentration of data at Slack makes them a very valuable target. But I’m wondering if self-hosting is really safer than using Slack:

                            • If everyone switches from Slack to Mattermost, this would make Mattermost a valuable target too, and then hackers will target Mattermost instances like they target Wordpress instances.
                            • System administration is hard. I would guess that Slack does a better job than most in-house teams at securing their system.
                            • For better or worse, most organizations don’t host on their own hardware anymore. They rent virtual machines (AWS, Google Cloud, Digital Ocean, etc.) or physical machines (OVH, Hetzner, etc.). This makes the hosting provider an extremely valuable target too.

                            What is your opinion on this?

                            1. 11

                              I agree, but there’s many reasons beyond those to run your own service, such as policy reasons.

                              My main complaint about Slack for FOSS projects for example is that Slack is policy-wise built around being a corporate chat (which has implications to privacy policy etc.).

                              1. 10

                                I very much agree about Slack not being a good fit for FOSS projects.

                                1. 1

                                  Is there a service that works much like Slack but with a default-public design intent? Is Gitter the go-to for this kind of thing?

                                  1. 8

                                    Gitter has become better in that regard (especially with moderation tooling).

                                    Discord, Matrix and (almost) Zulip are options, but all with different drawbacks. Zulip has the drawback that moderation features are currently not in the hosted offering. Discord seems to lead the pack when it comes to moderation. I’m obviously not a full-time user of all of them.

                                    As much as I dislike IRC, IRC as practiced has a very good model for FOSS: many channels, optional logging and clients geared towards being “AFK by default”.

                                    Sadly, there’s almost no chat software built around the needs for open communities.

                                    1. 2

                                      We’ve used Gitter and Slack for various OSS projects (both ours and others). Gitter’s great because it’s so easy for people to join. However, it doesn’t scale well as more people join a channel, because the search is really bad, no threading, scrolling through history is really cumbersome, etc. Also, the mobile app is terrible at notifications.

                                      Slack definitely feels like a closed ecosystem. The workflow of getting an invite, then signing into the Slack client, etc. adds a lot more friction to the process. Plus, switching between Slack chats on the Mac client is SLOW.

                                      1. 1

                                        the search is really bad, no threading, scrolling through history is really cumbersome, etc.

                                        It feels to me you’re looking for email, not a chat.

                                        1. 2

                                          That’s a simple statement to make, but the difference between email and chat is none of these properties.

                                          1. 3

                                            It’s time for forums to make a comeback.

                                            1. 3

                                              They have? Everyone and their dog has a discourse now.

                                            2. 1

                                              I agree, but how would you characterize the difference between email and chat?

                                              1. 1

                                                Temporal and conversational characteristics. Chat is built around real-time exchange of short messages, mail is built around slower discussion with larger, self-contained messages.

                                            3. 1

                                              Not really. We want real-time conversation with people. For example, someone may download Telepresence (one of our OSS projects) and struggle to get it working. Telepresence under the hood does a bunch of networking stuff: deploys a proxy in Kubernetes, sets up a VPN tunnel via kubectl port-forward, etc. So being able to talk to someone with problems in real-time (versus filing a GitHub issue, or email) is extremely helpful to accelerate the debug process.

                                              And then, it would be nice to search through and say oh yes, so so had an issue with Mac OS X and kube 1.7 that was like this … but you can’t really.

                                          2. 1

                                            YES this! I don’t hear this being talked about nearly enough, but I seriously dislike the whole model where individual communities are shuffled off to their own ‘teams’ or ‘rooms’ or whatever. IRC’s channels offer an invitation to collaboration and discovery - NONE of these services offer that, and I don’t understand why so many people are willing to throw the baby out with the bath water like this.

                                      2. 3

                                        That address only the question of external attackers. But can you trust Slack to keep your data private? I think many people are also worried about this.

                                        1. 4

                                          I also feel like we should definitely model “attackers” and “state actors” differently. Withing the law of the state Slack is in, the state can just walk in and ask and get things. No amount of anti-intrusion measures can counter that.

                                          (In Germany, that’s the same, but at least everything happens in the territory I have my lawyers in :) )

                                          1. 2

                                            I agree. It’s a very legitimate concern. For example, if I was working for a defense company, I would definitely not use Slack.

                                            1. 1

                                              We treat slack internally as open to the internet (or assumed to be). No passwords in slack, no secrets of any kind. If the contents of our conversations leaked, it’d maybe be bad, but you have to assume someone will be reading them at some point regardless for potential compliance reasons.

                                            2. 3

                                              Sure, but you have to know a lot more things. WP is easy to spam, because it’s pretty easy to find. Even if 60% of companies started using mattermost, you’d have to find their instance (is it chat.example.com or is it mattermost.example.com or… ?). Plus it’s not that difficult to fend off wide script-kiddie attacks like this. IP rate limiting, regular patches, etc. That’s not very difficult to do with WP or mattermost. The upside is Mattermost being a Go application has a much smaller footprint of attack surface, plus they take security fairly seriously, and release often. WP started with a negative security outlook and PHP only helped the problem.. PHP is like the opposite of safe and sane(it’s getting better, but still).

                                              System Admin is hard, but if you aren’t investing in good people, then chat data is probably the least of your worries.. see Equifax, etc.

                                              I’m quite sure Slack also rents virtual instances, so it’s the same issue.

                                              1. 2

                                                I agree that if a company already self-hosts some services, then it makes sense to also self-host a service like Mattermost, which is easy to install and administrate.

                                                Good point about Slack also renting virtual instances.

                                              2. 2

                                                I know you are not addressing me, but: when you host yourself you can (easily) add additional layers of security. For example put the services behind a VPN.

                                                1. 6

                                                  True, but I don’t believe that perimeter security is very useful with the open networks we have nowadays (see BeyondCorp).

                                              1. 14

                                                Sigh. Here we go.

                                                Why is it so slow?

                                                Not because of MySQL. Actually, MySQL is doing the right thing here.

                                                It’s because the term “commercial” used in my query is common (~300,000 documents contain the term)

                                                Yes, sort of.

                                                and because MySQL is not good at merging indexes

                                                Not relevant.

                                                Basically, MySQL uses the full-text index to lookup for the term “commercial”, and then it does a kind of nested join to lookup for the 300,000 rows and check the condition on id.

                                                Mostly true.

                                                The former is very quick,

                                                Nope.

                                                but the latter is extremely slow.

                                                No, that’s the fast part.

                                                It would be really great for MySQL to be able to combine multiple index using an in memory bitmap.

                                                For full text search? That would be fucking insane. And I do mean insanely dumb, not insanely cool.

                                                Since InnoDB wasn’t written in an intro to algorithms course by a college sophomore, it doesn’t just use a basic inverted index of words -> document ids. That’s a blatantly awful strategy. Sure, looking up the document list for a single word is O(1), but then you still have a list of n documents to deal with. English writing is mostly the same 10,000 words, and each document typically has more than one word, so these per-word document lists are going to be huge. Even with a small dataset of 2 million documents, a word like “commercial” has 300,000 documents! For people with significant data volume, that type of index would be completely worthless.

                                                Bitmap joins are cute but they can’t change that you’ll be joining on 1-5% of all your documents for every single search. 15% in your example.

                                                Why would you even want that sort of join to begin with? When your database says “ur query matches these 5 million docz haha LOL here u go” you do not LOL, you weep with your head in your hands, wondering why anyone let the JS hipsters touch a problem like full text search. Because a huge unsorted pile of 5 million documents isn’t quite the level of specificity your user had in mind, and they will not LOL either.

                                                A good full text search engine is optimized to return the top N most relevant results for a search.

                                                You took a full text search engine that cleverly, carefully, iteratively produces the top N results rather than materializing ~300,000 rows for teh lolz, and forced it to materialize all ~300,000 anyway. And then you wrote a bitchy blog post about how MySQL full text search is slow, when really you just have no idea what you’re doing. Maybe you would have saved some of your “precious hours” if you had read the fucking docs.

                                                Optimizations are applied to certain kinds of FULLTEXT queries against single InnoDB tables. Queries with these characteristics are particularly efficient:

                                                • FULLTEXT queries that sort the matching rows in descending order of score and apply a LIMIT clause to take the top N matching rows. For this optimization to apply, there must be no WHERE clauses and only a single ORDER BY clause in descending order.

                                                It’s almost as if it was written just for you.

                                                And there’s a good reason too. Exactly how the fuck would MySQL optimize that query? If you go through the normal filters first, you can’t use the full text index. If you go through the full text index first, the most relevant documents might not match the filters for a while. Or ever, in this contrived example. MySQL prefers the full text index. Why? Because if you wrote a full text match in your where clause, you probably want to use the fucking full text index.

                                                And if you don’t want the full text index because you know your filters are highly selective, you do this:

                                                select id, match(content) against ('commercial' in boolean mode) as rank
                                                from document 
                                                where id > 10000000
                                                and rank > 0
                                                order by rank desc
                                                limit 50
                                                

                                                This applies the normal filters first, then orders by relevance within that subset of documents. A perfectly legitimate thing to do. It’s a waste to go through the more complex full text index looking for just a few documents. Likewise, it would be downright stupid to join a few documents with ~300,000 document ids when you can just match on the fly. Should MySQL try to apply this optimization using table statistics? Maybe. Since statistics can be incomplete or out of date, and that would fuck you particularly hard in this circumstance, I personally say no.

                                                If your filters are broad, no problem. If you’re smart you’ll still protect yourself from runaway queries by limiting the full text search first in a sub select, and logging anything that hits it. The tricky part is when your filters are categorizing your data into big groups that aren’t perfectly uniform. For example, a comment system provider that supports multiple sites. Filtering where site = 'database-love.org' and match(content) against ('sql' in boolean mode) won’t cause any problems, but when someone searches for sql on cupcake-nation.net, you’re gonna trudge through a ton of stuff before you hit a even single document you want.

                                                A compound key (site_id, content) would be nice, but you can’t make compound keys that cover normal and fulltext indexes in MySQL. Instead you’d need to partition the indexes out to multiple tables, which honestly isn’t a terrible idea to begin with. Full text indexes need maintenance that regular tables don’t, so having them detached can save your ass. Anyway, something like this:

                                                create table group_within_things_fts_index (
                                                    id bigint unsigned primary key,
                                                    content longtext not null,
                                                    fulltext key (content),
                                                    foreign key (id) references things(id) on delete cascade
                                                );
                                                

                                                Naming the primary key FTS_DOC_ID causes InnoDB to use that as the internal document key, saving a little space.

                                                If you don’t have exclusive groups like sites or date ranges, or small groups you can match without the full text index, it kinda sucks. And not only in MySQL. I don’t know of a compact way to combine arbitrary general indexes with full text indexes, while still ordering the index by some notion of text relevancy. I’ve only seen a few systems that handle non-exclusive tagging well, and they’ve all been full custom proprietary stacks operated by big teams.

                                                In conclusion, full text search is hard, don’t be a whiner, especially when you’re wrong.

                                                1. 6

                                                  Since InnoDB wasn’t written in an intro to algorithms course by a college sophomore, it doesn’t just use a basic inverted index of words -> document ids.

                                                  And what do you think InnoDB uses? If you check the source code, you’ll discover that InnoDB full-text search uses an inverted index, like almost every other FTS engines (Lucene, Xapian and PostgreSQL FTS for example). Implementation details may differ, but the general principle remains the same.

                                                  The FTS index is stored in an hidden InnoDB table with the following schema (source):

                                                  CREATE TABLE $FTS_PREFIX_INDEX_[1-6] (
                                                  	word         VARCHAR(FTS_MAX_WORD_LEN),
                                                  	first_doc_id INT NOT NULL,
                                                  	last_doc_id  UNSIGNED NOT NULL,
                                                  	doc_count    UNSIGNED INT NOT NULL,
                                                  	ilist        VARBINARY NOT NULL,
                                                  	UNIQUE CLUSTERED INDEX ON (word, first_doc_id))
                                                  

                                                  The column ilist contains the list of document IDs and word positions where the word appears (source). The list is sorted.

                                                  Bitmap joins are cute but they can’t change that you’ll be joining on 1-5% of all your documents for every single search. 15% in your example.

                                                  Databases that support bitmap joins do that all the time without any issue.

                                                  A good full text search engine is optimized to return the top N most relevant results for a search.

                                                  You are implying that full-text search results should be sorted by relevance, which is a narrow use case. It is quite common to sort search results by date or by price, for example, and most FTS engines support this well.

                                                  You took a full text search engine that cleverly, carefully, iteratively produces the top N results rather than materializing ~300,000 rows for teh lolz, and forced it to materialize all ~300,000 anyway.

                                                  I’m not forcing MySQL to materialize the 300,000 rows. Quite the contrary. My query even includes a limit.

                                                  Maybe you would have saved some of your “precious hours” if you had read the fucking docs.

                                                  Optimizations are applied to certain kinds of FULLTEXT queries against single InnoDB tables. Queries with these characteristics are particularly efficient: […] FULLTEXT queries that sort the matching rows in descending order of score and apply a LIMIT clause to take the top N matching rows. For this optimization to apply, there must be no WHERE clauses and only a single ORDER BY clause in descending order.

                                                  You’d be surprised, by I have read the “fucking docs”, including the paragraph you quoted. It says some kind of queries are “particularly efficient”. It doesn’t say other queries are extremely slow and should be avoided.

                                                  Since we are quoting the docs, and you are cherry-picking the parts that support your point, I’d like to point out the following paragraph that explicitly acknowledges that being unable to use the Index Merge optimization with full-text indexes is a known deficiency of MySQL (source): “The Index Merge optimization algorithm has the following known deficiencies: […] Index Merge is not applicable to full-text indexes”.

                                                  Exactly how the fuck would MySQL optimize that query? If you go through the normal filters first, you can’t use the full text index. If you go through the full text index first, the most relevant documents might not match the filters for a while.

                                                  Your comment is funny because you’re almost describing the raison d’être of in-memory bitmaps, according to Wikipedia: “A bitmap index scan combines expressions on different indexes, thus requiring only one index per column to support all possible queries on a table”.

                                                  MySQL could optimize the query by combining multiple indexes using an in memory bitmap, as I wrote in the article. The technique is explained in PostgreSQL documentation. Even MongoDB is considering it.

                                                  select id, match(content) against ('commercial' in boolean mode) as rank
                                                  from document 
                                                  where id > 10000000
                                                  and rank > 0
                                                  order by rank desc
                                                  limit 50
                                                  

                                                  Your query doesn’t work (Unknown column 'rank' in 'where clause'). You can’t use in the WHERE clause an alias declared in the SELECT clause.

                                                  I can replace rank by the MATCH .. AGAINST condition:

                                                  select id, match(content) against ('commercial' in boolean mode) as rank
                                                  from document 
                                                  where id > 10000000
                                                  and match(content) against ('commercial' in boolean mode) > 0
                                                  order by rank desc
                                                  limit 50
                                                  

                                                  But then I’m back to my original issue: the query is extremely slow because I combine a “full-text” condition with a “normal” condition.

                                                  I don’t know of a compact way to combine arbitrary general indexes with full text indexes, while still ordering the index by some notion of text relevancy.

                                                  In most FTS engines, document selection and document sorting are mostly independent. The posting lists of each term are not sorted by relevance, they are sorted by document IDs. First, the documents are filtered according to the terms in the query, then the result is sorted by relevance, or by something else like the data or the price.

                                                  In conclusion, full text search is hard, don’t be a whiner, especially when you’re wrong.

                                                  FTS is hard, I agree, and as you can see, I tried to do my homework. For the record, I’m not the only one whining about MySQL FTS.

                                                  One last thing: calling people “JS hipsters” or “whiners” and using the word “fuck” five times doesn’t help; technical arguments do.

                                                  1. 1

                                                    Apologies for any typos, it’s pretty late here and I really ought to go to bed.

                                                    And what do you think InnoDB uses? If you check the source code, you’ll discover that InnoDB full-text search uses an inverted index, like almost every other FTS engines (Lucene, Xapian and PostgreSQL FTS for example). Implementation details may differ, but the general principle remains the same.

                                                    Details? Details? The difference between having an index queryable in O(n) time for every operation, and an index actually usable at scale. That’s like saying the implementation details between a CSV file and a B-tree differ, because they’re both row stores, the general principle remains the same.

                                                    Bitmap joins are cute but they can’t change that you’ll be joining on 1-5% of all your documents for every single search. 15% in your example.

                                                    Databases that support bitmap joins do that all the time without any issue.

                                                    Yes, but not for large scale document search. You highlight postgres, so what does it do? I’m ready to be impressed, I’ve been told that this… this is the good stuff!

                                                    Index Scan using documents_pkey on documents
                                                      Index Cond: (id > 1000000)
                                                      Filter: (tsv_precomputed @@ '''commercial'''::tsquery)
                                                    

                                                    Hmmm, that’s weird… oh! Silly me, I forgot to run analyze!

                                                    Seq Scan on documents
                                                      Filter: ((id > 1000000) AND (tsv_precomputed @@ '''commercial'''::tsquery))
                                                    

                                                    Um. Okay, that wasn’t so good.

                                                    With set enable_seqscan=false; postgres uses the index to search the whole table. Now querying for 15% of the database is decently quick, partly because the data scale is small. And then I order by rank and we’re back to miserably slow. Darn.

                                                    A good full text search engine is optimized to return the top N most relevant results for a search.

                                                    You are implying that full-text search results should be sorted by relevance, which is a narrow use case. It is quite common to sort search results by date or by price, for example, and most FTS engines support this well.

                                                    Do they? Well postgres was a bust, but people mostly agree its FTS isn’t that great. I personally think it’s alright if you know how to wrangle FTS, e.g. the ability to embed precomputed text search vectors in the table (like I did above in the tsv_precomputed column) speeds up queries where you can’t rely on the GIN index. But the whole issue where you can’t quickly order by relevance, or anything else, is a legitimate concern. Anyway, onwards to a real FTS engine, Elasticsearch!

                                                    Sorting on a full-text analyzed field can use a lot of memory.

                                                    Yikes. Well I sure hope that this wasn’t what you had in mind, cause it sounds like it’s materializing all the doc ids in memory to sort.

                                                    Hey now though, that was dishonest of me. I know full well we don’t actually want to sort by price or date, we want the top N for a price or date. ES supports this with the top_hits aggregation. Ah, but that’s a metrics aggregation, which doesn’t contribute to index selection. Still, MySQL doesn’t have a built in top N aggregation, so 1 point to ES for… making it easier to do a linear scan of 15% of your documents, I guess. At least ES supports multi-node setups so you can spread your hideously slow algorithm across a ton of CPUs, thus approximating the sensation of understanding algorithmic complexity.

                                                    I’m not forcing MySQL to materialize the 300,000 rows. Quite the contrary. My query even includes a limit.

                                                    Alright, do explain the execution plan you would choose for your query. Ah right, a bitmap join. Which will pull all 300,000 doc ids into a bitmap, perform the join on id relatively quickly, and then return 50 rows in no particular order whatsoever. Very useful. And if you then wanted to order by price or date you sort all the rows, because a bitmap join by nature destroys index ordering.

                                                    [The docs say] some kinds of queries are “particularly efficient”. It doesn’t say other queries are extremely slow and should be avoided.

                                                    Well I think you could have inferred that, since you ran a query violating the performance guidelines and it was extremely slow. However I will grant that the MySQL FTS docs suck. As is often the case with complex and nuanced technologies, pretty much all FTS docs suck. The MySQL ones are particularly bad since they outright lie to you about some things, but that’s just because some of the docs are legacy and haven’t been updated. No one pays enough attention to docs. Nevertheless, I think it’s pretty easy to deduce what’s actually going on if you know a thing or two about databases and full text search.

                                                    Since we are quoting the docs, and you are cherry-picking the parts that support your point, I’d like to point out the following paragraph that explicitly acknowledges that being unable to use the Index Merge optimization with full-text indexes is a known deficiency of MySQL (source): “The Index Merge optimization algorithm has the following known deficiencies: […] Index Merge is not applicable to full-text indexes”.

                                                    True, that does suck. However since your fundamental problem is about excessively large full text search result sets, this is irrelevant.

                                                    Exactly how the fuck would MySQL optimize that query? If you go through the normal filters first, you can’t use the full text index. If you go through the full text index first, the most relevant documents might not match the filters for a while.

                                                    Your comment is funny because you’re almost describing the raison d’être of in-memory bitmaps, according to Wikipedia: “A bitmap index scan combines expressions on different indexes, thus requiring only one index per column to support all possible queries on a table”.

                                                    MySQL could optimize the query by combining multiple indexes using an in memory bitmap, as I wrote in the article. The technique is explained in PostgreSQL documentation. Even MongoDB is considering it.

                                                    When I wrote this paragraph I assumed that we were on the same page that a scan of 15% of your doc ids was not a winning move, and constant factor optimizations won’t solve that algorithmic complexity problem. I saw from your response above, and now from your response here, that I was mistaken. But I do appreciate that you at least skimmed the wikipedia article for bitmap indexes before writing your blog post.

                                                    A bitmap index scan does well when there isn’t any other strategy except suck it up and accept that you’re going to query a significant fraction of the table. For full text search this works great if your data volume is small, and querying significant fractions of your documents table for every user search is acceptable. You mentioned sorting by price, well that’s a great use case for not giving a shit about algorithmic complexity, because most eCommerce sites have a completely negligible SKU cardinality.

                                                    Your query doesn’t work (Unknown column ‘rank’ in ‘where clause’). You can’t use in the WHERE clause an alias declared in the SELECT clause.

                                                    I can replace rank by the MATCH .. AGAINST condition:

                                                    select […]

                                                    But then I’m back to my original issue: the query is extremely slow because I combine a “full-text” condition with a “normal” condition.

                                                    Ah my mistake. I admit I wrote the query in the lobste.rs comment field and didn’t run it. You COULD put the match in the where clause, but you apparently know already that won’t work. I’m mystified how someone who did their homework doesn’t know about subselects:

                                                    select id
                                                    from (
                                                      select id, match(content) against ('commercial' in boolean mode) as rank
                                                      from document 
                                                      where id > 10000000
                                                      order by rank desc
                                                      limit 50
                                                    ) top_ranked
                                                    where rank > 0;
                                                    

                                                    I don’t know of a compact way to combine arbitrary general indexes with full text indexes, while still ordering the index by some notion of text relevancy.

                                                    In most FTS engines, document selection and document sorting are mostly independent. The posting lists of each term are not sorted by relevance, they are sorted by document IDs. First, the documents are filtered according to the terms in the query, then the result is sorted by relevance, or by something else like the data or the price.

                                                    I don’t recall saying I don’t know how rudimentary FTS indexes work, I said I don’t know of a compact way to combine arbitrary general indexes with full text indexes, while still ordering the index by some notion of relevancy. Which you absolutely want to do, because separating document selection and document sorting only works at tiny scale. Making a specialized inverted index for an external key like price is trivial. Making a fast full text index optimized for relevancy is a lot harder, which is why all these full text search engines come with order by relevancy.

                                                    You have a go at sorting your 300,000 documents matching “commercial” post-selection and let me know how that works out for you.

                                                    I tried to do my homework.

                                                    You did, and I’m proud of you. But you didn’t do enough to declare MySQL full text search is a waste of time.

                                                    For the record, I’m not the only one whining about MySQL FTS.

                                                    True. But you could at least bother to do it well. Otherwise, I’m calling your bullshit.

                                                1. 4

                                                  It would have been useful to see the EXPLAIN statement and attempt to rewrite the query to see if it helps with the query planner.

                                                  1. 3

                                                    I agree.. but I can’t shake the feeling that the output would be truncated somehow..

                                                    1. 1

                                                      Exactly. This is why I haven’t included EXPLAIN output in the article. Here is the output: https://lobste.rs/s/p12ocv/don_t_waste_your_time_with_mysql_full_text#c_tyclwd

                                                    2. 1

                                                      Checking the query execution plan is of course one of the first things I did:

                                                      mysql> EXPLAIN SELECT id
                                                          -> FROM document
                                                          -> WHERE match(content) AGAINST ('commercial' IN BOOLEAN MODE)
                                                          -> LIMIT 50;
                                                      +----+-------------+----------+----------+---------------+---------+---------+------+------+-------------+
                                                      | id | select_type | table    | type     | possible_keys | key     | key_len | ref  | rows | Extra       |
                                                      +----+-------------+----------+----------+---------------+---------+---------+------+------+-------------+
                                                      |  1 | SIMPLE      | document | fulltext | content       | content | 0       | NULL |    1 | Using where |
                                                      +----+-------------+----------+----------+---------------+---------+---------+------+------+-------------+
                                                      1 row in set (0.01 sec)
                                                      
                                                      mysql> EXPLAIN SELECT id
                                                          -> FROM document
                                                          -> WHERE match(content) AGAINST ('commercial' IN BOOLEAN MODE)
                                                          -> AND id > 10000000
                                                          -> LIMIT 50;
                                                      +----+-------------+----------+----------+-----------------+---------+---------+------+------+-------------+
                                                      | id | select_type | table    | type     | possible_keys   | key     | key_len | ref  | rows | Extra       |
                                                      +----+-------------+----------+----------+-----------------+---------+---------+------+------+-------------+
                                                      |  1 | SIMPLE      | document | fulltext | PRIMARY,content | content | 0       | NULL |    1 | Using where |
                                                      +----+-------------+----------+----------+-----------------+---------+---------+------+------+-------------+
                                                      1 row in set (0.03 sec)
                                                      

                                                      I didn’t share it in the article because, as you can see, the query is so simple (no joins) that there is nothing relevant in EXPLAIN output.

                                                      As a side note, I observe that EXPLAIN output is a lot more detailed and useful in PostgreSQL than MySQL.

                                                    1. 19

                                                      My favorite example of your point from when I first discovered LISP was CLOS. The typical way to get C programmers to have OOP was to tell them to use horribly-complex C++ or switch to Java/C#. The LISP people just added a library of macros. If they don’t like OOP, don’t use the library. Done. People thought Aspect-Oriented Programming would be cool. Started writing pre-compilers for Java, etc. LISP folks did a library. I like your emphasis on how easy it is to undo such things if it’s just a library versus a language feature. A lot of folks never knew Aspect LISP happened because their language isn’t stuck with it. ;)

                                                      1. 14

                                                        A lot of folks never knew Aspect LISP happened because their language isn’t stuck with it.

                                                        Now that’s what I call a selling point.

                                                        1. 7

                                                          I sometimes get the feeling that some programming communities are very against the sentiment of extending the language in userspace due to reasons of consistency or too much power. I find such sentiments vaguely authoritarian and off-putting, and don’t buy them at all. Users almost always end up extending the language somehow, whether it is giant frameworks/metaprogramming/code generation.

                                                          1. 6

                                                            A difference should be made between extending the vocabulary and extending the grammar:

                                                            • All programmers are okay with extending the vocabulary.
                                                            • But some programmers are reluctant to extend the grammar, not for authoritarian reasons, but because it hinders readability and maintainability.

                                                            Using a framework is not extending the language; it’s extending the vocabulary. Using code generation is not extending the language; it’s translating some source language to some target language.

                                                            In a natural language like English, you sometimes extend the vocabulary, but you rarely extend the grammar, and this is how we can understand each other. When you read an English sentence that contains an unknown word, you can still parse the sentence because you know the grammar. But if you read an English sentence that uses some kind of “syntactic macro for English”, then it will be very hard to understand what’s going go without learning the macro in the first place.

                                                            1. 2

                                                              Users almost always end up extending the language somehow, whether it is giant frameworks/metaprogramming/code generation.

                                                              They might, but that isn’t necessarily a good thing. When I am implementing algorithms (admittedly, “algorithms” aren’t the same thing as “programs”), I find both large and extensible languages to be a distraction: The essence of most algorithms can be expressed using basic data types (integers, sums, products, rarely first-class functions) and control flow constructs (selection, repetition and procedure calls; where by “selection” I mean “pattern matching”, of course). Perhaps the feeling that you need fancy language features (whether built into the language or implemented using metaprogramming) is just a symptom of accidental complexity in either the problem you are solving, or the language you are using, or both.

                                                              1. 2

                                                                Completely agree, you just end up with really baroque methods of metaprogramming if you try to prevent it.

                                                              2. 3

                                                                I think code generators work fine. I don’t really get why people think lisp macros are better than just writing a tool like lex/yacc, they seem strictly less flexible to me.

                                                                1. 12

                                                                  Tools like yacc are “applied once”. In Lisp I can write a function that symbolically differentiates another function and produces another function. Then this higher order function can be differentiated again yielding an even higher order function and so on. You can’t do this with yacc.

                                                                  In fact symbolic differentiation and other computer algebra things are precisely the reason why lisp was invented. It’s in the original Lisp paper.

                                                                  In fact homoiconicity is the only good reason in favor of dynamic typing that I ever found. In most dynamically-typed languages I feel like the author simply didn’t know better. In those languages dynamic typing is only a gun to shoot yourself with. Lisp is the only dynamically-typed language that I found where dynamic typing is truly fundamental and seems to be put to good effect.

                                                                  1. 2

                                                                    Great point about the link between homoiconicity and dynamic typing. Answered a question I was asking myself for a few years.

                                                                    1. 1

                                                                      Interesting point

                                                                    2. 6

                                                                      Lex/Yacc are mostly awful to work with. They complicate the build (even with native support in Make), suck to maintain, and are painful to debug. LLVM/Clang don’t bother using them, and the code is better for it. (Having debugged that stuff, I’m thankful they didn’t use them.) Maybe you can use lex to generate a state machine for you, or you can just do it manually. It’s a one time cost without the unholy mess of getting the proper includes.

                                                                      If your language is small, then lex/yacc is likely more of a burden than any kind of boon. Just write a recursive descent parser and be done with it. It will likely be fast enough and you can still deal with the oddities, and probably with fewer contortions.

                                                                      1. 3

                                                                        Using macros lets me do straight-forward stuff that cleanly integrates into the language and its tooling. I can also use code generators if they’re better suited for the job. I can also build code generators much more easily with macros in a language that’s already an AST. ;)

                                                                        There’s also highly-optimized implementations, formally-verified subsets, existing libraries in sane language, and IDE’s. The overall deal is much better than yacc, etc. Such benefits are how Julia folks built a compiler quickly for a powerful, complex language: it was sugar coating over a LISP (femtolisp). An industrial one might have worked even better. sklogic’s toolkit with DSL’s was pretty interesting, too.

                                                                        1. 1

                                                                          They most likely are less flexible. Just as programming languages and functions are strictly less flexible than assembly.

                                                                        2. 3

                                                                          What always made me smile a little was the fact that Gregor Kiczales, one of the authors of The Art of the Metaobject Protocol (published in 1991, and the best book on OO I’ve ever read), is one of the main contributors to AspectJ.

                                                                          1. 2

                                                                            Oh damn. Didnt know that. I might need a new example out of respect for his MOP work. Or just keep the irony coming. :)

                                                                        1. 5

                                                                          Another paper on this topic, shared on Lobsters two years ago: “Branch Prediction and the Performance of Interpreters - Don’t Trust Folklore”: https://lobste.rs/s/dek4o5/branch_prediction_performance

                                                                          1. 4

                                                                            What…? MIT is basically the same as BSD… so instead of remove the patent section they switch licenses?

                                                                            1. 5

                                                                              They removed the patent grant and switched to MIT. They switched to MIT because the language used in the license implies a patent grant. See my other comment here.

                                                                            1. 6

                                                                              After having removed the controversial patent grant, Facebook replaced BSD with MIT, because some the terms used by the MIT are closer to what defines the right of a patent holder.

                                                                              MIT says: “Permission is hereby granted […] to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software”.

                                                                              35 U.S. Code section 154 says: “Every patent shall contain […] a grant to the patentee, his heirs or assigns, of the right to exclude others from making, using, offering for sale, or selling the invention”.

                                                                              https://en.wikipedia.org/wiki/MIT_License#Comparison_to_other_licenses

                                                                              1. 2

                                                                                It seems a PaaS like Google App Engine fullfills most promises of serverless computing (scalability and not having to deal with machines, storage, etc.) without having to switch to a FaaS architecture completely based on serverless functions.

                                                                                1. 2

                                                                                  This looks so much better than await/yield in JavaScript, Python and C#, because the await keyword is not viral.

                                                                                  1. 26

                                                                                    I ought to be more concerned with where the trend in memory is going rather than where it has been.

                                                                                    Having heard (and sometimes used, whenever I wanted to do something really cool or was just feeling lazy) this argument for my entire 30-year career, I feel the need to debunk it: If you apply this argument habitually, the hardware never catches up! You’re always designing for where it’s going and it never works where it is. And now that people have stopped upgrading to 2X performance every 18 months, and are even upgrading to reduced performance machines (think tablets), that applies more than ever.

                                                                                    1. 12

                                                                                      Reminds me of the endless blog posts when the new Mackbook Pro specs came out.

                                                                                      I’m a pro, so I need at least 32GB of RAM!

                                                                                      I can see how that could make sense if every single application you run (text editor, note-taking app, email client and IRC) is a standalone JavaScript VM. Plus, this quote really gets me:

                                                                                      But advanced development machines of today are the entry level computers of tomorrow. Electron’s dream world would be one where every computer user had a luxurious amount of ram. But I think this world is fast approaching.

                                                                                      So you’re using some faith in a future that you sort of made up (the author doesn’t seem to understand the kinds of trade-offs involved in throwing in more RAM at everything) to justify objectively bad programming practices? That’s evil like littering the street with garbage because some day machines will be able to pick that up

                                                                                      1. 3

                                                                                        objectively bad programming practices

                                                                                        Objectively bad in what way in what situation? The whole premise of the article is that it might be objectively bad when your primary target is current hardware, but be just fine targeting future hardware. I’m not saying the article is right. But “objectively” is not the same thing as “universally.”

                                                                                        1. 4

                                                                                          Fair point.

                                                                                          The author is advocating purposely ignoring an entire dimension of constraints by claiming to be relying on future technology, especially when the burden on delivering such technology is not even on you.

                                                                                          We give Electron a lot of crap because of memory, but that’s just the more expensive one if you’re in the first world. Visual Studio Code is >200MB uncompressed and uses a ton of CPU. I don’t know about Atom, but I heard it’s worse.

                                                                                          Every profession’s “dream world” is one where there’s fewer to none constraints, but that’s no good reason to ignore them.

                                                                                          1. 5

                                                                                            I agree with most of your points regarding the original article. However…

                                                                                            Visual Studio Code is >200MB uncompressed and uses a ton of CPU.

                                                                                            The disk space use is about half that of JetBrains’ WebStorm, which is probably it’s closest competitor. And it does not use a ton of CPU. During very heavy usage, it consumes about 5% CPU on my machine. WebStorm on the other hand would regularly use 50%.

                                                                                            The specific issue you mentioned was unfortunate, but very uncharacteristic. VS Code is one of the best performing pieces of software on my machine.

                                                                                            1. 4

                                                                                              VS Code is one of the best performing pieces of software on my machine.

                                                                                              My understanding is that this is only true because they sunk a lot of time into reimplementing a bunch of the parts of Electron which gave them performance problems. Doesn’t fare so well for the “electron gives you an easy way to produce efficient desktop applications” argument.

                                                                                              1. 2

                                                                                                That may be true. I don’t have any commitment to Electron being terrible for performance or not. I just hate to see a good-performing piece of software misrepresented as a resource hog.

                                                                                                1. 2

                                                                                                  they sunk a lot of time into reimplementing a bunch of the parts of Electron which gave them performance problems

                                                                                                  Which ones? Any source supporting this claim?

                                                                                        2. 8

                                                                                          I’ll double down on what you’re saying about every 18 months where even people creating chips say Moore’s Law is effectively dead either now or soon. They’re doing heterogenous architectures with more threads, HW offloading of specific functions, and so on. This means the developers can’t be lazy anymore expecting the next hardware to cover up their incompetence or apathy.

                                                                                          The OP is right a bit, though, with memory where maybe they can on that as costs will eventually come down for at least what can be produced now. It happens as the equipment/I.P. gets paid off and suppliers bring in shiny, new things.

                                                                                          1. 6

                                                                                            Memory uses battery too, you know…

                                                                                            1. 2

                                                                                              I was just addressing his point. That should be factored in though.

                                                                                        1. 6

                                                                                          This inspired me to dig up a Swiss e-mail service provider I remembered seeing on Hacker News.

                                                                                          Here it is: https://www.migadu.com/en/index.html

                                                                                          I’m not a customer (nor affiliate), but it seems like it could be a good service.

                                                                                          1. 2

                                                                                            I set up Migadu for one of my customer and it was worth it.

                                                                                          1. 2

                                                                                            This is disappointing.

                                                                                            With an automated, zero-cost CA, there are very few legitimate cases for wildcard certificates, and the risks increase with their use.

                                                                                            I don’t understand why LE couldn’t simply allow for higher thresholds on certificate issuance, and instead support certificates that are actually a worthwhile goal: free S/MIME that doesn’t involve suckling at the Comodo teat.

                                                                                            1. 8

                                                                                              The biggest use case for wildcard certs is SaaS. If I have 10,000 SaaS customers with hosted domains like customer.example.com, LE wouldn’t want to issue (and renew!) that many certs. It also may exceed their rate limiter.

                                                                                              1. 3

                                                                                                Yes, this is exactly why I can’t use LE for my business right now.

                                                                                                1. 2

                                                                                                  LE creates SAN certificates, which let you group together multiple domains under one certificate. So you can use LE for a SaaS product like this if you’re clever about automatically grouping domains together. See: https://letsencrypt.org/docs/rate-limits/

                                                                                                  1. 5

                                                                                                    I know that LE can support up to 100 domains in the same certificate with SAN certificates. But I feel like the complexity implied by grouping domains together is not worth the few hundred bucks of a wildcard certificate.

                                                                                                    1. 2

                                                                                                      I’ve not known many companies that want to publish their full customer list so publicly :)

                                                                                                2. 4

                                                                                                  What are the risks for wildcard certificates?

                                                                                                  1. 2

                                                                                                    I do like the option when it’s there. For example when SNI is not available and you are running low on IPs.

                                                                                                    1. 0

                                                                                                      The main concern is phishing.

                                                                                                      If you look at your URL bar and see a green lock next to https://www.paypal.com.mysite.biz/login.php, you’re a lot more likely to log in.

                                                                                                      1. [Comment removed by author]

                                                                                                        1. 3

                                                                                                          I agree. If you can prove you own the domain, shouldn’t you be able to call your domain whatever you want and get a certificate for it?

                                                                                                          So the real risk, it seems to me, is in the way you show that proof. If the CA asks for this proof in a way that’s not secure, that to me would be a problem.

                                                                                                        2. 7

                                                                                                          You may be interested to know that browsers limit wildcard certs to one level deep, for this reason.

                                                                                                          1. 2

                                                                                                            What does this risk have to do with phishing?

                                                                                                            In any event, the CAs aren’t the right place to solve phishing, services like SafeBrowsing are.

                                                                                                        3. 1

                                                                                                          I like supporting wildcards but I do wish they’d dramatically increase the rate limits and decrease the suspension time. Getting banned for a week after a fuckup or bug is nuts.

                                                                                                          1. 1

                                                                                                            Agreed 100%.

                                                                                                        1. 4

                                                                                                          This is a very similar to what Digital Ocean offers. I wonder if Amazon was feeling any kind of threat from companies like DO, or just looking for another lucrative hosting business? I am glad they are taking steps to simplify getting started on AWS. Having taken a class in using AWS, there is a lot of functionality there that can be confusing to navigate.

                                                                                                          1. 2

                                                                                                            Yes, this is very similar, except that Digital Ocean doesn’t provide a built-in firewall (you have to mess with iptables), doesn’t offer something similar to S3 (you have to use S3 which means network latency and increased cost for egress) and can’t manage my database for me (with AWS RDS for MySQL or PostgreSQL). DO will be perfect for my use case when they add this, and I hope they’ll do :-)

                                                                                                            1. 1

                                                                                                              Another area that I’ve found DO falls short, is that they allow only 1 external IP per droplet. Almost every other host sells extra IPs.

                                                                                                              1. 3

                                                                                                                That’s true, but with TLS Server Name Indication being supported by almost all clients in use nowadays, I think this is becoming less important.

                                                                                                                1. 2

                                                                                                                  You’re right, SNI fixed one use-case for multiple IPs, but many others remain. There’s the privacy argument and SEO penalties for starters. My biggest issue right now, though, is that I want to run email for multiple domains on one droplet. However, one IP only gets one PTR record, and some email hosts count a mismatched PTR record as an indication of spam.

                                                                                                                  1. 3

                                                                                                                    I’ve been hosting email for two domains on one VPS with one IP for a while with no problems. It’s true that you should have your mailserver’s forward and reverse DNS match, so for mail purposes it can only really have one hostname if it has one IP. I.e. the same IP can’t be addressed as both mail.example.com and mail.example.org, because one of these will then fail to have a matching PTR. But there’s no requirement that the MX record for a domain point to a server in the same domain. So you can have the MX records for both example.com and example.org point to mail.example.com, which has one IP and a matching PTR record of mail.example.com. Works for outgoing mail as well, as SPF and DKIM also don’t have any same-domain requirement.

                                                                                                                2. 2

                                                                                                                  Can you not use DO’s floating IP service to provide multiple external IPs (I haven’t tried to do so, hence asking)?

                                                                                                                  I’m still working through the Lightsail documentation, but I don’t see any mention of IPv6 support? I do wonder why Amazon hasn’t yet been able to deliver VPC IPv6 support.

                                                                                                                  1. 1

                                                                                                                    Not really. DO gives only one “floating IP” per droplet, so they’re very limited too. Also, the floating IP will only route incoming traffic to the “anchor IP” on the droplet. So you can use it to respond to incoming connections, but you can’t initiate an outgoing connection from it (I think, not 100% sure there). Also, I’m not sure if you can even set the PTR record for the floating IP.

                                                                                                                    I really don’t get why DO doesn’t just allow extra IPs. It’s not a problem at any other host, and lots of their customers have been asking for years [0].

                                                                                                                    I’ve been seriously considering migrating to another provider over this. It’s a shame, because I like everything else about DO.

                                                                                                                    [0] https://www.digitalocean.com/community/questions/can-i-get-additional-ip-addresses-for-a-droplet

                                                                                                                    1. 1

                                                                                                                      Ah, thanks for the info. I assumed (never a good thing!) that it would be possible to add more than one to a droplet.

                                                                                                                      Yes, the single IP per droplet is odd. I can’t help but think it’s intentional, forcing you to use multiple droplets instead of a single droplet hosting services on unique IPs.

                                                                                                            1. 5

                                                                                                              sienote: i think they look a bit like the classic plan9 fonts :)

                                                                                                              1. 6

                                                                                                                Plan 9 fonts were also designed by B&H. Plan uses Lucida Sans Unicode and Lucida Typewriter as default fonts. Lucida Sans Unicode, with some minor alterations was renamed as Lucida Grande, the original system font on OS X, replaced only recently by Helvetica Neue. It’s funny that several people say this reminds them of Plan 9, but not OS X :-).

                                                                                                                However, these fonts are more similar to the Luxi family of fonts (also from B&H) than the Lucida family.

                                                                                                                Personally, I am going to continue programming (in acme, of course) using Lucida Grande (yes, I use a proportional font for programming).

                                                                                                                1. 4

                                                                                                                  What do you like in acme, compared to other editors (vim, Emacs, Atom, Visual Studio Code, Sublime Text…)?

                                                                                                                  1. [Comment removed by author]

                                                                                                                    1. 6

                                                                                                                      Does it have any affordance for keybindings, or is it strictly mouse-driven other than text entry? I’ve always been interested in its plugin model, but haven’t had a sense of how I’d like it given my general dislike of using the mouse.

                                                                                                                      1. 5

                                                                                                                        shameless self promotion

                                                                                                                        If you’re interested in the other Plan 9 editor, sam, there’s an updated version here: http://www.github.com/deadpixi/sam that has scalable font support and extensive support for keybindings.

                                                                                                                        1. [Comment removed by author]

                                                                                                                          1. 3

                                                                                                                            Sadly none that I’m aware of. Sam is much, much simpler than Acmez though, so it’s probably (IMHO) easier to just dive right in to.

                                                                                                                            1. 2

                                                                                                                              I would not say it’s significantly simpler, the command language is the same. It lacks interaction through a file system, so the lack of features could be interpreted as being simpler, I guess.

                                                                                                                              I use sam when I need to edit files on remote computers, but in my opinion the UI model makes it harder to use than acme.

                                                                                                                        2. 4

                                                                                                                          No keybindings outside of basic unix keybindings (C-A, C-E, C-W), sorry.

                                                                                                                        3. 5

                                                                                                                          Not to expressly shit on Acme, but that doesn’t sound like anything editors such as a Emacs or Vim can’t do. Well, it depends on how nicely you want to be able to move tiled windows around. I think transpose-frame does this on Emacs, but it’s not mouse-driven.

                                                                                                                          1. [Comment removed by author]

                                                                                                                            1. 3

                                                                                                                              In Emacs, the only thing you can’t do in that list with the mouse is the first item. All the others are certainly possible, even I’ve bound Mouse4/5 to copy and paste.

                                                                                                                        4. 7

                                                                                                                          Executable text, mutable text (including in win terminal windows), mouse chording, and mouse support in general, structural regexp, integrates well with arbitrary Unix tools, tiled window management, no distracting fluff; no options, no settings, no configuration files, no syntax highlighting, no colors.

                                                                                                                          Acme is by far the most important tool I use. If it were to disappear from the face of the earth, the first thing I would do is reimplement acme. Luckily, it would not take me very long as acme has very few features to implement, it relies on abstractions, not features.

                                                                                                                          A good demo: http://research.swtch.com/acme

                                                                                                                          1. [Comment removed by author]

                                                                                                                            1. 3

                                                                                                                              One of the distinguishing features of Plan 9 software is the rejection of the idea that software alway needs constant development. It’s done, it works, it doesn’t need further development.

                                                                                                                              As someone who has done multiple Go ports to new hardware architectures and operating systems, I would be very unhappy if plan9port would be implemented in Go because I would not be able to use it until I would be finished.

                                                                                                                        5. 3

                                                                                                                          To expand on that, I think macOS uses San Francisco UI nowadays. Helvetica Neue didn’t last long.

                                                                                                                          1. 3

                                                                                                                            Indeed. AFAIK Helvetica Neue was only used by macOS 10.10 - it was replaced with (Apple-designed) San Francisco in 10.11.

                                                                                                                          2. 2

                                                                                                                            It’s funny that several people say this reminds them of Plan 9, but not OS X :-).

                                                                                                                            well, i’ve never really used os x ;)

                                                                                                                          3. 2

                                                                                                                            I loved the classic Plan 9 pelm font. The enormous and curvaceous curly brackets are still a wonder.

                                                                                                                          1. 4

                                                                                                                            Am I reading this accurately?

                                                                                                                            1. They chose OCaml because it is “fast” and “safe”
                                                                                                                            2. But the standard library stack overflows
                                                                                                                            3. And stack traces hard to debug
                                                                                                                            4. So they gladly would trade 10x perf for debug

                                                                                                                            Something does not add up well.

                                                                                                                            1. 5

                                                                                                                              I upvoted this comment tree but I’ll still offer a semi-counter. The language has already established itself to have reasonable performance vs mainstream, Web-oriented languages plus more safety than many. The common gripes about it are a lack of one, good, standard library along with tooling. 2-4 fit the common gripes. These are even expected since it’s a language developed by academics that wrote compilers and do formal verification for use by those types extended for convenience by them and a thriving community. Yet, these main problems remain since… apathy, people not adopting Jane St’s stuff, or outside of original scope? Idk.

                                                                                                                              The starting point isn’t countered by the rest as it applies to just the foundation the language provides. That’s good. It just needs work on library and tooling side to make it more suitable for real-world applications that it wasn’t necessarily designed for. Whereas, high-assurance sector is getting a lot of mileage out of languages like Ocaml by using it where its attributes are strong. Esterel’s report is a nice example with Section 3 being enlightening:

                                                                                                                              http://users.eecs.northwestern.edu/~clk800/rand-test-study/_eruoctdsetiacf/uoctdsetiacf-2009-10-8-12-02-00.pdf

                                                                                                                              1. 4

                                                                                                                                When I read,

                                                                                                                                OCaml was initially introduced for its execution speed and ease of refactoring.
                                                                                                                                

                                                                                                                                I wonder if those characterizations of OCaml were based on their own experience and measurements against a defined target or simply restatement of commonly held beliefs. I’ve certainly read both statements elsewhere, but I rarely read of people or companies running in-house experiments to make such decisions. Google and disk performance and lifespan, yes. Jane Street and OCaml, probably. Here?

                                                                                                                                1. 6

                                                                                                                                  I used OCaml for a proof-of-concept on a large networked application many years ago, but I found unpredictable performance under high load with difficulty resolving the source prohibitive. Concept proved, I rewrote it all in C and lived happily ever after.

                                                                                                                                2. 4

                                                                                                                                  So they gladly would trade 10x perf for debug

                                                                                                                                  The article doesn’t say it would trace a 10x performance decrease in general, only a performance decrease when handling an exception. Here is the article quote:

                                                                                                                                  We would happily accept an order-of-magnitude performance decrease in exception-handling performance if we could have better stack traces instead.

                                                                                                                                  1. 2

                                                                                                                                    Stack overflows are relatively easy to debug even with a truncated stack trace.

                                                                                                                                    (and as /u/ngrilly says, the 10x perf was specifically about exception throwing/catching, not in general)

                                                                                                                                    Also the sad reality of the industry is that the bar for “safe” is ridiculously low.

                                                                                                                                    1. 2

                                                                                                                                      But the standard library stack overflows

                                                                                                                                      The compiler standard library does but it is trivial to write your own and in fact many people do. When you compile your code, it is fast. And I think the reason for the pervasives to be non-tail recursive has to do something with performance in the average case of inputs not overflowing if I remember it correctly.

                                                                                                                                      1. 5

                                                                                                                                        If it’s “trivial” to write, why not include a decent version in the first place?

                                                                                                                                        “Most people rewrite the standard library” is not a very great endorsement of a language.

                                                                                                                                        1. 7

                                                                                                                                          Agreed - that’s exactly the kind of fragmentation of the community that Common Lisp is often criticized for. It means people can’t read each others' code, and integrating is much harder. It seemed perfectly fine for a while, but it’s a core reason that Clojure is far more talked-about today.