1. 6

    I agree with parts. GMail is definitely a trap, but after some tweaking I did get mbsync working fine. It only is broken if I need to restore mails into GMail, and the fault lies with GMail. There is a bit that seems to suggest you need postfix to send, but you only need mstp to relay out via GMail. I haven’t seen a need for a local IMAP server… I like that my emails are stored in flat files because I’ve done some things that really needed that before. I only wish I could go back in time and have chosen Fastmail for email.

    The spam issue is correct. Most of it is either companies relentlessly emailing me and making it hard to subscribe (I particularly want to shame Hanes and Mack Weldon. No one needs almost daily emails from Hanes.), or cold calling marketers, who have taken to ‘this is the last email I will send’ shortly followed by more emails. I basically never see my email shared with partners–and I use unique addresses for each registration. I get a lot of unwanted mail because I am not motivated enough to fight it and it is easier to just delete it in mutt in weekly cleanup sessions, and then those deletes get pushed to GMail. That is one habit I’ve developed–skimming email for urgent things and then handling less urgent stuff in batches. I’m not sure I need linkedin anymore, but they are very chatty. I have to have Facebook, but they are maddening with friend suggestions to my phone and email that just are not even good guesses.

    The problem is that leaving email migrates me farther from sources of information that survive over time. Usenet has only a few active communities, forums are even dying in favor of things like slack and discord that are extremely hard to export data from if it is possible at all–knowledge just disappears now. Scrolling to the top of a slack channel is so very slow, and then they only preserve things for a while. I had an immediate negative response to discord, so I am not authoritative on it other than it never showing in search results. Things don’t get indexed and search engines think they are smarter than me and then may require you to use just the right phrasing to find what you want. It has turned me into a link hoarder.

    I do however have emails and list emails from the early 2000s (it would go back to the 90s, but there was an incident.)… And lists are great at providing web archives.

    I am not very confident that things sent over a Gemini inspired mail system will persist over time or not end up in the same email spam war if at all successful.

    I think really that what the internet is has just evolved over time, and I’m getting old. Just like my music collection stagnated mostly, my tastes in how I use the internet have not kept up with how it has changed. I have respect in Gemini as an enclave for some, but it also has NIH problems. I think the lower stress solution to email is to just do it in batches and not stress over inbox 0. And actually, as some online communities disappear and I am sad, I also do find new stuff that I like or double down on old haunts.

    I guess that’s my rant response to a rant about email.

    1. 5

      I think really that what the internet is has just evolved over time, and I’m getting old. Just like my music collection stagnated mostly, my tastes in how I use the internet have not kept up with how it has changed. I have respect in Gemini as an enclave for some, but it also has NIH problems. I think the lower stress solution to email is to just do it in batches and not stress over inbox 0. And actually, as some online communities disappear and I am sad, I also do find new stuff that I like or double down on old haunts.

      Just want to gently push back a bit on this. I personally think there is WAY too much “Meh” going on in tech right now.

      E-mail has problems as it currently exists mostly due to the way things evolved over time. If someone is actually thinking about making a fresh start and potentially fixing some of these problems, why not support them?

      1. 5

        If someone is actually thinking about making a fresh start and potentially fixing some of these problems, why not support them?

        If the plan is to fix things for everyone, then the effort is doomed to fail because of path dependence and network effects, and I’m therefore uninterested. If the plan is to fix things for a few nerds who are willing to pour a lot of effort into their communication stack, then I’m uninterested because that group doesn’t include me.

        Either way, I’m unaffected. And I think that is, more or less, how most people feel.

        1. 5

          You’re making the assumption that SUCCESS == widespread adoption.

          I would argue that even an attempt at fixing some of the larger issues is a success whether or not it sees wide adoption because even in failure we gain anti-patterns and counter-examples we can use when we enact an actual fix further down the line.

          Tech has a negativity addiction, and I think it’s both unhealthy and counter-productive.

          1. 3

            That’s fair. To be clear, I wish all of these people well, I’m just not holding my breath that I’ll ever actually be able to use any of the stuff they’re building.

          2. 5

            If you are unaffected and uninterested either way, why not just look away and move on? Why make a point of saying you think it’s going to fail or is uninteresting?

          3. 5

            Just want to gently push back a bit on this. I personally think there is WAY too much “Meh” going on in tech right now.

            Too much “meh”? Every other post on every tech news aggregator I’m on is talking about how tech is destroying the world.

            I think the true answer here is that, the mindshare behind Email is gone. Valuable content is being created on places like Slack, Discord, Matrix, Discourse, Zulip, Reddit, and Twitter. From the start, these platforms cut these problems out of the solution space, so most folks don’t have to think about these problems. As @b1twise mentioned

            I think really that what the internet is has just evolved over time, and I’m getting old. Just like my music collection stagnated mostly, my tastes in how I use the internet have not kept up with how it has changed.

            To some extent, I think an effort at “reviving” email for email’s sake is really an attempt to fix a social problem with a technical solution, without meaningfully addressing email’s competitors/descendants. I think https://delta.chat/ takes a more reasonable approach here in solving a specific subset of usecases via Email.

            1. 2

              Respectfully, I reject the assertion that E-mail isn’t worth saving because chat, reddit, and Twitter.

              These are fundamentally different technologies with different characteristics and use case, and I’ll also note that each of the services you cite are:

              • Not decentrailzed
              • Not open source
              • are all, except for Slack, struggling to monetize themselves.

              E-mail is distributed, open standards compliant and not reliant on a single source server platform which will cause the entire thing to vanish if a company goes under.

              Saying “None of the cool kids are using it so it’s irrelevant” is not a particularly compelling argument to me.

              Also, this argument in my mind anyway has VERY little to do with judging the merit of an alternative technology that shares some of E-mails goals but learns from its mistakes.

              1. 3

                I did refer to Matrix, Discourse, and Zulip in the list all of which are open-source, and Matrix is actually decentralized! Discourse does support Email as a first-class way to read and reply to posts, although there’s no way to search through the protocol itself for older messages.

                I didn’t mean to say that E-mail isn’t worth saving. I think it’s the rare example of a decentralized protocol that still has a lot of mindshare. What I mean to say is that, it’s worth taking a focused use-case and using Email as the substrate for that. A lot of decentralized projects like, say ActivityPub, are focused on creating and promoting the actual substrate protocol, instead of creating a “product” for a user to use. It took Mastodon to really create and popularize the experience around ActivityPub (well kinda, the protocol evolution is a bit messy and there’s lots of prior art here before Mastodon) before ActivityPub itself had worth as a communications protocol. I find that Delta Chat actually focuses on a product experience end-to-end and by doing so gives a compelling user story to use Email. Matrix is also focusing on an end-to-end experience and is developing their protocol alongside their product experience so that there’s a useful product for end-users first and a synchronization and update protocol second. Focusing on the networking and the decentralization is cool for us hackers, but it’s not that useful for actually communicating. What I meant to say was that trying to popularize Email as the “next thing to use” is trying to tackle the social problem of “facilitating communication” and apply a technical solution “use Email” as a result, when what’s needed is a focused application atop Email. Hope that makes some more sense.

                1. 2

                  Matrix is both decentralised and open source

                  1. 1

                    It is! However it isn’t easy to maintain long term archives of Matrix traffic that I know of, and also @GrayGnome didn’t cite Matrix in his post :)

                    Matrix is awesome, but it’s not the same kind of animal as Email or a modern Email alternative.

            2. 1

              I recently got a full CLI email client working for gmail with mbsync, fdm and some other stuff. Works brilliantly on macOS. Documented here: https://github.com/elobdog/mailhelp

            1. 4

              Claws Mail offers a great offline-mode with a single button-press. I often use it to sync up in a WLAN, go offline and then reply to all new E-Mails, sort them, etc. while offline (in the train, etc.), going online later, which then syncs everything up neatly.

              Regarding SPF/DKIM/DMARC: To be honest, I didn’t find setting them up particularly difficult compared to setting up postfix and other software on the server itself. When I received my first DMARC-reports I was surprised how many servers were trying to impersonate my addresses. Despite lots of complexity, mail still works, and I never have issues sending E-Mail from my own mail-server to any of the big players. It used to be different a few years ago, but it has become much better.

              Having hurdles like SPF/DKIM/DMARC, not being able to send from a domestic network, etc. is all necessary, because at the end of the day, how else should you be able to distinguish the spammers? In a way, the effort put into these authentication-methods is the price you pay to be able to be accepted as a player in this network.

              I see no way how something could replace E-Mail, even though I can imagine many things how E-Mail could be improved. Why? Because advertising drives the web and its technologies. It’s one reason why Gopher died. Any improving solution would per definition not become successful as it would prevent such advertisement in some way.

              I’m an Inbox 0 Taliban and I’m mortified each time I get a glance at a « normal person’s inbox ». It’s basically a long list of companies (lots of Facebook but also local companies) displayed in a long list where only one mail out of ten has been ever opened.

              We as hackers need to pay the price (=effort) for universal communication (=E-Mail) on our own terms because normies neither understand nor care about its value.

              1. 1

                I didn’t find setting them up particularly difficult compared to setting up postfix and other software on the server itself.

                Heh yeah, configuring mail software really assumes an understanding of mail. Questions like: is this a local recipient, is this a remote recipient, is your mail going through a relay, what’s the retry policy of email, how are bounce addresses configured, will TLS be terminated by the receiving SMTP daemon, are you sending over a UNIX socket/pipe/or TCP, etc. These are all front-and-center with email config, IMO because email just isn’t used by anyone anymore that’s not already very interested in email.

                We as hackers need to pay the price (=effort) for universal communication (=E-Mail) on our own terms because normies neither understand nor care about its value.

                I’d like to push back on this thinking. Email is still fairly ubiquitous, albeit captured by the large platforms, among non-technical folks. Usability and configurability improvements could go a long way to encouraging folks to move more of their communications to Email (though I admit, it’s a losing battle in the face of more featureful platforms that don’t have the baggage of Email to deal with). Moreover I challenge anyone to not view “hackers” and “normies” in a dichotomy; there’s more oddballs and privacy sticklers out there than you’d think. To this day, my mostly technologically illiterate father distrusts the Cloud because he’s afraid of what could happen to his data when it’s not in his hands (even though most emphatically, he’s pretty terrible at managing his digital life).

                I think https://delta.chat is doing a great job at trying to improve some of the pain points around Email for non-technical users.

              1. 4

                I just don’t see the problem here. If we’re talking about fellow tech enthusiasts, IMAP servers are old/boring technology; setting up Dovecot is covered in hundreds of guides online (to the point where even blogspam syndicates the articles!) and really doesn’t change much. Setting up a local Dovecot just doesn’t seem that difficult to me. I’ve also been running mbsync for years now and unlike OP have had no issues grabbing my mail or having weird sync failures. In my mind, the bigger issue with Email is just how much the entire architecture is based around the old mainframe design of having multiple users on a machine and having machines talk to other machines (MTAs and MDAs). It’s a very different paradigm than “today’s” (not even today, PCs have been a thing for 40 years) PC/Mobile form of computing where everyone owns their own individual machine, often without good, persistent network connectivity.

                If you want to use Email in a more async/offline-first fashion, there’s the venerable UUCP which, when paired with ssh, stunnel, or spiped gives you a secure offline-first way of sending and receiving files/data. There’s also the newer (my preference) NNCP project which offers a secure UUCP-like experience with some nice modern conveniences. I’ve even written a silly/extremely basic tool called nmail to send/receive mail over NNCP. I use it to send mail across my machines, and I’m using NNCP pretty extensively right now to fetch Youtube videos and archives of websites, I even have a Telegram bot that lets me send/receive messages over NNCP which I successfully used in a low-internet situation somewhat recently.

                That said, I really appreciate the history narrative around Email. If nothing else, thanks for talking about all of these issues succinctly and in a way which captures the accidental complexity of the whole system. Thanks for the write-up! Excited to see where this project/these thoughts go.

                1. 5

                  This article quotes the main thrust of the article it’s responding to, but it doesn’t address it. I’ll quote it again because it’s so damned important.

                  Then, they return to real life.

                  The point that the original author was making is so critical that it’s likely to go entirely unnoticed by any subsequent conversation. Because it’s not about whether the web is better (nicer or less nice, more boring or more interesting, more powerful or less powerful) with all the advanced doohickeys at our disposal. It’s not even, primarily, about accessibility. The vision it’s proposing is of a web that is less nice, and more boring, so that we don’t live there. We live here, instead, where our bodies and our people are. And even if our people are only accessible via the web, the web itself is as thin a layer as possible over their words (or their video, or audio…). The less designed it is, the less of it there is to get stuck into. And ultimately, we want it not to exist at all. We want it to be so thin and transparent that it melts away.

                  1. 7

                    The vision it’s proposing is of a web that is less nice, and more boring, so that we don’t live there.

                    That’s not a vision that I like, personally. When I was young and all I had was books, I used to devour my local library. While I certainly would dumpster dive and try to play around with discarded computers since my family didn’t have money for much, I spent the vast majority of my intellectual time in the world of books. I wasn’t really an exception either, I knew many other kids my age, and honestly many adults at the time, who also spent their free time with their noses glued to books. Some of these books were high-brow, but many of them were just pulp mystery, science fiction, romance, or thrillers. Generations of my family have loved reading books and reading newspapers. The internet then was just the next step to this consumption and sharing of information, not all of it intellectual or even useful. I may have spent a little more time in the books than was good for me at that age, but almost nobody proposed having books be some place where we “don’t live there” or that books should be “as thin a layer as possible over their content” or “the less designed it is, the less of it there is to get stuck into”.

                    I’m of the opinion that individuals should be able to make their lives revolve around whatever it is they want it to revolve around. Whether that’s code, prose, art, or whatever, it doesn’t matter. Usual caveats about harming others and such apply. I also largely agree with @fouric here. You’re trying to propose a technical solution to what is essentially a social problem.

                    1. 6

                      The vision it’s proposing is of a web that is less nice, and more boring, so that we don’t live there.

                      This is exactly the wrong way to solve this problem.

                      The problem is both content that is designed to be addicting (which can certainly survive even in a world without stylesheets) and users that have been habituated towards addicting habits and lost the discipline to decide what to do with their own time.

                      These are both social problems, not technical ones, and need social(-adjacent) solutions, not technical ones.

                      Crippling the web, which is an incredibly useful and valuable tool, to try (and fail) to fix a social problem (which is independent of the technology) is an incredibly bad idea, somewhat analogous to getting rid of GPUs because of the number of people addicted to video games.

                      Not to mention that it’s far easier (and more probable) to change the content and consumers than it is to get Google and Mozilla and Apple to all remove rich media features that virtually every one of their users will be extremely unhappy about.

                    1. 3
                      1. 2

                        An exception here is something like MicroPython / CircuitPython. I don’t really enjoy writing Python, and it’s too big/limited to be viable for most projects, but for me (nearly all my history with high-level languages, absolutely terrible at C, etc.) the “drop some code on a USB drive and attach to a repl with screen” workflow was kind of a revelation for quickly sketching out ideas.

                        (I worked on CircuitPython stuff for Adafruit a few years ago, but I haven’t kept up with the space at all since moving on to other things. Just revisited it the other day for a Halloween costume and got that “I’d be way less constrained if I wrote Arduino code or whatever for this, but a scripty-feeling language plus repl is so much more pleasant that I’m gonna use it anyway” feeling.)

                        1. 4

                          Yep! And if you’re not keen on python, nodemcu/whitecat can make a lua repl fit in 80kb of ram. It’s like night and day vs “regular” embedded development.

                          1. 2

                            Yeah using a REPL on a device which is cumbersome to interact with is a godsend for productivity. And if memory is really tight, make/use a Forth instead.

                      1. 1

                        If there are any forth experts here, perhaps you can help me past this extremely basic problem I’m having. My gforth system apparently does not know the word [:. net2o uses this word in err.fs, and so I can’t get the project to run.

                        $ cat rep.fs; gforth ./rep.fs -e bye
                        : print-some-warning ( n -- )
                         [: cr ." warning# " . ;] stderr outfile-execute :
                        
                        in file included from *OS command line*:-1
                        rep.fs:2: Undefined word
                         >>>[:<<< cr ." warning# " . ;] stderr outfile-execute :
                        Backtrace:
                        $7FE497110A00 throw 
                        $7FE497126C08 no.extensions 
                        $7FE497114338 compiler-notfound1 
                        

                        What am I missing here? Gforth version 0.7.3, void linux AND gforth 0.7.3 on ubuntu:latest

                        Edit: From what I understand the word [: is a built-in feature of gforth.

                        1. 2

                          FWIW I do know some Forth and got the same issue with gforth 0.7.3 on Ubuntu. I wonder if you need to include some file to get the quoting syntax?

                        1. 6

                          Replace IP and TCP, but also replace the whole web stack. Seems like a lot of projects

                          1. 8

                            It’s the Forth way 😉

                          1. 1

                            So how can I use my smartphone with this? Because there are lots of situations when I want content and I have my phone, but I don’t have a laptop/desktop/RPi/what have you. And how can I share P2P things with others who are non-technical? SSB, DAT, and IPFS have flows that are still somewhat explainable to someone who doesn’t write code as a hobby/living. I dread explaining git to other engineers, let alone lay people…

                            1. 3

                              My take: In “modern” OSs, the abstraction presented by malloc breaks down when you allocate huge amounts of memory. At those scales, you can’t keep pretending memory is free and just comes out of a tap like water. You have to take into account swap space, overcommit, your OS’s naughty-process killer, and such factors.

                              It’s nice that we have this abstraction — I speak as someone who spent decades coding on systems that didn’t have it — it’s just not perfect.

                              1. 1

                                I’d much rather have malloc return NULL, then overcommiting memory, fearing the OOM-Killer, and running something like getFreeMemory(&how_much_memory_can_my_app_waste); in a loop.

                                1. 2

                                  But isn’t this only an issue in a process that allocates “huge” amounts of memory? Where today on a desktop OS “huge” means “tens/hundreds of gigabytes”? If you’re doing that, you can take responsibility for your own backing store by creating a big-enough (and gapless) file, mmap’ing it, then running your own heap allocator in that address space.

                                  (Pre-apologies if I’m being naive; I don’t usually write code that needs more than a few tens of MB.)

                                  1. 2

                                    Basically creating your own swap file. It’s a fun concept, but here’s some things you may have to consider in practice:

                                    • must find a place for the temp file that’s actually on a disk, not an in-memory tmpfs, and it has to be a fast disk with enough space
                                    • because mmap was designed for I/O, not this, it would slow you down by flushing your memory to disk unnecessarily.. but okay, you’ve found the non-standard MAP_NOSYNC flag to turn that off
                                    • now you think you have your region backed by enough disk space – you initialized that memory with something after all – but oh no, the user has filesystem compression! Your initial data fits into the available disk space, but as you’re replacing it with less compressible data (all when you’re out of RAM), it doesn’t fit anymore. It explodes! Do you want your posessions identified? [ynq]
                                    • now if the user has a copy-on-write filesystem like ZFS, and you’re running out of space there… your blocks are not rewritten in-place, so whoops you kinda needed even more free space than you assumed

                                    Oh, and in something like a desktop app, there’s a good chance users will hate you for hogging the disk space :)

                                    1. 1

                                      I don’t really write those big applications, also. But Java (Tomcat), Browsers and other proprietary business apps are memory hogs. And because they are used to malloc pretty much always returning success, they employ various techniques(ugly hacks) to find out how much RAM there really is. Instead of backing off once they hit a malloc error.

                                      Rolling your own allocator, sometimes can be the answer, but most of the time its just dangerous to overwrite your systems malloc (debuggability, bug prone, security risks)

                                      1. 1

                                        But Java (Tomcat), Browsers and other proprietary business apps are memory hogs.

                                        The JVM preallocates heap memory, though direct byte buffers are allocated outside of this heap. Generally this means it’s rare for the JVM to continue allocating. You can also force the JVM to commit the memory so it doesn’t hit a copy-on-write fault. As such it shouldn’t have much of an issue if the system runs out of available memory.

                                      2. 1

                                        That’s exactly what I do in a production DB (single 32TB mmap) and it works very well. It does freak out customers when they run top though.

                                  1. 1

                                    Really wish there was an integration with this an JSON-LD using some vocabulary.

                                    1. 8

                                      FWIW I gave up on this post because it displays nothing in my browser without JS …

                                      1. 1

                                        Yeah I couldn’t grab it into Wallabag.

                                        1. 1

                                          I have JS enabled and it’s still a black screen for me…

                                        1. 8

                                          The README shows usage with Gmail, so reading using IMAP and sending emails using SMTP. Does Himalaya explicitly support working in Maildirs/mboxes or is this mostly an IMAP/SMTP MUA?

                                          1. 9

                                            For now it is only via IMAP/SMTP, but there is an open issue on reading emails from Maildir (https://github.com/soywod/himalaya/issues/43). It is in the pipeline!

                                            1. 1

                                              Any plans on implementing JMAP? The protocol looks nice but it’s suffering from a bootstrapping problem: most mail clients don’t want to support it until servers (read: Dovecot) supports it and it isn’t a priority for servers until clients can show a better experience from using it.

                                              1. 1

                                                Not at the moment, but why not in the future (if the tool develops well). Thanks for the idea (I invite you to open an issue to keep a trace).

                                          1. 4

                                            My problem with RSS/Atom has always been the lack of prioritization. Using RSS is like drinking a firehose. For example, if I’m interested in the “programming” tag RSS of Lobsters, I can fetch https://lobste.rs/t/programming.rss which gives me 25 programming-tagged posts on this site. Because there’s no in-band way to indicate scoring information, I don’t have any of the community ranking features associated with this site. Even if I were to score these myself based on some keyword scorer, NB classifier, or some RNN there’s no way to persist my scoring onto the original RSS feed itself, unless I modify the DOM with a new metadata attribute and then build/modify a reader to respect this new attribute.

                                            Combine one RSS feed like this with something from, say, HackerNews and a couple other tech sites, and I’m looking at hundreds of articles per day with no way of understanding what’s good and what’s bad. For now I’ve been playing around with a workflow where I use a classifier and a lot of human tagging to pick articles I want to read later which then get resyndicated into a feed that Wallabag ingests and adds into its Unread queue, which I then read in my own time. But this setup takes intentionality and maintenance. If there were an in-band way to indicate priority, that would go a long way to fixing RSS workflow issues for me.

                                            1. 4

                                              My first tech job was at a company that had some nice solutions to this. Unfortunately they got bought by Google and all the tech shelved so they could be integrated into the Google Reader team, which then got axed

                                              1. 2

                                                Because there’s no in-band way to indicate scoring information, I don’t have any of the community ranking features associated with this site. Even if I were to score these myself based on some keyword scorer, NB classifier, or some RNN there’s no way to persist my scoring onto the original RSS feed itself, unless I modify the DOM with a new metadata attribute and then build/modify a reader to respect this new attribute.

                                                This feels like something that should be built from the protocol, rather than the format, but it’s very difficult to do in any kind of generic way. First, it’s hard to differentiate between a signal in the client of ‘I’m not interested in this subject’ and ‘this is a bad article and you should feel bad’.

                                                To make use of the first on the server, you need to do some clustering to identify users with similar interests and silo their ratings (people who like the things you like, also like this). This requires tracking users, which is fine for something like lobste.rs because it requires account creation, but if you wanted to embed this in an arbitrary feed protocol then you’d need some kind of user-authentication protocol. That’s fairly tractable if it’s a protocol for aggregators to talk to clients but it’s much harder if you don’t have an aggregator in the middle (and if you do then this means that the feed providers don’t ever get this feedback and the value in the ecosystem is shifted away from the folks who produce the source material).

                                                Making use of the second is much simpler in theory but requires you to have a solution to review spam. If I can create 10,000 accounts and mod-bomb articles that I do / don’t like then I can massively influence. The outcome. Again, lobste.rs does this by having an audit trail for account creation and requiring referral, but that doesn’t easily scale up to a protocol that communicates feedback with arbitrary sites.

                                                Both of these have the problems of providing useful ratings for things with different numbers of readers. A news article about an MP being stabbed to death (to pick something recent from the news) probably had millions of readers, whereas the article from yesterday about the magic_enum C++ library probably had hundreds. Being able to scale these usefully so that people like me still see low-circulation geeky stories in the middle of a news feed is incredibly hard unless you completely isolate different kinds of story. Even within a publication, the number of readers can vary hugely. A typical article in Communications of the ACM gets 10-50K downloads, my most popular one got over 400K (and is definitely not ten times as interesting, I just picked an incredibly clickbaity title). When I was writing for InformIT, I think my least-read article got about 3K downloads in the first month, the most read got around 200K.

                                                I’d love to see something like this that works at scale but I have no idea how you’d even start to design it.

                                                1. 2

                                                  It helps to realize that finding the “perfect” feed, that captures all and only the most relevant content, is not possible. Or that it is even desirable. It’s the same as with movies and books. Each year, more good books are written than I could read in a lifetime. More good movies come out than I could see in a lifetime, even if I were to completely stop reading books. It makes no sense to get frustrated over that. Instead, that is a fact that should make me happy. Because that means that I can fill my all time with reading and watching the good stuff and never worry that it stops. From my personal perspective, the supply of good stuff is endless.

                                                  So if I enjoy reading a book, then I’ll continue reading it. If not, I’ll move to the next one. Who cares if I missed a book that could theoretically have scored 0.5 higher on my personal enjoyment scale on that particular day in my life? I am still enjoying myself, right?

                                                  The same applies to RSS feeds. Like you, I don’t enjoy fire hoses, so Lobsters, Hackernews, Reddit, etc are not part of my list. Same goes for news papers and other high volume sites. Except for the ones that are actually able to curate their feed and send only a limited amount of articles per day. Whenever I read something interesting, I’ll check whether the site has RSS and I add it. Whenever I notice that I skip a lot of articles of a specific feed I’ll remove it. Over time this means that I have grown a list of almost 200 feeds that I can easily process during a relaxing morning coffee. And if there is time left, I’ll go over to the big sites and amuse myself with watching the hype trains.

                                                  1. 2

                                                    Except for the ones that are actually able to curate their feed and send only a limited amount of articles per day. Whenever I read something interesting, I’ll check whether the site has RSS and I add it.

                                                    Right I’ve tried that workflow also, but then RSS just becomes another link aggregator that I have to check that has less breadth than a link aggregator like Lobste.rs or HackerNews. My interest in RSS is to be my “one stop” for news. A lot of social networks saw this issue and implemented either community scoring (like Reddit, Lobsters, and HN) or algorithmic filtering (like FB and Twitter) for this purpose.

                                                    That’s the motivation behind my current setup. The thing is, if I don’t have a solution for breadth, I find myself trawling news aggregators for the breadth, and then I get distracted reading comment trees and flamewars that take up my time/energy. RSS is part of my strategy to be intentional about where my time and focus goes online, especially as I’m getting older and busier and my backlog of projects I want to get done is getting larger.

                                                1. 8

                                                  “RSS” is like four distinct formats; the majority of which are underspecified garbage. When people say “RSS” do they usually mean “RSS 2.0” or is it still a mishmash like it was in the early aughts?

                                                  1. 9

                                                    At this point it’s mostly RSS 2.0 and Atom 1.0 from what I’ve seen.

                                                    1. 6

                                                      You got me thinking about this, so I did a quick check of my own subscribed feeds. Out of 451:

                                                      • 359 are RSS 2.0
                                                      • 81 are Atom 1.0 (2 were Atom 0.3, which I updated)
                                                      • 11 are RSS 1.x (RDF)

                                                      (It’s interesting that I can almost correlate formats. Atom has Google properties and self-publishers, RSS 1.x are mostly advisories, and RSS 2.0 is the de facto default).

                                                      1. 2

                                                        I was a big proponent of RSS2 for years, but these days it surprises me how popular the least specified option became.

                                                        1. 1

                                                          Is RSS2 XML? It looks as if it is, from a quick skim of some docs. This was the big reason I preferred Atom to RSS back in the day. RSS allowed embedding arbitrary HTML, which meant that you needed an HTML-aware parser to handle it. In particular, I could always embed Atom in XMPP and anything that could parse XMPP could handle it, even if it didn’t know what to do with it, whereas embedding RSS would cause the server to see invalid XML and drop the connection.

                                                          On the other hand, not being XML was one of the reasons that RSS was more popular than Atom: you could embed whatever tag-soup HTML you’d written in an RSS feed and it would be valid. These days, the XML vision of allowing arbitrary document types to be embedded in others with graceful fallback if you didn’t understand them is largely gone. I haven’t used XMPP for years, HTML5 is not XML and so you can’t embed HTML5 in SVG, and even SVG seems to be largely dying in favour of imperative JavaScript and canvas.

                                                          1. 4

                                                            Every RSS variant is XML and requires well-formed XML. In all variants except Atom the only way to embed HTML is to escape it. I worked at a company ingesting every RSS or Atom feed that could be found and I don’t recall running across one that tried to embed tag soup unescaped as though that would work.

                                                            RSS2 has two big problems: it doesn’t specify any way to tell if a description contains escaped HTML or plain text, and in practise people did both; and as an especial sin against XML it does not use a namespace, so you cannot easily embed it in other XML contexts. The latter is why all XMPP feed protocols use Atom, but Atom was created to fix both of these.

                                                            RSS1 of course uses namespacing properly, being RDF. In practise it had the same problem about not knowing what kind of content people were embedding. I think probably there could have been an RSS1.5 that kept the RDF and put in the features from Atom that were needed, but back then (and sometimes today) even XML nerds didn’t always appreciate RDF enough to care about that.

                                                            1. 2

                                                              I do believe RSS(2.0) is XML but the issue is how it handles embedded content. There were some subtle ambiguities regarding embedded HTML that Dave Winer simply wasn’t interested in addressing.

                                                              For example, in the RSS 0.92[1] spec at Userland, we can read

                                                              Further, 0.92 allows entity-encoded HTML in the of an item, to reflect actual practice by bloggers, who are often proficient HTML coders.

                                                              http://backend.userland.com/rss092

                                                              This is obviously not a future-proof spec.

                                                              Atom did the technically correct thing by insisting on unambiguous standards, damnit, but then they ran into the buzz-saw of “countercultural” bloggers who were on Winer’s side and disliked any restrictions or rewrites as akin to demands from The Man.

                                                              In the end, the Syndication Wars were a technical culture war, like the parallel XHTML vs HTML. Feed readers and parsers quickly learned to deal with feeds heuristically, just like HTML, and the requirements for strict standards compliance were seen as overly onerous in the real world.

                                                              I prided myself in having a standards-compliant Atom feed but it turns out my elements are apparently not up to spec. No-one has complained so far.

                                                              As an aside, Atom has no problem handling Gemini content, even if the intrusion of strict XML into the plaintext paradise of Gemini does feel a bit weird.

                                                              [1] RSS 2.0 is just a light re-skin of RSS 0.92 along with a version bump to indicate it’s “final”.

                                                              1. 2

                                                                That’s a shame. XMPP was a great transport for Atom (reliable delivery, push semantics). I hoped that it would eventually displace HTTP as the aggregator -> client transport at least, and ideally as the originator -> aggregator transport. From what I remember, neither Atom nor RSS on their own provided a good mechanism for handling large feeds: either they expired old entries and so aggregators had to cache everything or they kept old things and ended up with huge files. With XMPP, you could use HTML to fetch historical entries and XMPP to get tiny updates.

                                                                I suppose you could still transport RSS over XMPP in CDATA elements, but that feels like missing the point somewhat. The nice thing about Atom over XMPP was that your XMPP client could pull out a couple of core elements (e.g. title) for a notification and also push the rest to a dedicated reader application.

                                                                1. 3

                                                                  XMPP is still a great protocol for atom, and we’re starting to see movement in the client space.

                                                                  1. 1

                                                                    Atom’s ambitions were always bigger than blogging. I keep forgetting XMPP (mostly because I never use it). Wasn’t atom to be a part of Pub/Sub too? Lots of bright ideas, lots of ambitions about computer-to-computer communications, when it all really just settled on… syndication. Before the silos took over, slinging JSON internally.

                                                                    RSS at least kept it simple. Maybe that’s the lesson to take from all this…

                                                                  2. 2

                                                                    Note that Atom also allows embedded escaped HTML, it just requires that you say it is HTML instead of making the parser guess.

                                                                    1. 1

                                                                      Correct, I was unclear about that.

                                                            2. 2

                                                              I’ve used (self-hosted) both tt-rss and freshrss (went back to tt-rss because it works with rssguard) and most of them will be able to use any RSS or Atom format. Some can even automatically get the feed for a YouTube channel (which is now sorta hidden / difficult to find).

                                                              1. 1

                                                                Yeah, as a publisher, RSS and Atom both stink. JSON-Feed is better, but no one uses it. NinJS is great, but even fewer people use it.

                                                                1. 2

                                                                  What makes JSON-Feed better than RSS or Atom? I implement all three for my blog, and I found each just variations on a theme (with Atom being at least better described).

                                                                  1. 1

                                                                    Atom and RSS both have incomplete metadata and to actually describe a story in a reasonable way, you need to use one with namespaced tags from the other. JSON Feed has all the fields of both out of the box and a couple more, like link post support.

                                                                    See the table in http://www.intertwingly.net/wiki/pie/Rss20AndAtom10Compared and notice that neither is a superset, whereas JSON Feed is https://www.jsonfeed.org/mappingrssandatom/

                                                                    1. 1

                                                                      I used a pre-existing library from CPAN when I added it to my blog engine, but I suspect the major appeal is that it’d be a lot easier to roll by hand for most devs than anything built on XML. I was hoping it’d take off for that reason, but it pretty much seems to have stalled out.

                                                                    2. 1

                                                                      I’ve been using atom for over a decade and I’ve always been happy with it. You can definitely tell it was standardized by someone for whom it wasn’t their first spec.

                                                                      1. 1

                                                                        It has pub date but not modified date. Seems really obvious that you want both.

                                                                        1. 3

                                                                          This seems incorrect to me.

                                                                          From the link you posted in a sibling comment: https://www.jsonfeed.org/mappingrssandatom/

                                                                          Atom has an array of entry objects. In JSON Feed these are item objects.

                                                                          • Atom’s published and updated dates map to date_published and date_modified in JSON. Both Atom and JSON Feed use the same date format.
                                                                          1. 2

                                                                            Okay, it’s been a while since I implemented this. Looks like Atom does have updated. In general, Atom is better than RSS, for sure. I remember there being some missing tags in Atom, but I can’t recall what it was if it’s not updated time.

                                                                  1. 11

                                                                    Both the linked article and a lot of comments are skating towards where the puck was 10 years ago.

                                                                    First off, the majority of people with “computing devices” are using Linux because they’re using Android.

                                                                    Second, “Linux on the desktop” started as a goal, evolved into a bitter joke, and has been passed on by events. A computer with a desktop is simply not the main way people use computing devices nowadays. If you use one, you use it because your work requires it, or you’re a gamer, or you have specialized needs or are a developer “for fun”. A significant number of these people have access to more than one computer[1] and might be perfectly happy to install Linux on it, use it for whatever, and never really run into any pain points (or be savvy enough to google for solutions).

                                                                    In other words, Linux fills a niche, but is probably not the only computer for that user. Pain points in Linux are easy to deal with if you just visit your bank (for example) using a phone , or attend meetings like @b1twise mentions.

                                                                    So where does this leave the idealistic idea of the original “Linux on the desktop”? It was, in my opinion, based on the idea that people would get a computer, pay less for it because there was no OS bundled in the price, and get an equivalent or superior computing experience.

                                                                    But computers got more complex, and more locked down, and Apple and Microsoft stopped charging upfront for their OS (probably because Linux was seen as a competitor), and Linux distros ran into problems keeping up with stuff like audio, high-DPI monitors, and Bluetooth. And as a swelling tsunami, phones appeared as the main computing device and only Apple caught that wave. Linux couldn’t keep up.

                                                                    Apart from abtruse ideological justifications, Linux is not a better experience for most users. But it is a great experience for stuff like servers, or pi-holes, or for people who want to do home automation etc etc.

                                                                    “Linux on the desktop” is outdated because the desktop is not where people are anymore.

                                                                    [1] Personally I have access to one work computer running Windows, a personal Macbook, one VPS running Ubuntu, a used NUC running Ubuntu as well, and a RPi4 running Raspbian.

                                                                    1. 6

                                                                      According to the 2021 Stack Overflow survey even developers use Windows or MacOS more frequently than Linux for development. If devs, probably the demographic most equipped to thrive with a Linux install, don’t use Linux as a majority, I can only imagine what the usage is like for non-developers.

                                                                      1. 2

                                                                        I’ve always taken the approach of using what I get paid to use at work at home also. That has been mostly linux or a version of unix of some kind. Knowing that my career would be doing this kind of thing, I was happy to invest time in learning the tools of my trade. This has ended in an environment I’ve very comfortable with and that extends over time as I learn new things. I can boot into windows or bring it up in a VM, but it just doesn’t work the way I am used to doing things and that ends up being frustrating. I think the closest thing to success of linux on desktop is ChromeOS, but the hardware manufacturers are really hurting it with unreasonable prices and specs.

                                                                        I do agree that generally the normal person has a laptop because they use it for work or are in school. Beyond that they can probably be happy with an iPad and/or a cell phone. And most casual software is just designed for phones. The android tablet experience is still plagued by poor support by apps. I see it as: a child gets a tablet, gets older and graduates to a phone, gets farther in education and needs a laptop, graduates and probably gets a laptop for work and buys their own phones. If they are inclined they may also have a tablet or a game console. And a smart TV. And I’m fine with that. Over the years, it has significantly lowered the amount of tech support I provide to friends and family.

                                                                        I considered giving someone an old laptop with linux on it due to its lower specs. For the extended support lifetime, I installed Ubuntu 20.04. I got as far as the app store and realized it wasn’t being fair to put them through that.

                                                                        I’m quite interested in the reviews of the new mbp models. If thermals are good and battery life is good I am willing to deal with macos until linux is viable (if ever).

                                                                      1. 4

                                                                        There’s a lot of ideological language there, but I don’t see the actual point, I.e. how winning this suit would benefit users.

                                                                        How does access to the GPL’d source code used in Vizio TVs make it possible to repair the TV? It doesn’t make it any easier to modify the proprietary software in the TV, and it doesn’t provide access to the build system or docs of the specs of the internal hardware.

                                                                        And how likely is a TV to fail because of a flaw in the firmware? Usually it’s a hardware failure, or else network-based services fail because the manufacturer turns off the servers they talk to, neither of which is related to this.

                                                                        The most likely outcome seems to be that Vizio will just avoid copyleft software in the future.

                                                                        1. 21

                                                                          IANAL, but if successful, it would set a precedent allowing for companies violating software licenses to be sued by or on behalf of their users, as opposed to the current situation where only the copyright holders themselves are considered to have standing.

                                                                          This would be a Good Thing.

                                                                          1. 16

                                                                            There are some other good comments about direct benefits to users, but I think it’s worth keeping in mind that these kind of enforcement actions can have really positive indirect benefits as well. For example, a successful enforcement action against Cisco/Linksys years ago laid the groundwork for the OpenWRT project, an open-source wireless router firmware project that supports a wide range of devices today. OpenWRT, in turn, fueled a bunch of important work on low-cost wireless radio equipment in the years since, and shows up routinely in mesh networking and long-distance WiFi projects that support efforts expand low-cost access to the Internet today (as, of course, one small piece of a larger, mostly non-technical, puzzle).

                                                                            1. 5

                                                                              Users are entitled to the source code. You shouldn’t have to justify the benefits - they are entitled to it, because that’s the license terms and Vizio is not living up to them.

                                                                              If Vizio would rather take on the costs of maintaining another set of software rather than live up to the terms of the license, that’s on them. Their use of GPLed software doesn’t benefit the community if they don’t live up to the license, so there’s no loss if they decide to go that route.

                                                                              1. 5

                                                                                How does access to the GPL’d source code used in Vizio TVs make it possible to repair the TV?

                                                                                The article says so:

                                                                                Copyleft licensing was designed as an ideological alternative to the classic corporate software model because it: allows people who receive the software to fix their devices, improve them and control them; entitles people to curtail surveillance and ads; and helps people continue to use their devices for a much longer time (instead of being forced to purchase new ones).

                                                                                “run this same nice software, but without ads and data grabbing” is already a very nice proposition for many customers I would say. And having a way to keep the TV (and more importantly, its apps) functioning properly is important as well if you don’t intend to buy a new TV every 5 or so years or however soon the manufacturer decides to stop providing software updates.

                                                                                The most likely outcome seems to be that Vizio will just avoid copyleft software in the future.

                                                                                I agree that’s probably the net effect of all these GPL law suits, and the GPL in general. If a company doesn’t have good intentions, copyleft vs non-copyleft isn’t going to make much of a difference in the end.

                                                                                1. 2

                                                                                  The article answered “why” — I’m asking how technically. What is necessary to allow someone to rebuild a TV’s firmware? It seems likely it would require Vizio to make public some of their proprietary code, which I bet they wouldn’t do. They’d just pay damages instead (assuming that’s an option; IANAL.)

                                                                                  “run this same nice software, but without ads and data grabbing” is already a very nice proposition for many customers I would say

                                                                                  Again, ain’t gonna happen. There was a news story a few months ago about how Vizio is making more money from ads and data grabbing than from hardware sales. Making their TVs hackable would imperil their biggest revenue source.

                                                                                  1. 2

                                                                                    It seems likely it would require Vizio to make public some of their proprietary code, which I bet they wouldn’t do.

                                                                                    This was spoken to in a previous post: https://sfconservancy.org/blog/2021/jul/23/tivoization-and-the-gpl-right-to-install/

                                                                                    1. 2

                                                                                      The article answered “why” — I’m asking how technically. What is necessary to allow someone to rebuild a TV’s firmware?

                                                                                      Ah, I misunderstood. Well, that’s a good question. Typically though, there are always tinkerers willing to take apart the TV and figure out how to access the flash memory that stores the firmware. But you’re right, Vizio is not likely to tell you how to do it.

                                                                                  2. 2

                                                                                    Most smart TVs I’ve ever worked with were rendered useless by unmaintained apps no longer working, especially the browser/YouTube apps. With access to replace the firmware we could put Kodi, Firefox, chromium, whatever is needed on the TV and make it usable again.

                                                                                    The most likely outcome seems to be that Vizio will just avoid copyleft software in the future.

                                                                                    I hope so.

                                                                                    1. 7

                                                                                      My LG smart TV purchased recently (last 2 years) does not have support for Lets Encrypt’s new root certificate, so the situation is much worse than imagined.

                                                                                      1. 1

                                                                                        not have support for Lets Encrypt’s new root certificate

                                                                                        Oh gosh. Does that mean the TV just can’t open an HTTPS connection to any site using a Let’s Encrypt derived cert anymore?

                                                                                        1. 1

                                                                                          Yeah, I get a whole bunch of SSL handshake failures in my server-side logs. It’s extremely infuriating!

                                                                                    2. 1

                                                                                      How does access to the GPL’d source code used in Vizio TVs make it possible to repair the TV? It doesn’t make it any easier to modify the proprietary software in the TV, and it doesn’t provide access to the build system or docs of the specs of the internal hardware.

                                                                                      If the code is GPLv3 (the article doesn’t say), they would have to provide instructions for installing modified versions of the software.

                                                                                      If it’s an earlier GPL version, it would still let consumers know what the software is doing, which could be relevant to privacy concerns or developing external tools to interface with the TV.

                                                                                      1. 2

                                                                                        If the code is GPLv3 (the article doesn’t say), they would have to provide instructions for installing modified versions of the software.

                                                                                        This is also true for GPLv2

                                                                                        1. 2

                                                                                          No, it’s not - see Tivoization, a problem which GPLv3 was explicitly designed to address.

                                                                                          Perhaps you’re thinking of GPLv2’s provisions that (at least IIRC) require distributing any build systems, etc. needed to build the software? Just because you can build it doesn’t mean you can install it on the actual device.

                                                                                          1. 2

                                                                                            https://sfconservancy.org/blog/2021/jul/23/tivoization-and-the-gpl-right-to-install/

                                                                                            Tivoization unfortunately is widely misunderstood. It’s understandable, I’ve never seen a TiVo and I have seen a locked Android bootloader, and the way many people talk about it these sound the same on the surface.

                                                                                            What TiVo did was use technical measure to ensure that if you did install your own versions of the freedomware components, their nonfree components would stop working. They did not, it turns out, wholesale block installation of modified freedomware components. This is not a violation of GPLv2 (or, arguably, GPLv3).

                                                                                            What many manufacturers do now is block installation entirely. It’s not that the nonfree components will stop working but that the device will reject the installation attempt (or brick itself in some cases). This is a violation of both GPLv2 and GPLv3.

                                                                                    1. 2

                                                                                      I haven’t used Fibery, but I do use TiddlyWiki and Notion quite a bit and I find this space super interesting. I know TBL is working on Solid these days but I wonder if there’s any interest on bringing these richer hypertext ideas to the current WWW or whether the failure of Xanadu and the greater emphasis on the app web has made this a non-starter these days.

                                                                                      1. 4

                                                                                        What are the disadvantages of Wikidata here? I’ve made SPARQL queries against Wikidata and have gotten similar behavior to what the Mathematica example was showing. Wikidata is certainly still incomplete compared to Wikipedia, but I think it’s fairly feasible to programmatically bridge the two.

                                                                                        1. 6

                                                                                          That’s a good question I answer every time I am talking about this project/set of projects (so maybe the next blog post will be dedicated to it).

                                                                                          First of all, I am not rejecting the Wikidata (the “reality” project used it, too, alongside Wikipedia and OpenStreetMap) But currently, Wikipedia has two huge advantages: a) just more content (including structured content), and b) the path to content is more discoverable for “casual” developer/data users.

                                                                                          On “more content”: look for example at Everest’s entities in Wikipedia and Wikidata. The former has many interesting structured and semi-structured tables and lists not represented (yet?) in the latter (like “Selected climbing records” or even “Climate” (neat and regular table present in many geographic entities); in addition to unstructured text data which still can be fruitfully regex-mined.

                                                                                          Or, look at, linked from the Wikidata item, another item list of 20th-century summiters of Mount Everest… and corresponding Wikipedia article.

                                                                                          On “discoverability”: if we’ll look at, say randomly, Bjork at Wikipedia and Wikidata, the latter does have albums associated, but not the filmography (movies Bjork starred in). If you’ll start from the movie you can see they are linked via cast member predicate, so “all movies where Bjork is a cast member” can be fetched with SPARQL, but you need to investigate and guess the data is there; in Wikipedia, most of the people-from-movies just have “Filmography” section in their articles.

                                                                                          That being said, playing with Wikidata to make it more accessible is in the scope of my studies definitely :)

                                                                                          1. 4

                                                                                            If you can extract the data from Wikipedia programmatically then it sounds as if it would then be quite easy to programmatically insert it into Wikidata. I’d love to see tooling for this improved. I hope that eventually Wikipedia will become a Wikidata front end so any structured data added there is automatically reflected back.

                                                                                            1. 3

                                                                                              Yeah, just posted the same here: https://lobste.rs/s/mbd6le/why_wikipedia_matters_how_make_sense_it#c_kxgtem (down to the front/back wording :))

                                                                                          2. 2

                                                                                            Maybe we can kill two birds with one stone here. What is the chance that a project like this could actually be used to assist wikidata contributors? I don’t think it would be great to attempt to just programatically fill out wikidata like this, but having a nice tool where you can produce wikidata content for a specific wikipedia entry and then look over it and manually check it would save a lot of time for contributors.

                                                                                            I also feel like the wikidata approach is slightly flawed. Wikipedia collects its data by leveraging the combined knowledge of all humans (ideally) in a very accessible way. If we have a common data system that tries to collect the same data, but in a much less accessible way and only from trained data professionals we have lost a huge amount of utility. Especially given that the less accessible approach means way fewer eyeballs looking for errors.

                                                                                            1. 2

                                                                                              Maybe we can kill two birds with one stone here. What is the chance that a project like this could actually be used to assist wikidata contributors?

                                                                                              Bingo! One of my hopes is indeed that the new tool (once matured a bit) can be used fruitfully both on Wikipedia (to analyze articles structure) and Wikidata (to propose missing data on basis on Wikipedia)

                                                                                              I also feel like the wikidata approach is slightly flawed. Wikipedia collects its data by leveraging the combined knowledge of all humans (ideally) in a very accessible way. If we have a common data system that tries to collect the same data, but in a much less accessible way and only from trained data professionals we have lost a huge amount of utility. Especially given that the less accessible approach means way fewer eyeballs looking for errors.

                                                                                              That’s totally true. My (distant) hope is that Wikipedia/Wikidata might once become coupled more title: with Wikipedia being the “front-end” (both more friendly for the reader and for the contributor) with Wikidata the “back-end” (ensuring the formal structure of important parts). But there is a long road ahead, even if this direction is what Wikimedia wants.

                                                                                              1. 1

                                                                                                This was exactly where I was going with my reply in fact. Cheers.

                                                                                            1. 13

                                                                                              I have a theory that popularity of vision and language is due to data. There are endless amount of unlabeled data, and labeled data can be crowdsourced cheaply.

                                                                                              People like Horvath are hailed as genius-level polymaths in molecular biology for calling 4 scikit-learn functions on a tiny dataset.

                                                                                              Looking at https://en.wikipedia.org/wiki/Epigenetic_clock, I read:

                                                                                              Horvath spent over 4 years collecting publicly available Illumina DNA methylation data… (snip) The age estimator was developed using 8,000 samples from 82 Illumina DNA methylation array datasets.

                                                                                              It is both true that 8,000 samples are tiny, and it took 4 years to collect. Majority of machine learning effort is data collection, not data modeling. Data collection is easier with vision and language, even though data modeling is higher impact elsewhere.

                                                                                              1. 3

                                                                                                I have a theory that popularity of vision and language is due to data.

                                                                                                Well to add to this, CV and NLP are old fields. They’ve been collecting and cleaning datasets for decades now. Applying newer NN-based ML techniques was as easy as taking an academic dataset and training the nets. Other fields don’t have nearly the same history of pedagogy and so they’re probably going to take a lot longer to collect data (usefully) for.

                                                                                                1. 3

                                                                                                  Bio is also special…. Consider a genome. Most of your genome is the same as everyone else’s, so even though it’s a lot of data, the degrees of freedom are considerably lower. Also, even if we were to sequence every American and european, that’s only 750M data points (most of which self correlate)… Wiipedia alone kas 3.9 billion words.

                                                                                                  1. 2

                                                                                                    This would be true for just DNA and genetics. If you include epigenetic information and start going into gene expression in different cell types/tissues, there’s probably a lot more variation, but I don’t think we’ve quantified it yet.

                                                                                                  2. 2

                                                                                                    I agree with this, for hard problems you can’t just use mturk to label commonly available data

                                                                                                    1. 1

                                                                                                      I would attribute the popularity of language and vision more to problems that can be well modeled on GPUs w/ neural nets, as well as having massive amounts of labelled data. ImageNet existed well before the computer vision breakthrough by Hinton’s team. It was applying GPUs for process + neural networks that did the trick.

                                                                                                    1. 8

                                                                                                      My own take on this has been to avoid hype-driven areas like ML in favor of seemingly-underrated areas like control theory.

                                                                                                      1. 2

                                                                                                        I’m not sure how this follows from the article. The core piece of wisdom seems to be:

                                                                                                        For me, two ingredients for figuring out what to spend time learning are having a relative aptitude for something (relative to other things I might do, not relative to other people) and also having a good environment in which to learn.

                                                                                                        If anything, that take sounds closer to the generic internet advice that should be closely scrutinized (not to mention the large similarities between ML and control theory and the frequent overlap in things such as PIDs)

                                                                                                        1. 1

                                                                                                          Everyone knows there’s overlap between RL and optimal control theory. (Ditto with linear feedforward control and ARIMA regression.) It’s really a question of approach, not techniques.