Threads for knl

  1. 11

    you might think that Java was in fact in a period of great decline

    I wonder what drives that perception 🤔

    This article is a little weird to me. I don’t know a single person excited to be working in Java because it means Java 7 or Java 8. They are excited for the general JVM ecosystem, usually in Kotlin or Scala.

    I think you’ll see that Java is going to remain in the top three of loved languages

    Weird outtro when it was 53% dreaded in the last StackOverflow survey.

    1.  

      I actually know a couple of people that are pretty excited about working in Java 11 (with prospects of moving to Java 17), who moved to Java away from Scala. Their take is that Java has a fairly rich ecosystem and well understood tooling and upgrade paths, while Scala doesn’t seem to have a good direction.

      1. 6

        Their take is that Java has a fairly rich ecosystem and well understood tooling and upgrade paths

        You kind of echo my point here, the JVM ecosystem is rich, but Java the language isn’t the desirable part in my experiences.

      2.  

        I don’t know if excited is simply the wrong word. I inherited 2 Spring codebases last year and brought them from Java 8 to Java 11, and soon to Java 17 - it’s been fun, but then again working with Spring was kinda new to me and I can see how this could get stale again at some point. I’ve never been excited for the JVM ecosystem per se (but I do like Clojure, so don’t read this as “it’s all just boring tech”).

        But I guess this piece was trying to give a specific vibe which does not fit. Enterprise languages (as in: not only used by hip startups) are usually not chosen because the developers love them but because it makes sense.

      1. 4

        At $WORK we use Nix, and nix-shell in particular to provision reproducible environments for developers and for CI/CD pipelines. This allows anyone do just do a git clone and then make run to see the project running, and we have the assurance that this will run the same in a CI pipeline.

        We also use Nix for building container images, so that we get full reproducibility and granular control over our software supply chain.

        We don’t use Nix as our build system, though, for that we use Make, which I feel takes the core experience of shell scripting and adds declarative rules on top.

        Reading about Bazel, it would be well suited to replace Make. Maybe there is a way for Nix to grow in this area while streamlining its features and commands?

        1. 3

          So you use nix to ensure all dependencies are present but then let your project build “naturally”? If so, that’s sounds like an appealing approach to me. I’ve been put off using Nix because it seemed like I’d end deep down a rabbit hole reproducing build tooling that was already working.

          1. 3

            I had something similar in the previous two companies, and it was quite a pleasant experience. There were some rough edges when upgrading Haskell dependencies, but in general it worked quite smooth.

            One of the companies open-sourced the system we used back in the day: https://github.com/digital-asset/dev-env. This is a lazy-loading dev environment, where each tool would be fetched on the first use. Quite handy for monorepi, where one half uses Scala exclusively and the other half uses Haskell - neither has to wait for the tools from the other side to be downloaded.

            1. 1

              I use nixos for most of my current and new infrastructure, but I still maintain a bunch of “legacy” promxmox machines and debian vms via an older ansible setup.

              The thing includes stuff like custom plugins for secret handling, cmdb interaction and so-on and uses ansible together with mitogen. A relatively simple flake.nix uses nixpkgs, poetry2nix and a devshell with a shellhook to set up the correct python environment, the password manager, a bunch of env vars, so that a simple nix develop starts a shell with evertything needed to deploy, update and maintain that ansible setup across machines (works on nixos and debian, and worked on macOS, but we don’t use that anymore).

          1. 1

            Great presentation on nix flakes (I’ve never used it but I admit it looks really nice from a reproducability perspective).

            Though I have to point out his initial comparison to docker is a bit disingenuous IMHO, he basically picks up some of the worst things you can do in a Dockerfile for the initial example including:

            1. FROM ubuntu:latest; it’s best practice to use the most specific tagged image possible, at bare minimum ubuntu:xenial, but even better would be ubuntu:xenial-20210804. Expanding on that it’s also generally better to pull your toolset’s image rather than a generic one if possible, e.g. at work for our Golang projects we use golang:1.xxx where 1.xxx is the version of Go we need. That way it doesn’t change underneath us between builds.

            2. apt-get update && apt-get upgrade; yes, this changes every time. So if you’re really worried about that you should have a “base” Dockerfile image to build, tag that and push it to your registry, and then have your app’s image use that in its FROM line so that you don’t have to worry about that layer ever changing.

            3. CMD ["hello"]; ok, I’ll concede this one since the path can change but usually it’s a better idea to put the entire path to the binary as ENTRYPOINT/CMD so you don’t have to worry about $PATH being wrong (and the risk of this gets lower if you use an intermediate image after the install-ey lines too since that controls the change more).

            tl;dr: this is a bad example of a Dockerfile, just like there are probably bad flake.nix out there, this is a REALLY bad Dockerfile example. It might not be quite as reproducible as a flake.nix but you can make OCI containers more reproducible than the example the speaker initially presented.

            I do want to say as an outsider that learning the entire language puts a weird taste in my mouth compared to just describing the state of a container via a Dockerfile but that might be because I’ve been happily using them for too long. I really need to do more research and fiddle more with nix…

            1. 2

              It might be a bad example of Dockerfile, but sadly I’m seeing them on a daily basis. That’s why I’m hopeful that Nix prevents one to shoot themselves in the foot.

              1. 1

                Agreed, if teams don’t have container experts (or expensive consultants) things get out of hand quickly. I wish I could tag my post as a rant because I’m not really “mad” about it.

                User error ultimately is the problem and not the tool!

            1. 1

              This article contains some pretty great visualizations and advices! I just learned about Observable Plot and it’s seems like a good complement to ggplot2/plotnine and matplotlib.

              But to get to the point of good visualizations, one usually needs to do some data processing and getting the data in the right shape. So far, I’ve been using pandas (as we use python), but that experience is usually quite frustrating. Anyone has good recommendations for processing data in a comfortable way?

              1. 30

                aka “It’s time to make programming easier by changing reality”

                I feel like, in this case, we could also make programming easier by changing programming. The root cause of this isn’t leap seconds per se, but the fact that the de-facto-standard computer timekeeping system doesn’t understand them, and we hacked it up in such a way that they completely break everything.

                If UNIX time counted actual, rather than idealized, seconds, most things would become easier. (That is, for each tick of a naïve clock, the current UTC second is labelled with the numerically next integer). Converting the current time without current leap second data would be wrong. But clocks don’t need to care about this, only frontend systems do, and in 2022 those run lots of things that need to be updated more frequently than every six months.

                1. 8

                  The article doesn’t bring it up, but the problems with leap seconds don’t end with better programming. Since the leap seconds are only announced about 6 months in advance, the number of seconds between now and a UTC timestamp more than 6 months into the future is unknown. Therefore, if Unix time counted actual seconds as you suggest, it would be impossible to convert UTC to Unix time for such future timestamps. That would mean that calendars and other application that need to represent future timestamps couldn’t use Unix time.

                  As I see it, the root cause of this problem is that civil time, when used to express plans more than 6 months into the future, is undefined. Better programming can’t fix that.

                  1. 18

                    The date and time of an event in several years can’t be defined in terms of seconds from now, but you can easily define it in terms of date, time and timezone.

                    1. 1

                      You cannot easily define the time and date of an event in the future in terms of date, time, and timezone! This has nothing to do with UNIX timestamps being stored in seconds.

                      If you care about the elapsed time, you cannot count the amount of actual time that will pass from now to some date even a year from now. Not with precision at the level of seconds or less. X amount of time into the future doesn’t map to a fixed date, time, and timezone because we’re redefining time constantly with these leap seconds.

                      FB is right, kill the leap second.

                      1. 9

                        This goes beyond leap seconds. With a fixed date, time, and timezone, the timezone can change, and does with some regularity.

                        Unless we kill political control of timezones, this will still need to be taken into consideration.

                        1. 1

                          To some extent that’s true, but not generally.

                          The definition of UTC-5, modulo leap seconds, doesn’t change. In that sense removing leap seconds does allow you to compute future times just fine. If I have a device that I need to check 5 years from now, I know exactly what time that will be UTC-5, modulo leap seconds.

                          Now if you mean, timezone in the sense of EST/EDT, then plenty of time zones have not changed in well over a century and it’s hard to see them ever changing. Perhaps ET may change by fixing it to EST or EDT, but generally, as countries become more developed they stop making these changes because of the disruption to the economy. Check out https://data.iana.org/time-zones/tzdb/NEWS

                          So yes, political control of timezones is actually being killed as the economic consequences of changing them becomes severe. Things are slowly freezing into place, aside from leap seconds.

                            1. 4

                              Basically, “18:30 on 2038-01-19 in the host system timezone” is the only more or less well-defined concept of a future date that is useful in practice. When that time comes, a system that is up to date with all natural and political changes can correctly detect that it came.

                              Applications that deal with arbitrary time intervals in the future like “2.34 * 10^9 seconds from now” should use a monotonic scale like TAI anyway, they don’t need UTC.

                              1. 2

                                Eh, scheduling meetings a year or two in advance can happen, and it could be well defined and useful. But it’s important to note that the further into the future something is happening, the less the accuracy matters, unless it’s astronomy at which point you have TAI or UT1 depending on context.

                                1. 1

                                  Except that there is no safe way to compute TAI on most systems.

                                  1. 1

                                    A GPS reciever costs $1000 at most. If you need precise accuracy, it’s what you’re going to use, and it’s just GPS_time - 7s to get to TAI. Big companies run their own NTP pools for reliability, and if you have your own pool, you can run it at TAI.

                                2. 2

                                  I’ve seen that! It’s what I meant about ET changing its definition. It’s far from done sadly :( The house seems to have abandoned the bill https://thehill.com/homenews/house/3571007-permanent-daylight-saving-time-hits-brick-wall-in-house/

                                  In any case, the problem is with redefining time zones not dropping them.

                            2. 3

                              Can you elaborate on this, I’m really curious why is it so? I was under the impression that if we say a meeting will happen on August 1st, 2050, at 3.30pm CEST, in Bern, Switzerland, not many things can make this ambiguous. If Switzerland stops using CEST, I’ll probably just switch to the replacement timezone. The reason I’m confused is that I don’t see how leap seconds play any role.

                              1. 4

                                It is ambiguous because extra seconds of time may be inserted between now and then. So no one can tell you how long from now that time is (in seconds)

                              2. 2

                                In what situations do you need to know the exact number of seconds to a future (civil) time more than a year in the future?

                            3. 14

                              As I see it, the root cause of this problem is that civil time, when used to express plans more than 6 months into the future, is undefined.

                              Civil time is not “undefined”. Definitions of local civil time for various locations may change, but that’s not the same thing at all as “undefined”.

                              I also don’t generally agree with “better programming can’t fix” – the issue simply is programmers demanding that a messy human construct stop being messy and instead become perfectly neat and regular, since we can’t possibly cope with the complexity otherwise. You slip into this yourself: you assume that the only useful, perhaps the only possible, representation of a future date/time is an integer number of seconds that can be added to the present Unix timestamp. The tyranny of the Unix timestamp is the problem here, and trying to re-orient all human behavior to make Unix timestamps easier to use for this purpose is always going to be a losing proposition.

                              1. 7

                                As I see it, the root cause of this problem is that civil time, when used to express plans more than 6 months into the future, is undefined. Better programming can’t fix that.

                                This is true to an extent, but I think it’s true independently of leap seconds. The timezone, and even the calendar, that will apply to dates in the future are also undefined.

                                I also think it’s not the whole story. It seems intuitively reasonable to me that “the moment an exact amount of real time from this other moment” is a different type from “the moment a past/future clock read/reads this time”, and that knowledge from the past or future is required to convert between the two. I think we’ve been taking a huge shortcut by using one representation for these two things, and that we’d probably be better off, regardless of the leap second debate, being clear which one we mean in any given instance.

                            1. 4

                              This is a pretty well thought out visualization, I was looking for something like this! Seems that the original idea comes from Hadley Wickham: https://r4ds.had.co.nz/relational-data.html — he seems to have a knack to produce well-understandable tools and visualizations.

                              1. 8

                                The only problem with lots of custom aliases (or custom keybindings in other programs like editors), is that the muscle memory burns you every time you have to work on a remote machine. I used to go custom-to-the-max with my config, but I’ve gradually shifted back to fewer and fewer aliases except for the most prevalent build/version control commands I run dozens of times each day.

                                1. 9

                                  When I need to remote into machines where I can’t set up my shell profile for whatever reason, I just config ssh to run my preferred shell setup commands (aliases, etc) as I’m connecting.

                                  My tools work for me, I don’t work for my tools.

                                  1. 5

                                    You mean, could single session only? Care to share that lifehack? I’m assuming something in ssh_config?

                                    1. 2

                                      Yeah, single session only. There are a bunch of different ways to skin this cat — LocalCommand and RemoteCommand along with ForceTTY in ssh_config can help.

                                      Conceptually you want to do something like (syntax probably wrong, I’m on my phone)

                                      scp .mypreferedremoterc me@remote:.tmprc; ssh -t me@remote “bash —rcfile ~/.tmprc -l; rm .tmprc”

                                      which you could parameterize with a shell function or set up via LocalCommand and RemoteCommand above, or skip the temp file entirely with clever use of an env variable to slurp the rc file in and feed it into the remote bash (with a heredoc or SendEnv/SetEnv)

                                  2. 2

                                    every time i have to work on a remote machine i do the commands through ssh or write a script to do it for me.

                                    1. 2

                                      naming a meta-archive-extracter, “atool” doesn’t help either. OP used unzip for this but it is overloaded. uncompress also is taken.

                                      What word would you guys use for aliasing it?

                                      1. 3

                                        I use extract as a function that just calls the right whatever based on the filename.

                                        1. 2

                                          I think prezto comes with x alias, and I like it a lot. It’s burns easily into the muscle memory.

                                        2. 2

                                          To defeat muscle memory when changing tools, I make sure the muscle memory command fails:

                                          alias unzip = “echo ‘use atool’”

                                          It doesn’t take many times to break the muscle memory. Then I remove the alias.

                                          1. 1

                                            Is atool there by default on Linux boxes?

                                            1. 1

                                              Nope. At least I’m not aware of any Linux distro installing it by default.

                                              But being installed by default is IMHO totally overrated. The main point is that it is available in many Linux distribution’s repos without having to add 3rd party repos—at least in Debian and all derivatives like Devuan, Kali oder Ubuntu.

                                              1. 2

                                                I understand, but it’s not the same. If I don’t have a shell regularly there, and not my own dotfiles, I likely want to avoid installing and removing system packages on other people’s systems. When stuff breaks, I want the minimum amount of blame :)

                                                Not that this is not a useful tool.

                                                1. 1

                                                  Ok, granted. Working as a system administrator it’s usually me who has to fix things anyway. And it happens only very, very seldom that something breaks just because you install a commandline tool. (Saying this with about 25 years of Linux system administration experience.)

                                                  Only zutils can theoretically have an impact as it renames commandline system tools and replaces them with wrappers. But so far in the past decade, I’ve never seen any system break due to zutils. (I only swa things not working properly because it was not installed. But that was mostly because I’m so used to it that I take it as given that zutils is installed. :-)

                                                  1. 2

                                                    Yep, different role. I did some freelance work a long ago, and learned on (fortunately) my predecessor’s mistake: they hired me to do some work, because I guess someone before me updated some stuff, and that broke… probably PHP version? Anyway, their shop didn’t work any more and they were bleeding money till I fixed it. It was one of my early freelance jobs, so that confirmed the age-old BOFH mantra of if it ain’t broke, don’t fix it. So given time, I would always explicitly ask permission to do this or that or install the other, if needed.

                                                    But I went a different route anyway, so even though I am still better than average, I think, I’m neither good nor professional. But I think old habits die hard, so that’s why I’m saying “if this stuff isn’t there by default, you’ll just have to learn your tar switches” :)

                                          2. 2

                                            muscle memory burns you every time you have to work on a remote machine

                                            Note that this doesn’t apply for eshell as the OP is using: If you cd to a remote machine in eshell, your aliases are still available.

                                            1. 1

                                              Command history and completion suggestions have really helped me avoid new aliases.

                                            1. 1

                                              Thanks for this post, it was quite informative. Earlier this week I was thinking how we don’t see many articles about build and CI systems, and to my delight I was proved wrong :)

                                              1. 4

                                                A response to https://lobste.rs/s/w21yxt/congratulations_we_now_have_opinions_on from James Bennett of Django fame, which I found quite reasonable and covering most of the points raised in the linked discussion. I liked the part about responsibility that we take when we release something.

                                                1. 1

                                                  Alternative title: Web dev rediscovers cache coherence

                                                  1. 12

                                                    A alternate way to approach this article is as learning material for all the other “web devs”.

                                                    It’s a well told story with good insights into how one can improve performance by looking at it holistically.

                                                    1. 2

                                                      Putting aside the fact that the author is not a web dev (per grand-poster), the reasons why I posted the article are exactly the ones you phrased so well.

                                                  1. 1

                                                    I haven’t personally done benchmarks, but I thought I read that using the SSH protocol with rsync was a lot slower than using the rsync protocol due to no encryption overhead. Did you see differently?

                                                    1. 3

                                                      Yeah, I can definitely believe that SSH can become the bottleneck, or pose a significant overhead, in many setups. But, in my tests, using unencrypted rsync daemon mode was even slower for some reason!

                                                      I ran some tests with my network storage PC, downloading a 50GB zero file from my workstation (both connected via a 10 Gbit/s link):

                                                      • curl -v -o /dev/null reaches ≈1000 MB/s — maximum achievable on this 10 Gbit/s link
                                                      • ssh midna.lan cat > /dev/null reaches ≈368 MB/s — SSH overhead
                                                      • rsync (writing to tmpfs) via SSH reaches ≈321 MB/s
                                                      • rsync (writing to tmpfs) unencrypted reaches ≈337 MB/s

                                                      But, once you write to disk, throughput drops even further:

                                                      • scp (writing to disk) reaches ≈280 MB/s — SSH+disk overhead
                                                      • rsync (writing to disk) via SSH reaches ≈213 MB/s — rsync overhead
                                                      • rsync (writing to disk) unencrypted reaches ≈199 MB/s (!) not sure why this is slower
                                                      1. 1

                                                        As you have a really fast network, I’m wondering if compression is to blame? IIRC, in both programs compression is disabled by default, but distributions might change it.

                                                    1. 4

                                                      rope.go in the repository talks about persistence, but it sounds more like immutability — new, modified instance is returned for every change. Is the use of word persistence correct here, I thought it implies being written to disk or some storage?

                                                      1. 3

                                                        https://en.wikipedia.org/wiki/Persistent_data_structure

                                                        In this case, “persistence” means “the old versions stick around after modification as long as you need them to.”

                                                        Among other reasons, it’s a nice property because it makes undo/redo trivial.

                                                        1. 2

                                                          Yeah unfortunately it seems the term is overloaded, meaning either immutable data structures or persisted-on-disk (or equivalent) data structures, depending on the author.

                                                        1. 21

                                                          For such a use case I’d really recommend tqdm, which does basically all of that (and more) automatically.

                                                          https://tqdm.github.io/

                                                          1. 4

                                                            A progress library with custom merch. This is where we’re going?

                                                            1. 3

                                                              I recommend this lib in general too! Main problem is just having the foresight to have it installed in whatever env I’m working with. Of coures tqdm is Python-specific but you can replicate stuff like that in 20 lines or so if you want to.

                                                              1. 1

                                                                Yes, I think especially because it’s a nice-to-have dependency it’s often not included in a dependency list or environment explicitly. It also “feels” a bit heavy-weight, although it hardly adds a performance penalty to the execution itself.

                                                                While tqdm is python-specific, if you go via the command line, then you can use tqdm as an executable, eg from the website:

                                                                seq 9999999 | tqdm --bytes | wc -l
                                                                

                                                                Otherwise the stuff you outlines in the post is more generally applicable.

                                                              2. 2

                                                                There is also the language agnostic pv, but it’s limited in what it can output.

                                                              1. 2

                                                                After reading the landing page I’m still unsure what this project is about. What would definitely help me are some examples on the page, because the given example is without much annotations.

                                                                1. 3

                                                                  From what I can tell from the docs and source code, the idea seems to be a language independent CLI parsing library. Essentially, you write a nix script that identifies all your sub commands, flags, and positional parameters. It then generates a Bash script which handles all the command line parsing and validation and runs the correct executable.

                                                                  Some things like allowing a CLI to default back to a certain environment variable is a nice touch that isn’t present in every CLI parsing libarry. I can also imagine that it would be about the only sane option if you have multiple utilities written in different languages that you want to expose behind a single subcommand interface.

                                                                  Again, though, this is based on a five minute skim of the code.

                                                                  1. 2

                                                                    Exactly

                                                                    BTW I was doing some module work to allow polyglot codegen, so far it only generates shell script but implementing more languages should be trivial, your own code could be readFiled to the prelude then it generates your main function for C for example. I might move some target options to somewhere else.

                                                                1. 4

                                                                  I posted this because I’m still trying to wrap my mind about the concepts of DPUs and implications they bring. If someone has any experience and good use cases to share, I’d love to hear.

                                                                  1. 8

                                                                    There’s no huge difference between a box full of DPUs and a blade system. They are both physical formats for increasing the density of servers. The blade system will have better support for replacing units without taking down all the neighbors, and should have well-designed cooling and power from the beginning. The DPU box will have a faster backend connection between units – PCIe4 or, soon, 5.

                                                                    The OCP/OpenRack system is a less extreme but more scalable system. If you’re not familiar with it, it’s a datacenter-scale system where compatible products fit into a specific rack format where 12V DC power is centrally provided in the rack, and there are standards for networking distribution. Essentially, each rack is a chassis into which you fit compute/RAM nodes, storage nodes, and networking nodes. None of them need power supplies of their own.

                                                                    1. 4

                                                                      Unless I’m missing something, it’s just a computer on your SmartNIC. The main reason things like this are attractive in the datacenter is that you clock cycles and RAM on them can be a lot cheaper than on the host. If you run a cloud service on conventional hardware, you need to reserve some amount of host RAM and some amount of CPU (possibly one or more cores) to run host. This includes any device emulation / paravirtualisation and your control plane. With a modern SmartNIC, you can typically offload a lot of the device emulation / paravirtualisation by having the device support SR-IOV (S-IOV / SF-IOV coming soon) expose an MMIO space to each guest that either looks like a real device or the PV device. This can even include things like translating SCSI or ATA commands into iSCSI or similar to talk to your back-end storage system. The control plane still needs to run on the host though. If you put a cheapish Arm core on the device, it can have some very cheap (slow) RAM and do things like DMA page-table updates for the second-level address translation, send interrupts to the guests for startup / shutdown, and even DMA initial boot images into host memory. This combination lets you sell 100% of the RAM (minus a tiny bit for page tables, which are often tiny if you’re able to use 1 GiB superpages) to your customers.

                                                                      A lot of SmartNICs have a general-purpose core now. They generally have some combination of slightly programmable ASICs for things like line-rate packet filtering, FPGAs for slower / less power-efficient filtering and transforming, and CPU cores for slower control-plane things. Converting ATA commands into iSCSI, for example, is much easier to do on a general-purpose core. The command messages are generally tiny in comparison to the data, so you don’t need the performance of an ASIC and having a general-purpose core means that you can update the translation to add new features (e.g move from iSCSI to some other protocol on the back end or support a different emulated PV interface) trivially by just deploying a firmware update.

                                                                      A few people (including NVIDIA and AWS) are trying to make security claims from this. Aside from some side-channel resistance, I’m not convinced that running security-critical software on a less-reviewed platform is actually a big win for security.

                                                                    1. 1

                                                                      It was also interesting to see the reaction from open source developers to unsolicited pull requests from what looks like a bot (really a bot found the problem and made the PR, but really a human developer at Code Review Doctor did triage the issue before the PR was raised). After creating 69 pull requests the reaction ranged from:

                                                                      I wonder if you’d get better reactions if a human made the PR and didn’t say it came from a bot.

                                                                      1. 1

                                                                        That’s exactly the quote that prompted me to share the article. I think there was recently a case in the linux kernel community that some university group was submitting (arguably bad) patches that were generated by a tool – it didn’t went well if I recall correctly. Maybe initial reactions would be better, but long term, if the project finds out, it would lead to loss of trust.

                                                                        1. 1

                                                                          It was the University of Minnesota and they got their entire university banned from submitting anything to the Linux kernel.

                                                                          The biggest argument against stuff like this I saw was that the heads of the groups being tested against had not accepted to participate in the study.

                                                                        2. 1

                                                                          A flipside of that is that you might expect better analysis from a human if it had been filed under a human’s name. These are clearly mostly auto-generated bug reports, and a number of false positives were filed, despite the triaging (from just spot-checking: 1, 2). So filing them under a bot’s name is maybe more honest to manage expectations.

                                                                        1. 1

                                                                          A stated reason to avoid KSUID was that it uses base-62, so comparison would fail depending on the sorting preference. As base-62 is just the representation, could KSUID still be used with a different encoding, for example base-32?

                                                                          1. 1

                                                                            Absolutely none at all, we could have added our own encoding methods for handling thus quirk.

                                                                            One reason for using an existing scheme for the basis of our ID’s was to get a battle tested system, which reduces the risk of us having bugs in that layer. I would have been concerned that our encoding and decoding might have included a werid edge case (although Fuzzing would have picked this up).

                                                                            The main reason we did not use it in the end was we wanted in process ordering. A lot of things can happen in the same second; and if we ever had a race condition type bug, being able to extract the exact order in which objects where created would be incredibly useful

                                                                          1. 2

                                                                            Fairly new to go, so I’m puzzled by this:

                                                                            type Ordered interface {
                                                                                Integer|Float|~string
                                                                            }
                                                                            

                                                                            ~string is described as

                                                                            For type constraints we usually don’t care about a specific type, such as string; we are interested in all string types. That is what the ~ token is for. The expression ~string means the set of all types whose underlying type is string. This includes the type string itself as well as all types declared with definitions such as type MyString string.

                                                                            Shouldn’t token ~ be used also in front of Integer and Float?

                                                                            1. 4

                                                                              Your instinct is right, it’s just that Integer and Float are names of other interfaces and not the basic core types. They are defined as you expect in the constraints package: https://pkg.go.dev/golang.org/x/exp/constraints#Float, e.g.:

                                                                              type Float interface {
                                                                                  ~float32 | ~float64
                                                                              }
                                                                              

                                                                              and similarly for Integer.

                                                                            1. 2

                                                                              What this article doesn’t mention is my biggest frustration with R: lack of language support for 64-bit integers. Given that I frequently work with timestamps with nanosecond precision, it’s frustrating. Even with bit64, things are not so easy.

                                                                              1. 2

                                                                                (copy of HN comment, full thread)

                                                                                The popularity of the Tidyverse is a major blow to your motivation to learn R. Why would anyone want to learn a language that is treated as secondary to some packages? Worse still, if that turns out to be the best way to use R, then you’re forced to admit that R is a polished turd with a fragmented community.

                                                                                As others have mentioned, just use tidyverse. I picked it up 4 years ago, and last week I went back to the code I wrote then.

                                                                                I was productive in minutes. I could read the code, modify it, and easily test it in the REPL. The docs for dplyr are good.

                                                                                ggplot2 is still awesome and the docs are good there too. ggplot2 is the fastest way to figure out what you want and make a pretty plot.

                                                                                (However one thing that still annoys me is that R moves faster than Debian. So it’s possible to do install.packages() in R, and it will break telling you your Debian R interpreter is too old. There is no easy solution for this, just a bunch of workarounds)


                                                                                OK, sure you can call it a polished turd, and to some degree that’s true. But a polished turd is better than just using … a turd!

                                                                                The error messages in R are not quite as good as Python, but I wouldn’t call it a problem. I’m able to localize the source of an error, even when using tidyverse.

                                                                                My article comparing tidyverse to some other solutions:

                                                                                What Is a Data Frame? (In Python, R, and SQL) http://www.oilshell.org/blog/2018/11/30.html


                                                                                But would I recommend learning it to anyone else? Absolutely not. We can do so much better.

                                                                                I would recommend with the caveat that it’s one of the hardest languages I’ve had to learn. However that is partly because it changes how you think. But if you have a certain type of problem then you have to change how you think, or you’ll never get it done. Data analysis is surprisingly laborious even for people who have say written compilers and such.

                                                                                1. 2

                                                                                  But would I recommend learning it to anyone else? Absolutely not. We can do so much better.

                                                                                  I’m wondering what the author had in mind when he said we could do better? I’m not aware any other ecosystem with such ergonomic and performant libraries.

                                                                                  1. 1

                                                                                    Yeah the problem is that Julia and Python are both imitating R and playing “catch up”. Unfortunately that takes 10 or more years. Meanwhile R has a lot of very smart people moving the ecosystem forward.

                                                                                    I started using R before tidyverse – around 2010, and tidyverse was 2016 or so. And maybe in 2016 you could make an argument that Python was catching up.

                                                                                    But then tidyverse was released and documented with books, and now R is light years ahead again.

                                                                                    So yeah in my mind, there’s basically nothing equivalent, so you if you want to solve certain problems, you just have to suck it up.

                                                                                    Sort of like shell. Shell has a lot of warts, but at the end of the day it’s easier just to learn it than to flail around with inferior alternatives

                                                                                    (As a side note, tidyverse generally performs well, but with R in general you do have to be wary of performance. So often I use Python or shell as “pre-filtering steps” to cut down what you deal with in R. I generally deal only with clean TSV files in R; dealing with arbitrary text can be very slow)