1. 2

    If one is inclined to look for bugs as old as this one or older, I suggest looking into indent(1) which has some potential, I think, since it was started in 1977 and is still used. I’m not aware of programs with similar history, but I’m almost sure they do exist.

    1. 3

      SELECT ... FOR UPDATE SKIP LOCKED was introduced in Postgres 9.5 to solve (some of) these issues. With it, dequeueing a set of jobs looks like this. No need to hang on to a long running transaction either as long as you have some sort of process to requeue dropped jobs.

      1. 1

        The article was published in 2015.

        1. 1

          Oh, I wasn’t aware it got fixed in Postgres 9.5. However, I still do like the article as it goes into depth on various intricacies of Postgres and gives a good insight on how things work. A good, albeit outdated, learning material.

          1. 1

            It doesn’t however solve the “dead tuples problem” mentioned in the article. Correct me if I’m wrong.

            1. 2

              That’s right. That’s an issue for any table that has high churn in terms of updates and deletes and that’s something you’d work around by tuning auto vacuum (to run frequent, fast vacuums – the goal being predictable performance over very high throughput) and things like the fillfactor for the jobs table specifically.

              Another option I’ve heard about but never tried is having two jobs tables and periodically switching between them and truncating the other. Presumably, that’d involve turning off autovacuum for the two tables since you’re effectively performing a full vacuum “in software” at every switch. Personally, I think that if you get to the point where you need to do something like that, it’s probably time to move on to a real message queue (like RabbitMQ).

          1. 4

            I like unlogged tables, but you need to remember that they are not replicated to standby servers :(

            1. 3

              Another even shadier trick is to have temporary tables on unjournaled filesystems or just tmpfs.

              If they are really just for intermediate computations - there is no end to end loss in reliability, as on a rare reboot you start fresh and the computation can be restarted from scratch.

            1. 2

              Here’s what didn’t work or was not available:

              Sublime Text

              Does linux-sublime3 not work?

              vmware; vagrant

              Bhyve

              FreeBSD performs better than Linux

              At least for desktop use, my experience was the opposite.

              Docker/kubernetes

              Though note that, if you can use only freebsd, jails are a cleaner option.

              Virtualbox

              Why would you use virtualbox when you could use qemu?

              1. 2

                Bhyve

                … and Virtualbox :)

                At least for desktop use, my experience was the opposite.

                Depends what your needs are. Linux still lacks in several places. For example:

                • FreeBSD/GhostBSD has smaller memory usage: https://youtu.be/fes9o5iI0J8
                • Linux does not support ZFS Boot Environments (Ubuntu) out of the box anywhere
                • FreeBSD/GhostBSD is a lot SIMPLER then any Linux

                … but if you compare out of the box experience for ‘typical desktop’ then yes, Linux is winner here.

                Why would you use virtualbox when you could use qemu?

                Because with QEMU will be dead slow. On Linux QEMU uses KVM backend for hardware based virtualization. On FreeBSD QEMU is just emulation. Fast virtualization on FreeBSD is either Bhyve or Virtualbox.

                1. 1

                  Why would you use virtualbox when you could use qemu?

                  Because with QEMU will be dead slow. On Linux QEMU uses KVM backend for hardware based virtualization. On FreeBSD QEMU is just emulation. Fast virtualization on FreeBSD is either Bhyve or Virtualbox.

                  TFA suggested using vbox as a host for freebsd, presumably on top of linux. IME, vbox is alright, but rather slow and complicated compared with ‘native’ qemu,bhyve,hyperv,etc. (Have heard good things about vmware, but haven’t had a chance to try it.) (I’ve also been planning to set up xen for home use, but that might be a bit much for someone who just wants to try an OS.)


                  FreeBSD/GhostBSD has smaller memory usage

                  This is incredibly dependent on the specifics of init, userland, services, etc. I wouldn’t take this as indicative of anything. Memory usage of ‘base’ system (without graphical DE,apps,…) is likely to be negligible anyway, and graphical environments are not OS-specific. Beyond that, freebsd uses jemalloc by default, which does tend to perform better than ptmalloc, but it’s possible to override the system allocator with jemalloc on linux.

                  Linux does not support ZFS Boot Environments (Ubuntu) out of the box anywhere

                  FreeBSD/GhostBSD is a lot SIMPLER then any Linux

                  To be sure! I really like freebsd, and currently boot both linux and freebsd. But I was specifically talking about performance.

                  Wrt your first point, I don’t really see the point of root/boot-on-zfs. My homedir is zfs, as is all my important data, but root is just ufs or xfs. Those tend to be faster (and suck up less memory for cache), and I don’t really care about integrity for system packages that can trivially be re-downloaded.

                  1. 2

                    I don’t really see the point of root/boot-on-zfs. My homedir is zfs, as is all my important data, but root is just ufs or xfs. Those tend to be faster (and suck up less memory for cache), and I don’t really care about integrity for system packages that can trivially be re-downloaded.

                    I also thought like that in the past (ZFS for data and something simple/fast for system) but once I realized the freedom and safety ZFS Boot Environments gives I always setup root on ZFS now.

                    These two are possible with both root and /boot on ZFS with ZFS Boot Environments on FreeBSD/Illumos:

                    1. Complete system rollback and snapshot.

                    Upgrade went wrong? No problem, jest reboot to other created before upgrade Boot Environment. Like never happened.

                    Upgrade went fine but some package has very nasty bug and package repository does not have older package to go back - same - jest reboot to BE you created before upgrade.

                    2. Very easy migration of system between systems.

                    Need to move ‘system’ with its packages and configuration to other server? Just send that ZFS BE over SSH to other server, change hostname/IP and reboot.

                    Need to provision new laptop? Just send that BE to other laptop and reboot.

                    ZFS Boot Environments are so powerful and great that I do not want to live without them anymore.

                    1. 2

                      ZFS also has checksumming.

                    2. 1

                      Wrt your first point, I don’t really see the point of root/boot-on-zfs.

                      A few things:

                      • If everything is on ZFS, it’s easier to administer than if only some things are on ZFS
                      • UFS requires a fixed-size partition. What is the split between your home directory and other data? With ZFS, it’s entirely dynamic.
                      • ZFS can do compression and there’s a lot of redundancy in system files. ZFS is actually very performant for read-mostly workloads, even with compression enabled.
                      • Boot environments. If you do a major-version upgrade and find a bug in a driver that makes your system unbootable, ZFS makes it trivial to roll back (and this is integrated in the FreeBSD bootloader)
                      • Snapshots before package updates.
                      • Snapshots before you go and futz with configuration. If you’re disciplined and put all of /etc and /usr/local/etc in some revision control system, you don’t need this. If you’re lazy, being able to do a snapshot before you modify any config files and then discard it once you know you’re doing it right is nice.
                      • Backups. The system files are easy to replace, but your configs, the specific set of packages you have installed, and so on are a bit harder. ZFS makes backing up entire filesystems trivial.
                      • Block-level checksums. Really bad things can happen as a result of corruption in programs that run as root (or, worse, the kernel).
                1. 2

                  Recently I was thinking if it’d be feasible to compile custom matching routine and plug it into grep with LD_PRELOAD, so I fould this repository interesting and gave it a star :)

                  1. 1

                    Now I’m wondering if it would be possible to build a modular tool that will spit out the minimum source code needed for the given command so that it could be compiled separately and possibly provide speed benefit.

                    1. 2

                      Do you mean something like re2c or libfsm?

                      1. 1

                        Haven’t heard of them before. Glanced through them and I feel like they could be used to create what I’m thinking of but not quite yet.

                        Instead of translating the script like sed-to-C being discussed in this thread, the grep or sed implementation itself could provide the source code. For example, something like grep --give-source 'foo.*bar' gives source code that can be compiled and then run against the input to get lines matching the given regexp. I don’t have an idea of how complicated these tools are, so this may be too difficult to implement.

                        1. 2

                          Slightly unrelated but reminds me a bit of a bunch of tools that can output their bash completion code, npm completion for instance.

                  1. 1

                    I think the idea to use base-several tens to encode binary data in URLs is more useful for sets than for identifiers. I’ve even played with the idea of doing it at PostgreSQL level, like this:

                    SELECT translate(en, '+/', '-_') -- safe for URLs
                    FROM to_hex('0100001000100000010000011000010000000000011011010000000000000000'::bit(64)::bigint) t, --set of boolean options
                        octet_length(t) o, -- count the number of digits
                        lpad(t, o + (o & 1), '0') l, -- decode() requires an even number of digits
                        decode(l, 'hex') dec, -- convert (hex-encoded) string to binary data
                        set_byte(dec, 0, 14) s1, -- set lowest 8 bits to integer 14
                        encode(s1, 'base64') en; -- convert bytes to (base64-encoded) string
                    
                    1. 2

                      That’s all cool and all but my biggest concern with statically linked binaries is: How does ASLR even work? What mechanism can a static binary do to make sure the libc it shoved into itself isn’t predictably located?

                      1. 5

                        Look into static PIE. gcc has had support for a few years now, and musl even before that (musl-cross-make patched gcc before support was upstreamed in version 8).

                        1. 2

                          Does ASLR work?

                        1. 7

                          EDIT: subthread TL;DR for future readers: the overall query plan has a wildly inaccurate cost estimation, based on some dubious row count estimation on both sides of a hash join. The massive cost estimate trips the thresholds for JIT compilation, even though the real cost of the query is negligible. The 13 second JIT compilation time is a red herring: PostgreSQL measures wall time rather than CPU time, so the times were artificially inflated by the extreme system load of numerous competing compilations. Run in isolation, the query compiled in ~100ms.

                          A few things jump out at me, that I’ll address one at a time:

                          • The initial cost estimate is quite small: cost=0.86..12.89
                          • JIT was slow for every query, not just one.
                            • The query has an inlined constant: Index Cond: (user_id = 7002298).
                          • Initial code generation ran shockingly long: Timing: Generation 108.775 ms.
                            • Consider code generation is fairly similar to writing a C file into memory with sprintf.

                          The initial cost estimate was basically 13. Why on earth was this query JIT optimized? The PostgreSQL 12 query planning docs for JIT cost parameters specify queries won’t be JIT optimized at all until their cost is 100,000 by default. Inlining and optimization passes default to 500,000. Yet all 3 occurred.


                          Why was the JIT slow for every query? Surely PostgreSQL caches query plans and JIT code? If so, could the inlined constant interfere with plan caching in this case? I actually don’t know, so I looked it up and found the PostgreSQL 12 docs for the plan_cache_mode planner option. Here’s the relevant snippet:

                          Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions. Thus, use of a generic plan saves planning time, but if the ideal plan depends strongly on the parameter values then a generic plan may be inefficient.

                          From this I surmise it’s entirely possible PostgreSQL did not reuse a generic plan in this case. @piranha specified prepared statements did not help in another Lobste.rs comment. It might be worth trying this query again using force_generic_plan, just to see what happens.


                          Why did code generation take so long? This query seems extremely simple–both from what we see in the explain, and what @piranha tells us the query does.

                          Going out on a limb here, I guess that the long code generation time indicates a lot of code was generated. Then the inlining phase took 1s and the emission phase took a whopping 5s! I further guess that whatever took so long to inline exploded the code size even more, leading to the massive emission time.

                          But what could cause this? The PostgreSQL 12 JIT accelerated optimizations docs specify:

                          Currently PostgreSQL’s JIT implementation has support for accelerating expression evaluation and tuple deforming.

                          Not much. Either the query contains some expression with lots of underlying complexity, or the table schema is quite complex to decode. For both cases I suspect something like JSON field extraction. If the JIT pulls in a full JSON decoder for this query, that would definitely increase the code size and compile time.

                          But I’m really reaching on the JSON guess, with hardly a shred of evidence.


                          That’s what comes to mind so far, but there’s really no way to tell without seeing explain (analyze, verbose, buffers), the table schema, the full query, and all of the query planner settings used in production.

                          1. 3

                            Quickly replying on last part: I’ll get to my laptop in an hour or so and update post with isolated query, schemas and full explain. I’m not sure if we changed any query planner settings at all…


                            OTOH, I’m wary a bit of leaking too much information, but looking at query plan it’s understandable that PG just thinks too many rows will return from this join. So JIT enables because its default costs configuration permits that. :) No problem here.

                            Why it takes so much time is indeed a more interesting question!

                            1. 2

                              For the part of the plan you posted, PostgreSQL correctly estimates the result will have 1 row. Does the redacted part of the plan have a higher cost= field?

                              1. 3

                                https://paste.in.ua/4215/ - that’s the whole plan, but without verbose, buffers - will check on that later on. :)

                                EDIT: argh, something has changed since april and when I enable JIT right now for this query results are like this:

                                 Planning Time: 2.240 ms
                                 JIT:
                                   Functions: 101
                                   Options: Inlining false, Optimization false, Expressions true, Deforming true
                                   Timing: Generation 13.719 ms, Inlining 0.000 ms, Optimization 3.280 ms, Emission 83.755 ms, Total 100.753 ms
                                 Execution Time: 102.812 ms
                                

                                EDIT2: I think that’s because of load. When I enable JIT for a single query it takes 105 ms, but when it’s enabled for everything, then this being one of the most common queries just destroys CPUs. So that’s where the weird 13 seconds are coming from. Updated my post.

                                1. 2

                                  I thought about that possibility, and I hoped PostgreSQL was recording CPU time rather than wall time for JIT compilation. Sadly you are correct. PostgreSQL ExplainPrintJIT clearly uses instr_time, which in turn clearly uses clock_gettime.

                                  This time around your query only compiled 101 functions, not 138, so there may be a difference there. But under tremendous load 100ms could become 13s–I’ve seen worse.

                                  I think that closes the book on the 13s compilation time. Now you just need to convince PostgreSQL to reuse the query plan, or estimate properly. Looks these are your two biggest problems estimation-wise:

                                  ->  Seq Scan on role role_1  (cost=0.00..2.49 rows=98 width=64) (actual time=0.013..0.013 rows=1 loops=1)
                                  

                                  Based on the table name I assume you could slap a unique index on role and get a correct rows=1 estimate. Honestly that looks like an oversight, and should probably be fixed if so. In theory there’s nothing wrong with a sequential scan on a small table, but it can confuse the query planner.

                                  ->  Bitmap Heap Scan on supplier sup  (cost=9.23..145.41 rows=62 width=85) (actual time=0.070..0.070 rows=0 loops=1)
                                        Recheck Cond: (emails @> ARRAY[(u.email)::text])
                                        ->  Bitmap Index Scan on supplier_emails  (cost=0.00..9.22 rows=62 width=0) (actual time=0.057..0.057 rows=0 loops=1)
                                              Index Cond: (emails @> ARRAY[(u.email)::text])
                                  

                                  For this one, I’d guess supplier_emails is also relatively unique. I’m not sure how PostgreSQL analyzes array values, so if you filter on emails often, converting the array column to a join table may help out the optimizer.

                                  1. 2

                                    Part about role is like that:

                                    LEFT JOIN
                                      (SELECT ru.user_id,
                                              unnest(role.permissions) AS permissions
                                       FROM ROLE
                                       INNER JOIN roles_users ru ON ru.role_id = role.id) role ON role.user_id = u.id
                                    

                                    roles_users is:

                                    # \d roles_users
                                                 Table "public.roles_users"
                                     Column  |  Type   | Collation | Nullable | Default 
                                    ---------+---------+-----------+----------+---------
                                     user_id | integer |           | not null | 
                                     role_id | integer |           | not null | 
                                    Indexes:
                                        "roles_users_pkey" PRIMARY KEY, btree (user_id, role_id)
                                        "roles_users_role_idx" btree (role_id)
                                    

                                    role is:

                                    # \d role
                                                                          Table "public.role"
                                       Column    |          Type          | Collation | Nullable |             Default              
                                    -------------+------------------------+-----------+----------+----------------------------------
                                     id          | integer                |           | not null | nextval('role_id_seq'::regclass)
                                     name        | character varying(80)  |           |          | 
                                     description | character varying(255) |           |          | 
                                     permissions | text[]                 |           |          | 
                                    Indexes:
                                        "role_pkey" PRIMARY KEY, btree (id)
                                        "role_name_key" UNIQUE, btree (name)
                                    

                                    not sure what can I make unique here? Am I missing something?

                                    As for supplier_emails… It’s a new addition, we slapped it on to do something quickly and yeah, it’s being converted to a join table, because it’s used more and more.

                                    1. 2

                                      Ah. PostgreSQL thinks that the average user has 3 roles and the average role has 10 permissions. Because of that, it’s using a hash join for role inner join roles_users, instead of a nested loop lateral join on index seek. You have all the unique indexes you need, the query planner just doesn’t care.

                                      I’ve seen this plenty of times with PostgreSQL. You could try running ANALYZE on roles_users and roles to update statistics, but it may never get better. Autovacuum should analyze tables automatically, but those tables may not have changed enough to trip the automatic analyze, as they are presumably relatively small. Worth a shot.

                                      Luckily this doesn’t actually create a real problem for you, as the query plan works alright as long as the JIT is disabled. It’s much more frustrating when this kind of thing creates an outrageously slow query plan and you have to separate your query into subqueries at the application layer, just to avoid PostgreSQL picking a wildly inappropriate join strategy.

                                      I’ll look at this again in the morning but for now it’s 3AM in my time zone, so I’m going to bed. I don’t get to analyze SQL queries at my current job, so I’m getting a kick out of it.

                                      1. 2

                                        PostgreSQL thinks that the average user has 3 roles and the average role has 10 permissions. Because of that, it’s using a hash join for role inner join roles_users, instead of a nested loop lateral join on index seek. You have all the unique indexes you need, the query planner just doesn’t care.

                                        Luckily this doesn’t actually create a real problem for you, as the query plan works alright as long as the JIT is disabled. It’s much more frustrating when this kind of thing creates an outrageously slow query plan and you have to separate your query into subqueries at the application layer, just to avoid PostgreSQL picking a wildly inappropriate join strategy.

                                        What if the JOIN was transformed into a LATERAL one? It’d be perfect if Postgres figured it out by itself, but perhaps query separation at the application layer isn’t necessary.

                                        1. 2

                                          I’ve had this problem before, and tried exactly that. It ignored the lateral join same as it’s already ignoring the unique index on role. It thinks it’s doing the cost-effective thing as-is.

                                          Unfortunately even if the lateral join worked, I don’t think it would change the row estimate or overall cost estimate. A lateral join would be more efficient here, but the query is already fast enough (unlike the other times I’ve seen this happen). The cost analysis is what’s improperly triggering the JIT.

                            2. 2

                              I looked at this yesterday as well, and I couldn’t really figure out why the JIT was being used. One thing that struck me about the JIT query plan though is Functions: 138, which seems rather high. I suspect this may be the cause, and may also explain why it takes quite long to compile?

                              Bit difficult to say without looking at the actual query/schema though, and I couldn’t really find what what “Functions” means exactly (it doesn’t seem documented as far as I can find, but I think it’s the number of generated functions).

                              CC: @piranha

                              1. 2

                                https://lobste.rs/s/r6ydjp/postgresql_query_jit#c_qfxrsk - I think EDIT2 captures the essence of the problem. :-)

                            1. 6

                              Great read! I use scp quite a lot. It sounds like I should learn sftp.

                              1. 15

                                I’d really recommend rsync instead. As the article mentions, it’s essentially a drop-in replacement for scp and it’s much more performant in most use cases.

                                1. 8

                                  This. Also, while rsync can be used like a drop-in scp replacement, it can do a lot more than that and for copying or syncing whole trees or filesystems there’s nothing better. Just as ssh is the swiss-army knife of secure connections, rsync is the swiss-army knife of file tree copying / moving / duplicating. It’s well worth becoming thoroughly familiar with its command-line options.

                                  1. 5

                                    rsync is a bit painful about when it copies files or directories and where exactly they will end up in what situations. I came to prefer https://github.com/DragonFlyBSD/cpdup because its UI is safe enough for me.

                                    1. 3

                                      You only have to remember one basic Unix rule: foo is the directory itself and foo/ is the contents of the directory

                                  2. 2

                                    Doesn’t rsync require a server, or at least rsync binary to be present on the remote machine?

                                    1. 2

                                      No server required on remote, but yes you do need the binary on remote. I haven’t had any problems with it not being installed places I need, but you may have some different targets than I do. There are a couple of statically compiled copies of rsync you can drop in for different arches.

                                    2. 1

                                      rsync can’t do what scp -3 does, can it?

                                      1. 1

                                        Can’t say I’ve ever used -3. So without -3 it sounds like it tries to copy directly from one remote to another, but with -3 it’ll copy to the local machine and then pass the file through to the remote? If I understand correctly, then I believe you’re correct, rsync has no builtin capabilities for handling copying between two remotes. You’ll have to setup port forwards.

                                    3. 7

                                      sftp behaves very similarly to ftp when it comes to commands, plus any recent releases come with tab completion for both local and remote files.

                                      All you’ll probably ever need is put filename, get filename, put -r folder, get -r folder, ls, lls, cd, lcd. You can also run commands locally like “!vim filename”.

                                      It’s very easy to get used to.

                                      1. 2

                                        Tried it out today. Seems nice and easy to use

                                    1. 2

                                      Could have used flat (denormalized) tables. They can be refreshed whole at once and in the background (using table inheritance) or on a row basis (using triggers).

                                      1. 5

                                        Because C/C++ became popular long before auto-formatters became popular (and e.g. part of pre-submit hooks), there are many different (and incompatible) C/C++ styles in use.

                                        classic indent(1) is from some time around 1977. See https://github.com/freebsd/freebsd/blob/master/usr.bin/indent/README for a bit of its history.

                                        1. 1

                                          How is this related to https://github.com/freebsd/freebsd/? The commit hashes don’t match the tree on GitLab.

                                            1. 1

                                              Wow, that’s… wow. The description of how vendor branches are going to work to avoid using submodules makes me so very glad I don’t maintain anything in contrib. Thank you for your work keeping dtc in sync, I’m sorry to hear that the git working group has decided to make your life harder than it needs to be.

                                              1. 1

                                                What is the best way to handle vendor branches in cases like this?

                                                1. 2

                                                  If you don’t have an allergic reaction to submodules, clone each of the upstreams and then the directories in the contrib directory submodules. Keep an upstream branch in each of the cloned repos and a contrib branch that carries any changes you’ve made. It requires users to add –recurse when they clone and pull, but given how many things use submodules now that’s a pretty common requirement. If you don’t have any changes, then the submodules just points at your mirror of the upstream. If you do, then having a self-contained git repo for that project makes merging relatively painless and preserves history. If you’re happy to burn some CPU cycles, you can use git-imerge in rebase-with-history mode every time you pull from upstream to give a clean history of your changes applied on top of upstream, while also making it possible for people who have local changes to merge your new version. In both cases, you update the shared version by just bumping the submodule version. Your main repo’s history just contains a hash and, for extra benefits, people can clone the whole history of your project but use shallow clones of the submodules to avoid having, for example, the entire history of the LLVM versions that you’ve shipped.

                                                  1. 1

                                                    It requires users to add –recurse when they clone and pull, but given how many things use submodules now that’s a pretty common requirement.

                                                    I keep all my vim plugins as submodules in .vim/bundle and every single time I’ve ever cloned my .vim I’ve gone:

                                                    $ git clone $URL
                                                    $ ln -s .vim/vimrc .vimrc
                                                    $ vim 
                                                    oh, no plugins, wtf? oh ffs
                                                    $ rm -rf .vim
                                                    $ git clone $URL --recurse-submodules
                                                    $ vim
                                                    :Helptags
                                                    

                                                    I don’t think I’ve ever remembered it once, and I can never remember the command for submodules. Is it git submodule init --update? Or git submodule update --init? Easier to just clone again with --recurse-submodules.

                                                    Your main repo’s history just contains a hash and, for extra benefits, people can clone the whole history of your project but use shallow clones of the submodules to avoid having, for example, the entire history of the LLVM versions that you’ve shipped.

                                                    And of course they do have the full history of the LLVM versions that you’ve shipped if they want them, they just don’t have to if they don’t want them. It really is the best of both worlds.

                                                    I must say I find keeping submodules in sync with what they’re meant to be just bizarrely painful, though. Do you know of any way to just say ‘always keep the submodules at whatever commit they’re meant to be’?

                                                    1. 1

                                                      I must say I find keeping submodules in sync with what they’re meant to be just bizarrely painful, though. Do you know of any way to just say ‘always keep the submodules at whatever commit they’re meant to be’?

                                                      There’s a bot on GitHub that notices if your submodules are out of date and sends a PR to update it. I’m not sure if you have to explicitly opt in to it (it seems to run on everything in the Microsoft org).

                                                      I know that several of my colleagues have just added some custom git verbs to handle them (and also do common things like stash, update, stash pop), so never have to deal with the core git UI abominations for them.

                                          1. 11

                                            I would love to see an explanation on these wats, I have no idea why they happen.

                                            1. 9

                                              Luckily there is a spec! Many of these are explained in chapter 2, which is only 1732 pages long. Should be pretty straightforward.

                                              1. 1

                                                select a, b from nums order by 2;

                                                sorts by values in the second column

                                                select a, b from nums order by 1+1;

                                                sorts by a constant, just like order by 3*1 would

                                                ((select a from nums) union (select a from nums)) order by b;

                                                the set resulting from this operation (union) does not include a column named b

                                                select b from nums group by a;

                                                b is functionally dependant on the primary key a

                                                select user, password from users;

                                                see select user; alone. I suggest you always qualify your columns

                                                I’ll add my own “wat”:

                                                select sum(nums.a) from nums group by ();

                                              1. 2

                                                Cool, it understands Postgres’s parameterized queries, if I’m not mistaken.

                                                1. 3

                                                  As long as you don’t have hundreds of millions of jobs, yes. If you have that many jobs, things become interesting. It is very nice ot be able create jobs inside transactions.

                                                  Using a varchar() type instead of text in PostgreSQL in 2019, hmmm.

                                                  1. 3

                                                    Using a varchar() type instead of text in PostgreSQL in 2019, hmmm.

                                                    Hah, it’s funny to see this comment, I was just today wondering which one of these I should be choosing for a column that contains a URL fragment - eg oregon in /usa/oregon/portland. Being a newbie to Postgres (my SQL background is in MySQL) I have been choosing text essentially because of the name. Would you please expand on why text is the better choice than varchar in 2019?

                                                    1. 3

                                                      why text is the better choice than varchar in 2019

                                                      It isn’t, they’re equivalent. TEXT is considered idiomatic in Postgres community, and that’s it.

                                                  1. 1

                                                    This has got to be PR, right? This guy completely glosses over the existence of Docker.

                                                    1. 3

                                                      There is no Docker for FreeBSD.

                                                      1. 2

                                                        I meant like, he’s complaining about how you can’t use Linux to have reproducible environments, and he completely glosses over Docker

                                                        1. 1

                                                          Was Docker already widely-used in 2016? (Genuine question)

                                                          1. 2

                                                            I don’t have as good of an eye on the space as others, but from what I remember, it started picking up steam in 2014-2015

                                                        2. 2

                                                          It’s in progress, as far as I know: https://reviews.freebsd.org/D21570

                                                        3. 3

                                                          We don’t need docker, we’ve had jails since 1999

                                                          1. 2

                                                            Early in 2016 somebody tried out kubernetes, got a long way on having it up & running, but couldn’t keep up with the maintenance burden vs moving the environment onto it. At the time docker was a too-fast-moving target for a 3 person dev team to pick up the burden of maintaining this. Now, it would be a different kettle of fish, with mainstream stable distros providing this. LXC would have been an option at the time, though.

                                                            Don’t get my wrong, I’m violently in favour of containers, but violently against FLOSS using a commercially controlled service as the main way of building and consuming software. Feel free to enlighten me if Docker’s moved on now and isn’t beholden to the VC behemoths for a sweet exit & cash windfall?

                                                            1. 1

                                                              To be clear, I have never used Docker (other than messing around with it for an hour a few years back). I have no place to say if it’s good software or not.

                                                              I just find it fishy that Docker didn’t get a mention at all.

                                                          1. 8

                                                            The fact that you can’t apparently compose functions seems a bit disappointing for something that aims to mimic sql, but oh well. The only thing that seems really bad there is that in the example given, it returned a nonsensical result instead of an error.

                                                            $ fselect "MIN(YEAR(modified)) from Downloads"
                                                            -1
                                                            

                                                            This looks really really nice though because it looks like the syntax may be a lot easier to use and remember than find’s minilanguage, plus it looks like they’ve done a really good job of making output formats be helpful and easy.

                                                            One of the things that’s endlessly painful with scripts that start with a find invocation is feeding filenames into pipelines that then get messed up by funny characters in filenames.

                                                            Stoked to try this out later.

                                                            1. 2

                                                              hey @0x2ba22e11, the author of this tool read the blog post and corrected the bug that prevented function composition here- I haven’t tried the fix yet but it’s being worked on at least!

                                                              1. 1

                                                                Nice! ❤

                                                              2. 1

                                                                You could use something like this, which doesn’t require a 5MB Rust binary and probably some insane dependency chain (tested on OpenBSD):

                                                                find . -print0 | xargs -0 stat -f "%m%t%Sm%t%N" | sort -n | sed 1q | cut -f 2-
                                                                
                                                                1. 4

                                                                  Yeah, that is precisely the sort of thing I hate writing. What does that snippet do if one of the filenames has a newline in the middle of it? Nothing good, I’d wager.

                                                                  1. 2

                                                                    What does that snippet do if one of the filenames has a newline in the middle of it

                                                                    Wait, how often do you encounter filenames with characters like newlines in them? Is foo\n.bar really a thing?

                                                                    1. 2

                                                                      Anything except / or \0 is legal on some filesystems. Just dealing with spaces in shell scripts is already a disaster.

                                                                      1. 1

                                                                        Well, sure. I didn’t ask if it was possible, I asked how likely it was you would see files with \n in the name… I’ve never come across this, which is why I ask.

                                                              1. 1

                                                                No spec means there’s nothing keeping rustc honest.

                                                                On this topic, I found this to be interesting and informative: https://users.rust-lang.org/t/is-there-a-plan-to-get-a-formal-rust-language-specification-at-some-point/21622/8

                                                                1. 2

                                                                  There is also https://github.com/n-t-roff/heirloom-ex-vi which is: “The Traditional Vi (vi with many enhancements from Gunnar Ritter)”

                                                                  1. 1

                                                                    I have a feeling that vim would make a great part of her programming environment due to its telegraphic style of interaction. I wonder what options for HCI will be there if I get both RSI and glaucoma. I also wonder if this talon thing could be a little easier to use if it worked like a trie or something similar.

                                                                    1. 1

                                                                      I immediately thought of vim when she explained how “delete 4” is ambiguous. In vim it never is and I agree that it could be very useful for her.