1. 2

    Working on revamping my Vim reference guide.

    Currently reading Ella Enchanted - whimsical fantasy, liking it so far.

    1. 4

      After about 50 days of break, will start working on a book again. I’m planning to revamp Vim reference guide that I wrote a few years back.

      1. 1

        After a hectic week (finishing a GUI app, traveling, newsletter issue, programming deals, etc), hoping for a work free weekend.

        Currently reading Skate the Thief.

        1. 2

          Wow book looks great! Do you know the author, or how did you come across it?

          1. 1

            I saw it on /r/FreeEBOOKS/

        1. 12

          This isn’t present in the manpage (at least, not on my install), but it is in the info page. And since I never check the info page (does anyone?), I had no clue.

          For GNU tools, the info pages always have the complete documentation. I prefer to look that up online instead of info command though, since I haven’t troubled myself to learn the tricks of info navigation and online pages are visually easier to digest.

          Regarding multibytes, I think very few tools in coreutils support it. I only know of wc -m option (and this will not treat grapheme clusters as a single character). Some more tools that’ll trip you up if you expect multibyte processing: tr, head and tail (edit: just remembered sort -k<f>.<c> as well)

          1. 21

            Yeah, I’d straight up forgot that info pages exist. It feels very “GNU-y” that they decided to keep the actual complete documentation in something that only they use.

            1. 7

              Man pages are Unix, and GNU’s Not Unix.

              I have my issues with the current state of the FSF, and I can intellectually grasp that GNU code can be really gnarly and probably doesn’t need to be now that all the world’s a Linux, but complaining that GNU code isn’t POSIX-compliant or that it doesn’t adhere to the “Unix ethos” is missing the point. GNU’s goal isn’t to recreate Unix but to make a new thing that is better.

              That said, not handling Unicode correctly in 2021 feels a bit off.

              1. 16

                I have a longer blogpost that I mean to write about how deeply frustrated I am with emacs core development (and I say this as someone who’s used it as an editor for years and desperately wants it to succeed), and how I think it stems from GNU/the FSF not being willing to let go of the past and the fact that they’ve failed in one of their fundamental goals. This is probably the most emblematic example of that that comes to mind.

                Also yeah. This would’ve been okay in 2000, or maybe 2010, but… come on.

              2. 3

                Info manuals are can and are exported to HTML, which everyone can access through a web browser. That doesn’t seem ‘something that only they use’.

                1. 6

                  I mean that nobody else writes info manuals.

            1. 1

              Continue working on Python GUI for regex practice. Hope to finish it by next weekend.

              Started a weekly newsletter today, will spend some time to prepare for the next week’s issue.

              On a reading-slump (didn’t much like the last one I read and gonna drop the current one). Might skip starting another novel this weekend and instead watch some movie, go for a walk (weather permitting), etc.

              1. 2

                Write a couple of pending book reviews.

                Check out a few newsletters so that I can start one of my own.

                1. 2

                  Festival (Diwali) week here and Reaper (Cradle 10) releasing tomorrow, so I won’t get much done work wise.

                  1. 1

                    Continue working on GUI app for practicing Python regex. Hope to finish MVP tomorrow. Code is quite messy, need to refactor before moving on to add more features.

                      1. 4

                        The which command is a broken heritage from the C-Shell and is better left alone in Bourne-like shells.

                        That’s enough for me! Burn the which!

                        1. 2

                          Am I the only (t)csh user around here?? 8-)

                          1. 4

                            I am sure you’re not… but maybe one of the few who admits it.

                      1. 4

                        Going to see the new Dune! I have pretty high expectations, so I’m a little afraid of being disappointed. But, I must not fear, fear is the mind-killer.

                        1. 2

                          All the reviews I’ve come across so far are very positive.

                          1. 2

                            Oh, shit, I’m doing that as well! and taking my wife as well. She has no history of dune, except for one evening she suffered through the Lynch version that, when you think about it, aged badly for the uninitiated.

                            1. 1

                              Yes, he threw in a few too many David Lynch-y things, and it just was too much.

                            2. 1

                              I was a little worried about the runtime but I actually enjoyed it a lot and the 2 hours and a half went through without any problems. The soundtrack is also very well executed, I think you won’t be disappointed :)

                            1. 3

                              Building a GUI app in Python that’ll help programmers to practice Python regex exercises.

                              1. 2

                                Just published Command line text processing with GNU Coreutils, so I’m going to relax this weekend (might stay away from book writing for the rest of the month).

                                I’ve enjoyed reading slice-of-life fantasy novels like The Wizard’s Butler recently, so I’d be looking for more such books to read.

                                1. 2

                                  Looks like a pretty good weekend read, good job; I could use a coreutils refresher for sure.

                                  1. 2

                                    Thanks. And yeah, this works as a reference for me as well :)

                                  1. 10

                                    Once we move beyond one-liners, a natural question is why. As in ‘Why not use Python? Isn’t it good at this type of thing?’

                                    The reasons provided are fine, but for me the main reason is speed. AWK is much, much faster than Python for “line at a time” processing. When you have large files, the difference becomes clear. (perl -p can be a reasonable substitute.)

                                    Once you are writing long AWK programs, though, it’s time to consider Python or something else. AWK isn’t very fun once data manipulation gets complicated.

                                    1. 4

                                      (perl -p can be a reasonable substitute.)

                                      +1. In my eyes, it’s Awk and then Perl. Perl turns out to be much better for these purposes than other scripting languages. The difference in startup time between Perl and Python is very significant. If you don’t use (m)any modules, Perl scripts usually start just as quickly as Awk scripts.

                                      1. 2

                                        I’m sure that’s true for some kinds of scripts, but that doesn’t match my experience/benchmarks here (Python is somewhat faster than AWK for this case of counting unique words). For what programs did you find AWK “much, much faster”? I can imagine very small datasets being faster in AWK because it’s startup time is 3ms compared to Python’s 20ms.

                                        1. 2

                                          For what programs did you find AWK “much, much faster”?

                                          Any time the input file is big. As in hundreds of MGs big.

                                          I used to have to process 2GB+ of CSV on a regular basis and the AWK version was easily 5x faster than the Python version.

                                          1. 1

                                            Was the Python version streaming, or did it read the whole file in at once?

                                            1. 1

                                              Streaming.

                                          2. 2

                                            Regarding your results, 3.55 under awk is with or without -b?

                                            I get 1.774s (simple) and 1.136s (optimized) for Python. For simple awk, I get 2.552s (without -b) 1.537s (with -b). For optimized, I get 2.091s and 1.435s respectively. I’m using gawk here, mawk is of course faster.

                                            Also, I’ve noticed that awk does poorly when there are large number of dictionary keys. If you are doing field based decisions, awk is likely to be much faster. I tried printing first field of each line (removed empty lines from your test file since line.split()[0] gives error for empty lines). I got 0.583s for Python compared to 0.176s (without -b) and 0.158s (with -b)

                                            1. 4

                                              Also, I’ve noticed that awk does poorly when there are large number of dictionary keys.

                                              Same here. If you are making extensive use of arrays, then AWK may not be the best tool.

                                          3. 2

                                            Once you are writing long AWK programs, though, it’s time to consider Python or something else. AWK isn’t very fun once data manipulation gets complicated.

                                            I dunno, I think it’s pretty fun.

                                            I am consistently surprised that there aren’t more tools that support AWK-style “record oriented programming” (since a record need not be a line, if you change the record separator). I found this for Go, but that’s about it. This style of data interpretation comes up pretty often in my experience. I feel like as great as AWK is, we could do better - for example, what about something like AWK that can read directly from CSV (with proper support for quoting), assigning each row to a record, and perhaps with more natural support for headers.

                                            1. 2

                                              You are right. Recently I was mixing AWK and Python in a way that AWK was producing key,value output easily readable and processed later by Python script. Nice, simple and quick to develop.

                                            1. 2
                                              • Working on second draft of my book on coreutils text processing tools
                                              • Read an advanced copy of a book last week, have to compile and send the typos to the author
                                              • Planning to read The Wizard’s Butler by Nathan Lowell
                                              1. 3
                                                • Working on my book on text processing with coreutils. Hope to finish first draft by next week.
                                                • Reading an advanced-reader-copy (progression fantasy genre)
                                                1. 2

                                                  Cool topic. I feel like this is one of those minor superpowers that generally has to be cobbled together from gobs of sources.

                                                  1. 1

                                                    Thanks. I was familiar with some of the command with personal use, reading/answering stackoverflow questions, etc. For some, I’m reading the manual and just trying out examples that make sense to me. I’m not a native English speaker and some of the documentation has been difficult to understand for me.

                                                1. 10

                                                  The Python script in the article is really an example of how shell scripts are superior to Python for a lot of jobs.

                                                  #!/usr/bin/env python3
                                                  import subprocess, random, glob
                                                  print('HERE WE GO')
                                                  
                                                  sounds = glob.glob('/Users/{yourName}/Library/Sounds/*.aiff')
                                                  sound = random.choice(sounds)
                                                  print('randomly selected sound: ', sound)
                                                  command = 'defaults write .GlobalPreferences com.apple.sound.beep.sound /Users/{yourName}/Library/Sounds/{}.aiff'.format(sound)
                                                  
                                                  # You could also define an array of sounds yourself,  
                                                  # if you don't want every .aiff file to be a possibility
                                                  # sounds=['tabarnak1', 'tabarnak2', 'tabarnak3']
                                                  # sound=random.choice(sounds)
                                                  # command = 'defaults write .GlobalPreferences com.apple.sound.beep.sound /Users/{yourName}/Library/Sounds/{}.aiff'.format(sound)
                                                  
                                                  subprocess.call(command, shell=True)
                                                  

                                                  {yourName} could be fixed by using Python’s env or ~ expansion or whatever, but this is much simpler as a shell expansion: ~/Library/Sounds/*.aiff. Using the shell=True flag on subprocess makes the subprocess call totally unsafe and subject to being broken by files with spaces in them anyway. This script is functionally identical but safer:

                                                  #!/bin/bash
                                                  set -euo pipefail
                                                  echo "HERE WE GO"
                                                  local FILE=$(ls ~/Library/Sounds/*.aiff | sort --random-sort | head -n 1)
                                                  echo "randomly selected sound: $FILE"
                                                  defaults write .GlobalPreferences com.apple.sound.beep.sound "$FILE"
                                                  
                                                  1. 2
                                                    ls ~/Library/Sounds/*.aiff | sort --random-sort | head -n 1
                                                    

                                                    can be simplified to

                                                    shuf -n1 -e ~/Library/Sounds/*.aiff
                                                    

                                                    Checked with GNU shuf, works even with spaces in filenames, not sure about other implementations.

                                                    1. 7

                                                      I thought about shuf, but Mac doesn’t have shuf built-in, and this is meant to pick Mac sounds.

                                                  1. 9

                                                    Here’s the feature I wish awk had: regex groups.

                                                    Awk has deep support for regexes, allowing a script to grab lines matching subtle patterns and process them.

                                                    Awk has deep support for tabular data, allowing a script to pick out individual columns, process them, and spit out new columns in response.

                                                    For some reason, Awk doesn’t let you write a regex that matches interesting parts of a line, and then process them. The best you can do is write a regex that matches all the interesting parts of a line, then write a bunch of gsub() calls with regexes that match each individual part. Those regexes are similar to the original, but each is different in small, easy-to-get-wrong ways.

                                                    I get it, it’s an old language, I shouldn’t judge it against my modern perspective. But still, writing a regex to pick out parts of a line, and then a block of code to process them, feels like the awkiest thing ever, and it baffles me that it doesn’t work like that.

                                                    1. 9

                                                      If you have gawk, you can use match() function and access matched portions via array. You’d still need a loop if there are multiple matches:

                                                      # using substr and RSTART/RLENGTH
                                                      $ s='051 035 154 12 26 98234'
                                                      $ echo "$s" | awk 'match($0, /[0-9]{4,}/){print substr($0, RSTART, RLENGTH)}'
                                                      98234
                                                      # using array 3rd argument in gawk
                                                      $ echo "$s" | awk 'match($0, /[0-9]{4,}/, m){print m[0]}'
                                                      98234
                                                      
                                                      # matched portion of first capture group
                                                      $ echo 'foo=42, baz=314' | awk 'match($0, /baz=([0-9]+)/, m){print m[1]}'
                                                      314
                                                      
                                                      # extract numbers only if it is followed by a comma
                                                      $ s='42 foo-5, baz3; x-83, y-20: f12'
                                                      $ echo "$s" | awk '{ while( match($0, /([0-9]+),/, m) ){print m[1];
                                                                         $0=substr($0, RSTART+RLENGTH)} }'
                                                      5
                                                      83
                                                      
                                                      1. 2

                                                        Yeah this is exactly why I use gawk , because you can capture groups with the match() function. It might be the only reason I use gawk!

                                                        FWIW Bash actually lets you capture groups with [[ $x =~ $pat ]] and ${BASH_REMATCH[1]}, etc. If for some reason you’re using bash but not gawk.

                                                        Oil makes this a little nicer with (x ~ pat) and _match(1) or _match('name')

                                                        I agree this feature is missing from the traditional tools in the POSIX spec.

                                                        1. 1

                                                          Quick random idea. What if we combined structural regular expressions + awk? That would be something! I haven’t seen structural regexes used in practice at all.

                                                          1. 1

                                                            If I recall correctly, the structural regex paper includes a psuedocode example for an awk-like language based on structural regexes. Just looking at it, it seemed like the Obvious Right Thing To Do, and it makes me sad that nobody (including myself) seems to have implemented such a thing.

                                                      1. 1
                                                        • Continue working on GNU coreutils book. Already behind on schedule, but that happens with me for every book.
                                                        • Hopefully write a blog post, been a long time.
                                                        • Currently reading To Sleep in a Sea of Stars, good so far.
                                                        1. 2

                                                          Is this your project? ‘show’ tag is for showing own project as far as I know.

                                                          I’d also suggest to add ‘graphics’ tag.

                                                          1. 7

                                                            It’s not really graphics programming, so that tag doesn’t fit.

                                                            1. 3

                                                              yeah idk about graphics… Its a spotify client written in rust that happens to use a gpu enabled ui toolkit. The project itself doesn’t do much with it