1. 5

  2. 2

    If you are looking for a great (libgit2) git to html generator, this one rocks: http://git.2f30.org/stagit/log.html

    About escaping in shell, I solved it this way: https://github.com/josuah/bin/blob/master/git-index

    git -C "$1" log --graph --date=short --format='%H  %cd  %cn <%ce>%n%s%n' |
    sed -r -e 's|&|\&amp;|g' -e 's|<|\&lt;|g' \
    	-e 's|([0-9a-f]{8})([0-9a-f]{32})|<a href="commit/\1\2.html">\1</a>|'

    So not putting html in --format='', then use sed to 1) escape the < and & 2) look for everything that looks like a commit and convert it.

    result: http://josuah.net/git/bin.git/

    1. 2

      That solution has all the properties I criticized in the third section: “the worst part of shell” :-(

      It’s a solution that has to be recreated for each type of input data. It doesn’t handle multiline strings. It also doesn’t escape > or ".

      I’m curious if you understood my solution and if you have any criticisms of it. Some people didn’t want to pipe to Python, which could be understandable if you know say Perl better. But that is a detail. Awk is possible, but it’s much more awkward because it doesn’t have re.sub() that takes a callback (no first-class functions in Awk).

      The generator looks nice, but at that point it seems like cgit does the same thing dynamically, and saves you from all the tiny files.

      1. 3

        You are right, I did not see it. “The Worst Part of Shell: Pushing the Problem Around”.

        Rather than aiming to save storage with a dynamic generator, and saving RAM and CPU with a web cache (Varnish… which uses storage) making a complex huge pile of software (you also need CGI, but wait, nginx does not handle CGI, so you also need fcgiwrap), I like to have statically generated content. That may be a matter of taste… :)

        For multiline input, one could have been using awk.

        There is one thing that looks really trivial and usefull but does not have any standard solution so far as I am aware of: match “$1” and replace with “$2”, without interpreting “$1” and/or “$2” as regular expression. (I am paraphrasing you I guess “callback”).

        For a simple case like a generating an html git log, I prefer doing it with a specific but simpler solution, but for solving the problem in general for the shell, using python might be an overkill, but works every time, unlike with POSIX commands.

        I keep in mind the tip: greedy regex trick that prevent “\001” and “\002” in the content to be taken as delimiters.

        On a side note on oil shell: do you aim at solving problems at a more global scale rather than finding a completely different solution every time?

        1. 1

          Yes, glad you understood what I’m saying.

          Keep in mind the regex is now \x00([^\x00]*)\x00 so you don’t need the non-greedy trick.


          I’m no longer suggesting using 0x01 and 0x02 . NUL is better, but it needs specific support from git rather than being able to use bash.

          Yes the idea is to find a solution that works for all types of input. I don’t want to think about individual spaces, newlines, etc. I am going to fold in the re.sub() functionality of Python, which awk/grep/sed are missing.

          This is what lets you simply call html.escape() on each segment without fiddling with match positions and so forth.

          Also I’m not sure if awk/grep/sed even support \x00 to match NUL bytes?

          python -c 'print "a\0b"' | egrep ...? ...

          I don’t think there is anything you can do there to match NUL bytes. So yes the traditional Unix toolset is missing another thing I should have mentioned.

          1. 3

            awk '/\000/ { print }' works.

            This looks like a bit a python script with a shell-like syntax.

      2. 2

        I just realize that this does not work for your case (table), as there might be commits in the git commit message, which would produce one more cell in the table. I used a <pre> and a <a> for each commit.