1. 10
  1.  

  2. 2

    Nice article! HTML parsing is also very possible in emacs via the xml library and the url-retrieve-synchronously function. Together they allow you to essential curl a page and then extract information from that page via xml- functions.

    1. 1

      How does the xml library handle tag-soup? It’s pretty common for HTML documents to have extremely invalid markup.

      1. 1

        xml.el was written for Gnus, so has to handle HTML somwhat but I haven’t tested.

        https://www.emacswiki.org/emacs/XmlParsers says that as of emacs 24 you can use libxml2 directly from emacs lisp, which is one of the best HTML tag soup parsers out there (powered Nokogiri for ruby, for example).

    2. 1

      If you just object to the beaks in SGML inspired syntax, you could use https://en.wikipedia.org/wiki/SXML instead