1. 39
  1.  

  2. 45

    PDFs are not mobile friendly. That’s already a strong reason for HTML.

    I know the author claims that PDF reflow allegedly solves this issue, but I’ve yet to find a reader that supports it.

    1. 21

      Strongly agree. And what about accessibility? PDFs are awkward for people who need large text accommodations. And I don’t know how amenable they are to screen readers nowadays: originally it was quite difficult to recover the text from a PDF because each line of text was an independent element with no clear indication of how they connect to each other.

      1. 2

        These days you can at least add accessibility annotations to PDFs that support devices like screen readers. Not sure how many readers support this stuff though.

      2. 2

        KOReader does a halfway decent job, but I agree it’s not nearly good enough to obviate HTML.

      3. 35

        While this is somewhat better than Gemini, it misses the same point: just because most people make bad web pages doesn’t mean you can’t make good ones. I mean, it seems so obvious, but somehow a lot of people don’t understand this. It’s actually much easier than continuing to make JS applications!

        Every single one of the “advantages of PDF” except “self-contained” is also true of a well-written web page.

        HTML encourages document creep. Are you disciplined enough to stick to that subset? Are you sure you won’t be tempted to embed an interactive tweet or some other faustian convenience?

        Uh… yes, obviously?

        How will your users trust that your site’s plain-HTML-look doesn’t hide some malicious JavaScript tracking anyway

        It’s called noscript, duh.

        Even this page admits (albeit only in a footnote) that PDFs have a massive disadvantage over HTML: they’re much more difficult to edit and remix. And when I went to copy the text for that quote above, even that didn’t work right! The line breaks were omitted and words ran together.

        1. 7

          Agreed, and many of the benefits of PDF put forward are only true if you assume you’re restricting to the “good kind” of PDF (ie not letting authors embed a bunch of active/dynamic/remote content) but then goes on to argue that you can’t do the same thing with a subset of web standards because you can’t trust authors to actually limit themselves like that.

          Further, the bad actors who abuse web content would be just as happy to do the same to PDFs if there were a critical mass of audience there… we’d see the same monetization/ad tech/surveillance platform gold rush as soon as it was lucrative.

          1. 6

            Even “self-contained” can be done on web pages pretty easily. You can inline all your CSS and your scripts (if any). For images, data: URLs are well-supported by browsers and are a published standard.

            This proposal seems completely pointless to me. None of the cited advantages are true, and the disadvantages are significant.

            1. 4

              Are you disciplined enough to stick to that subset?

              Am I lazy enough? Absolutely. I can’t even be bothered to install node or npm. The most I use is CGI.

              1. 3

                Gemini has a lot of great ideas, but in practice misses so many accessibility things, like compression, abbreviations, caching. The problem as other saying are that we absolutely have the ability to use existing tech to do this, but there aren’t enough restrictions to the web platform – and as browser vendors do get more strict, there’s constant, capitalist pushback from advertising firms, and those that profit off user data. Example: JavaScript can be a great tool and getting notifications for an automatically-sandboxed web app I install would be ideal, or having a web app that could update firmware over bluetooth for a device so I didn’t need to install an executable, but all of these technologies are used against us in the form of fingerprinting.

              2. 13

                You can make self-contained HTML documents.

                They can include text, images, videos, PDFs and even interactivity all in a single HTML file. That file is viewable on almost any device. Responsive layouts that work well with mobile devices, are basically the default. Ah and let’s not forget that browsers are highly secure renderers ana runtimes for untrusted content. That’s a claim most PDF viewers cannot make.

                1. 1

                  which technology do you use ?

                  1. 6

                    Most browsers support data URLs, which allow you to embed any resource that the browser can reach via URL inline, base64-encoded. This lets you create single-file HTML. It isn’t generally done because it’s quite useful to have different files for resources, for several reasons:

                    • Images shared across multiple pages can’t be cached by the browser, they must be fetched inline.
                    • You hit head-of-line blocking issues. If you want a large image to be displayed at the top of a page then nothing below can be rendered until that image has been fetched, whereas multiple files let the browser asynchronously fetch the image and keep laying out the rest of the page while it waits.
                    • You must serve everything from the same server and with the same cache policy, you can’t have a CDN or aggressive cache policy for the images and a more dynamic server for the rest of the page.
                    • You need to regenerate every page that uses a script / image / whatever if you want to update that image.

                    There are also some problems with the data URL implementation, specifically that base64 encoding is pretty inefficient and you can’t layer compression on top of it. In contrast, PDF allows any object to be independently compressed and embedded as raw data with the length defined in the object dictionary. The lack of compression doesn’t matter too much if the embedded data is compressed, except for the size increase from base64 encoding, which uses only 6 our of every 8 bits and so will increase the size by a third. This can be mitigated if the HTTP connection does compression, but that adds some latency.

                    That said, it would be fairly easy for most generators to provide a ‘download this page’ link that embedded every resource in a data URL. Or you could use a format like ePub, which is HTML in a zip file with all of the resources embedded.

                2. 7

                  Moving back to a print-like publication-format in PDF has good reasons and is interesting, but goes too far in my opinion.

                  I think the best approach is to make a static website with very light JavaScript on top (no AJAX). It’s also very well-archivable and easier to access than a PDF, which still involves a file-download and is impractical for content that changes often.

                  1. 13

                    I was interpreting it as heavy-handed sarcasm. If they’re really serious about switching to PDF, I guess I’ll just have to file it under the growing Hairshirt Computing movement. Is there a good PDF reader for CP/M?

                    1. 2

                      I think the best approach is to make a static website with very light JavaScript on top (no AJAX). It’s also very well-archivable and easier to access than a PDF, which still involves a file-download and is impractical for content that changes often.

                      for example, this page seems to archive pretty well. file > save creates an HTML file and a sidecar directory with images and CSS/script files. you could inline everything into the HTML if you really wanted everything to be one file, at the expense of load time.

                      1. 4

                        you could inline everything into the HTML if you really wanted everything to be one file, at the expense of load time.

                        It’s sad that browsers don’t do that by default. Having to carry around a .htm file and a folder of assets is a bit cumbersome.

                        1. 5

                          Safari’s had the “web archive” file format forever. Dunno why other browsers don’t do something similar; just create a zip archive of HTML + assets.

                          1. 5

                            The concept is much older than Safari: https://en.wikipedia.org/wiki/MHTML

                            I think we ended up here because people want to be able to pull individual assets out of downloaded web pages, and a zip file makes a lot of sense for supporting a single archive with simple extraction of pieces.

                        2. 2

                          you could inline everything into the HTML if you really wanted everything to be one file, at the expense of load time.

                          I suspect it would improve load time, or at least not be noticable: you remove a few round trips, at the cost of adding a couple of packets of data that can’t be cached. 1k of css fits comfortably into 1 packet, and allows for a lot.

                          1. 2

                            Inlining large images may add considerable load time because Base64 decoding is slower than reading binary files.

                            Still, I believe the simplicity of single-file archives loadable in any browser justifies the load time increase and removed the need for custom “web archive” formats.

                      2. 6

                        Interesting observation: some people (myself included) believe that the the best way to produce PDFs with complex layouts is by authoring an HTML file and asking the browser to convert that to PDF.

                        HTML+CSS is an ok language for describing what the stuff should look like, and browsers are great layout and rendering engines. PDF does seem like a more future-proof way to store and distribute the end result.

                        See https://github.com/asciidoctor/asciidoctor/issues/2972#issuecomment-441473190 for context.

                        1. 4

                          Its message ignored for a moment, this is an excellent webpage. Its content is engaging and in-depth, and the addition of metadata like a hash and the random statistics at the bottom are good old fun, without getting in the way of the functioning of the site or requiring a good graphics card (like fancy Javascript animations might).

                          1. 4

                            Yes, let’s go pack to pretending to be paper that’s a great idea. Not only that, but using a format everybody hates, is definitely not actually open for practical purposes, and actually has very, very few (if any) fully-functional OSS readers or writers.

                            1. 3

                              “But how can I add legitimate interactive features to my site, like user comments?!”

                              I can answer that : let news aggregation websites act as your comment section. You can link to lobsters, hackernews, reddit, etc in your PDF, and avoid the hassle of moderating a comment section.

                              1. 1

                                I have seen a few people have a per post email address like comment-ae127ty@example.com which I assume goes into an automated backend for SPAM detection and moderation before making its way to the webpage during the next static generation.

                                I have mulled over doing similar for my blog.

                              2. 3

                                Yes, great idea to have a 1MiB file to download so that I can read that website later. Much later.

                                Seriously I hope this is just a kind of humor that I don’t get here.

                                1. 3

                                  You saw that it includes a 1.44Mb floppy disk image, right?

                                  I’m not sure if that’s humor in itself, but in all seriousness, a standalone file containing a description of a coding concept with an attached code sample actually makes sense.

                                2. 3

                                  I was browsing the website for a local business recently, and it was just a PDF uploaded somewhere. It struck me as sad that for many people it’s still much easier to create a document in Word or whatever, export it to PDF, and upload that than to create a proper website. We need better tools. Dreamweaver didn’t work because it was very brittle and you couldn’t extended the documents it created, but maybe with modern CSS (flexbox, grid), it would be possible to do a better job of making something that works for ordinary users.

                                  1. 2

                                    I guess that is the gap that products such as Wix are supposed to fill however for the average business owner even that is too much for them to use; they just want to be running their business not learning how to create a website; thus the job of “Wix Developer” was spawned.

                                    1. 1

                                      I’d be happier even if it was made in Word and exported to HTML. I wonder if it is that they are worried about printed layout looking right or being usable by their third party print shop (a lot of the organizations that do this are actually primarily concerned about printing it for various reasons with the website being an afterthought) or if they don’t even know that option is possible.

                                      1. 1

                                        I think the particular example I was thinking of was made primarily to be printed, and only secondarily uploaded to the web.

                                    2. 3

                                      I will say that websites should be pleasant to read offline, for the benefit of people who are traveling or live somewhere with poor/inconsistent internet.

                                      There are many approaches to making web pages accessible when offline:

                                      • a print stylesheet, for people who want a hard copy
                                      • RSS/Atom to be consumed in a feed reader
                                      • some progressive web app (I still don’t fully understand these)
                                      • an ebook format, like EPUB, or yes, PDF

                                      I’m not sure how much I agree with this article, but it’s not unreasonable to publish your website as a PDF as well as a regular website.

                                      1. 2

                                        I use a feed reader that can save articles locally. Works great.

                                      2. 3

                                        There’s definitely a fundamental tension between web pages as documents (which should last for a very long time and be immune to bitrot) versus web pages being deliverable, dynamic applications. The latter deserves churn and the former deserves stability. Not sure how to reconcile but I think JSON APIs are ironically a step in the right direction, because the application just becomes a view for http-gettable content. Maybe we can get people to start storing xml documents again?

                                        1. 1

                                          The peak of that tension is that if I send someone a link to a site I intend to share as a static document, they might get the webpage, or an application, or something different entirely — depending on a number of hidden variables. The conversation goes on, but we might not have seen even remotely the same thing. While a surefire recipe for confusion and misunderstanding, the consequences from that can run deep and shatter relationships; verification costs time and is not exactly a natural thing in informal communication.

                                        2. 3

                                          I thought we were for urbanization. More public transport, more efficient use of land etc. Fads change too quickly these days.

                                          1. 3

                                            I saw the “pdf” tag and was simply going to skip it. pdfs are so hard to read that I rarely find them worth the bother, but the comments made me curious. Of course it is really hard to use, but the one but that amused me is on page 7, it says pdf has feature creep but you can just ignore it, but then on the very same page, says you can’t just ignore html’s feature creep. OK.

                                            But that does bring me to a nice thing, I do legitimately like being able to say “on page 7”. (though I’d often still say “halfway down page 7”) In html documents, sometimes I’ll say things like “about 3/4 down the page, under the “backwards to the future” header, but it isn’t quite as nice. Ironically I’d probably paper (lol) this over with some javascript on the web, making a numeric scroll position indicator.

                                            I also agree with the immutable aspect, to some extent. What I like to do is indeed avoid edits, and if I do make one, clearly mark when and where an edit was made. I also agree all documents should have a date on them. It is often frustrating to me when it is hard to figure out when something was written - the date is part of the context of the article.

                                            1. 4

                                              There was a mini-trend of “purple hashes” on blogs ca. 2005 where each paragraph had an auto-generate HTML anchor so you could easily link to a section on a paragraph basis. Like so many other early aughts blogging trends it was way too twee to catch on, but I’ve since then become friends with the <a name> tag and kind of miss it.

                                              When I need to reference a section on another page unambiguously I cite it verbatim - it both gives a context to my comment, as well as making it relatively easy for someone else to search the page for the text. Copying and pasting from PDFs is fraught, though.

                                              1. 3

                                                IIRC, browsers will treat any id attribute like <a name>; kill two birds (CSS and anchors) at once. Chrome has an extension to link directly to unmarked text too.

                                            2. 2

                                              I think that ideas like this and Gemini are fine, and cool, but as soon as the author claims that this is the right direction to go for ALL web publishing, I get extremely skeptical.

                                              I think the theory behind these ideas, the problems with the web of today, the problems with web browsers, etc, is all very valid WITHIN ITS NICHE of blogging and self-publishing and I agree with it WHEN APPLIED TO THAT NICHE. However, I don’t think that’s the main value proposition of the internet. It’s an analysis of the current situation that only looks at the consumption side, and mostly only looks at the negative side effects of the consumption side.

                                              But there is more stuff going on than that. There is also the production side: Web sites and web applications that people use because they make our jobs easier, not because we want to read or watch something for fun. We can’t forget that the reason the web exploded like it did was not just because “consumers wanna consume”, it was also because it enabled things that previously seemed like economic miracles. Zero-congestion warehousing. Knowing things in advance before they happen. Getting a text on your phone when your package gets delivered. Increased safety for folks in traditionally dangerous occupations. Generally increased automation. Remote collaboration. The ability for our economy to survive a deadly and extremely infectious pandemic. The list goes on and on and on.

                                              We are not just consumers. There is more going on in the world than that. Web applications are economic miracle workers, and not just when they have business investment behind them. I think that as generations Y & Z age, they are gaining invaluable wisdom about the internet that their parents lacked… I think they might seek out healthier relationships with technology, personal and community ownership over technology, etc, both out of necessity and out of desire for the comfort of privacy and safety. And when they do, I doubt they will want to give up on web applications along the way.

                                              1. 1

                                                Thanks for this comment, it’s articulated a lot of what bugs me about the doom-and-gloom crowd who equate all Javascript with surveillance capitalism.

                                              2. 2

                                                Based on comments from the orange site, there’s mixed reception over this move. While moving to a PDF format isn’t necessarily freedom-friendly, or even filesize-friendly, I think this action depends highly on your audience and who you write for. You lose some traits of the web like an open publishing format with HTML, but if your audience enjoys the format of PDF, maybe it’s not the worst thing in the world.

                                                I think it’s a bold move to try and I’m wishing the best for the author(s). The web is vastly different compared to only a few years ago, but if this is what they wish to go for, then it is their choice and we can only wish them the best.

                                                1. 2

                                                  A related work by the same folks that I think builds on this issue: https://lab6.com/1

                                                  In 1, they show that plaintext and even mp3 formats can be included in a pdf file in a way that allows the file to be read by a plaintext reading program, a pdf reading program, and an mp3 playback program! I think that alleviates some of the concerns I’m seeing in comments here about the accessibility concerns of pdfs.

                                                  1. 1

                                                    Nothing prevents you from adding a link to a pdf render of your page if you genuinely want to do that. Or like, an epub, too. If you give me articles in epub format that i can read on my kobo I’ll be happy. PDF handling on the kobo is meh.