1. 16

I am curious to know what are the general opinions, directions and best practices about server side pdf generation. I want to break out my app’s client-side pdf export feat, and I was looking to spin up a service that renders a json payload to pdf.

There’s a language/stack you suggest or that you have found particularly fit for this task?

  • node/pdfkit?
  • PHP?
  • clojure/pdfbox?
  • pandoc?

I am even planning to do a spike and test using handlebars to render latex and then tex2pdf, but perhaps this is a good scenario to use amazon lambda..

What you think?

  1.  

  2. 5

    I would pick one from these two options for server-side pdf generation:

    • using pdflatex (comes with the latex distribution called texlive in Ubuntu, in particular the texlive-base package)

    • using pandoc to convert a markup document to pdf (I think this uses pdflatex as well)

    If you go with the first option, you may need to have some templates ready and fill them with data.

    The second option is more convenient since writing markup (for example markdown) is easier.

    1. 4

      ++ for pdflatex! Tex stuff isn’t the easiest to do initially (and some templating engines don’t work well with it), but the results are consistent and the layout options are vast!

      1. 1

        Yes, this is one of the ones I was really considering. It seems a reasonable effort, especially if starting from a dockerized texlive.. Since you mentioned it, what sort of issues you encountered with the templating engines ?

        1. 2

          It was with Jinja2, I was trying to ansible'ize some generating of invoices. Ansible was puking when processing the tex file.

          Further in the past, I used perl to dynamically build the tex files (no templates).

    2. 5

      (t/g)roff also lends itself for pdf rendering and is forgotten most of the time. for simple tasks you also don’t need to know much about it just create a few pages pdf.

      using -mpdfmark you can have all the fancy pdf features, too.

      1. 2

        After looking at this thesis, I have been convinced that troff does a great job!

        1. 2

          Troffs default fonts are not ideal and put off many people, i guess. When choosing a nice font, modern troffs can produce really nice output.

          I really like to use heirloom-doctools as troff (https://github.com/n-t-roff/heirloom-doctools) together with EB Garamond (http://www.georgduffner.at/ebgaramond/) as font.

          1. 2

            Thank you for pointing me this implementation. I will start using it for personnal projects.

            PS: this Garamond looks good.

            1. 2

              you are welcome :) i’ve also written/customized some macros for use with heirloom troff which, though not perfect, maybe are useful for you: https://github.com/rbns/vortrofflich

      2. 3

        I generate reports including some text, tables with figures, charts, and maps/pictures.

        I used ConTEXt (an alternative to LaTEX) for a long time.

        But more recently, I wanted to generate reports based on HTML, in order to reuse code from the web application.

        I used WeasyPrint to convert the HTML to a PDF. It’s a really great project. But I encountered some performance issues (used too much memory). I switched to wkhtmltopdf.

        I produce HTML using the usual templating engine, and I launch WeasyPrint or wkhtmltopdf as a subprocess, streaming the HTML to stdin (no temporary file needed).

        It works quite well.

        WeasyPrint manages page break really well. wkhtmltopdf is a lot weaker in this regard. I’d hope to find something more lightweight and reliable in the future.

        1. 3

          I would recommend libharu( C) or itextpdf (Java/C#). Both have excellent CJK support.

          I have some example itextpdf restful code here: https://github.com/konomiya/invoice_generator

          1. 2

            It really depends on your needs and team experience.

            How high-fidelity are the designs? Are there fancy gradients, dynamic graphs, transparency? What technologies are your team already familiar with? What is the SLA? How many PDFs are being rendered per day?

            Do you want to manage/monitor another service or is something like docraptor a better fit?

            e.g. For my current side-project the PDF export plan is literally just ‘save as pdf’ from chrome -> email result to client. I’m really hoping it’s successful enough that this becomes a scaling pain-point.

            1. 2

              I use PDF::Writer/Prawn in a few Rails projects to generate data- and table-heavy PDF reports (the customers still like printing things out). While it does give good control over the PDF, it’s very tedious to generate and make changes later. For future projects I’d probably switch to something like pandoc that can just convert an HTML template to PDF.

              1. 2

                I’ve got two sites using Python and Reportlab. It gets the job done, but certain layout tasks are particularly painful. In places where performance isn’t quite as much of an issue and you’re not super picky about the output, wkhtmltopdf and it’s associated libraries are super easy to get up and running with.

                Lambda does seem like it’d be particularly well suited for this.

                1. 2

                  Why not? Considering the type of task, the process seems a good candidate for going the"serverless" way. It’s a simple 1:1 pipeline , am I missing something?

                  1. 1

                    I think you inserted a “not” into my last sentence (I probably could’ve found a better way to word it).

                    I would absolutely give Lambda a shot for this.

                2. 2

                  Like a few others have chimed in, I’d recommend pdflatex. We use it at my company as well. One lesson we’ve learned, and one that’s a bit embarrassing, is to make sure to check the return codes, output files, etc. from pdflatex/whatever you end up choosing. Having a failing template->pdf generation had me spend a few hours raging at the PHP template engine we use. Suffice to say, once I get a free moment any PHP will be removed and replaced with, well, anything else.

                  1. 2

                    Save yourself a ton of time by buying this: http://www.princexml.com

                    It takes HTML and CSS and poops out a PDF. It has some CSS for managing page breaks and such.

                    It’s a static binary you plop on a server.

                    1. 1

                      Irrelevant but neat: it’s written in http://www.mercurylang.org

                      1. 1

                        Seconded.

                        We used this at my last gig and it was quite nice. We wanted our output to look almost identical to our previous pipeline’s output made out of {REDACTED} and {EXPLETIVE} {REDACTED}. Support for the CSS paged media module was pretty good; standards-based page handling means it’s easier to research how things are supposed to work.

                        The HTML files we generated to be PDF'ed were actually built using JS and the DOM APIs, it was a productive though quirky use of headless browsers.

                        I forget what the alternative tool we were looking at was; I think we skipped it in part because it was licensed on a per-concurrent-process or per-core basis? Also, I think Prince was started by at least one of the major authors of CSS.

                      2. 2

                        I’ve used HTML/CSS and wkhtmltopdf successfully on a couple of projects. HTML/CSS isn’t the nicest typesetting language but it is pretty easy, you can reuse assets from your website easily and there’s plenty of HTML/CSS people on the market …