Bookdown is Neat and Automation is Hard
Anyone and their hamster is writing {bookdown} books these days, and that’s arguably a good thing – because as long as everything renders nicely, it’s a pretty easy way to get some knowledge out there.
But then there’s the case where stuff doesn’t render nicely, and that’s where the fun ends and the “learning about stuff you didn’t know was relevant or come to think of it was particularly interesting to begin with is a required step in the process of making things happen that you kinda want to happen” game starts. Or, as I like to call it, “the ol’ lasydkwrocttoiwpitbwiarsitpomthtywth” 1.
So now that I more or less successfully switched our R-Intro for psychology undergrads (German) over from “auto-built on our server mostly” to “auto-built on Travis CI”, I thought it might be a good time to consolidate some of the things I’ve learned along the way as someone not terribly familiar with travis outside of R-package testing.
If you’re not at all familiar with travis (or the concept of CI), then you might want to brush up on the basics before you continue. For everything else, I’m going to assume that you’re at least kind of familiar with with git / GitHub (if not, this is your go-to reference) and have dabbled in bookdown already.
But why though?
The first question one might ask is: Why even bother doing the travis-dance when you can just render your bookdown project locally in your bazillion formats? Then either directly upload that somewhere or commit the whole output to version control and let GitHub Pages or netlify pick it up from there.
That’s certainly a possibility, but it’s also prone to some reproducibility issues.
For example, to render our R-Intro, we were relying (directly or indirectly) on R packages with specific system dependencies – on Ubuntu systems, they needed to be installed via apt
. Wile I was starting up the project on macOS, I wasn’t aware of these dependencies because, well, it worked for me. On my machine. Locally.
That might be all fine and dandy and good enough, but as soon as you’re collaborating with other people, presumably on different operating systems, the dependency hell starts to inch closer and you’re beginning to feel the pain of the “it works because I did some stuff here, dunno”-workflow.
Building your project via services like travis has the neat effect of forcing you to think about what is required to make you project reproducible – that principlie applies to R-packages the same as it does to things we don’t usually think about having “dependencies” in this sense: Books, for example. The idea is: If you can make it work on travis, you can probably make it work on other people’s machines, too.
Besides that, building stuff on travis (and deploying from there) also means that you don’t run the risk of making some small changes to your book and then forget to render/upload, or making a change and not realizing that it introcuded an issue that breaks your book.
The workflow make changes -> git commit -> git push to GitHub -> travis -> deploy target
ensures that you don’t have to worry about anything past the point where you modify the content of your book. Well, at least as long as all the other bits of the pipeline are set up approrpiately.
Build & deploy to GitHub Pages
Probably the easiest solution as you only need a GitHub account and token.
The GH pages deplyoment is documented here, and you’ll at least need to get a personal access token from GitHub (in R, you can let usethis::browse_github_token()
take you where you need to go) and add it to your project on travis as an environment variable (perferrably named GITHUB_PAT
), and make sure it is available to all branches and won’t display in the build log.
Now, to have travis build your project and push to a gh-pages
branch, you’ll first have to create that branch. These bits are explained in the bookdown book already.
You’ll also need a dummy DESCRIPTION
file at the root of your project, just because that’s the key component of an R package which travis needs to figure out which R packages need to be installed before anything else can happen. It should look something like this:
|
|
Most fields don’t matter, except Imports:
(definitely) and Remotes:
(probably?).
Maintaining such a file for your bookdown project is also a good way to keep track of the packages you use along the way.
Your .travis.yml
might look something like this (for the first 3 lines, refer to the travis docs for R projects):
|
|
The steps are executen in order of appearance. As there’s only 3, and they’re named before_script
, script
and deploy
, they’re probably pretty self-explanatory.
You might stumble over this part though:
|
|
and it’s kind of ugly. What this does is “if there’s not an executable at $HOME/bin/phantomjs
, execute this Rscript
command to install it using the webshot
package”. In general, phantomjs
is required if you’re using some kind of htmlwidget in your book, but also render non-interactive output like PDF or epub – it basically makes a snapshot of your widget and pastes the image into your book.
The first bit of the command is a bash test expression, checking if there’s an executable. Read up on tests/conditions/logic in bash if you’re interested, or accept that this is a thing that we did 2.
Anyway – why did we do this again? Well, we told travis to cache $HOME/bin
for us, where phantomjs
gets installed to, and we don’t want to download and install phantomjs
every time we trigger a build, so this seems like a neat way to solve that issue.
Anyway, now travis is building your book in two formats, sequentially, and deploys it to GitHub Pages and everything is awesome and you’ll never have to worry about anything ev– aaaand it broke.
But fonts though
The thing about simple setups is that they only work for simple projects.
For me, things start breaking around the topic of fonts a lot. Maybe because I like Fira Sans in my ggplot2 themes, or maybe because I like TeX Gyre Pagella in my PDFs.
After diving into the old google rabbit hole of terrible font things, I ended up with this:
|
|
This adds a script to the build process using carefully indented YAML
that you need to be careful about, otherwise it’s not recognized by travis and there won’t be a build at all.
Anyway, what I did here was, as the comments suggest:
- Install TinyTex: Maybe I didn’t need to, but as far as TeX distributions go, with TinyTex I at least somewhat know how it works.
- Create a
.fonts
directory where, well, fonts go. Usemkdir -p
to only create it if it doesn’t exist already without throwing an error – we also cache that folder. - Manually download fonts as
.zip
files and extract them to$HOME/.fonts/
– checking if the zip already exists in the cache. - Do stuff with
texlive-fontconfig.conf
– I don’t know man. It ended up (probably) solving the issue that XeLaTeX didn’t find “TeX Gyre Pagella
” because the font name contained spaces. I think. Honestly at this point I’m too tired of this to do more testing. fc-cache -fv
makes the new fonts in.fonts
available to the system.fc-list
is only there for debugging purposes so the lst of currently available fonts gets printed in the travis build log.
And… it works. Neat.
This makes TeX Gyre Pagella
and TeX Gyre Heros
available to XeLaTeX, and adds Fira fonts for both XeLaTeX (where I use Fira Mono
as a monofont
) and ggplot2 plots, where I use Fira Sans
via the aforementioned theme/package.
Honestly, this should have been a script called make-ze-font-stuff-be-good.sh
to keep .travis.yml
more readable, but I actually prefer to deal with as much as possible in the travis config so it’s easier to copy “the travis bits” to a new project than when you have to remember/copy multiple scripts.
I think.
I might change my mind later 3.
Regarding the Fira fonts, I should mention that I had to install the firasans
package manually from GitHub as it’s not on CRAN, and for some reason the Remotes:
field in DESCRIPTION
wasn’t enough to convince travis to install it for me:
|
|
I also use Asana Math as mathfont
, but you can just get it as a regular ol’ apt
package.
|
|
So, at this point you should have working travis deployment to GitHub Pages while also using custom fonts in your PDF output.
Nice.
Speeding things up maybe
One thing that still annoyed me about this setup was the sequential rendering of the different output formats:
|
|
What if you also added bookdown::epub_book
or bookdown::tufte_book2
or whatever you feel like? That would stretch the build time quite a bit.
Building multiple formats in parallel on your local machine might not be feasible, because the same intermediate files might be generated and/or deleted simultaneously, but on travis we don’t have to care about this. One of the neatest features of travis is that we can spin up 3 different jobs running on 3 different virtual machines doing 3 things totally isolated from each other.
So what I tried was this:
|
|
Travis will start the parallel jobs for each value of the environment variable $BOOKDOWN_FORMAT
, and we can access the value of that variable in other sections of the config, like the script
bit where the fun happens. Here we’re throwing render_book
at whatever format is defined in $BOOKDOWN_FORMAT
. I suggest looking up bash / shell environment variables if you’re not familiar with the concept.
The problem with this is, well, kind of a biggie: Each of the three jobs deploys to GH Pages. Each of these jobs overwrites the output of the other job. At the end of the build, your gh-pages
branch will only contain one of the three formats.
That’s kind of a bummer.
And this is what brings us to server-deployment.
Deploy stuff to your own server
Servers are relatively cheap, at least the “just webhosting”-level VPS. Ours is hosted on Scaleway and does the job. It already hosts all of our sites, and I never really intended for our R-Intro to be hosted on GitHub anyway, so it was inevitable to go the ssh-deployment-route anyway.
To get started, I followed (and suggest following) this blogpost.
At the end of the day, it was easier to set up than I anticipated. Creating a keypair is simple, and encrypting the private key via the travis
command line tool isn’t exactly rocket science either.
On the server side, I created a dedicated travis
user, added the public key to its authorized_keys
4, and made sure the server directory where the output should go is writable by the travis
user.
The config bits are as follows:
|
|
The whole shebang is located here.
Conclusion
Using this approach we can have travis render multiple bookdown output formats simultaneously, and after each is finished, rsync
the output to our server – where each format lives in the same folder without deleting the other output.
And as long as you don’t leak credentials, this is also probably kind of maybe secure™!
…and figuring all that out and making it (kind of) work only cost me a weekend.
What a steal.
-
still sucking at acronyms ↩︎
-
It should also be noted that this command is set up so that the overall exit status is
0
in either case, because otherwise travis would consider the build failed, which is kind of annoying. ↩︎ -
2020-03-11: I did. My most recent overhaul of a similiar setup calls a script to download the Adobe Source Pro fonts because it was starting to get really messy. ↩︎
-
if you’re doing this manually, make sure the file has
0600
permissions because you won’t be able to login if the file is readable by other users ↩︎