How to Hugo-fy Your Old Flat HTML Site

This site (http://otisbean.com/) had the same very basic look and feel from ~2002 to 2019. It had header and nav defined by Server Side Includes and a hundred line CSS file. All the HTML was hand coded in Emacs. About as old-skool as you can get.

It looked like this:

website screenshot

At the start of 2020, after years of neglect, I wanted to add a nice photo gallery to show off the results of my new hobby. In typical geek fashion instead of just adding a gallery I turned a small change into a giant project. The time had come to rebuild my site. A few months earlier I had read up on “static site generators” while helping a friend with his site, and Hugo seemed like a great tool for managing a simple site like mine. Writing Markdown is a lot easier than hand-rolled HTML, and it’s nice to have themeing support. Not having to deal with active server-side code and the attendant security nightmares is a big benefit.

The simplicity of my HTML made it a good candidate for semi-automated conversion. A bit of Googling turned up this very helpful article on converting a static site. That got the basics mostly done, and having header & nav already broken out into SSIs made it a bit easier.

I had less than fifty “interesting” pages to convert, which wasn’t too bad. But there were ~550 legacy pages (generated image galleries, ancient content, etc.) that I didn’t want to convert. Worse, I had over 3500 images and another 500 miscellaneous files (some of them large zip files). Hugo bills itself as “The world’s fastest framework”, taking about 1ms per page. This means that “the average site builds in less than a second”, but my site was taking ten seconds to render. That’s fast, but irritating when you’re working with a system whose basic assumptions and standard development methodology assume a one second turnaround.

I set about dividing my content into two parts, “dynamic” and “legacy”. In the latter went the thousands of images, hundreds of ancient generated HTML files, and the rest of the stuff that Hugo would never need to care about. The fifty interesting pages went into the actual Hugo site. I ended up with a directory layout like so:

.../website/
         |
         |__ legacy/
         |       |
         |       |__ stuff/
         |       |
         |       |__ more-stuff/
         |
         |__ dynamic/
                 |
                 |__ static/
                 |
                 |__ content/
                 |       |
                 |       |__ important/
                 |       |
                 |       |__ supercool/
                 |
                 |__ layouts/
                 |
                 |__ [more hugo dirs]

When I’m ready to push changes to production I run hugo --cleanDestinationDir then my syncing script, which looks kind of like this:

rsync -av --delete local:website/legacy remote:website --exclude=/important --exclude=/supercool

rsync -av --delete local:website/dynamic/public remote:website --exclude=/stuff --exclude=/more-stuff

The first sync copies over legacy files, skipping anything handled by Hugo. The second copies over the Hugo-generated content, skipping the legacy gorp. The reality is more complex, but you get the idea.

Converting the old HTML pages to Markdown looked like it was going to be tedious, but then I found Pandoc, an all-singing, all-format document converter. Not only will it cleanly turn HTML into Markdown, you can even customize the output with a template. The process went like so:

# Move to static dir
$ cd static

# Create parallel directory structure in `content`
$ find . -type d | xargs -IX mkdir ../content/X

# Bulk process all html file to markdown<br>
$ find . -name \*html | xargs -IFNAME pandoc --template ../../scripts/pandoc-template.txt -s -f html -t commonmark -o ../content/FNAME FNAME

# Back to the root of the site
$ cd ..

# Rename all the created markdown files from *.html to *.md
$ find content/  -name "*.html" -exec bash -c 'mv "$1" "${1%.html}".md' - '{}' \;
# Pandoc occasionally does strange things with the `title` front matter.  In my case I had to replace `\!` with `!` in a few files.

# `index.md` needs to be `_index.md`
$ find content/  -name "index.md" | rename -v 's/index.md/_index.md/'`

Here’s the template I used with pandoc to create Hugo documents with front matter:

scripts/pandoc-template.txt:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
+++
title = "$title$"
+++

$if(titleblock)$
$titleblock$

$endif$
$for(header-includes)$
$header-includes$

$endfor$
$for(include-before)$
$include-before$

$endfor$
$if(toc)$
$toc$

$endif$
$body$
$for(include-after)$

$include-after$
$endfor$