Getting a Grip on Hugo's URL Space

When I converted my static site to Hugo in January of 2020 everything went smoothly except the URL mapping. I wanted all the existing pages to retain their URLs, but Hugo has Very Definite Opinions about how the URL space should be organized.

Unfortunately Hugo’s documentation doesn’t clearly articulate these opinions. I wasted many hours trying to understand what was going on.

My site had a fairly standard layout of directories each containing one or more HTML files or subdirectories. An index.html file would be presented when loading the bare directory: .../www/foo/bar/index.html has the URL http://site.com/foo/bar/. Other *.html files in a directory would have URLs ending in .html: .../www/baz/page.html has the URL http://site.com/baz/page.html.

My first pass at converting to Hugo I retained the directory & file layout, and just converted everything from html to md. So we have something like this:

$ ls -RF content/
content/:
dir-one/  index.md

content/dir-one:
index.md  stuff.md  subdir/

content/dir-one/subdir:
index.md  sub-stuff.md

Publishing with hugo -D got me the following (eliding *.xml, dist/*, categories, tags and other boilerplate):

$ ls -RF public/
public/:
index.html

What the hell? Where is all of my content?

I read more docs and figured out that I needed to name the index files _index.md (with a leading underscore), not index.md. Like so:

$ ls -RF content/
content/:
dir-one/  _index.md

content/dir-one:
_index.md  stuff.md  subdir/

content/dir-one/subdir:
_index.md  sub-stuff.md

Publish, and now I can at least see something:

$ ls -RF public/
public/:
dir-one/  index.html

public/dir-one:
index.html  stuff/  subdir/

public/dir-one/stuff:
index.html

public/dir-one/subdir:
index.html  sub-stuff/

public/dir-one/subdir/sub-stuff:
index.html

But it’s nothing like the simple layout that I’ve got in my content directory. Each leaf-node file is its own directory instead of being an HTML file in a directory. What the heck?

OK, so the docs talk about uglyurls. Let’s turn that on and maybe I’ll get something sensible…

$ ls -RF public/
public/:
dir-one.html  index.html  dir-one/

public/dir-one:
stuff.html  subdir/  subdir.html

public/dir-one/subdir:
page/  sub-stuff.html

That’s … not even wrong. The leaf node files are in the right place, but the index files are now leaf nodes in their parent directory. Insane.

At this point I gave up trying to get Hugo to do something sensible automatically, and I just put a url parameter in the front matter of every leaf node page. So now content/dir-one/stuff.html is:

---
title: "Stuff"
date: 2020-02-02T18:11:31-08:00
url: "/dir-one/stuff.html"
---
This is stuff

This means I’d need to hand-edit the front matter if I ever rearrange my site, but I don’t think I’m going to need to move the legacy content around very much. For most of the new content I’ll probably just do things “The Hugo Way”, and live with its ideas about URLs.

After all of this headdesking I came to some conclusions about Hugo. I don’t know if its developers would agree with them, but they helped me get a working mental model.