I have a side project and I’d like to do some content marketing to potential customers to show how my product is useful. To do this, I need a blog for my project.
Maybe you need a blog for your project too. Have you thought about where your blog will exist on the internet? For me, I considered two choices:
- Use a subdomain like
blog.mysite.com
. - Use a route style like
mysite.com/blog/
. (I learned in my research that SEO experts call this a “subfolder” style.)
I chose the latter approach.
I’m also a big fan of statically generated websites. I like to keep my articles in version control and write my content in Markdown. This approach works well for my writing flow. In fact, if you’re reading this on my website, mattlayman.com, you’re reading content that was generated by Hugo, my static site generator of choice.
Since my Django application runs on my main domain, how could I include a route-based blog onto the domain without tripping over the app?
I could see two strategies to making a route-based blog work.
- Put some software between browsers and my application server that can intercept and route blog traffic to the static files generated by Hugo.
- Make the application server serve the blog.
Putting Software In Front Of The App Server
This is a well-trodden path. This is also the path that I didn’t pick (more on that later).
In many (most?) Django deployments, the Django application runs with a Python application server like Gunicorn or uWSGI. These application servers have the job of delegating the dynamic requests to Django views.
What do we do with requests that aren’t dynamic like JavaScript or image files? Django has a process to manage these static files that will collect all the files into a single directory that can be served by other software.
What other software? Typically, that other software is a general purpose web server like Nginx or the Apache HTTP Server. When one of these web servers is in between and delegating to the Python application server, we call the web server a reverse proxy. Check out Cloudflare's reverse proxy article for a good explanation of why this kind of server is considered a reverse proxy.
A web server like Nginx can be configured
to serve static files
at a particular route.
For Django’s typical static file handling,
you would configure Nginx
to route the static files directory
that Django produces
to something like /static/
.
For any request
for a static file
that comes to your site,
Nginx would detect that the path starts
with /static/
and would send back the file directly
rather than requesting it
from the application server.
Knowing that,
it’s not much of a conceptual leap
to see how to serve a static blog.
In this scenario,
you would generate the blog
with your tool of choice
and configure Nginx
to route anything coming to /blog/
to the output of your static site generator.
To be frank, if your infrastructure can handle this style well, this will probably be an easier approach. If you’re feeling a bit more adventurous or like walking a different path, read on.
The Road Less Traveled
Another way of serving static files for your Django app is to let the application server do it. This approach is not as performant as the reverse proxy approach, but it has the advantage of being a simpler setup because you only have one kind of server running, not two.
When you want to use this style, you’d reach for WhiteNoise.
For Django projects,
WhiteNoise is designed
to work
with Django’s static files scheme.
That means that the library will have no trouble serving your CSS, JS, images,
or whatever else.
This also means
that you can expect all of these files
to be served out of /static/
.
If you’re ok with serving your blog
from /static/blog/
,
then your life would be pretty simple.
When you deploy,
you’d generate your blog content
from your static site generator,
include the output directory
as a directory
in the STATICFILES_DIRS
Django setting,
and you’re done.
That kind of URL path sounds gross to me.
What casual non-tech reader would expect
to read a blog post
at /static/blog/
?
Yuck.
That kind of reader is unlikely to know what “static” would mean.
My goal was to get WhiteNoise
to serve my blog
at /blog/
.
That’s the setup.
Let’s see how it worked out.
The Details
Before seeing all the details, let me make sure I address why I did this.
My application is running on
Heroku.
Heroku makes deployment so simple
for basic apps.
With a small file
called a Procfile
that looks like
release: python manage.py migrate
web: gunicorn project.wsgi --log-file -
I can get an entirely operational application on their platform. The downside of using a Platform as a Service (PaaS) like Heroku is that I have less control of the environment.
In this circumstance, I don’t have the ability to introduce a reverse proxy like Nginx. I could cobble together some scheme with a shell script that would let me run both Nginx and Gunicorn, but I’d have difficulty guaranteeing that both processes would stay running.
Unless I wanted the blog to run on a separate subdomain, which I already mentioned that I don’t, I need to make the Gunicorn application server serve the blog.
First, let’s get the blog itself going. I’m going to skip most of the details of working with Hugo, but I want you to have some names that you can consider as we work through this problem. From the root of my repository, I ran:
$ hugo new site blog
This created a new Hugo site with some empty directories. Because I wanted to keep all of the directories, I added some hidden files so that Git would track them. For example,
$ touch blog/data/.gitkeep
Later in my experimentation,
I found that the config.toml
needed
to be in the root
of the repository.
Since I didn’t want to fill the repository root
with other Hugo directories,
I had to adjust some variables
in the config file
to look in the blog
directory.
# config.toml
archetypeDir = "blog/archetypes"
contentDir = "blog/content"
data = "blog/data"
layoutDir = "blog/layouts"
staticDir = "blog/static"
themesDir = "blog/themes"
I also set the directory where I wanted the blog output. This name is important later.
# config.toml
publishDir = "blog_out"
To finish off the Hugo setup
(aside from the actual blog content generation
which I’m not going to describe),
I added Hugo generated directories
to my .gitignore
.
# Blog
blog_out/
resources/
With this much configuration, I can generate my product’s blog with a single command.
$ hugo
| EN
+------------------+----+
Pages | 13
Paginator pages | 0
Non-page files | 2
Static files | 10
Processed images | 7
Aliases | 0
Sitemaps | 1
Cleaned | 0
Total in 14 ms
14ms! It’s so fast!
My next job was to teach Heroku how
to generate the blog
with each deployment.
Heroku tries to figure out how
to build your application
by checking for certain files.
Because of the manage.py
file,
Heroku detected
that I have a Python project.
When there are multiple types of things to build,
you have to be a bit more explicit.
Thus,
I needed to add a
buildpack.
Buildpacks are responsible
for assembling a project’s code
into a format
that the Heroku platform can run.
To make Hugo go,
I added the roperzh/hugo
community contributed buildpack.
$ heroku buildpacks:add --index 1 roperzh/hugo
I set my Hugo version in Heroku to the same version that I use locally on my Mac to ensure consistency.
$ heroku config:set HUGO_VERSION=0.46
From this configuration,
my Heroku deployments now build my Hugo blog
and store the output content
in the blog_out
directory
of the built application artifact
(which Heroku calls a “slug”).
We can take another step
to optimize the blog.
WhiteNoise will serve a compressed version
(either gzip or brotli)
if there is a file
on disk
with the same name
and ending with a .gz
or .br
extension
(e.g., index.html
and index.html.gz
).
Once Hugo is done generating the blog output,
we can instruct WhiteNoise
to compress the files.
To do this,
I used a bin/post_compile
script
that runs as part of the Python buildpack.
My script looks like:
#!/bin/bash
set -e
python -m whitenoise.compress blog_out
This generates the compressed versions so that my application server can serve fewer bytes when sending static blog files to a browser.
Now that the static content side is ready to go, let’s teach Django how to serve the blog.
In Django, WhiteNoise works by running a Django middleware. This middleware is designed to run very early in the stack of middleware to intercept requests to static files and return them before the application server wastes too much time processing the request.
The problem with the middlware
(if I can even call it a problem)
is that it is designed
to work exclusively
with the static files mechanism
that Django exposes as the static files interface.
This means that it serves content
for /static/
URLs
from directories
that are either in static
directories
inside of each Django app
or static files included
in STATICFILES_DIRS
.
Unfortunately, there’s no Django setting that will permit a developer to say “Hey, WhiteNoise, please serve these files at this other path too!”
My solution to this dilemma was to “use the source, Luke (uh, Matt)!”
You can see in
Using WhiteNoise with any WSGI application
that the WhiteNoise
class has a method
named add_files
.
This method takes a directory
and serves those files
at some developer-defined prefix.
While inspecting the source,
I found that the WhiteNoiseMiddleware
is a subclass of WhiteNoise
.
Enter MoreWhiteNoiseMiddleware
.
I decided to make a subclass
of the WhiteNoiseMiddleware
to take advantage
of the add_files
method
for my Django project.
(My side project is called “homeschool”
so you’ll see that in the code snippets below.)
# homeschool/middlware.py
from django.conf import settings
from whitenoise.middleware import WhiteNoiseMiddleware
class MoreWhiteNoiseMiddleware(WhiteNoiseMiddleware):
def __init__(self, get_response=None, settings=settings):
super().__init__(get_response, settings=settings)
for more_noise in settings.MORE_WHITENOISE:
self.add_files(
more_noise["directory"], prefix=more_noise["prefix"])
Then I made the following changes to my settings file.
# project/settings.py
MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"homeschool.middleware.MoreWhiteNoiseMiddleware",
...
]
...
MORE_WHITENOISE = [
{"directory": os.path.join(BASE_DIR, "blog_out"), "prefix": "blog/"}
]
WHITENOISE_INDEX_FILE = True
That last setting will make sure
that WhiteNoise serves any directory
with an index.html
as the file.
That means that my blog post content
that Hugo puts
into /blog/some-post/index.html
will be accessible
at mysite.com/blog/some-post/
.
Other rules about this setting are
in the documentation.
With this tiny middleware subclass,
I can serve up more static directories!
My product blog is served up
at the pretty /blog/
URLs
that I wanted.
Victory!
I’m only doing this for my product’s blog now, but I will probably do something similar with the product documentation in the future.
Tradeoffs
What’s the catch? It seems like there’s always a catch.
The biggest catch is caching, or, the lack thereof. With the scheme I’ve described, Django isn’t able to generate filenames for the blog files that include the hash of the file content.
In a normal static files setup,
you can use
ManifestStaticFilesStorage
to generate those file names.
With that storage engine,
Django will generate a manifest file
that stores a dictionary
of original filenames
to the versioned filename
that includes the hash.
Django uses the manifest during template rendering
to serve up HTML content
that includes the hashes
(e.g., a file named base.css
would be sent
to the user as base.1234abcd.css
).
Because Django sends out the versioned filenames
for browsers,
when the browser comes back to the server
to request a CSS file
like base.1234abcd.css
,
WhiteNoise can detect
that the file is “versioned.”
With a versioned file,
WhiteNoise will set the Cache-Control
HTTP cache header
to tell the browser
that the file can be safely cached
for a very long time.
The content generated by Hugo doesn’t go
through the Django template engine
and won’t include those version hashes.
Thus,
WhiteNoise can’t detect the files won’t change
because of the absense
of hashes.
Since the code doesn’t know
if the file will change,
it can’t set Cache-Control
far into the future.
Instead,
it will set the header
to one minute
which is configurable
via the
WHITENOISE_MAX_AGE
setting.
For a small product like mine, this tradeoff is totally reasonable. If I had a product blog with massive amounts of traffic, I’d probably have a more complex infrastructure anyway and be in a position to use a reverse proxy instead.
The other minor tradeoff is
that WHITENOISE_INDEX_FILE
setting.
By enabling that,
I open up my Django server
to serving directories
that include index.html
.
This is good and desirable
for the blog,
but the side effect is
that any other directory
in my static files
that happens to have an index.html
file
in it
is also now available.
That may not affect your app
if you try this approach,
but it’s something to be cognizant of.
Summary
I started this adventure by looking for an alternative way to serve a blog for my Heroku project.
In the process, we learned about:
- Heroku buildpacks and how to use multiple buildpacks to generate an app that requires multiple tools
- WhiteNoise and how to customize the middleware to give it more files to serve
- Caching and the tradeoffs associated with my approach
I hope you found this little adventure interesting.
Next time you need some static content
for your Django project outside
of /static/
,
now you know
of an option
that doesn’t include subdomains
or a reverse proxy!
If you have questions or enjoyed this article, please feel free to message me on X at @mblayman or share if you think others might be interested too.