On this episode, we’re going to dig into file management. Unlike the static files that you create for the app yourself, you may want your app to accept files from your users. Profile pictures are a good example of user files. You’ll see how Django handles those kinds of files and how to deal with them safely.
Listen at Spotify.
Last Episode
On the last episode, we looked at how to manage settings on your Django site. What are the common techniques to make this easier to handle? That’s what we explored.
Files In Django Models
While it is possible to store file data directly in a database, you won’t see that happen often. The reason is that storing the data in the database usually affects the performance of the database, especially with a large number of files.
Instead, a common pattern in database usage is to store files separately from the database itself. Within the database, a column would store some kind of reference to the stored file like a path if files are stored on a filesystem. This is the approach that Django takes with files.
Now that you know that Django takes this approach, you can remember:
- Django models hold the reference to a file (e.g., a file path)
- The file data (i.e., the file itself) is stored somewhere else.
The “somewhere else” is called the “file storage,” and we’ll discuss storage in more depth in the next section.
Django includes two fields that help with file management:
FileField
ImageField
FileField
# application/models.py
from django.db import models
class Profile(models.Model):
picture = models.FileField()
# Other fields like a OneToOneKey to User ...
This is the most basic version of using file fields. We can use this model very directly with a Django shell to illustrate file management.
$ ./manage.py shell
>>> from django.core.files import File
>>> from application.models import Profile
>>> f = open('/Users/matt/path/to/image.png')
>>> profile = Profile()
>>> profile.picture.save('my-image.png', File(f))
- The
File
class is an important wrapper that Django uses to make Python file objects (i.e., the value returned fromopen
) work with the storage system. - The name
image.png
andmy-image.png
do not have to match. Django can store the content ofimage.png
and usemy-image.png
as the name to reference within the storage system. - Saving the picture will automatically save the parent model instance by default.
The current model example raises questions.
- Where does that data go?
- What if we have a name conflict between two files like “
my-image.png
”? - What happens if we try to save something that isn’t an image?
If we make no changes to the current setup,
the data will go into the root
of the media file storage.
This will lead to a mess if you’re trying to track many file fields,
but we can fix this with the upload_to
field keyword argument.
The simplest version of upload_to
can take a string
that storage will use as a directory prefix
to scope content
into a different area.
# application/models.py
import uuid
from pathlib import Path
from django.db import models
def profile_pic_path(instance, filename):
path = Path(filename)
return "profile_pics/{}{}".format(uuid.uuid4(), path.suffix)
class Profile(models.Model):
picture = models.FileField(upload_to=profile_pic_path)
# Other fields like a OneToOneKey to User ...
There’s one more problem to fix in this example. How do we know that a user provided a valid image file? This is important to check, because we want to avoid storing malicious files that bad actors might upload to our apps.
This is where the ImageField
has value.
This field type contains extra validation logic
that can check the content of the file
to check that the file is, in fact, an image.
To use ImageField
,
you’ll need to install the
Pillow library.
Pillow is a package
that let’s Python work with image data.
# application/models.py
import uuid
from pathlib import Path
from django.db import models
def profile_pic_path(instance, filename):
path = Path(filename)
return "profile_pics/{}{}".format(uuid.uuid4(), path.suffix)
class Profile(models.Model):
picture = models.ImageField(upload_to=profile_pic_path)
# Other fields like a OneToOneKey to User ...
Files Under The Hood
The setting to control which type
of file storage Django uses is DEFAULT_FILE_STORAGE
.
This setting is a Python module path string
to the specific class.
So, what’s the default?
The default is a storage class
that will store files locally
on the server
that runs the app.
This is found at django.core.files.storage.FileSystemStorage
.
The storage class uses a couple
of important settings:
MEDIA_ROOT
and MEDIA_URL
.
The MEDIA_ROOT
setting defines
where Django should look for files in the filesystem.
MEDIA_ROOT = BASE_DIR / "media"
The other setting important to FileSystemStorage
is MEDIA_URL
.
This settings will determine how files are accessed
by browsers
when Django is running.
Let’s say MEDIA_URL
is:
MEDIA_URL = "/media/"
Our profile picture would have a URL like:
>>> from application.models import Profile
>>> profile = Profile.objects.last()
>>> profile.picture.url
'/media/profile_pics/76ee4ae4-8659-4b50-a04f-e222df9a656a.jpg'
This is the path that we can reference in templates. An image tag template fragment would like:
<img src="{{ profile.picture.url }}">
The Django documentation shows how file storage is a specific interface.
FileSystemStorage
happens to be included
with Django and implements this interface
for the simplest storage mechanism,
the file system
of your server’s operating system.
Recommended Package
What is a problem
that can arise
if you use the built-in FileSystemStorage
to store files
for your application?
There are actually many possible problems!
Here are a few:
- The web server can have too many files and run out of disk space.
- Users may upload malicious files to attempt to gain control of your server.
- Users can upload large files that can cause a Denial of Service (DOS) attack and make your site inaccessible.
The most popular storage package to reach for is django-storages. django-storages includes a set of storage classes that can connect to a variety of cloud services. These cloud services are able to store an arbitrary number of files. With django-storages, your application can connect to services like:
- Amazon Simple Storage Service (S3)
- Google Cloud Storage
- Digital Ocean Spaces
- Or services you run separately like an SFTP server
Why use django-storages?
- You will never need to worry about disk space. The cloud services offer effectively unlimited storage space if you’re willing to pay for it.
- The files will be separated from your Django web server. This can eliminate some categories of security problems like a malicious file trying to execute arbitrary code on the web server.
- Cloud storage can offer some caching benefits and be connected to Content Delivery Networks easily to optimize how files are served to your app’s users.
As with all software choices, we have tradeoffs to consider when using different storage classes. On its face, django-storages seems to be nearly all positives. The benefits come with some setup complexity cost.
For instance,
I like to use Amazon S3
for file storage.
You can see from the
Amazon S3 setup documentation
that there is a fair amount of work to do
beyond setting a different DEFAULT_FILE_STORAGE
class.
This setup includes setting AWS private keys,
access controls,
regions,
buckets,
and a handful of other important settings.
django-storages is a pretty fantastic package,
so if your project has a lot of files to manage,
you should definitely consider using it
as an alternative to the FileSystemStorage
.
Summary
In this episode, you learned about Django file management. We covered:
- How Django models maintain references to files
- How the files are managed in Django
- A Python package that can store files in various cloud services
Next Time
In the next episode,
let’s explore commands.
Commands are the code
that you can run with ./manage.py
.
You can follow the show on Spotify. Or follow me or the show on X at @mblayman or @djangoriffs.
Please rate or review on Apple Podcasts, Spotify, or from wherever you listen to podcasts. Your rating will help others discover the podcast, and I would be very grateful.
Django Riffs is supported by listeners like you. If you can contribute financially to cover hosting and production costs, please check out my Patreon page to see how you can help out.