How to handle uploaded files

Are you a web developer who wants to know some tips or tricks about storing files in a database? Would you like a quick example on how to handle a file upload in Rails and stuff that file in your database? My answer to that question is don't do it. Here's why.

The database software you are most likely using is a relational database, which was designed to hold discrete and related chunks of data and to make it easy for you to retrieve that data based on relationships to other data. Your filesystem was designed to store and organize files and to make it easy for you to retrieve those files in a fairly random fashion. In other words, use the file system for storing files, not your database. Still not convinced? Alright, then I can give you an example.

Let's say you are hosting a popular site that is database-backed but has a lot of static content and various types of media assets (images, videos, etc.). A lot of Rails apps fall into this "custom CMS" type of application. Sooner or later, your application gets more and more popular, and you need to find a way to improve performance, and you look at distributing the load of all those assets you are serving. Following the standard "shared-nothing" approach of scaling a web application, you decided to move image and asset hosting to another server to spread the load. Rails even provides a configuration option for making this easy: ActionController::Base.asset_host.

If your images are stored in the database, now you have a problem. It's actually a non-trivial problem to replicate database data from one database server to another. Sure, every decent database engine has replication support, but the problem is that in almost every case database replication is difficult to set up, difficult to maintain, or simply error prone. However, if you have your asset files on the filesystem where they belong, the solution is much more simple.

Rsync is a tried and tested solution to the problem of synchronizing the filesystems among hosts. It's darned easy to replicate the files from one host to another. You can even get easy, secure, and unattended synchronization by installing rsync on both hosts, setting up key-based login via ssh, and then running this on a regular basis (from cron, for example) from the host that has the uploaded assets: rsync -cure ssh /source/path/ destination_server.com:/destination/path/. This command will recursively copy all the files that have been updated or have a different checksum to the destination server via ssh. It's that easy.

Of course, I haven't even addressed the most compelling reason to store your files on the filesystem: performance for serving those files. It's readily apparent that storing your images and other media files on the filesystem makes it much easier to serve them via a very fast web server such as lighttpd, which will incur a much lesser performance impact than serving data streams pulled from the database via your Rails application.

I hope I have convinced you to store your files on the filesystem, whether for performance or for ease of scaling. If you do, the world will be a better place.

Comments