High Performance Media Merging with Django, Nginx and Memcached
I admit it – I’m a performance junkie. I can’t stand code that just works but performs poorly. Being said, I recently fell in love with Django (a fantastic Python powered web application framework). In one of my current projects – xarmory.com – the number of requests for static resources issued during page load began to bother me. The project makes intensive use of jQuery and it’s my personal belief that Django + jQuery is a match made in heaven. When working with jQuery you will find yourself often in the situation to rely in cool little jQuery plugins – each distributed as a seperate javascript file of course. When a project grows, those files begin to add up. In the case of xarmory.com we now had eleven javascript references in the header section of the page. Although the individual files are were only 3-11k in Size, the request overhead for all those tiny files became unacceptably in my eyes.
To alleviate the problem I decided to resort to a simple solution and just merge the individual script files into a single file. A pretty common practice. Before reinventing the wheel I surfed the net for existing solutions and found two which I gave a shot. Short story, both solutions had their share of problems mostly stemming from the fact that they wanted to do it all and integrate their own Javascript Minifier in addition to the merging. During my short evaluation both Minifiers choked on fifty percent of my javascript files, even on the official jquery 1.2.6 script. A bit frustrated I decided to roll out my own Django Template Tag based on this code.
I can be used like this:
{% load mediamerge %} {% mediamerge merged/standard 1 scripts/jquery-1.2.6.min.js scripts/jquery.cookie.js scripts/superfish.js scripts/supersubs.js scripts/shared.js %} |
The first line obviously makes the mediamerge tag available to the template. The second line needs more explanation. The first parameter has two meanings
- First it is the relative path of the generated merged javascript file. The path is relativ to settings.MEDIA_ROOT (not used if the mediamerged is configured to use memcached – on to that later)
- Second it defines an URI relative to settings.MEDIA_URL
The second parameter defines a numeric version of the merged scripts and needs to be bumped by you whenever one of the files changes.
After the version follows the list of resources to be merged. All specified relative to settings.MEDIA_ROOT.
Assuming that settings.MEDIA_URL = http://static.example.com/ the tag above would result in the following HTML:
<script src="http://static.example.com/merged/standard.js?1" type="text/javascript"><!--mce:0--></script> |
Great, we’ve cut the number of requests by a factor. But it gets even faster. The mediamerge tag can be instructed to not generate an output file but to put the merged content directly into memcached (I’m obsessed with memcached) for a webserver supporting memcached to pick it up there. Nginx does and it is lightning fast serving static content.
So we instruct the mediamerge tag to use memcached in settings.py:
MEDIAMERGE_ENABLE_MEMCACHED = True MEDIAMERGE_MEMCACHED_URI_PREFIX = "mmerge_" MEMCACHED_SERVERS = [ 'localhost:11211' ] |
Using these settings, mediamerge will store the merged scripts in memcached under the key ‘mmerge_ http://static.example.com/merged/standard.js?1′ (settings.MEDIAMERGE_MEMCACHED_URI_PREFIX + settings.MEDIA_URL + )
The final thing to make it all work is to instruct Nginx to pick up request for resources handled by the mediamerge tag directly from memcached.
server
{
listen 80;
server_name static.example.com;
expires 24h;
# mediamerge
location /merged/
{
set $memcached_key mmerge_$scheme://$server_name$request_uri;
memcached_pass 127.0.0.1:11211;
}
location /
{
root /var/www/example_static;
}
} |
Bingo, Nginx will now serve all requests for resources in http://static.example.com /merged/ by looking up the contents in the specified memcached server (cluster). Nginx is already very fast for traditionally served static files but this is just ridiculously fast.
UPDATE: I’ve updated the source below since the article was originally posted. Changes:
- A new option is now supported when using the disk based method: MEDIAMERGE_DISABLE_MERGE_UPDATES. Setting this option to True in settings.py will disable any checks for updated merge source files and is intended for production servers where your input files should not change in the background unless the web server is restarted. This should improve the rendering time of the tag in production considerably.
- The files are now opened and written in binary mode
Here’s the source code for the mediamerge template tag:
import os, re import logging from os import path import cStringIO as StringIO from django.template import Library, Node, TemplateSyntaxError from django.conf import settings logger = logging.getLogger('mediamerge') mc = None register = Library() class JSMergeNode(Node): def __init__(self, js_name, js_ver, js_files): self.js_name = '%s.js' % js_name self.js_ver = js_ver self.js_files = js_files self.merge_filename = path.normpath(path.join(settings.MEDIA_ROOT, self.js_name)) # make sure that the target directory exists if not settings.MEDIAMERGE_ENABLE_MEMCACHED and not path.exists(path.dirname(self.merge_filename)): os.makedirs(path.dirname(self.merge_filename)) def merge_files(self, merge_file): # join all input files into a single output file for js in self.js_files: jspath = path.normpath(path.join(settings.MEDIA_ROOT, js)) if not path.isfile(jspath): continue self.merge_js(js, jspath, merge_file) def render(self, context): '''Implementation of the render method of the tag''' if settings.MEDIAMERGE_ENABLE_MEMCACHED: return self.render_memcached(context) else: return self.render_file(context) def render_passthrough(self, context): result = "" for js in self.js_files: if settings.MEDIA_URL[-1] == '/': result += '<script type="text/javascript" src="%s%s"></script>\n' % (settings.MEDIA_URL, js) else: result += '<script type="text/javascript" src="%s/%s"></script>\n' % (settings.MEDIA_URL, js) return result def render_memcached(self, context): '''Memcached based render method - merged output directly stored in memcache - where nginx will pick it up based on the rquest uri) without using a temp file''' import memcache key = settings.MEDIAMERGE_MEMCACHED_URI_PREFIX + str(self.compute_media_url()) global mc if not mc: mc = memcache.Client(settings.MEMCACHED_SERVERS, debug=0) if not mc.get(key): merge_file = StringIO.StringIO() self.merge_files(merge_file) contents = merge_file.getvalue() mc.set(key, contents, 3600) logger.info('Setting memcached value for for resource %s - key %s' % (self.js_name, key)) else: logger.debug('Merge key %s already exists in memcached for resource %s' % (key, self.js_name)) return self.js_tag() def render_file(self, context): '''File based render method - merged output directly stored in the file specified with the tag - relative to settings.MEDIA_ROOT''' if not path.exists(self.merge_filename) or not self.is_merge_updated(): logger.debug('Trying to open output file %s for resource %s' % (self.merge_filename, self.js_name)) merge_file = open(self.merge_filename, 'wb') self.merge_files(merge_file) merge_file.close() else: logger.debug('Merge file %s already exists for resource %s' % (self.merge_filename, self.js_name)) return self.js_tag() def is_merge_updated(self): if settings.MEDIAMERGE_DISABLE_MERGE_UPDATES: return True """ compares modification time of all js with merged js """ last_mtime = 0 # last modification time of a js file for js in self.js_files: jspath = path.normpath(path.join(settings.MEDIA_ROOT, js)) jsstat = os.stat(jspath) mtime = jsstat[-2] if last_mtime < mtime: last_mtime = mtime merge_mtime = os.stat(self.merge_filename)[-2] return merge_mtime > last_mtime def merge_js(self, jsname, jspath, fd): """ do merging and compressing of javascript """ logger.debug('Merging file %s for resource %s' % (jsname, self.js_name)) jsfile = open(jspath, "rb") jscontent = jsfile.read() fd.write('/* -- %s -- */\n\n' % jsname) fd.write(jscontent) fd.write("\n\n\n") def compute_media_url(self): # detect if MEDIA_URL ends with a slash if settings.MEDIA_URL[-1] == '/': js_url = '%s%s' % (settings.MEDIA_URL, self.js_name, self.js_ver) else: js_url = '%s/%s?%d' % (settings.MEDIA_URL, self.js_name, self.js_ver) return js_url def js_tag(self): """ write js tag for merged file inclusion """ return '<script type="text/javascript" src="%s"></script>' % self.compute_media_url() def do_jsmerge(parser, token): """ This will merge javascript files in only one compressed javascript. Usage:: {% load mediamerge %} {% mediamerge <output_js_file> <output_version_integer> [jsfile1] [jsfile2] .. %} Example:: {% load mediamerge %} {% mediamerge mediamergefile 1 js/file1.js js/file2.js js/file3.js %} This will create (if not exists) a /media/mediamergefile.js with three files merged. The HTML output for this will be:: <script type="text/javascript" src="/media/mediamergefile.js"></script> """ tokens = token.contents.split() if len(tokens) < 3: raise TemplateSyntaxError(u"'%r' tag requires at least 1 arguments." % tokens[0]) return JSMergeNode(tokens[1], int(tokens[2]), tokens[3:]) register.tag('mediamerge', do_jsmerge) |



This is an interesting idea, I never considered using memcached to serve static media as nginx is so good at it already. I’ll definitely check this out.
Is it a pain to have to reload memcached anytime your static media is updated?
Well not really since my usual update sequence on the production server looks like this:
Great article! I think I will be visiting this again as soon as I have a server with enough memory to let me run all of the goodies alongside my sites :)
Hello,
first of all, congratulations, seems a really good piece of software and really optimal in terms of network usage !
have you though about the option to reuse this system with the css files ? I suppose is mainly the same except the name and the correspoding html tags. And also would be really amazing to use a system like the described in http://www.djangosnippets.org/snippets/1324/ to avoid repeating the same css or javascript if we add it twice, which could be very typical in case we use templatetags.
Also, related to Bill’s comment, could be better to have a process which just removes the current key from the memcached ? because some times memcached could be used for other things not nginx related, in my case this is used to mantain a global user related structure which takes a lot of time and cpu to be calculated.
Anyway, I want to say again that this is something I was looking for !
Robert, they way I envisioned it for my site is that on a production server – and that’s where you would benefit from this – script files usually change only after a new build is pushed to the server. Thus the only expiration strategy is to restart memcached. Therefore there is no sophisticated change tracking.
How much faster is nginx on serving content from memcached than static files?
I believe OS like linux and *bsd caches frequently requested files in ram. Though you could garantuee that from memcached its always in memory but how about tcp latency? Does nginx connec to the memcached on every request?
Though i see one big advantage when it comes to scaling, making a cloud of memcacheds. I guess it really dependes on how big the site is because if its really big you really would like to use something like amazon cloudfront.
Andreas, agreed – compared to Nginx excellent file based static file handling the performance difference is pretty neglectable – especially on a single server setup. It is my understanding that Ngnix will not connect to memcached for every request but I will email the author of Nginx to be sure.
Did some benchs out of curiousity
Static file is almost 4x faster, though i think its a cleaner concept to put generated, “semi-static” stuff in mem than on disk.
# test 1, static file served by nginx
Total: connections 1 requests 10000 replies 10000 test-duration 1.646 s
Connection time [ms]: min 1601.6 avg 1601.6 max 1601.6 median 1601.5 stddev 0.0
Connection time [ms]: connect 0.5
Connection length [replies/conn]: 10000.000
Request rate: 6243.8 req/s (0.2 ms/req)
Request size [B]: 71.0
Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (0 samples)
Reply time [ms]: response 0.2 transfer 0.0
Reply size [B]: header 217.0 content 2181.0 footer 0.0 (total 2398.0)
Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 0.10 system 0.93 (user 6.2% system 58.1% total 64.3%)
Net I/O: 15054.6 KB/s (123.3*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
# test 2, same file in memcached served directly from nginx
Total: connections 1 requests 10000 replies 10000 test-duration 4.375 s
Connection rate: 0.2 conn/s (4375.4 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 4375.4 avg 4375.4 max 4375.4 median 4375.5 stddev 0.0
Connection time [ms]: connect 0.5
Connection length [replies/conn]: 10000.000
Request rate: 2285.5 req/s (0.4 ms/req)
Request size [B]: 77.0
Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (0 samples)
Reply time [ms]: response 0.4 transfer 0.0
Reply size [B]: header 164.0 content 2180.0 footer 0.0 (total 2344.0)
Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 0.48 system 2.39 (user 11.0% system 54.6% total 65.6%)
Net I/O: 5403.6 KB/s (44.3*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
Andreas interesting analysis indeed. If it were not for the fact that it’s my personal perception to put generated, “semi-static” stuff into memory rather than on disk – I’d say screw memcached for this purpose :)
Looks like a great bit of code. I’m wondering why you opted to manually pass in a cache-busting ID rather than use the modification timestamp though. Is there a particular advantage to this? You seem to be “stat”ing the files every time you call the render_file function anyway.
Hi from google Google-TCW