16 February 2009 12 Comments

High Performance Media Merging with Django, Nginx and Memcached

I admit it – I’m a performance junkie. I can’t stand code that just works but performs poorly. Being said, I recently fell in love with Django (a fantastic Python powered web application framework). In one of my current projects – xarmory.com – the number of requests for static resources issued during page load began to bother me. The project makes intensive use of jQuery and it’s my personal belief that Django + jQuery is a match made in heaven. When working with jQuery you will find yourself often in the situation to rely in cool little jQuery plugins – each distributed as a seperate javascript file of course. When a project grows, those files begin to add up. In the case of xarmory.com we now had eleven javascript references in the header section of the page. Although the individual files are were only 3-11k in Size, the request overhead for all those tiny files became unacceptably in my eyes.

To alleviate the problem I decided to resort to a simple solution and just merge the individual script files into a single file. A pretty common practice. Before reinventing the wheel I surfed the net for existing solutions and found two which I gave a shot. Short story, both solutions had their share of problems mostly stemming from the fact that they wanted to do it all and integrate their own Javascript Minifier in addition to the merging. During my short evaluation both Minifiers choked on fifty percent of my javascript files, even on the official jquery 1.2.6 script. A bit frustrated I decided to roll out my own Django Template Tag based on this code.


I can be used like this:

?View Code PYTHON
{% load mediamerge %}
 
{% mediamerge merged/standard 1 scripts/jquery-1.2.6.min.js scripts/jquery.cookie.js scripts/superfish.js scripts/supersubs.js scripts/shared.js %}

The first line obviously makes the mediamerge tag available to the template. The second line needs more explanation. The first parameter has two meanings

  • First it is the relative path of the generated merged javascript file. The path is relativ to settings.MEDIA_ROOT (not used if the mediamerged is configured to use memcached – on to that later)
  • Second it defines an URI relative to settings.MEDIA_URL

The second parameter defines a numeric version of the merged scripts and needs to be bumped by you whenever one of the files changes.

After the version follows the list of resources to be merged. All specified relative to settings.MEDIA_ROOT.

Assuming that settings.MEDIA_URL = http://static.example.com/ the tag above would result in the following HTML:

<script src="http://static.example.com/merged/standard.js?1" type="text/javascript"><!--mce:0--></script>

Great, we’ve cut the number of requests by a factor. But it gets even faster. The mediamerge tag can be instructed to not generate an output file but to put the merged content directly into memcached (I’m obsessed with memcached) for a webserver supporting memcached to pick it up there. Nginx does and it is lightning fast serving static content.

So we instruct the mediamerge tag to use memcached in settings.py:

?View Code PYTHON
MEDIAMERGE_ENABLE_MEMCACHED = True
MEDIAMERGE_MEMCACHED_URI_PREFIX = "mmerge_"
MEMCACHED_SERVERS = [ 'localhost:11211' ]

Using these settings, mediamerge will store the merged scripts in memcached under the key ‘mmerge_ http://static.example.com/merged/standard.js?1′ (settings.MEDIAMERGE_MEMCACHED_URI_PREFIX + settings.MEDIA_URL + )

The final thing to make it all work is to instruct Nginx to pick up request for resources handled by the mediamerge tag directly from memcached.

?View Code APACHE
server
{
	listen 80;
	server_name static.example.com;
	expires 24h;
 
	# mediamerge
	location /merged/
	{
		set $memcached_key mmerge_$scheme://$server_name$request_uri;
		memcached_pass 127.0.0.1:11211;
	}
 
	location /
	{
		root /var/www/example_static;
	}
}

Bingo, Nginx will now serve all requests for resources in http://static.example.com /merged/ by looking up the contents in the specified memcached server (cluster). Nginx is already very fast for traditionally served static files but this is just ridiculously fast.

UPDATE: I’ve updated the source below since the article was originally posted. Changes:

  • A new option is now supported when using the disk based method: MEDIAMERGE_DISABLE_MERGE_UPDATES. Setting this option to True in settings.py will disable any checks for updated merge source files and is intended for production servers where your input files should not change in the background unless the web server is restarted. This should improve the rendering time of the tag in production considerably.
  • The files are now opened and written in binary mode

Here’s the source code for the mediamerge template tag:

?View Code PYTHON
import os, re
import logging
from os import path
import cStringIO as StringIO
from django.template import Library, Node, TemplateSyntaxError
from django.conf import settings
 
logger = logging.getLogger('mediamerge')
mc = None
register = Library()
 
class JSMergeNode(Node):
	def __init__(self, js_name, js_ver, js_files):
		self.js_name = '%s.js' % js_name
		self.js_ver = js_ver
		self.js_files = js_files
		self.merge_filename = path.normpath(path.join(settings.MEDIA_ROOT, self.js_name))
 
		# make sure that the target directory exists
		if not settings.MEDIAMERGE_ENABLE_MEMCACHED and not path.exists(path.dirname(self.merge_filename)):
			os.makedirs(path.dirname(self.merge_filename))
 
	def merge_files(self, merge_file):
		# join all input files into a single output file
		for js in self.js_files:
			jspath = path.normpath(path.join(settings.MEDIA_ROOT, js))
			if not path.isfile(jspath):
				continue
			self.merge_js(js, jspath, merge_file)
 
	def render(self, context):
		'''Implementation of the render method of the tag'''
		if settings.MEDIAMERGE_ENABLE_MEMCACHED:
			return self.render_memcached(context)
		else:
			return self.render_file(context)
 
	def render_passthrough(self, context):
		result = ""
		for js in self.js_files:
			if settings.MEDIA_URL[-1] == '/':
				result += '<script type="text/javascript" src="%s%s"></script>\n' % (settings.MEDIA_URL, js)
			else:
				result += '<script type="text/javascript" src="%s/%s"></script>\n' % (settings.MEDIA_URL, js)
		return result
 
	def render_memcached(self, context):
		'''Memcached based render method - merged output directly stored in memcache - where nginx will pick it up based on the rquest uri) without using a temp file'''
		import memcache
		key = settings.MEDIAMERGE_MEMCACHED_URI_PREFIX + str(self.compute_media_url())
 
		global mc
		if not mc:
			mc = memcache.Client(settings.MEMCACHED_SERVERS, debug=0)
 
		if not mc.get(key):
			merge_file = StringIO.StringIO()
			self.merge_files(merge_file)
			contents = merge_file.getvalue()
			mc.set(key, contents, 3600)
			logger.info('Setting memcached value for for resource %s - key %s' % (self.js_name, key))
 
		else:
			logger.debug('Merge key %s already exists in memcached for resource %s' % (key, self.js_name))
 
		return self.js_tag()
 
	def render_file(self, context):
		'''File based render method - merged output directly stored in the file specified with the tag - relative to settings.MEDIA_ROOT'''
		if not path.exists(self.merge_filename) or not self.is_merge_updated():
			logger.debug('Trying to open output file %s for resource %s' % (self.merge_filename, self.js_name))
			merge_file = open(self.merge_filename, 'wb')
			self.merge_files(merge_file)
			merge_file.close()
		else:
			logger.debug('Merge file %s already exists for resource %s' % (self.merge_filename, self.js_name))
 
		return self.js_tag()
 
	def is_merge_updated(self):
		if settings.MEDIAMERGE_DISABLE_MERGE_UPDATES:
			return True
 
		""" compares modification time of all js with merged js """
		last_mtime = 0 # last modification time of a js file
		for js in self.js_files:
			jspath = path.normpath(path.join(settings.MEDIA_ROOT, js))
			jsstat = os.stat(jspath)
			mtime = jsstat[-2]
			if last_mtime < mtime:
				last_mtime = mtime
		merge_mtime = os.stat(self.merge_filename)[-2]
		return merge_mtime > last_mtime
 
	def merge_js(self, jsname, jspath, fd):
		""" do merging and compressing of javascript """
		logger.debug('Merging file %s for resource %s' % (jsname, self.js_name))
 
		jsfile = open(jspath, "rb")
		jscontent = jsfile.read()
		fd.write('/* -- %s -- */\n\n' % jsname)
		fd.write(jscontent)
		fd.write("\n\n\n")
 
	def compute_media_url(self):
		# detect if MEDIA_URL ends with a slash
		if settings.MEDIA_URL[-1] == '/':
			js_url = '%s%s' % (settings.MEDIA_URL, self.js_name, self.js_ver)
		else:
			js_url = '%s/%s?%d' % (settings.MEDIA_URL, self.js_name, self.js_ver)
 
		return js_url
 
	def js_tag(self):
		""" write js tag for merged file inclusion """
 
		return '<script type="text/javascript" src="%s"></script>' % self.compute_media_url()
 
 
def do_jsmerge(parser, token):
	"""
	This will merge javascript files in only one compressed javascript.
 
	Usage::
 
		{% load mediamerge %}
		{% mediamerge <output_js_file> <output_version_integer> [jsfile1] [jsfile2] .. %}
 
	Example::
 
		{% load mediamerge %}
		{% mediamerge mediamergefile 1  js/file1.js js/file2.js js/file3.js %}
 
	This will create (if not exists) a /media/mediamergefile.js with three files merged. The HTML output for this will be::
 
		<script type="text/javascript" src="/media/mediamergefile.js"></script>
	"""
	tokens = token.contents.split()
	if len(tokens) < 3:
		raise TemplateSyntaxError(u"'%r' tag requires at least 1 arguments." % tokens[0])
	return JSMergeNode(tokens[1], int(tokens[2]), tokens[3:])
 
 
register.tag('mediamerge', do_jsmerge)

12 Responses to “High Performance Media Merging with Django, Nginx and Memcached”

  1. Bill Evans 16 February 2009 at 9:14 pm #

    This is an interesting idea, I never considered using memcached to serve static media as nginx is so good at it already. I’ll definitely check this out.

    Is it a pain to have to reload memcached anytime your static media is updated?

  2. oliver 16 February 2009 at 10:03 pm #

    Well not really since my usual update sequence on the production server looks like this:

    /etc/init.d/nginx stop
    /etc/init.d/apache2 stop
    cd /usr/local/git/<projectrepo>/
     
    git pull origin master
     
    /etc/init.d/memcached restart
    /etc/init.d/apache2 start
    /etc/init.d/nginx start
  3. codekoala 16 February 2009 at 11:26 pm #

    Great article! I think I will be visiting this again as soon as I have a server with enough memory to let me run all of the goodies alongside my sites :)

  4. robert 17 February 2009 at 1:08 am #

    Hello,

    first of all, congratulations, seems a really good piece of software and really optimal in terms of network usage !

    have you though about the option to reuse this system with the css files ? I suppose is mainly the same except the name and the correspoding html tags. And also would be really amazing to use a system like the described in http://www.djangosnippets.org/snippets/1324/ to avoid repeating the same css or javascript if we add it twice, which could be very typical in case we use templatetags.

    Also, related to Bill’s comment, could be better to have a process which just removes the current key from the memcached ? because some times memcached could be used for other things not nginx related, in my case this is used to mantain a global user related structure which takes a lot of time and cpu to be calculated.

    Anyway, I want to say again that this is something I was looking for !

  5. oliver 17 February 2009 at 1:17 am #

    Robert, they way I envisioned it for my site is that on a production server – and that’s where you would benefit from this – script files usually change only after a new build is pushed to the server. Thus the only expiration strategy is to restart memcached. Therefore there is no sophisticated change tracking.

  6. Andreas 17 February 2009 at 11:34 am #

    How much faster is nginx on serving content from memcached than static files?
    I believe OS like linux and *bsd caches frequently requested files in ram. Though you could garantuee that from memcached its always in memory but how about tcp latency? Does nginx connec to the memcached on every request?

    Though i see one big advantage when it comes to scaling, making a cloud of memcacheds. I guess it really dependes on how big the site is because if its really big you really would like to use something like amazon cloudfront.

  7. oliver 17 February 2009 at 3:43 pm #

    Andreas, agreed – compared to Nginx excellent file based static file handling the performance difference is pretty neglectable – especially on a single server setup. It is my understanding that Ngnix will not connect to memcached for every request but I will email the author of Nginx to be sure.

  8. Andreas 23 February 2009 at 6:11 am #

    Did some benchs out of curiousity
    Static file is almost 4x faster, though i think its a cleaner concept to put generated, “semi-static” stuff in mem than on disk.

    # test 1, static file served by nginx

    Total: connections 1 requests 10000 replies 10000 test-duration 1.646 s

    Connection time [ms]: min 1601.6 avg 1601.6 max 1601.6 median 1601.5 stddev 0.0
    Connection time [ms]: connect 0.5
    Connection length [replies/conn]: 10000.000

    Request rate: 6243.8 req/s (0.2 ms/req)
    Request size [B]: 71.0

    Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (0 samples)
    Reply time [ms]: response 0.2 transfer 0.0
    Reply size [B]: header 217.0 content 2181.0 footer 0.0 (total 2398.0)
    Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0

    CPU time [s]: user 0.10 system 0.93 (user 6.2% system 58.1% total 64.3%)
    Net I/O: 15054.6 KB/s (123.3*10^6 bps)

    Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
    Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

    # test 2, same file in memcached served directly from nginx

    Total: connections 1 requests 10000 replies 10000 test-duration 4.375 s

    Connection rate: 0.2 conn/s (4375.4 ms/conn, <=1 concurrent connections)
    Connection time [ms]: min 4375.4 avg 4375.4 max 4375.4 median 4375.5 stddev 0.0
    Connection time [ms]: connect 0.5
    Connection length [replies/conn]: 10000.000

    Request rate: 2285.5 req/s (0.4 ms/req)
    Request size [B]: 77.0

    Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (0 samples)
    Reply time [ms]: response 0.4 transfer 0.0
    Reply size [B]: header 164.0 content 2180.0 footer 0.0 (total 2344.0)
    Reply status: 1xx=0 2xx=10000 3xx=0 4xx=0 5xx=0

    CPU time [s]: user 0.48 system 2.39 (user 11.0% system 54.6% total 65.6%)
    Net I/O: 5403.6 KB/s (44.3*10^6 bps)

    Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
    Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

  9. oliver 23 February 2009 at 11:10 am #

    Andreas interesting analysis indeed. If it were not for the fact that it’s my personal perception to put generated, “semi-static” stuff into memory rather than on disk – I’d say screw memcached for this purpose :)

  10. Aaron 24 February 2009 at 4:22 pm #

    Looks like a great bit of code. I’m wondering why you opted to manually pass in a cache-busting ID rather than use the modification timestamp though. Is there a particular advantage to this? You seem to be “stat”ing the files every time you call the render_file function anyway.

  11. Google-TCW 8 September 2009 at 12:45 pm #

    Hi from google Google-TCW


Trackbacks/Pingbacks.

  1. Thoughts on static vs memcached serving by nginx « god morgon! - 12. Mar, 2009

    [...] proxy/mail proxy from Russia with love that needs no introduction today) directly. Recently more and more has done the samething but instead of generating static files, they put the generated content [...]

Leave a Reply