Inspired by this article I decided to find out if the same technique can be exploited in my current project which is developed in django.
My first problem was to come up with a viable cache key scheme since simply using the full request URI as suggested in the article wouldn’t work for me because my site renders a different version of a navigation menu depending on the authentication state of current the user. After weighing in the advantages and disadvantages between the super clean variant of factoring the session cookie and all other cookies into the memcached key and a less heavy weight method that would only append a server supplied abstract “page version” field to the request URI, I went for the latter. My resulting nginx virtual host config was looking like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | # define application servers upstream backend { server 127.0.0.1:8080 weight=1; } server { listen 80; server_name domain.com; access_log /var/log/nginx/domain.com.log; location / { # we never cache post requests if ($request_method = POST) { proxy_pass http://backend; break; } # extract cache key args and compute cache key if ($http_cookie ~* "pv=([^;]+)(?:;|$)") { set $page_version $1; } set $memcached_key $request_uri&pv=$page_version; # Check if local memcached server can answer this request default_type text/html; memcached_pass 127.0.0.1:11211; # Send to app. server if Memcached could not answer the request error_page 404 = @cache_miss; } location @cache_miss { proxy_pass http://backend; } } |
The important lines are line 23-27. Line 23 tests the cookies header of the current request for the presence of a cookie named ‘pv’ using a regular expression and line 25 stores the extracted value of this cookie in a temporary variable. Later, in line 27 we combine the value of that variable with the request URI to form the final memcached key. For example if the request uri would be /foo and the headers would contain a cookie pv=acme123, then the cache key would be /foo&pv=acme123.
Line 31 is where the actual memcached lookup happens. If the computed key is present in the cache, the cached page is returned immediately – completely bypassing the backend. If the page (or the specific version of the page) is not present in the cache, then the alternate branch at line 37 is taken which ultimately forwards the request to our load balanced cluster of application servers in line 39. The actual servers making up the cluster is defined at line 2. There’s currently only one lonely application server defined there and that’s a apache 2.2 listening on localhost port 8080 running our django application through mod_wsgi.
So far I have explained how the front line request processing in nginx works but how is invalidation handled? To tackle that problem I wrote a django middleware that can be applied to arbitrary django views using a decorator:
1 2 3 4 5 6 7 8 9 10 11 12 | @cache_page_nginx(3600*6, compute_common_page_version) def index(request): return render_to_response('index.html') def compute_common_page_version(request): '''This method is called by the nginx_cache decorator The method is supposed to return some value that can be used to distinguish different cached versions of the same page. For now it is sufficient to differentiate between authenticated and anonymous users''' if request.user and request.user.is_authenticated(): return 1 return 0 |
Some readers will notice that the decorator looks very similar to Django’s built-in cache_page decorator. And it actually works similar but with several important differences:
- It totally relies on nginx doing the actual cache lookup. The decorator is only responsible for storing the rendered content of page in the cache.
- It bypasses the Django cache framework and talks to memcached directly using cmemcached or python-memcached. This was done because there’s reallly no point in supporting an additional level of abstraction if the consumer of the cached content won’t look for it anywhere but in memcached.
A word about the compute_common_page_version function. What does it do? The purpose of this method is to ensure that the cache will contain at maximum two distinct versions of the home page. Once for authenticated users and one for anonymous users. For more complex scenarios, the complexity of this method would also increase, resulting on more distinct versions of the same page. Obviously the difficulty lies in maintaining data integrity for every user while not producing to many cached versions of a page, thus increasing memory pressure on the server and lowering cache hit ratio.
The decorator:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | from django.utils.decorators import decorator_from_middleware from django.conf import settings from django.core.cache import cache from django.utils.encoding import smart_unicode, smart_str from utils import kdebug try: import cmemcache as memcache except ImportError: try: import memcache except: raise InvalidCacheBackendError("Memcached cache backend requires either the 'memcache' or 'cmemcache' library") class UpdateCacheMiddleware(object): """ Updates memcached with the response of the request. It is of _paramount_ importance that the generated cache_key matches exactly the key generated by your web to lookup the page from the cache. This class talks to memcached, bypassing Djangos cache backend because it is only meant to talk to memcached and nothing else. Must the first piece of middleware in settings.MIDDLEWARE_CLASSES so that it'll get called last during the response phase. """ def __init__(self, cache_timeout, page_version_method): '''timeout - is the timeout in seconds after which the cached response expires in memcached page_version_method - is called during response processing and must return an arbitrary value that can be used to distinguish different cached versions of the same page ''' self.cache_timeout = cache_timeout self.page_version_method = page_version_method self.cache = memcache.Client(settings.NGINX_MEMCACHED_SERVERS) self.cookie_name = 'pv' def process_response(self, request, response): """Sets the cache, if needed.""" if not settings.NGINX_MEMCACHED_ENABLE or request.method != 'GET' or not response.status_code == 200: # because of interactions between this middleware and the # HTTPMiddleware, which throws the body of a HEAD-request # away before this middleware gets a chance to cache it. return response self.cache_response(request, response) return response def cache_response(self, request, response): '''Manually insertion of cached version of page into the cache''' # retrieve page version pv = self.page_version_method(request) # compute key that's follows the same naming convention as specified in nginx configuration: # set $memcached_key $request_uri&sid=$session_id; cache_key = self.compute_cache_key_from_request(request, pv) # update cache self.cache.set(cache_key, response._get_content(), self.cache_timeout) # communicate the page version to the browser using cookie response.set_cookie(self.cookie_name, pv) def compute_cache_key_from_request(self, request, page_version): '''Computes the cache key for the specified page and version of the page''' return self.compute_cache_key(request.get_full_path(), page_version) def compute_cache_key(self, page, page_version): '''Computes the cache key for the specified page and version of the page''' return smart_str("%s&%s=%s" % (page, self.cookie_name, page_version)) def invalidate_page_version_from_request(self, request, page_version): '''Removes the specified version of the page requested by 'request' from the cache''' cache_key = self.compute_cache_key_from_request(request, page_version) self.cache.delete(cache_key) def invalidate_page_version(self, page, page_version): '''Removes the specified version of the page requested by 'request' from the cache''' cache_key = self.compute_cache_key(request, page_version) self.cache.delete(cache_key) # decorator cache_page_nginx = decorator_from_middleware(UpdateCacheMiddleware) |
Hello,
I am aiming at a similar setup like this one. I am interested in whether you use a different server (or a different instance of nginx) for the static files or not.
The django devs suggest we use a different server for the static files, but I think that applies to apache mod_python setups. Am I right?
[...] this page was mentioned by Bogey (@bogey), Sajal Kayan (@sajal), phisite ☂ (@phisite) and others. [...]
What is the page_version_method ? What they doing?