Load Balancing and Django
NOTE: THIS DOCUMENT IS INCOMPLETE. It is currently undergoing heavy editing, and I make no account for its accuracy as of this moment. -James Crasta
It all started with this IRC conversation on 2006-06-30:
13:55 < mattmcc> There isn't really much about load balancing that's Django-specific. 13:55 < brantley> Mike: I don't think so. 13:55 < Crast> mikearagua: no, but that's not really a django-specific topic anyway, it's more a webserver-config and DNS 13:56 < brantley> Personally, if I were to do that, I'd setup a specific box as the database-box, everything else would pretty much fall into place. 13:56 < mikearagua> yeah, i know, i just assume there would be particularities when doing it with a django app 13:56 < mattmcc> Not really. 13:56 < Crast> I have loadbalanced django using fastcgi and a single lighttpd in front, no proxying, and it worked... but I had no reason for it except for as proof-of-concept 13:57 < Crast> lighttpd does a decent job of load-balancing fcgi 13:57 < brantley> Crast, was that on multiple machines? 13:57 < Crast> brantley: yessuh, though admittedley one of them was the lighttpd machine itself 13:58 < Crast> I took that one out of the loop and substituted a laptop just to see how things would change 13:58 < brantley> Hrm, I don't understand, exactly... Lighttpd sent the requests to fastcgi's on different servers? 13:58 < Crast> and the database server just ran on the lighttpd machine 13:59 < Crast> brantley: yep. you can specify multiple fastCGI servers to a single path using lighttpd 13:59 < brantley> Damn, I didn't know that. 13:59 < brantley> That's pretty sweet. 13:59 < brantley> Yeah, Mike. Do that. 13:59 < benbangert> Crast: lighty has some issues dealing with dead FastCGI processes 13:59 < Crast> therefore you don't get all the ugliness of HTTP proxying 13:59 < aurynn> yay 13:59 < Crast> benbangert: I don't use lighty to spawn the processes though, since they're not even running on the same machine 13:59 < mikearagua> i was mostly thinking about issues with session state 14:00 < benbangert> ie, when a Fast CGI process stops responding, it takes awhile before lightty will shut it down and restart it 14:00 < aurynn> automatic cpu underclocking, as well as load-based rampup fills me with complete joy. 14:00 < Crast> benbangert: however, you are correct 14:00 < benbangert> Crast: yea, but it doesn't seem to do a very good job noticing when a FCGI connection has stopped answering requests 14:00 < brantley> Err, I think the sessions are handled in the database with cookies, it should be fine. 14:00 < benbangert> I heard pound is really slick 14:00 < benbangert> http://www.apsis.ch/pound/ 14:01 < Crast> benbangert: that's due to the natural timeout of the protocol though, it doesn't "not notice" it just allows sufficient time to complete the request 14:01 < mikearagua> brantley: that's exactly what i wanted to know 14:01 < mikearagua> thanks. 14:01 < brantley> I can't give you an exact affirmative, you should probably ask on django-users. 14:01 < brantley> But I'm pretty sure that's how it works. 14:01 < Crast> yeah, sessions are stored in the DB 14:02 < Crast> serialized python dicts more or less 14:02 < brantley> And hooked up through cookies. 14:02 < brantley> And as long as the browser thinks it's the same server, it'll work fine. 14:02 < Crast> yep 14:02 < mikearagua> it seemed that way to me but i could have been missing something 14:02 < brantley> Also: I would create a specific server that is dedicated to serving media. 14:03 < mikearagua> maybe i should start messing with lighttpd. i use apache mostly 14:03 < brantley> With big fast hard-drives. 14:03 < Crast> well you do have a possible concurrency issue if two processes are using a session at the same time, whicever one writes second writes the finished session, however that'll happen even without load balancing, and it's really a non-issue 14:03 < brantley> That's really the database that needs to care about that, it seems to me. 14:03 < brantley> The first "load-balancing" should be separating the application server from the media serving. 14:04 < brantley> Then you go from there. 14:04 < Crast> I don't mind writing up a doc of my experimental results 14:04 < mikearagua> that would be cool 14:04 < brantley> That would be very progressive for enterprise Django. 14:04 < mikearagua> i could add to the doc as i go. i would like to do some testing with load balancing before i deploy my app 14:05 < Crast> so I'll put the doc on the wiki then 14:05 < mattmcc> Serving the static media on a separate box is definitely the first step. Optimizations for that server will be quite different from the ones serving up python. 14:05 < mikearagua> yeah, that's fairly easy though 14:06 < mikearagua> there are lots of docs describing how to serve static files fast 14:07 < mikearagua> especially with apache. there are some issues with having dojo on a separate domain name though that i've run into so for the time being i'm running both off the same machine 14:07 < Crast> benbangert: there's a "disable-time" option to fastcgi servers, perhaps that needs to be tweaked for loadbalancing to be effective 14:07 < Crast> (on lighttpd, that is)
There are a few approaches to load balancing, and in many cases, they can be combined (some large websites use more than one of these strategies)
- Round Robin DNS: Involves setting up multiple DNS records for the same hosting domain.
- Advantages: Very simple to set up, mostly does not require any special server setup at all. Machines serving the site can be as geographically dispersed as you wish. You can also round-robin more than just web requests, so even things like FTP, IRC, and many other services can be load-balanced this way.
- Disadvantages: The round-robin DNS is picked by your local DNS server serving you recursion results (usually your ISP for most end-users). There's no nice way to "switch" users down to another server if one server is overloaded or down, and making changes to the DNS can take days to propagate across DNS caches. It's very difficult to track if users are unable to access your site, since they may be sending requests nowhere.
- Reverse Proxying Using a front-end server or dedicated hardware device in front of multiple web serving.
- Advantages: This allows a web site to be served by multiple independent web servers. Most load balancer solutions include tools for monitoring server load and appropriately sending requests to the least-busy servers, and performing proper failover. With appropriate hardware this can scale to very large proportions, and achieve very good uptime / consistent performance.
- Disadvantages: Set-up is tricky, and load-balancer hardware is often expensive. In most set-ups, the static files have to be mirrored to all the web-servers, and every machine has to be running a full webserver stack. Some issues pop up with things like some HTTP meta-information being lost due to the proxying.
- distributing dynamic content-generating processes
- Advantages:
- Disadvantages:
Example "traditional" Load-balanced network layout
Obligatory ascii art diagram follows:
---------- ((( INTERNET ))) ----+----- | | | +-----------+ | Gateway |-----+ +-----------+ | | +-----------------+ +-------------------+ +----------------+ | Load Balancer | | Database Server | | Web Server 1 | +-----------------+ +-------------------+ +----------------+ | | | Internal network --> +----------------------+-----------------------+--------- | | +----------------+ +----------------+ | Web Server 2 | | Web Server 3 | (more web servers) +----------------+ +----------------+
About this document
This document will describe in detail one possible method of achieving load balancing with django, including discussion on solving the "static file problem." This solution involves using the LigHTTPd web-server, FastCGI, and anywhere between two and infinity web servers. Also, a potential solution for reverse proxying using Apache and mod_python will also be discussed, but in lesser detail. Some performance numbers will be put out based on my independent study I have conducted using a variety of my own machines encompassing multiple platforms and operating systems, giving a general idea of the performance of load balancing.
This document will also discuss potential pitfalls of using a load-balanced setup, how they occur and how to avoid them. It is by no means comprehensive, but usually your issues will fall into one of a few categories, most of which can be solved rather simply.