#1142 closed enhancement (fixed)
Support for multiple database connections
Reported by: | Simon Willison | Owned by: | Alex Gaynor |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | normal | Keywords: | multiple database feature |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Django currently assumes that all models will be stored in a single database, and that only one database connection will be used for the duration of a request. This assumption does not scale to really large applications, where it is common for multiple database connections to be used in non-obvious ways.
Three examples include:
- Traditional replication, where all writes go to a single master database while reads are distributed across a number of slave databases.
- Sharding, where (for example) user accounts 1-1000 live on db1, 1001-2000 live on db2 etc.
- Different types of data live on different servers / clusters. Frequently accessed user profile data might be stored on a seperate database/cluster from log data which is frequently written but very rarely accessed.
At the very least, Django needs to allow more than one database connection to be maintained by the DB wrapper. The default should still be a single connection as this is the common case, but Django should not get in the way should multiple connections be desired.
Rather than having a single connection, how about maintaining a dictionary of connections? A "default" key could correspond to the connection that is used in most cases, but other connections can be configured in the settings file and assigned names. There would need to be a mechanism somewhere for Django model classes to be told which named connection they should use.
Simple replication may end up being a different issue entirely - it's possible that could be handled just with a custom DB backend that knows to send writes to one server and distribute reads across several others. The above change (where Django allows multiple DB connections) is still essential for more complex configurations.
More about this on the mailing list: http://groups.google.com/group/django-users/browse_thread/thread/d08f4975da831bb/75ede5e9c7f0db53
Attachments (1)
Change History (81)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
My limited experience with really large apps suggests that handling clustering at a lower level won't provide enough flexibility. For huge sites you end up having to take different scaling approaches for different bits of functionality - you might have one web service call (or RSS feed) that is hit more than anything else and needs to be scaled in a different way for example. You end up needing to scale different parts of the application in different ways, often using different databases for different parts of the app.
There are also other use-cases for multiple database connections outside of scaling - talking to two legacy applications at once for example.
comment:3 by , 19 years ago
Django may benefit from the ability to define connections at the application level that can override the sitewide setting. This would make it rather trivial to pull data in from multiple platforms.
An interface to SQLrelay (http://sqlrelay.sourceforge.net/) might help the scalability issue.
comment:4 by , 19 years ago
Simon,
Per your recommendation here I'm just adding a brief description of the scenario in which we typically encounter this current limitation of Django:
In a mixed-database environment, We're typically faced with having to model data types hosted on different database servers. For example, web content may generally live on our MySQL servers, but some applications will need to incorporate invoice data from a MS SQL database. While it's certainly a general mess to have data spread out over multiple engines like this, I think it's also a fact that many developers (especially in small-medium corporate environments) are faced with, and not having the ability to easily manage data from different source within a Django app is going to be a serious limitation to these people.
The solutions of either being able to define a dictionary of database connections in the sitewide config file or being able to specify database connections on an application level, could both work well. I wonder, though, what exactly would be entailed in defining connections on the application level? Would applications generally be able to add their own settings file in which sitewide preferences could be overwritten, or would there be a specific module, like myapp.db.connections for specifying app-level connections? Maybe the simplest and most backward-compatible approach would be to keep a dictionary of named connections available to installed apps in the sitewide settings file as originally suggested.
Thanks. /Morten
comment:5 by , 19 years ago
priority: | normal → low |
---|---|
Severity: | major → normal |
comment:6 by , 19 years ago
I would have to agree that this is a serious limitation.
It is not always possible to restructure existing databases into a single database for use and it would be of great valu to be able to use the django database model to integrate multiple databases at some level.
comment:7 by , 18 years ago
Type: | enhancement |
---|
comment:8 by , 18 years ago
Type: | → defect |
---|
comment:10 by , 18 years ago
Cc: | added |
---|
comment:11 by , 18 years ago
Cc: | added |
---|
comment:12 by , 18 years ago
Component: | Database wrapper → Validators |
---|---|
milestone: | → Version 0.91 |
priority: | low → high |
Severity: | normal → blocker |
Type: | defect → enhancement |
Version: | → new-admin |
comment:13 by , 18 years ago
Component: | Validators → django-admin.py |
---|---|
milestone: | Version 0.91 |
priority: | high → highest |
Severity: | blocker → trivial |
Type: | enhancement → task |
Version: | new-admin → SVN |
comment:14 by , 18 years ago
Cc: | added; removed |
---|---|
Keywords: | rthml tab space editor js added |
Summary: | Support for multiple database connections → hi-world cup |
comment:15 by , 18 years ago
Summary: | hi-world cup → Support for multiple database connections |
---|
comment:16 by , 18 years ago
Cc: | added; removed |
---|
comment:17 by , 18 years ago
What is current status of this? When can we expect to see this in trunk?
thanks,
Forest
comment:18 by , 18 years ago
priority: | highest → normal |
---|---|
Severity: | trivial → normal |
comment:19 by , 18 years ago
Cc: | added |
---|
comment:20 by , 18 years ago
Has there been any progress on this feature? I am in a similar situation. My project needs to be able to query different databases for different pieces of data. Another suggestion would be to be able to over ride the connection settings at the app level.
comment:21 by , 18 years ago
Component: | django-admin.py → Database wrapper |
---|---|
Keywords: | rthml tab space editor js removed |
Type: | task → enhancement |
reverted metadata spam.
comment:22 by , 18 years ago
Keywords: | multiple database added |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
Version: | SVN |
comment:24 by , 18 years ago
Is multiple database support going to implemented?
At least in my case, I was looking at Django for an application for my company, but it will need to access at least 2 databases. The main one, plus a database from another product over which we have no control and can't change.
We would actually like it to be 3 databases with the 3rd one being for reporting data. We prefer to keep reporting data in a separate database for performance purposes and for scalability, but if absolutely necessary, we would put it in the dsame database as the main data. That still leaves us with at least 2 databases.
We have another app that was done in Java with Hibernate and Hibernate is currently accessing 4 databases from within the same app.
comment:25 by , 18 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Patch needs improvement: | set |
Triage Stage: | Design decision needed → Accepted |
According to MultipleDatabaseSupport, the branch is feature complete and seems to be only lacking documentation.
If anyone is using or has tested the multi db branch, please note your experiences here.
Marking patch needs improvement since the latest commit to the branch notes that tests are still failing.
comment:26 by , 18 years ago
Question: Does the patch for this also provide a work-around for keeping the native Django apps tables out of your database?
comment:27 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | py23_set_fix_28feb07.diff added |
---|
Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3
comment:28 by , 18 years ago
Has patch: | unset |
---|
It's been a long time since something has happened to this branch.. will it ever get included in the main trunk?
I wish to use this, but i want to stay with the current Django code. Is it possible to check-out the trunk and merge the changes from this branch?
comment:30 by , 18 years ago
Please post question like comment 28 to the users list, not this ticket. Ticket comments are for resolving the issue at hand, not seeking genera. information.
comment:31 by , 18 years ago
Hi there, I have been following this conversation. I am working with some friends on a research project that requires one, maybe several postgres dbs. It is a bit off-label, but were hoping to make use of some of the Django features.
Multiple Dbs is something long-term we might need. Can someone tell me how to install this patch? I am running latest Django code with Python 2.4
So far, I see some patch online with about four lines of code... My Python skills are somewhere around early intermediate.. so I am not sure if :
- if just placing that code in my latest subversion checkout (where the surrounding code looks a bit different than what is there).. constitutes an installation of this patch.
and
- how can I use it?
Appreciate your patience and assistance.
Robert
comment:32 by , 18 years ago
Cc: | added |
---|
comment:33 by , 17 years ago
Cc: | added |
---|
comment:34 by , 17 years ago
Cc: | added; removed |
---|---|
Component: | Database wrapper → Uncategorized |
Owner: | changed from | to
comment:35 by , 17 years ago
Cc: | added; removed |
---|---|
Component: | Uncategorized → Database wrapper |
Owner: | changed from | to
Reverted spam.
comment:36 by , 17 years ago
Keywords: | feature added |
---|
comment:37 by , 17 years ago
Triage Stage: | Design decision needed → Someday/Maybe |
---|
comment:38 by , 17 years ago
Cc: | added |
---|
comment:39 by , 17 years ago
Cc: | added |
---|
comment:40 by , 17 years ago
Cc: | added |
---|
comment:41 by , 17 years ago
Cc: | added |
---|
comment:42 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I reviewed an API design with Jacob and Adrian (and others) and will now work on a prototype. Much of the credit for the API goes to Ben Ford (and his code should give me a big headstart).
follow-up: 44 comment:43 by , 17 years ago
I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?
comment:44 by , 17 years ago
Replying to cmcavoy:
I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?
I'll answer your question on django-developers.
comment:45 by , 17 years ago
Cc: | added |
---|
comment:46 by , 17 years ago
Cc: | added |
---|
comment:47 by , 17 years ago
Cc: | added |
---|
comment:48 by , 17 years ago
Cc: | added |
---|
comment:49 by , 17 years ago
Cc: | added |
---|
comment:50 by , 17 years ago
Cc: | added |
---|
comment:51 by , 17 years ago
Cc: | added |
---|
comment:52 by , 17 years ago
Ben Ford set up a mercurial repository for new work on multiple databases at http://hg.woe-beti.de. A track setup is available at http://trac.woe-beti.de.
comment:53 by , 16 years ago
See this django-developers thread for more discussion (including my API proposal): http://tinyurl.com/5q7oy5
comment:55 by , 16 years ago
Cc: | added |
---|
comment:56 by , 16 years ago
Cc: | added |
---|
comment:57 by , 16 years ago
Cc: | added |
---|
What is the current status on this issue? There doesn't appear to have been any visible movement in several months now, and this is something which would be very useful in a project I'm working on. I have a slightly different use case than the previously mentioned ones, however... and I'm sure it's not an entirely common one, but perhaps something to take into consideration.
The company I work for is sitting on a fairly large data warehouse for a number of clients. Each client has their own database, with a common schema, on one of two MS SQL 2k servers. In the near future, we will hopefully be moving some of these clients onto a PostgreSQL server (and in the distant future, it would be nice to move towards Greenplum; although that should be transparent with psycopg2, afaik). In addition, there is a separate database already running on the PostgreSQL server, which stores configuration parameters and which I am using for the Django system tables.
On every page load, I need Django to connect to database 'A' on the PostgreSQL server for any Django specific operations, then connect to an arbitrary database on either of the MS SQL servers, depending on which of our clients' data is currently being viewed. From what I've seen so far in the discussion in this ticket and on MultipleDatabaseSupport, this doesn't sound like it would likely be a supported scenario, as the connection to be used is determined dynamically, rather than a static connection chosen per model. In my case, all models apply across all database connections.
Now, I have made this work thus far with my own connection manager (http://dpaste.com/hold/84458/) but using it is somewhat kludgy and I'm sure it's far from a full implementation. I also had to apply a hack (http://code.google.com/p/django-pyodbc/issues/detail?id=18) to django-pyodbc and apply a pending patch (#6710) to Django in order to get it to work. Here is an example of why it in use: http://dpaste.com/hold/84465/
comment:58 by , 16 years ago
There has, in fact, been lots of activity. Perhaps you might wish to peruse the archives of django-developers, which is where design work takes place. This ticket is really just a placeholder that will be closed when we commit some kind of final solution (and possibly for patches as we get closer if the work is self-contained enough not to require a branch).
comment:59 by , 16 years ago
Cc: | added |
---|
comment:60 by , 16 years ago
Cc: | added |
---|
comment:62 by , 16 years ago
Cc: | added |
---|
comment:63 by , 16 years ago
(idea #10492)
My look is
# Wrapper classes from re import compile class Db: def __init__(self, cstr): self.link = None __re_dbstr = compile(r'^(?P<engine>[^:]+)://((?P<user>[^\(:]+)(\((?P<role>.+)\))?(:(?P<password>[^@]+))?(@(?P<host>[^\/:]+)(:(?P<port>\d+))?)?)?/(?P<name>.+)') try: self.__db = __re_dbstr.search(cstr).groupdict() # Fix for sqlite if self.__db['engine'].startswith('sqlite'): self.__db['name'] = "/%s" % self.__db['name'] except: self.__db = {} raise Exception("#1") self.connect() def connect(self): self.link = 'Connection link' # .... def keys(self): return self.__db.keys() def items(self): return self.__db.items() def __getitem__(self, key): return self.__db.get(key, None) def __getattr__(self, key): return self.__db.get(key, None) class DbPool: def __init__(self): self.__dbs = {} self.__default = None def __getitem__(self, db_alias): if db_alias not in self.__dbs: raise Exception('#2') return self.__dbs[db_alias] def __getattr__(self, key): if not self.__default: raise Exception('#3') return self.__dbs[ self.__default ].__getattr__(key) def add(self, db_alias, db_str): self.__dbs[db_alias] = Db(db_str) if self.__default: return self.set_default(db_alias) def get_default(self): return self.__default def set_default(self, dbAlias): if dbAlias in self.__dbs: self.__default = dbAlias else: raise Exception("#4") # Settings DATABASES = { 'alpha': 'sqlite3:///:memory:', 'beta': 'sqlite3:///tmp/django.sqlite3', 'gamma': 'mysql://user1/django', 'delta': 'mysql://user1:password1/django', 'default': 'mysql://user1:password1@host1/django', 'reserv1': 'mysql://user1:password1@host1:1234/django', 'reserv2': 'postgresql://user1(role1):password1@host1/django', 'etc': 'postgresql://user1(role1):password1@host1:1234/django', } DATABASE_DEFAULT = 'default' DATABASE_OPTIONS = {} DATABASE_OPTIONS['etc'] = { 'ssl':'...' } # Setting handlers pool = DbPool() for dbase in DATABASES.items(): pool.add(*dbase) pool.set_default(DATABASE_DEFAULT) # Default database print pool.engine, pool.host, pool.name # Some other db print pool['etc'].engine, pool['etc'].host, pool['etc'].name # Models class SomeModel(): # If not defined # meta_connections = [DATABASE_DEFAULT] meta_connections = ['alpha', 'beta'] # API # instead # from django.db import connection # connection.cursor() # # use # from django.db import pool # pool.link.cursor() # pool['etc'].link.cursor()
comment:64 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
Triage Stage: | Someday/Maybe → Accepted |
This ticket is accepted as a part of the 2009 GSOC.
comment:65 by , 16 years ago
I made a Multiple database manager for django-blocks see: http://django-blocks.googlecode.com/svn/trunk/src/python/blocks/apps/core/managers.py
just need to do something like this to your model:
from django.db import models from blocks.apps.core.managers import MultiDBManager class SomeModel(models.Model): code = models.IntegerField(primary_key=True) name = models.CharField(max_length=250) objects = MultiDBManager() class Meta: db_name = 'oracle'
follow-up: 72 comment:66 by , 16 years ago
Cc: | added |
---|
comment:67 by , 16 years ago
Cc: | added |
---|
comment:68 by , 16 years ago
Cc: | added |
---|
comment:69 by , 15 years ago
Cc: | added |
---|
comment:70 by , 15 years ago
Cc: | added; removed |
---|
comment:71 by , 15 years ago
Cc: | added |
---|
follow-up: 73 comment:72 by , 15 years ago
Replying to alexkoshelev:
I've implemented this on a development version of django 1.1. It appears as though it appends the table name of the app in the database established in the project settings instead of adding the model's tables to the declared database
from django.db import models from django.blocks.apps.core.managers import MultiDBManager class FireMap(models.Model): address = models.CharField(max_length=100) city = models.CharField(max_length=50) zip = models.IntegerField() latitude = models.DecimalField(max_digits=17, decimal_places=14) longitude = models.DecimalField(max_digits=17, decimal_places=14) neighborhood = models.CharField(max_length=30) assesorID = models.IntegerField() assessorURL = models.URLField() homeValue = models.CommaSeparatedIntegerField(max_length=8) homeOwner = models.CharField (max_length=30) mainIMGURL = models.URLField() photos = models.CommaSeparatedIntegerField(max_length=40) articleID1 = models.IntegerField() articleID2 = models.IntegerField() articleID3 = models.IntegerField() extrainfo = models.TextField() objects = MultiDBManager() def __str__(self): return self.name class Meta: db_name = 'firemap'
the setting for the Db in my project settings file is set to:
DATABASE_ENGINE = 'postgresql_psycopg2' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'. DATABASE_NAME = 'newsok' # Or path to database file if using sqlite3.
When the tables are created I see in the newsok Db a firemap_firemap table (firemap is the name of the app as well) but there is nothing in the actual firemap Db.
I apologize if this isn't the place to report this issue. I am new around here and to django in general.
comment:73 by , 15 years ago
Replying to nicktank:
Replying to alexkoshelev:
I've implemented this on a development version of django 1.1.
...
I apologize if this isn't the place to report this issue. I am new around here and to django in general.
Nick - as noted in comment 64, Multiple database connections was the subject of a 2009 Google Summer of Code project. Alex Gaynor has been developing the code to implement this. It's not a minor change - it requires lots of changes throughout Django. Check out Alex's GitHub repository if you want to see the progress he has made. It isn't quite ready for trunk yet, but it's getting close, and it's sufficiently functional that you can try it out.
comment:74 by , 15 years ago
If you want to see the current state of work please use my branch in soc2009/multidb in the Django repository, right now my GitHub repo is undergoing severe code alterations that make it high unstable.
comment:75 by , 15 years ago
Cc: | added |
---|
comment:76 by , 15 years ago
Cc: | added |
---|
comment:77 by , 15 years ago
Cc: | added |
---|
comment:78 by , 15 years ago
Cc: | added |
---|
comment:80 by , 15 years ago
Cc: | removed |
---|
I know posts should be focused on ticket problems or solutions, but I can't help myself.
thank you so much! what a great Christmas gift!
An Alternative to this would be to use something like CJDBC/Sequoia (https://forge.continuent.org/projects/sequoia/) and manage the clustering at a lower level to django itself.
the link is for a JDBC wrapper, but they also have a C++ library which might be useful. all that is needed is a python module ;-)
regards
Ian