Opened 19 years ago

Closed 15 years ago

Last modified 15 years ago

#1142 closed enhancement (fixed)

Support for multiple database connections

Reported by: Simon Willison Owned by: Alex Gaynor
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: multiple database feature
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Django currently assumes that all models will be stored in a single database, and that only one database connection will be used for the duration of a request. This assumption does not scale to really large applications, where it is common for multiple database connections to be used in non-obvious ways.

Three examples include:

  1. Traditional replication, where all writes go to a single master database while reads are distributed across a number of slave databases.
  2. Sharding, where (for example) user accounts 1-1000 live on db1, 1001-2000 live on db2 etc.
  3. Different types of data live on different servers / clusters. Frequently accessed user profile data might be stored on a seperate database/cluster from log data which is frequently written but very rarely accessed.

At the very least, Django needs to allow more than one database connection to be maintained by the DB wrapper. The default should still be a single connection as this is the common case, but Django should not get in the way should multiple connections be desired.

Rather than having a single connection, how about maintaining a dictionary of connections? A "default" key could correspond to the connection that is used in most cases, but other connections can be configured in the settings file and assigned names. There would need to be a mechanism somewhere for Django model classes to be told which named connection they should use.

Simple replication may end up being a different issue entirely - it's possible that could be handled just with a custom DB backend that knows to send writes to one server and distribute reads across several others. The above change (where Django allows multiple DB connections) is still essential for more complex configurations.

More about this on the mailing list: http://groups.google.com/group/django-users/browse_thread/thread/d08f4975da831bb/75ede5e9c7f0db53

Attachments (1)

py23_set_fix_28feb07.diff (570 bytes ) - added by afarnham@… 18 years ago.
Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

Download all attachments as: .zip

Change History (81)

comment:1 by ian@…, 19 years ago

An Alternative to this would be to use something like CJDBC/Sequoia (https://forge.continuent.org/projects/sequoia/) and manage the clustering at a lower level to django itself.

the link is for a JDBC wrapper, but they also have a C++ library which might be useful. all that is needed is a python module ;-)

regards
Ian

comment:2 by Simon Willison, 19 years ago

My limited experience with really large apps suggests that handling clustering at a lower level won't provide enough flexibility. For huge sites you end up having to take different scaling approaches for different bits of functionality - you might have one web service call (or RSS feed) that is hit more than anything else and needs to be scaled in a different way for example. You end up needing to scale different parts of the application in different ways, often using different databases for different parts of the app.

There are also other use-cases for multiple database connections outside of scaling - talking to two legacy applications at once for example.

comment:3 by Kozmo Kid, 19 years ago

Django may benefit from the ability to define connections at the application level that can override the sitewide setting. This would make it rather trivial to pull data in from multiple platforms.

An interface to SQLrelay (http://sqlrelay.sourceforge.net/) might help the scalability issue.

comment:4 by Morten Bagai, 19 years ago

Simon,

Per your recommendation here I'm just adding a brief description of the scenario in which we typically encounter this current limitation of Django:

In a mixed-database environment, We're typically faced with having to model data types hosted on different database servers. For example, web content may generally live on our MySQL servers, but some applications will need to incorporate invoice data from a MS SQL database. While it's certainly a general mess to have data spread out over multiple engines like this, I think it's also a fact that many developers (especially in small-medium corporate environments) are faced with, and not having the ability to easily manage data from different source within a Django app is going to be a serious limitation to these people.

The solutions of either being able to define a dictionary of database connections in the sitewide config file or being able to specify database connections on an application level, could both work well. I wonder, though, what exactly would be entailed in defining connections on the application level? Would applications generally be able to add their own settings file in which sitewide preferences could be overwritten, or would there be a specific module, like myapp.db.connections for specifying app-level connections? Maybe the simplest and most backward-compatible approach would be to keep a dictionary of named connections available to installed apps in the sitewide settings file as originally suggested.

Thanks. /Morten

comment:5 by Adrian Holovaty, 19 years ago

priority: normallow
Severity: majornormal

comment:6 by anonymous, 19 years ago

I would have to agree that this is a serious limitation.
It is not always possible to restructure existing databases into a single database for use and it would be of great valu to be able to use the django database model to integrate multiple databases at some level.

comment:7 by Go, 19 years ago

Type: enhancement

comment:8 by anonymous, 19 years ago

Type: defect

comment:9 by Jacob, 19 years ago

(In [3198]) Created branch for MultipleDatabaseSupport. Refs #1142.

comment:10 by anonymous, 19 years ago

Cc: gary.wilson@… added

comment:11 by Andy Dustman <farcepest@…>, 19 years ago

Cc: farcepest@… added

comment:12 by anonymous, 19 years ago

Component: Database wrapperValidators
milestone: Version 0.91
priority: lowhigh
Severity: normalblocker
Type: defectenhancement
Version: new-admin

comment:13 by anonymous, 19 years ago

Component: Validatorsdjango-admin.py
milestone: Version 0.91
priority: highhighest
Severity: blockertrivial
Type: enhancementtask
Version: new-adminSVN

comment:14 by hi-world cup, 18 years ago

Cc: hi-world cup added; gary.wilson@… farcepest@… removed
Keywords: rthml tab space editor js added
Summary: Support for multiple database connectionshi-world cup

comment:15 by Adrian Holovaty, 18 years ago

Summary: hi-world cupSupport for multiple database connections

comment:16 by anonymous, 18 years ago

Cc: gary.wilson@… farcepest@… added; hi-world cup removed

comment:17 by anonymous, 18 years ago

What is current status of this? When can we expect to see this in trunk?

thanks,
Forest

comment:18 by Adrian Holovaty, 18 years ago

priority: highestnormal
Severity: trivialnormal

comment:19 by anonymous, 18 years ago

Cc: eikke@… added

comment:20 by nmccullough@…, 18 years ago

Has there been any progress on this feature? I am in a similar situation. My project needs to be able to query different databases for different pieces of data. Another suggestion would be to be able to over ride the connection settings at the app level.

comment:21 by Gary Wilson <gary.wilson@…>, 18 years ago

Component: django-admin.pyDatabase wrapper
Keywords: rthml tab space editor js removed
Type: taskenhancement

reverted metadata spam.

comment:22 by Simon G. <dev@…>, 18 years ago

Keywords: multiple database added
Triage Stage: UnreviewedDesign decision needed
Version: SVN

comment:23 by namiswaan@…, 18 years ago

I hope this gets merged into the trunk soon.

comment:24 by dleewo@…, 18 years ago

Is multiple database support going to implemented?

At least in my case, I was looking at Django for an application for my company, but it will need to access at least 2 databases. The main one, plus a database from another product over which we have no control and can't change.

We would actually like it to be 3 databases with the 3rd one being for reporting data. We prefer to keep reporting data in a separate database for performance purposes and for scalability, but if absolutely necessary, we would put it in the dsame database as the main data. That still leaves us with at least 2 databases.

We have another app that was done in Java with Hibernate and Hibernate is currently accessing 4 databases from within the same app.

comment:25 by Gary Wilson <gary.wilson@…>, 18 years ago

Has patch: set
Needs documentation: set
Patch needs improvement: set
Triage Stage: Design decision neededAccepted

According to MultipleDatabaseSupport, the branch is feature complete and seems to be only lacking documentation.

If anyone is using or has tested the multi db branch, please note your experiences here.

Marking patch needs improvement since the latest commit to the branch notes that tests are still failing.

comment:26 by mmccarty@…, 18 years ago

Question: Does the patch for this also provide a work-around for keeping the native Django apps tables out of your database?

comment:27 by anonymous, 18 years ago

Cc: peter.kese@… added

by afarnham@…, 18 years ago

Attachment: py23_set_fix_28feb07.diff added

Fixes mulitple-db-support branch django/db/models/manager.py for Python 2.3

comment:28 by namiswaan@…, 18 years ago

Has patch: unset

It's been a long time since something has happened to this branch.. will it ever get included in the main trunk?

I wish to use this, but i want to stay with the current Django code. Is it possible to check-out the trunk and merge the changes from this branch?

comment:29 by namiswaan@…, 18 years ago

Has patch: set

oops, reverting the hash patch flag i removed

comment:30 by Malcolm Tredinnick, 18 years ago

Please post question like comment 28 to the users list, not this ticket. Ticket comments are for resolving the issue at hand, not seeking genera. information.

comment:31 by robertmlamarca@…, 18 years ago

Hi there, I have been following this conversation. I am working with some friends on a research project that requires one, maybe several postgres dbs. It is a bit off-label, but were hoping to make use of some of the Django features.

Multiple Dbs is something long-term we might need. Can someone tell me how to install this patch? I am running latest Django code with Python 2.4

So far, I see some patch online with about four lines of code... My Python skills are somewhere around early intermediate.. so I am not sure if :

  1. if just placing that code in my latest subversion checkout (where the surrounding code looks a bit different than what is there).. constitutes an installation of this patch.

and

  1. how can I use it?

Appreciate your patience and assistance.

Robert

comment:32 by anonymous, 18 years ago

Cc: chris.mcavoy@… added

comment:33 by (removed), 18 years ago

Cc: ferringb@… added

comment:34 by Forte , 18 years ago

Cc: lindemann@… added; gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… removed
Component: Database wrapperUncategorized
Owner: changed from Adrian Holovaty to Jacob

comment:35 by Malcolm Tredinnick, 18 years ago

Cc: gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… added; lindemann@… removed
Component: UncategorizedDatabase wrapper
Owner: changed from Jacob to Adrian Holovaty

Reverted spam.

comment:36 by Philippe Raoult, 17 years ago

Keywords: feature added

comment:37 by Jacob, 17 years ago

Triage Stage: Design decision neededSomeday/Maybe

comment:38 by anonymous, 17 years ago

Cc: iacobs@… added

comment:39 by anonymous, 17 years ago

Cc: andrewjwilkinson@… added

comment:40 by anonymous, 17 years ago

Cc: email@… added

comment:41 by darylspitzer, 17 years ago

Cc: daryl.spitzer@… added

comment:42 by darylspitzer, 17 years ago

Owner: changed from nobody to darylspitzer
Status: newassigned

I reviewed an API design with Jacob and Adrian (and others) and will now work on a prototype. Much of the credit for the API goes to Ben Ford (and his code should give me a big headstart).

comment:43 by cmcavoy, 17 years ago

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

in reply to:  43 comment:44 by darylspitzer, 17 years ago

Replying to cmcavoy:

I'm glad to see someone taking this ticket, it's something I've wanted for a long time. Any chance you can write up your planned API?

I'll answer your question on django-developers.

comment:45 by mzdaniel, 17 years ago

Cc: remco@… mzdaniel@… added

comment:46 by anonymous, 17 years ago

Cc: rik24d@… added

comment:47 by anonymous, 17 years ago

Cc: stuartw@… added

comment:48 by Joe Holloway <jholloway7+django@…>, 17 years ago

Cc: jholloway7+django@… added

comment:49 by simeon, 17 years ago

Cc: simeonf+django@… added

comment:50 by Tom Carrick <knyght@…>, 17 years ago

Cc: knyght+django@… added

comment:51 by anonymous, 17 years ago

Cc: spoof@… added

comment:52 by Koen Biermans <koen.biermans@…>, 17 years ago

Ben Ford set up a mercurial repository for new work on multiple databases at http://hg.woe-beti.de. A track setup is available at http://trac.woe-beti.de.

comment:53 by darylspitzer, 17 years ago

See this django-developers thread for more discussion (including my API proposal): http://tinyurl.com/5q7oy5

comment:54 by mrts, 17 years ago

milestone: post-1.0

Not in scope for 1.0.

comment:55 by paulforrester, 16 years ago

Cc: django@… added

comment:56 by flosch, 16 years ago

Cc: flori@… added

comment:57 by Joey Wilhelm, 16 years ago

Cc: tarkatronic@… added

What is the current status on this issue? There doesn't appear to have been any visible movement in several months now, and this is something which would be very useful in a project I'm working on. I have a slightly different use case than the previously mentioned ones, however... and I'm sure it's not an entirely common one, but perhaps something to take into consideration.

The company I work for is sitting on a fairly large data warehouse for a number of clients. Each client has their own database, with a common schema, on one of two MS SQL 2k servers. In the near future, we will hopefully be moving some of these clients onto a PostgreSQL server (and in the distant future, it would be nice to move towards Greenplum; although that should be transparent with psycopg2, afaik). In addition, there is a separate database already running on the PostgreSQL server, which stores configuration parameters and which I am using for the Django system tables.

On every page load, I need Django to connect to database 'A' on the PostgreSQL server for any Django specific operations, then connect to an arbitrary database on either of the MS SQL servers, depending on which of our clients' data is currently being viewed. From what I've seen so far in the discussion in this ticket and on MultipleDatabaseSupport, this doesn't sound like it would likely be a supported scenario, as the connection to be used is determined dynamically, rather than a static connection chosen per model. In my case, all models apply across all database connections.

Now, I have made this work thus far with my own connection manager (http://dpaste.com/hold/84458/) but using it is somewhat kludgy and I'm sure it's far from a full implementation. I also had to apply a hack (http://code.google.com/p/django-pyodbc/issues/detail?id=18) to django-pyodbc and apply a pending patch (#6710) to Django in order to get it to work. Here is an example of why it in use: http://dpaste.com/hold/84465/

comment:58 by Malcolm Tredinnick, 16 years ago

There has, in fact, been lots of activity. Perhaps you might wish to peruse the archives of django-developers, which is where design work takes place. This ticket is really just a placeholder that will be closed when we commit some kind of final solution (and possibly for patches as we get closer if the work is self-contained enough not to require a branch).

comment:59 by anonymous, 16 years ago

Cc: alexandru.palade@… added

comment:60 by anonymous, 16 years ago

Cc: Jay Hargis added

comment:61 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:62 by lao, 16 years ago

Cc: lao added

comment:63 by lao, 16 years ago

(idea #10492)

My look is

# Wrapper classes
from re import compile

class Db:
    def __init__(self, cstr):
        self.link = None
        __re_dbstr = compile(r'^(?P<engine>[^:]+)://((?P<user>[^\(:]+)(\((?P<role>.+)\))?(:(?P<password>[^@]+))?(@(?P<host>[^\/:]+)(:(?P<port>\d+))?)?)?/(?P<name>.+)')
        try:
            self.__db = __re_dbstr.search(cstr).groupdict()
            # Fix for sqlite
            if self.__db['engine'].startswith('sqlite'):
                self.__db['name'] = "/%s" % self.__db['name']
        except:
            self.__db = {}
            raise Exception("#1")
        self.connect()

    def connect(self):
        self.link = 'Connection link'   # ....

    def keys(self):
       return self.__db.keys()

    def items(self):
       return self.__db.items()

    def __getitem__(self, key):
        return self.__db.get(key, None)

    def __getattr__(self, key):
        return self.__db.get(key, None)


class DbPool:
   def __init__(self):
       self.__dbs = {}
       self.__default = None

   def __getitem__(self, db_alias):
       if db_alias not in self.__dbs:
           raise Exception('#2')
       return self.__dbs[db_alias]

   def __getattr__(self, key):
       if not self.__default:
          raise Exception('#3')
       return self.__dbs[ self.__default ].__getattr__(key)

   def add(self, db_alias, db_str):
       self.__dbs[db_alias] = Db(db_str)
       if self.__default:
           return
       self.set_default(db_alias)

   def get_default(self):
       return self.__default

   def set_default(self, dbAlias):
       if dbAlias in self.__dbs:
           self.__default = dbAlias
       else:
           raise Exception("#4")

# Settings
DATABASES = {
    'alpha': 'sqlite3:///:memory:',
    'beta': 'sqlite3:///tmp/django.sqlite3',
    'gamma': 'mysql://user1/django',
    'delta': 'mysql://user1:password1/django',
    'default': 'mysql://user1:password1@host1/django',
    'reserv1': 'mysql://user1:password1@host1:1234/django',
    'reserv2': 'postgresql://user1(role1):password1@host1/django',
    'etc': 'postgresql://user1(role1):password1@host1:1234/django',
}
DATABASE_DEFAULT = 'default'
DATABASE_OPTIONS = {}
DATABASE_OPTIONS['etc'] = {
   'ssl':'...'
}
# Setting handlers
pool = DbPool()

for dbase in DATABASES.items():
    pool.add(*dbase)

pool.set_default(DATABASE_DEFAULT)
# Default database
print pool.engine, pool.host, pool.name

# Some other db
print pool['etc'].engine, pool['etc'].host, pool['etc'].name

# Models 
class SomeModel():
    # If not defined
    # meta_connections = [DATABASE_DEFAULT] 
    meta_connections = ['alpha', 'beta']

# API
# instead
# from django.db import connection
# connection.cursor()
#
# use
# from django.db import pool
# pool.link.cursor()
# pool['etc'].link.cursor() 

comment:64 by Alex Gaynor, 16 years ago

Owner: changed from darylspitzer to Alex Gaynor
Status: assignednew
Triage Stage: Someday/MaybeAccepted

This ticket is accepted as a part of the 2009 GSOC.

comment:65 by kimus, 16 years ago

I made a Multiple database manager for django-blocks see: http://django-blocks.googlecode.com/svn/trunk/src/python/blocks/apps/core/managers.py

just need to do something like this to your model:

    from django.db import models from blocks.apps.core.managers import MultiDBManager

    class SomeModel(models.Model):
        code = models.IntegerField(primary_key=True)
        name = models.CharField(max_length=250)

        objects = MultiDBManager()

        class Meta:
            db_name = 'oracle'

comment:66 by Alexander Koshelev, 16 years ago

Cc: Alexander Koshelev added

comment:67 by _wilane, 16 years ago

Cc: _wilane added

comment:68 by anonymous, 16 years ago

Cc: daemondazz added

comment:69 by eengbrec, 16 years ago

Cc: eengbrec added

comment:70 by eengbrec, 16 years ago

Cc: erik.engbrecht@… added; eengbrec removed

comment:71 by anonymous, 15 years ago

Cc: S2nek added

in reply to:  66 ; comment:72 by nicktank, 15 years ago

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1. It appears as though it appends the table name of the app in the database established in the project settings instead of adding the model's tables to the declared database

from django.db import models 
from django.blocks.apps.core.managers import MultiDBManager

class FireMap(models.Model):
    address = models.CharField(max_length=100)
    city = models.CharField(max_length=50)    
    zip = models.IntegerField()
    latitude = models.DecimalField(max_digits=17, decimal_places=14)
    longitude = models.DecimalField(max_digits=17, decimal_places=14)
    neighborhood = models.CharField(max_length=30)
    assesorID = models.IntegerField()
    assessorURL = models.URLField()
    homeValue = models.CommaSeparatedIntegerField(max_length=8)
    homeOwner = models.CharField (max_length=30)
    mainIMGURL = models.URLField()
    photos = models.CommaSeparatedIntegerField(max_length=40)    
    articleID1 = models.IntegerField()
    articleID2 = models.IntegerField()
    articleID3 = models.IntegerField()
    extrainfo = models.TextField()
    
    objects = MultiDBManager()
    
    def __str__(self):
        return self.name

    class Meta:
        db_name = 'firemap'

the setting for the Db in my project settings file is set to:

DATABASE_ENGINE = 'postgresql_psycopg2'           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = 'newsok'             # Or path to database file if using sqlite3.

When the tables are created I see in the newsok Db a firemap_firemap table (firemap is the name of the app as well) but there is nothing in the actual firemap Db.

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

in reply to:  72 comment:73 by Russell Keith-Magee, 15 years ago

Replying to nicktank:

Replying to alexkoshelev:

I've implemented this on a development version of django 1.1.

...

I apologize if this isn't the place to report this issue. I am new around here and to django in general.

Nick - as noted in comment 64, Multiple database connections was the subject of a 2009 Google Summer of Code project. Alex Gaynor has been developing the code to implement this. It's not a minor change - it requires lots of changes throughout Django. Check out Alex's GitHub repository if you want to see the progress he has made. It isn't quite ready for trunk yet, but it's getting close, and it's sufficiently functional that you can try it out.

comment:74 by Alex Gaynor, 15 years ago

If you want to see the current state of work please use my branch in soc2009/multidb in the Django repository, right now my GitHub repo is undergoing severe code alterations that make it high unstable.

comment:75 by xdrum, 15 years ago

Cc: swapon@… added

comment:76 by bioform, 15 years ago

Cc: andrew.krasnoff@… added

comment:77 by anonymous, 15 years ago

Cc: david@… added

comment:78 by Robin Breathe, 15 years ago

Cc: Robin Breathe added

comment:79 by Alex Gaynor, 15 years ago

Resolution: fixed
Status: newclosed

Fixed in r11952 :D

comment:80 by Jay Hargis, 15 years ago

Cc: erik.engbrecht@… gary.wilson@… farcepest@… eikke@… peter.kese@… chris.mcavoy@… ferringb@… iacobs@… andrewjwilkinson@… email@… daryl.spitzer@… remco@… mzdaniel@… rik24d@… stuartw@… jholloway7+django@… simeonf+django@… knyght+django@… spoof@… django@… flori@… tarkatronic@… alexandru.palade@… Jay Hargis lao Alexander Koshelev _wilane daemondazz S2nek swapon@… andrew.krasnoff@… david@… Robin Breathe removed

I know posts should be focused on ticket problems or solutions, but I can't help myself.

thank you so much! what a great Christmas gift!

Note: See TracTickets for help on using tickets.
Back to Top