Opened 14 years ago

Closed 11 years ago

Last modified 8 years ago

#16187 closed Cleanup/optimization (fixed)

Refactor lookup system

Reported by: Ulrich Petri Owned by: Anssi Kääriäinen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: charette.s@…, sfllaw@…, FunkyBob, mpessas@…, loic@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

There are a few tickets that propose adding new lookup types [1][2][3][4] and
also one [5] that suggests adding the possibility for custom fields to have the
ability to add their own lookup types.

The current implementaion is not able to distinguish between field names and
lookup types when querying through a relation.

I briefly talked about this topic with Russel, Janis and Alex at the
djangocon.eu 2011 sprints. The consensus was that it would be worthwile to do a
refactor in this area. It would also allow django to have "awesome fields" as
Alex called it :)

I'm starting this ticket to track work in this area and also "bundle" the other
related tickets. I'll also attach a patch with a basic testcase that fails with
the current implementation.

[1] #8424
[2] #9596
[3] #10911
[4] #12489
[5] #14263

Attachments (6)

tests-t16187.patch (3.5 KB ) - added by Ulrich Petri 14 years ago.
tests-t16187-2.patch (3.5 KB ) - added by Ulrich Petri 14 years ago.
t16187-proof-of-concept.patch (2.4 KB ) - added by Ulrich Petri 14 years ago.
16187.lookup-refactor.diff (6.5 KB ) - added by Julien Phalip 13 years ago.
16187_uqly.diff (12.4 KB ) - added by Anssi Kääriäinen 13 years ago.
An alternate way to resolve the lookup.
lookup_hstore_test.tar.gz (8.0 KB ) - added by Anssi Kääriäinen 11 years ago.
An example project with hstore lookups

Download all attachments as: .zip

Change History (47)

by Ulrich Petri, 14 years ago

Attachment: tests-t16187.patch added

comment:1 by Russell Keith-Magee, 14 years ago

Triage Stage: UnreviewedAccepted

by Ulrich Petri, 14 years ago

Attachment: tests-t16187-2.patch added

comment:2 by Ulrich Petri, 14 years ago

New tests-patch. There was a small error in the old one

comment:3 by Ulrich Petri, 14 years ago

Has patch: set
Owner: changed from nobody to Ulrich Petri
Patch needs improvement: set

The t16187-proof-of-concept patch - as the name says - is a (quick and dirty) proof of concept (the testsuite passes, including those tests in tests-t16187-2.patch)

This means that fields may be named like lookup types and new lookup types can be added at will without backwards compatibility concerns.

This patch definitely needs improvement / cleanup before its RFC.

by Ulrich Petri, 14 years ago

comment:4 by Julien Phalip, 13 years ago

See also #11670 which tackles the conflicts that already exist with the 'year', 'month' and 'day' lookups.

by Julien Phalip, 13 years ago

Attachment: 16187.lookup-refactor.diff added

comment:5 by Julien Phalip, 13 years ago

I've updated the patch to avoid some test failures with foreign keys by adding the following special case:

if len(part) > 3 and part[-3:] == '_id'
    field, _, _, _ = model._meta.get_field_by_name(part[:-3])
...

That feels a bit hacky but it works.

I'd really like to push this over the line for 1.4 as it's causing several issues in the ORM and the admin. So I'm wondering: What is the roadmap for this? How could the current patch be improved? What API is considered to make the lookups configurable?

I'm a total noob in regards to the ORM codebase but I'm keen to help out under some supervision :)

comment:6 by Anssi Kääriäinen, 13 years ago

Not this ticket's problem, but it would make sense to do even bigger refactor in the direction that first do a traversal of the lookup path and determine if there are multi-joins in the path, raise errors if the path is not valid etc, and only after that do setup_joins with the valid path. This ticket's patch is one step in that direction, because this already traverses the lookup path upfront. The next step would be to check errors already at this point and use the info gathered during the traversal for setup_joins calls.

That being said, I will try to review (as in point out what needs improvement in my opinion) tomorrow.

comment:7 by Alex Gaynor, 13 years ago

So here's my idea for how custom field lookups should work (not necessarily related to this ticket):

(1) First the Query identifies the root object of the lookup, this should either be a field, or an aggregate I think.

(2) Then, it traverses the remaining bits, at each section it calls a method on the return value from either part 1, or the previous item in part two. This method should return a representation of the next step, so for example an fkey__field would return something that says "I am a field traversal", and it could raise some exception to bail out (in particular this should be used on the final step, in order to indicate that an __ lookup like __lt should use the comparison operator).

(3) Finally the last step of this should add itself to the query. In theory this calls for an even bigger refactor, because really it should add itself to the compiler at the very end, but I'm not sure if that's possible ATM, and it shouldn't block this happening.

comment:8 by Anssi Kääriäinen, 13 years ago

I didn't have time to do the promised review before. But here it is:

I think the big issue is how to restructure the code in a way that there is not duplication of the lookup path resolving code. Currently you need to first walk the path in resolving the final field and lookup part of the path, and then again in setup_joins(). This means you have to do pretty much the same code in two places, which is IMHO not good. This is not an easy issue to solve, as setup_joins refactoring is needed, and that piece of code isn't the easiest to refactor.

I think there are some mistakes in the patch:

  • IMHO it would be good to have the lookup path resolving done in different method than add_filter, if for no other reason than readability.
  • I don't like this logic:
    for ...:
        try:
            ...
            try:
                ...
            except FieldDoesNotExist:
                if really_not_exists:
                    raise
            ...
        except FieldDoesNotExist:
            do stuff
    
    Related to that, you are looking for aggregates only after you have walked the path, this results in Foo.objects.filter(id__someaggregate__exact=1) being valid. I think the aggregate resolving must be done upfront.

As it happens, I had some code related to one of my projects lying around which could be reused for this. I have attached it as a patch. It does more than is needed (it actually tries to resolves the joins needed), and because of that there is a lot of duplication of code to setup_joins. But it does pass the test suite (well, there is one test failing, but it not failing before is a bug in current implementation). I hope it gives some ideas about how to solve this issue. It is incomplete and downright uqly in places, though.

I don't understand the last step's problem Alex mentioned above. Why can't you just add the lookup normally into the where / having clause, and resolve it into SQL in the compiler in the normal way:

# compiler.py:72-73
       where, w_params = self.query.where.as_sql(qn=qn, connection=self.connection)
       having, h_params = self.query.having.as_sql(qn=qn, connection=self.connection)

Or is the idea that the lookup can return something else than things going into the where / having parts of the query?

Last: I had some problems making this work with GenericRelations. Is there some reason why they are using ManyToManyRel instead of ManyToOneRel? That seems just wrong...

by Anssi Kääriäinen, 13 years ago

Attachment: 16187_uqly.diff added

An alternate way to resolve the lookup.

comment:9 by Julien Phalip, 13 years ago

akaariai, I agree that it'd be nice to avoid multiple traversals of the lookup query and that, to do so, some big-ish refactoring would be required. In the meantime, I have posted a patch (with tests) in #11670 with a slightly simpler algorithm than the ones originally posted here and there. That patch aims to fix the possible collisions between genuine field names and built-in lookup types.

I'm not sure whether that bug should be fixed in priority or if it should wait for the general refactoring to happen.

comment:10 by Julien Phalip, 13 years ago

I've done some further refactoring in the latest patch for #11670 (see specifically ticket:11670#comment:15) following akaariai's suggestion to modify setup_joins in comment:8. Some help pushing that one over the line would be appreciated.

comment:11 by Anssi Kääriäinen, 12 years ago

I have done some initial work for custom lookups (and refactoring of the lookup system in general) - work available from https://github.com/akaariai/django/compare/ticket_10790_new...ticket_16187 (this is based on work done in #10790). The commit messages are worth a read.

The modeltests/field_subclassing/* files shows one trivial custom lookup. However, it should be possible to define complex lookups too.

All tests pass.

That was the positive news. The negative news are:

  • The where.py / fields.py / lookups.py interaction is a total mess (it was messy before, now it is a bit worse).
  • The API of Lookup is totally unfinished. We need to think how Lookup and Field interact (Lookup defines boolean filter conditions, Field datatypes)
  • There should be easier way to add lookups to existing fields than subclassing SomeField.register_lookup comes to mind here.
  • There should be (maybe?) better ways to write multibackend compliant Lookups.
  • There is just one core lookup "BackwardsCompatLookup". All existing lookups should be defined as different Lookup subclasses.
  • Probably a lot more...

Cleaning the mess in where.py & friends can be done later. What I did doesn't make it much worse than what it is already. The cleanup and core lookup rewrite is going to take some time, but it should not be particularly hard to do.

The API however is going to be somewhat hard to define. What does Field do, what does Lookup do. Currently Field still knows the lookup types, and we definitely want to change that. How to do that in a way that is even somewhat backwards compatible is going to be interesting job.

Still, I would say that the hard parts are now done.

comment:12 by Simon Charette, 12 years ago

Cc: charette.s@… added
Version: 1.3master

comment:13 by Anssi Kääriäinen, 12 years ago

Just a little bit more about the Lookup <-> Field interaction problem. Currently, value preparation for lookups is done by get_prep_lookup and get_db_prep_lookup. These are documented and thus public API methods. These do things like convert the given value to proper format for icontains queries ('foo' to 'foo%'). The base implementation of these methods is a long if-else chain listing different lookup names and then passing the actual conversion to the connection used.

We likely would want to move these methods to Lookup. They actually are already in the current Lookup but the implementation just passes the value onto the Field. But if we want to stop passing the implementation to the field, then we are breaking backwards compatibility. We could perhaps detect if the given field has overridden get_db_prep_lookup and get_prep_lookup, if so, pass to field and raise deprecation warning. This is likely going to be somewhat complex... For one we have to still define these methods in Django's core fields as otherwise subclasses calling super() will break...

I will next go through some of the basic lookup types to see how much work converting one lookup is. Will report back...

comment:14 by Anssi Kääriäinen, 12 years ago

I have now something that seems to be fully working: https://github.com/akaariai/django/compare/ticket_16187

The above branch contains #10790 changes, too. After I commit #10790 changes I am going to rebase the above branch on master.

There is a lot to do still:

  • API design (The Lookup <-> Field interface and the Backend <-> Lookup interface, how to register new lookups per field)
  • How to implement backwards compatibility (the above branch should be 100% backwards compatible, it essentially contains both the new and old code...).
  • Cleaning up the implementation
  • ...

Adding new lookups seems to be somewhat easy, though it requires subclassing. Having the ability to hook a custom lookup to a given field instance would be really nice to have...

The new code seems to be a little bit faster for WhereNode generation (around 5% faster query execution speed for "complex" filter clause).

An example of a custom lookup can be found from the tests: https://github.com/akaariai/django/compare/ticket_16187#L16R75

In summary: a lot of work to do still but this feature will likely be in 1.6.

comment:15 by Simon Law, 12 years ago

Cc: sfllaw@… added

comment:16 by Anssi Kääriäinen, 12 years ago

#10790 just went in, rebased branch available from https://github.com/akaariai/django/compare/ticket_16187_new

The commit history is ugly... but the basic status is that the above branch does implement custom lookups in barckwards compatible way. The next steps would be to convert the GIS ORM to use the custom lookups, and then to try to implement things like datefield__year__2000=1.

After that the big question is how exactly to handle backwards compatibility. Otherwise I don't see currently any major problems for this feature.

comment:17 by Anssi Kääriäinen, 12 years ago

I have rebased the work of ticket_16187_new on current master, and I have also created a branch which cherry-picks moving the names_to_path call into add_filter(). This is a needed change in any case, so that we can handle the split_exclude() case properly (currently an F() expression is added two times into the query when doing a split_exclude()).

cherry-pick available from: https://github.com/akaariai/django/compare/django:master...ticket_16187_cherrypick

I am not entirely happy with the cherry-pick branch, but it is IMO move into right direction. So, I will likely commit the cherry-pick into master as separate (rebased) commit from the rest of stuff.

comment:18 by Anssi Kääriäinen, 12 years ago

I implemented one nested lookup, year__somelookup, where somelookup is a lookup supported by IntegerField.

I also converted django-hstore to work with the new lookup code, and added some nested lookups to hstore field. See this django-developers post for more details.

comment:19 by Anssi Kääriäinen, 12 years ago

The following things now work: using .values(), .values_list(), aggregates (.annotate() and .aggregate()), .order_by(), distinct(*fields) and F() expressions with custom lookups. This means one can do things like:

  Book.objects.order_by('pub_date__year')

This should be very useful when used with hstore or other similar fields. Consider something like:

  SportActivity.objects.filter(type=RUNNING).aggregate(Avg('data__distance'))

where data is json field for example. In this example you store different types of sport activities in a DB table with some common fields, and store the details in the "data" field. But you are still able to access the details through the ORM.

I have update django-hstore (available from https://github.com/akaariai/django-hstore/compare/custom_lookups), it contains now one .order_by() and one .aggreagte() example which drills into the hstore.

As an example from the tests:

assertEqual(DataBag.objects.aggregate(avg=Avg('data__v__asint')), {'avg': 1.5})

generates this SQL:

    SELECT AVG(("app_databag"."data" -> %s)::INTEGER) AS "avg" FROM "app_databag", ['v']

The current Django code can be found from a new branch: https://github.com/akaariai/django/tree/nested_lookups.

comment:20 by Anssi Kääriäinen, 12 years ago

I think I know how to implement the nested lookups. Here is the idea: A lookup is something that has get_lookup() method if it supports nesting. Nesting is supported for those lookups that transform the column value in some way (for example, __year lookup will extract the year from the column). These lookups will respond to as_sql() method. Similarly fields have get_lookup() method. Finally a new class Column is introduced, this too responds to as_sql() method.

So for example mydatefield__year__exact=2000 would resolve as follows:

  1. resolve field (mydatefield from table "mytable" found).
  2. field.get_lookup('year') -> resolves to YearLookup instance which is given a Column("mytable", "mydatefield") as lhs value.
  3. resolve 'exact' from the 'year' lookup. Will return an ExactLookup which is given the YearLookup as lhs value.
  4. as the ExactLookup is a final lookup, it will be given 2000 as a value (set_lookup_value() maybe?).
  5. The ExactLookup is added to WhereNode as is (no Constraint objects, just plain where.add(found_lookup, connector)).

When transforming the value to SQL at compiler stage the following things happen:

  1. The ExactLookup.as_sql() is called.
  2. The ExactLookup will call lhs.as_sql(), thus the YearLookup as_sql() is called.
  3. The YearLookup.as_sql() is called, this will again call the given Column's as_sql() and resolve to "mytable"."mydatefield".
  4. The YearLookup will wrap the "mytable"."mydatefield" to EXTRACT 'YEAR' FROM "mytable"."mydatefield".
  5. The ExactLookup will return EXTRACT 'YEAR' FROM "mytable"."mydatefield" = %s, (2000,).

So, a lookup .as_sql() works in two ways depending if the field has been given a value or not - it either does a transformation of the value (cast, extract, ...), or it produces boolean condition. There are a lot of details to make this work correctly. How to treat SQLEvaluators, subqueries, many special cases like isnull checks, how to handle nullability - certain lookups can transform non-null values to non-nullable values, how to handle return type information - from Column through lookup transformations, what is the final type? And so on...

Still, the above idea should have a solid foundation for making things work cleanly. For backwards compatibility I think the best approach is to have the current code as much as-is somewhere, and use that for the backwards compatibility period. The main problem is field.get[_db]_prep_lookup() methods which needs to be called both for new code and old code... Lets see how to make that work nicely.

comment:21 by FunkyBob, 12 years ago

Cc: FunkyBob added

comment:22 by Anssi Kääriäinen, 11 years ago

Owner: changed from Ulrich Petri to Anssi Kääriäinen
Status: newassigned

I am working on this again. I decided to start from implementing current base lookups as classes. The results can be found from: https://github.com/akaariai/django/tree/custom_lookups

Before there is full support for custom lookups, the following things need to be done:

  • Add a way to nest lookups. There is some work on that already in the above branch. This shouldn't be too hard.
  • Backwards compatibility - Maybe just returning None from a field's get_lookup() could enable backwards compat mode. This mode should work exactly like before, so those who have overridden WhereNodes etc will still get a working implementation.
  • Add a way to register lookups to fields (I think either instances or classes is wanted behaviour).
  • Add the ability to use "extracts" in .values(), .order_by() etc. The basic idea is that a Lookup itself can't be used in such situations, you will need to write an Extract class.
  • Add ability to ask if given lookup ('myfield__customlookup') is allowed for given queryset - admin needs this. I think adding QuerySet.is_allowed_lookup(somelookup) is the way to go.
  • Of course, also docs & tests needed.

Of the above the field registration API is the biggest question.

Last edited 11 years ago by Anssi Kääriäinen (previous) (diff)

comment:23 by Anssi Kääriäinen, 11 years ago

I just force-pushed an update to https://github.com/akaariai/django/tree/custom_lookups. I am going to use force-push for some time still.

The basic chained lookup functionality is there, but the implementation is far from finished (the code needs heavy cleanup, and additional features are needed, too).

From the TODO list in above comment nesting and registering lookups are implemented. Backwards compatibility should be there, but I haven't tested that too much. Extract and lookup_exists() functionality is still on TODO list, and no docs written yet. There are a couple of failing test cases related to error handling. The error messages will likely need to be changed anyway, so I am going to fix these later on.

So, work is progressing, but there is still a lot to do.

comment:24 by Anssi Kääriäinen, 11 years ago

I have done some investigation into how GIS works, particularly how GIS transforms select columns. It is obvious there is room for big cleanup in there (see gis compiler, and its overridden get_columns/get_default_columns). Unfortunately this means a lot of work, but it seems this is worth it. The get_*_columns() methods are very much non-DRY. It should be possible to make GIS a lot less dependent on private ORM APIs.

Another possible painpoint is that query.select and query.related_select_cols will get a totally new signature. If subclassers override these, it will be hard to guarantee any form of backwards compatibility. It is possible to keep backwards compat paths, too, but this needs to be done in multiple places. The end result will not look pretty good. It might be enough to add some way to override the SQL output for columns. I have thought of optional connection.col_as_sql() method which can be used for things like django-nonrel.

Still one point: the new way of generating SQL for select queries will be a bit slower than the current way, maybe in the range of 2-10% for common query types. I think this performance penalty is acceptable given what will be gained. Also, I think the new way will be a bit easier to optimize (for example, there is a possibility to cache both Col instances, and their output when tables aren't aliased).

comment:25 by Anssi Kääriäinen, 11 years ago

The changes are cascading a lot.

Current status of the patch (https://github.com/akaariai/django/compare/custom_lookups) is that it not only rewrites how lookups are implemented. It also rewrites how SELECT clause generation works, and how values are converted from db values to Python equivalents. The last change simplifies a lot of GIS and .dates()/.datetimes() queries. There isn't a lot of work to do to remove GeoQuery and GeoCompiler. Even now gis needs a lot less hacks to work.

Actually I think a good goal is to make GIS as much as possible public-API. If that can be done, then it is a strong signal that things are working as expected.

I believe the changes could be really tough for those who subclass SQLCompiler liberally (think django-nonrel).

comment:26 by Marc Tamlyn, 11 years ago

A hack-less gis is definitely a strong indicator that this is going in a good direction. I think there will be a relatively small number of projects subclassing SQLCompiler so I'm not massively worried if we break them, so long as the resulting code will be easier for projects like that to work with in the future. What's your instinct about the possibilities of nonrel with this new style?

comment:27 by Anssi Kääriäinen, 11 years ago

I think django-nonrel should be possible. It might be harder to do (maybe even a lot harder). Currently you can overwrite compiler and wherenode implementations for nonrel and you are done. The new way requires also rewrite of lookups and columns. This should be possible with added hooks (there is currently a dummy hook for lookup rewriting, but no hook for column rewriting). In addition I believe you still need to overwrite wherenode and compiler.

It must be said that it is hard for me to see if nonrel is possible to write - I haven't done any work on nonrel...

I have once again force-updated the branch. The select clause handling should be a lot better now. A lot of select clause generation has changed - query.select, extra and aggregates are gone, they are all collapsed into custom_select OrderedDict. Unfortunately this way is also a bit slower than current master for some use cases. Interestingly enough the reason seems to be OrderedDict creation. Python's OrderedDict implementation is just slow... Otherwise, there is a lot of API cleanup (no rewriting, just cleanup) and then docs + more tests.

comment:28 by Anssi Kääriäinen, 11 years ago

A lot more work done. Biggest change is that now there is support for .alias(somecol=SomeSQLProducingObject) and .annotate(somecol=SomeSQLProducingObject). These can be used with other queryset operations. Best way to see what this means is to look at custom_lookups/tests.py - there are some tests for ARRAY_AGG and case when ... then ... end structures. Another case that uses the .annotate() feature is dates() and datetimes() implementations.

comment:29 by Anssi Kääriäinen, 11 years ago

Recent developments: GeoWhereNode, GeoSQLCompiler and GeoQuery are gone.

GeoWhereNode was converted to GeoLookup. GeoLookup could be improved a lot, but for now on just doing exactly what the old code seems good enough. In general the new custom lookup and custom annotations features could be used to make the implementation of GIS a lot better (from ability to nest lookups to plain bugfixing).

I went into some trouble to remove all usage of GeoSQLCompiler and GeoQuery. The compiler subclass resulted in lots of clutter in different DB backends, and there wasn't almost anything left in any case. The GeoQuery removal was mostly motivated by principle of "least private API usage".

I am now waiting for composite fields implementation to get in to master. It should make some parts of custom_lookups code simpler (mainly in Query.build_lookup()). After that the API still needs to be cleaned up and documented. Finally I need to check how 3rd party SQL backends or non-SQL backends can adapt to the code.

comment:30 by Marc Tamlyn, 11 years ago

Anssi - I'd like to review this as and when you're ready, but I'm not particularly familiar with how things worked before. I feel it would be a good exercise for me getting to know the ORM a bit more, as well as hopefully a useful fresh set of eyes. It's not a small patch, so any guidance you can give as to where to start would be greatly appreciated.

comment:31 by Anssi Kääriäinen, 11 years ago

Just take simple queries and see how they are built and executed by the ORM.

Try to pick one area to concentrate at a time. For the lookup refactor checking where clause entries generation and execution is naturally important. The basic path for generating a clause is qs.filter() -> _filter_or_exclude() -> query.add_q() -> query._add_q() -> query.build_filter() -> query.setup_joins(). At execution time the query.where clause is converted to SQL by WhereNode.as_sql() (and WhereNode.make_atom() as part of that).

Try to ignore HAVING clause generation at start. If you wonder why something is needed, change it and run queries tests - that test module alone covers most of ORM features. The broken tests should tell you why any particular part is needed.

comment:32 by Apostolis Bessas, 11 years ago

Cc: mpessas@… added

comment:33 by loic84, 11 years ago

Cc: loic@… added

comment:34 by Anssi Kääriäinen, 11 years ago

I have done another implementation of custom lookups. I intentionally did *not* implement support for using nested lookups outside of filtering. That is .order_by('datefield__year') isn't supported. The reason for this decision is that implementing that in one go leads to a huge patch (the previous try was nearing 2000 lines of ORM code changes).

I think the code portions of the patch are actually pretty much ready. The biggest things are adding deprecation warnings for old-style lookups, and ensuring Django doesn't use those anywhere. In addition admin needs some work so that custom lookups are usable in admin, too. There is some code messiness inside sql/query.py due to the support for both old-style and new-style lookups. I don't think that needs to be cleaned up before deprecation removal.

The current work is available from https://github.com/akaariai/django/compare/lookups_3. The docs additions contain examples of how to write custom lookups and extracts.

The API feels correct to me now. 3rd party backends can customize SQL as needed, and writing simple extracts is straightforward. Complex cases can be handled, too. The docs of course need a lot of rewriting. But I am hopeful that we can get this into Django soon. Please help by testing the branch!

comment:35 by Anssi Kääriäinen, 11 years ago

I did a little more work on the API. I am confident it is general enough yet simple enough now.

I created a quick-and-dirty HStoreField implementation, I'll attach a ready to run project. Quick instructions: edit database settings to point to a db with hstore extension installed, syncdb and run python tester.py. The interesting bits are in testproject/hstore.py. I'll paste that portion in here:

from django.db.models import TextField, Lookup, Extract

# This one implements "get value of key X from the hstore". The
# value is extracted from the hstore, and so further lookups on
# the value are possible. The extracted values are of type text
# in the DB, so output_type = TextField().
class HStoreExtract(Extract):
    output_type = TextField()

    def as_sql(self, qn, connection):
        lhs, params = qn.compile(self.lhs)
        # extracts know the lookups used for creating them,
        # so init_lookups[0] gives us the 'somekey' in
        # hstorefield__somekey='foo'
        params.append(self.init_lookups[0])
        return '%s -> %%s' % (lhs), params


# This one implements "does hstore contain key X". This is a
# a lookup on the hstore field itself.
class HStoreContains(Lookup):
    lookup_name = 'hcontains'

    def as_sql(self, qn, connection):
        lhs, params = self.process_lhs(qn, connection)
        rhs, rhs_params = self.process_rhs(qn, connection)
        params.extend(rhs_params)
        return '%s ? %s' % (lhs, rhs), params


# Main HStoreField
class HStoreField(TextField):
    # Following two methods needed for createdb & save support.
    def db_type(self, connection):
        return 'hstore'

    def get_db_prep_save(self, value, connection):
        data = []
        for k, v in value.items():
            data.append('%s=>%s' % (k, v))
        return ','.join(data)

    def get_lookup(self, lookup):
        found_lookup = None
        # It is possible to rename Django's own lookups so that conflicts
        # between common names (contains, exact for example) can be avoided.
        if lookup.startswith('dj_'):
            # If it started with 'dj_' use Django's own lookups
            found_lookup = super(HStoreField, self).get_lookup(lookup[3:])
        # If it wasn't something existing, then interpret as extract key from
        # hstore
        if found_lookup is None:
            return HStoreExtract
        else:
            return found_lookup
HStoreField.register_lookup(HStoreContains)

Using HStoreField it is possible to do queries like qs.filter(hstoredata__somekey__icontains='foobar').

The test project contains some examples which go a bit further.

Now, docs need rewriting and this one is RFC.

by Anssi Kääriäinen, 11 years ago

Attachment: lookup_hstore_test.tar.gz added

An example project with hstore lookups

comment:36 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 20bab2cf9d02a5c6477d8aac066a635986e0d3f3:

Fixed #16187 -- refactored ORM lookup system

Allowed users to specify which lookups or transforms ("nested lookus")
are available for fields. The implementation is now class based.

Squashed commit of the following:

commit fa7a7195f1952a9c8dea7f6e89ee13f81757eda7
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 18 10:53:24 2014 +0200

Added lookup registration API docs

commit eb1c8ce164325e0d8641f14202e12486c70efdb6
Author: Anssi Kääriäinen <akaariai@…>
Date: Tue Jan 14 18:59:36 2014 +0200

Release notes and other minor docs changes

commit 11501c29c9352d17f22f3a0f59d3b805913dedcc
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Jan 12 20:53:03 2014 +0200

Forgot to add custom_lookups tests in prev commit

commit 83173b960ea7eb2b24d573f326be59948df33536
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Jan 12 19:59:12 2014 +0200

Renamed Extract -> Transform

commit 3b18d9f3a1bcdd93280f79654eba0efa209377bd
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Jan 12 19:51:53 2014 +0200

Removed suggestion of temporary lookup registration from docs

commit 21d0c7631c161fc0c67911480be5d3f13f1afa68
Merge: 2509006 f2dc442
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Jan 12 09:38:23 2014 -0800

Merge pull request #2 from mjtamlyn/lookups_3

Reworked custom lookups docs.

commit f2dc4429a1da04c858364972eea57a35a868dab4
Author: Marc Tamlyn <marc.tamlyn@…>
Date: Sun Jan 12 13:15:05 2014 +0000

Reworked custom lookups docs.

Mostly just formatting and rewording, but also replaced the example
using YearExtract to use an example which is unlikely to ever be
possible directly in the ORM.

commit 250900650628d1f11beadb22814abd666029fb81
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Jan 12 13:19:13 2014 +0200

Removed unused import

commit 4fba5dfaa022653ffa72497258ffd8f8b7476f92
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 22:34:41 2014 +0200

Added docs to index

commit 6d53963f375c77a1f287833b19b976d23f36c30b
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 22:10:24 2014 +0200

Dead code removal

commit f9cc0390078e21f1ea5a7bc1f15b09f8f6b0904d
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 19:00:43 2014 +0200

A new try for docs

commit 33aa18a6e3c831930bda0028222a26f9c1d96e66
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 14:57:12 2014 +0200

Renamed get_cols to get_group_by_cols

commit c7d5f8661b7d364962bed2e6f81161c1b4f1bcc3
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 14:45:53 2014 +0200

Altered query string customization for backends vendors

The new way is trying to call first method 'as_' + connection.vendor.
If that doesn't exist, then call as_sql().

Also altered how lookup registration is done. There is now
RegisterLookupMixin class that is used by Field, Extract and
sql.Aggregate. This allows one to register lookups for extracts and
aggregates in the same way lookup registration is done for fields.

commit 90e7004ec14e15503f828cc9bde2a7dab593814d
Merge: 66649ff f7c2c0a
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 13:21:01 2014 +0200

Merge branch 'master' into lookups_3

commit 66649ff891c7c73c7eecf6038c9a6802611b5d8a
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Jan 11 13:16:01 2014 +0200

Some rewording in docs

commit 31b8faa62714b4b6b6057a9f5cc106c4dd73caab
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 29 15:52:29 2013 +0200

Cleanup based on review comments

commit 1016159f34674c0df871ed891cde72be8340bb5d
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 28 18:37:04 2013 +0200

Proof-of-concept fix for #16731

Implemented only for SQLite and PostgreSQL, and only for startswith
and istartswith lookups.

commit 193cd097ca8f2cc6a911e57b8e3fb726f96ee6a6
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 28 17:57:58 2013 +0200

Fixed #11722 -- iexact=F() produced invalid SQL

commit 08ed3c3b49e100ed9019831e770c25c8f61b70f9
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 21 23:59:52 2013 +0200

Made Lookup and Extract available from django.db.models

commit b99c8d83c972786c6fcd0e84c9e5cb08c1368300
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 21 23:06:29 2013 +0200

Fixed review notes by Loic

commit 049eebc0703c151127f4f0265beceea7b8b39e72
Merge: ed8fab7 b80a835
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 21 22:53:10 2013 +0200

Merge branch 'master' into lookups_3

Conflicts:

django/db/models/fields/init.py
django/db/models/sql/compiler.py
django/db/models/sql/query.py
tests/null_queries/tests.py

commit ed8fab7fe8867ff3eb801c3697a426478387bb2f
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Dec 21 22:47:23 2013 +0200

Made Extracts aware of full lookup path

commit 27a57b7aed91b2f346abc4a77da838bffa17c727
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 1 21:10:11 2013 +0200

Removed debugger import

commit 074e0f5aca0572e368c11e6d2c73c9026e7d63d7
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 1 21:02:16 2013 +0200

GIS lookup support added

commit 760e28e72bae475b442b026650969b0d182dbe53
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 1 20:04:31 2013 +0200

Removed usage of Constraint, used Lookup instead

commit eac47766844b90e7d3269e7a8c012eee34ec0093
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 1 02:22:30 2013 +0200

Minor cleanup of Lookup API

commit 2adf50428d59a783078b0da3d5d035106640c899
Author: Anssi Kääriäinen <akaariai@…>
Date: Sun Dec 1 02:14:19 2013 +0200

Added documentation, polished implementation

commit 32c04357a87e3727a34f8c5e6ec0114d1fbbb303
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Nov 30 23:10:15 2013 +0200

Avoid OrderedDict creation on lookup aggregate check

commit 7c8b3a32cc17b4dbca160921d48125f1631e0df4
Author: Anssi Kääriäinen <akaariai@…>
Date: Sat Nov 30 23:04:34 2013 +0200

Implemented nested lookups

But there is no support of using lookups outside filtering yet.

commit 4d219d4cdef21d9c14e5d6b9299d583d1975fcba
Author: Anssi Kääriäinen <akaariai@…>
Date: Wed Nov 27 22:07:30 2013 +0200

Initial implementation of custom lookups

comment:37 by Tim Graham <timograham@…>, 10 years ago

In 8e435a564034c59ac408ec71283d8ac6ede2ce1f:

Added deprecation docs for legacy lookup support; refs #16187.

comment:38 by Tim Graham <timograham@…>, 10 years ago

In 0b3e3e21cfe7463245dc1dfc9258e18903bc1116:

[1.8.x] Added deprecation docs for legacy lookup support; refs #16187.

Backport of 8e435a564034c59ac408ec71283d8ac6ede2ce1f from master

comment:39 by Tim Graham <timograham@…>, 10 years ago

In bb2b4acc7a8435e204b772954995487e87866ef9:

[1.7.x] Added deprecation docs for legacy lookup support; refs #16187.

Backport of 8e435a564034c59ac408ec71283d8ac6ede2ce1f from master

comment:40 by Tim Graham <timograham@…>, 10 years ago

In 5008a4db440c8f7d108a6979b959025ffb5789ba:

Removed legacy ORM lookup support per deprecation timeline; refs #16187.

comment:41 by Simon Charette <charette.s@…>, 8 years ago

In 9ae4362b:

Refs #16187 -- Stopped compiling query compilers during lookup rhs processing.

Lookup right hand side processing was compiling query compilers which happened
to work by chance as SQLCompiler defines a as_sql() method with two optional
parameters albeit it doesn't expect the same type of arguments.

Note: See TracTickets for help on using tickets.
Back to Top