Opened 10 years ago

Closed 10 years ago

Last modified 6 years ago

#23423 closed New feature (fixed)

Integrate unaccent lookups in django.contrib.postgres

Reported by: Thomas C Owned by: Thomas C
Component: contrib.postgres Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Thomas C)

After the introduction of the Lookup API in Django 1.7, I believe that django.contrib.postgres could benefit from some custom lookups based on the unaccent extension of PostgreSQL (see http://www.postgresql.org/docs/9.3/static/unaccent.html). That kind of feature would be very helpful for a lot of people in latin countries, and the existing solutions seems not very clean (and do not make use of the new API).

Change History (12)

comment:1 by Thomas C, 10 years ago

Description: modified (diff)
Has patch: set

I tried something using Tranform, as it looks like it was made for that kind of things: https://github.com/tchaumeny/django/commit/cb3d7808c20094636bacb0576bc738ee466d256a

comment:2 by Aymeric Augustin, 10 years ago

Marc, do you think this belong to django.contrib.postgres, or would you prefer to push it to a third-party app?

Either option would work for me.

comment:3 by Thomas C, 10 years ago

I should mention that I'm not quite satisfied with the implementation in the commit above. The current lookup API in Django does not allow for real accent insensitive lookup. More generally, it does not allow for any *SQL transformation*-insensitive search as the transformation is only applied to the left-hand side of the lookup, never to the right side.

I discussed that with akaariai and made a proposition for bilateral transformations, which is tracked in #23493, which I believe is a prerequisite for implementing robust accent-insensitive lookups in Django.

Version 0, edited 10 years ago by Thomas C (next)

comment:4 by Marc Tamlyn, 10 years ago

Sounds like a reasonable addition to d.c.postgres to me. I agree it needs a bilateral transform, which also sounds like a good feature.

comment:5 by Aymeric Augustin, 10 years ago

Triage Stage: UnreviewedAccepted

comment:6 by Tim Graham, 10 years ago

Needs documentation: set
Patch needs improvement: set

Marking as "patch needs improvement" (pending bilateral transform) and "needs documentation".

comment:7 by Thomas C, 10 years ago

Owner: set to Thomas C
Status: newassigned

comment:8 by Simon Charette, 10 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Documentation proofreading from a native speaker wouldn't hurt but the patch looks good to me.

comment:9 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: assignedclosed

In 17fe0bd808a47f37dd1351adb01a8ad2cc852f24:

Fixed #23423 -- Added unaccent lookup in django.contrib.postgres

comment:10 by Gabor Körber, 6 years ago

bilaterality of this function maybe should be explained in the documentation.

also there is a caveat, and i want to comment it here, as this implementation caused me to debug for a while:

if i do (please note how the sequence of enclosing is for unaccent and lower, also what follows are just abstract examples, i know there are possible workarounds!)

    Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Lower( 'another_field_or_Value' ) )

you would expect (abstractly) something like this SQL:

SELECT * WHERE unaccent(lower(fieldname)) = lower(another_field_or_Value);

however what you get is (because of bilateral "magic"):

SELECT * WHERE unaccent(lower(fieldname)) = lower(unaccent(another_field_or_Value))

less of a problem, as most likely you would want anyway something like

    Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Unaccent(Lower( 'another_field_or_Value' )) )

however this expression results in django as

SELECT * WHERE unaccent(lower(fieldname)) = unaccent(lower(unaccent(another_field_or_Value)))

this would not make much issues, you can just leave out unaccent on RHS as you will get it from annotate, especially if lower and unaccent were interchangable in their position
but they are not, there are exceptions

see in postgres:

select lower(unaccent('ÇÁÈÎËßÖÜ')), unaccent(lower('ÇÁÈÎËßÖÜ'));

results in "caeiesou", "caeieSou"
and these are just the european exceptions i checked, but be warned.

comment:11 by Claude Paroz, 6 years ago

Gabor, thanks for your input. It would be great to open a separate ticket with your findings, ideally with a failing test case for the Django test suite.

comment:12 by Gabor Körber, 6 years ago

well since this writing further research revealed, that postgres 9.6 has fixed the issue (at least for ß). so it's only valid for 9.5 series and lower. i am therefore not sure if its worth a new ticket.

Last edited 6 years ago by Gabor Körber (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top