Opened 4 years ago

Last modified 3 years ago

#32492 closed New feature

Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search. — at Version 1

Reported by: Taneli Owned by: nobody
Component: contrib.postgres Version: dev
Severity: Normal Keywords:
Cc: Paolo Melchiorre Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Taneli)

Use case: filter queryset by an inexact substring. For example, retrieve an object with the following headline Dogecoin is following bitcoin in its dramatic rise when searching for dogge.

Currently it is possible to filter a queryset on the basis of trigram similarity between the search string and full text stored in a column.

Author.objects.annotate(similarity=TrigramSimilarity('name', test),).filter(similarity__gt=0.3)

This is a wrapper around the similarity function of the pg_trgm extension. While it allows comparing full strings, i.e. searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy searching of substrings.

SELECT similarity('dogge', 'doge');
---------
0.57

SELECT similarity('dogge', 'dogecoin is following bitcoin');
------------
0.1

word_similarity does take into account the word boundaries

SELECT word_similarity('doge', 'dogecoin is following bitcoin');
--------------
0.5

Adding a django API to word_similarity would allow for better fuzzy fulltext search without a need to use either raw SQL or external tools like elasticsearch.

Change History (1)

comment:1 by Taneli, 4 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top