Opened 4 years ago

Last modified 3 years ago

#32492 closed New feature

Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search. — at Initial Version

Reported by: Taneli Owned by: nobody
Component: contrib.postgres Version: dev
Severity: Normal Keywords:
Cc: Paolo Melchiorre Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Use case: filter queryset by an inexact substring. For example, retrieve an object with the following headline ' Dogecoin is following bitcoin in its dramatic rise' when searching for 'dogge'.

Currently it is possible to filter a queryset on the basis of trigram similarity between the search string and full text stored in a column.

Author.objects.annotate(similarity=TrigramSimilarity('name', test),).filter(similarity__gt=0.3)

This is a wrapper around the similarity function of the pg_trgm extension. While it allows comparing full strings, i.e. searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy searching of substrings.

SELECT similarity('dogge', 'doge');
---------
0.57

SELECT similarity('dogge', 'dogecoin is following bitcoin');
------------
0.1

word_similarity does take into account the word boundaries

SELECT word_similarity('doge', 'dogecoin is following bitcoin');
--------------
0.5

Adding a django API to word_similarity would allow for better fuzzy fulltext search without a need to use either raw SQL or external tools like elasticsearch.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top