Opened 4 years ago
Last modified 3 years ago
#32492 closed New feature
Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search. — at Version 1
Reported by: | Taneli | Owned by: | nobody |
---|---|---|---|
Component: | contrib.postgres | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Paolo Melchiorre | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Use case: filter queryset by an inexact substring. For example, retrieve an object with the following headline Dogecoin is following bitcoin in its dramatic rise when searching for dogge.
Currently it is possible to filter a queryset on the basis of trigram similarity between the search string and full text stored in a column.
Author.objects.annotate(similarity=TrigramSimilarity('name', test),).filter(similarity__gt=0.3)
This is a wrapper around the similarity function of the pg_trgm extension. While it allows comparing full strings, i.e. searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy searching of substrings.
SELECT similarity('dogge', 'doge'); --------- 0.57 SELECT similarity('dogge', 'dogecoin is following bitcoin'); ------------ 0.1
word_similarity does take into account the word boundaries
SELECT word_similarity('doge', 'dogecoin is following bitcoin'); -------------- 0.5
Adding a django API to word_similarity would allow for better fuzzy fulltext search without a need to use either raw SQL or external tools like elasticsearch.