Opened 4 years ago
Last modified 3 years ago
#32492 closed New feature
Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search. — at Initial Version
Reported by: | Taneli | Owned by: | nobody |
---|---|---|---|
Component: | contrib.postgres | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Paolo Melchiorre | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Use case: filter queryset by an inexact substring. For example, retrieve an object with the following headline ' Dogecoin is following bitcoin in its dramatic rise' when searching for 'dogge'.
Currently it is possible to filter a queryset on the basis of trigram similarity between the search string and full text stored in a column.
Author.objects.annotate(similarity=TrigramSimilarity('name', test),).filter(similarity__gt=0.3)
This is a wrapper around the similarity function of the pg_trgm extension. While it allows comparing full strings, i.e. searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy searching of substrings.
SELECT similarity('dogge', 'doge'); --------- 0.57 SELECT similarity('dogge', 'dogecoin is following bitcoin'); ------------ 0.1
word_similarity does take into account the word boundaries
SELECT word_similarity('doge', 'dogecoin is following bitcoin'); -------------- 0.5
Adding a django API to word_similarity would allow for better fuzzy fulltext search without a need to use either raw SQL or external tools like elasticsearch.