Opened 4 years ago

Last modified 4 years ago

#32241 closed New feature

Array Many to Many field — at Initial Version

Reported by: Omar Altayyan Owned by:
Component: contrib.postgres Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

let's use an example:

class Item
  related = model.ManyToManyField(Tag)

class Tag
  pass

in relational databases, you sometimes have the use case where u want to filter Item by getting rows that are related to a certain set of Tag, so for example get me the items that are related to tag1 and tag2 at the same time, if we wanted to get the items that have either tag1 or tag2, the query would be as simple as an in condition, but here we want the items that have both tag1 and tag2 and possibly more.

using regular django or SQL you'd solve this by adding a group by and a count, note that this query has to search in the Tag entity, get the records, group them, then check the items where the count of Tag related rows equals the desired number of Tags we are filtering by.

however, using a simple ArrayField in postgres, that has all the IDs of Tag related to Item in an array field in the Item entity, along with a good index like GinIndex, you can simply use instersect on that tags field in Item to do that filtering, and even do a regular OR filtering that is many times more efficient.

we tested this approach vs the group by/count in the M2M field, the performance improvement was almost 2 orders of magnitude, going from 100s of ms to like 4-5 ms

however the only issue with anyone trying this approach in Django is that the arrayField can store the IDs of Tag but there's no ORM backing behind it, there's no prefetch related, nor any other type of help, so you'd have to manually fetch entities which is not only bad for code but also for performance.

there's already a package https://github.com/primal100/django_postgres_extensions that has this applied, but it has been archived and it uses a lot of hacks that either mess up django (which is what happened to us in https://code.djangoproject.com/ticket/32239#comment:4) or make it break when doing a django update

so since the improvement is huge in performance, it's worth considering by the Django core team, you could just fetch the ArrayManyToMany field from this package and apply it's patches in a concrete way and make this field a part of Django, since it's benefits are too good to ignore.

in order to help a bit, i made a pastebin entry:

https://pastebin.com/yPJGi3vu

note that the additions to django's methods are enclosed in OUR CUSTOM CODE comments, the ones after else blocks mean the else isn't found in django.

in order to make these methods work we would patch the methods in django in an App ready() function:

    from django.db.models import query
    from django.db.models.sql import datastructures
    from django.db.models.signals import pre_delete
    from django_postgres_extensions.signals import delete_reverse_related
    from core.lib.django_postgres_extensions.array_m2m_patches import prefetch_one_level
    from core.lib.django_postgres_extensions.array_m2m_patches import as_sql
    from core.lib.django_postgres_extensions.array_m2m_patches import _update
    from core.lib.django_postgres_extensions.array_m2m_patches import update

    query.QuerySet.update = update
    query.QuerySet._update = _update

    datastructures.Join.as_sql = as_sql
    query.prefetch_one_level = prefetch_one_level
    pre_delete.connect(delete_reverse_related)

i hope this has any value.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top