Opened 5 years ago

Closed 5 years ago

#31096 closed New feature (needsinfo)

Massively improving ManyToMany caching when using in forms

Reported by: László Károlyi Owned by: nobody
Component: Forms Version: 3.0
Severity: Normal Keywords: ORM Caching Forms ManyToMany
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by László Károlyi)

Hey,

after about half a day of investigating, I think I can provide a meaningful improvement to Django. Let me describe the issue:

If you have a ModelForm that has ManyToMany fields after saving the relations, the cache — with all the prefetch_related() and select_related() queryset adjustments you might have used on the ModelMultipleChoiceField's — will be lost. This often results in tons of unnecessary DB queries when using the saved instance from the ModelForm, because the relations on the newly constructed instance won't be cached yet, although they store the already adjusted relations from the form.

My solution here goes down to ManyToManyField.save_form_data() to fix the problem, and patches it to store the cleaned/validated ModelMultipleChoiceField result on the model instance, thus keeping queryset optimizations, and results.

I have a complete module for manipulating Django's ORM query caches on ForeingKey and ManyToMany fields, I'll paste that here for further review/improvement (it might not be perfect but works well for me), down at the bottom is the hot-replacing of the ManyToManyField.save_form_data().

Feel free to comment on it, but as a result, this change have saved me lots of DB queries when I log from the saved instance after the ModelForm.save() part. This module might spark some other ideas on behalf of core devs as to how to optimize the ORM further.

The only downside I saw so far with this change is, it seems to ignore the Model ordering for some reason, so I had to patch my tests for flakyness.

Here the module:

from typing import Iterable, Optional

from django import VERSION
from django.db.models.base import Model
from django.db.models.fields.related import ManyToManyField
from django.db.models.fields.reverse_related import ManyToOneRel
from django.db.models.manager import Manager
from django.db.models.query import QuerySet


def invalidate_onetomany(objs: Iterable[Model], prefetch_keys: Iterable[str]):
    """
    Invalidate one-to-many caches. These are remote `ForeignKey` and
    `ManyToManyField` fields fetched with `prefetch_related()`.
    """
    if VERSION[0] == 1 or VERSION[0] == 2:
        for obj in objs:
            if not hasattr(obj, '_prefetched_objects_cache'):
                continue
            for key in prefetch_keys:
                if key not in obj._prefetched_objects_cache:
                    continue
                del obj._prefetched_objects_cache[key]


def invalidate_manytoone(objs: Iterable[Model], field_names: Iterable[str]):
    """
    Invalidate many-to-one caches. These are `ForeignKey` and
    `OneToOneField` fields fetched with `select_related()` or
    `prefetch_related()`.
    """
    if VERSION[0] == 1:
        for obj in objs:
            for field_name in field_names:
                if not is_fk_cached(obj=obj, field_name=field_name):
                    continue
                del obj.__dict__[f'_{field_name}_cache']
    elif VERSION[0] == 2:
        for obj in objs:
            for field_name in field_names:
                if not is_fk_cached(obj=obj, field_name=field_name):
                    continue
                del obj._state.fields_cache[field_name]


def get_prefetch_cache_key(relation: Manager) -> str:
    'Return a key used in the prefetched cache for a relation.'
    try:
        # Works on ManyToMany
        return relation.prefetch_cache_name
    except AttributeError:
        # Is a ForeignKey (OneToMany)
        rel_field = relation.field.remote_field  # type: ManyToOneRel
        if rel_field.related_name:
            return rel_field.related_name
        if VERSION[0] == 1:
            return rel_field.name
        elif VERSION[0] == 2:
            return f'{rel_field.name}_set'


def init_prefetch_cache(obj: Model):
    'Init a prefetch cache on the model.'
    if VERSION[0] == 1 or VERSION[0] == 2:
        if hasattr(obj, '_prefetched_objects_cache'):
            return
        obj._prefetched_objects_cache = {}


def is_query_prefetched(relation: Manager) -> bool:
    'Return `True` if the relation is prefetched.'
    if VERSION[0] == 1 or VERSION[0] == 2:
        obj = relation.instance
        if not hasattr(obj, '_prefetched_objects_cache'):
            return False
        prefetch_cache_key = get_prefetch_cache_key(relation=relation)
        return prefetch_cache_key in obj._prefetched_objects_cache
    return False


def set_prefetch_cache(
        relation: Manager, queryset: QuerySet, override: bool = True):
    'Set prefetch cache on a `Model` for a relation.'
    if is_query_prefetched(relation=relation) and not override:
        return
    obj = relation.instance
    init_prefetch_cache(obj=obj)
    if VERSION[0] == 1 or VERSION[0] == 2:
        key = get_prefetch_cache_key(relation=relation)
        obj._prefetched_objects_cache[key] = queryset


def is_queryresult_loaded(qs: QuerySet) -> bool:
    'Return `True` if the query is loaded, `False` otherwise.'
    if VERSION[0] == 1 or VERSION[0] == 2:
        return qs._result_cache is not None
    return False


def set_queryresult(qs: QuerySet, result: list, override: bool = True):
    'Set result on a previously setup query.'
    if VERSION[0] == 1 or VERSION[0] == 2:
        if override or not is_queryresult_loaded(qs=qs):
            qs._result_cache = result


def get_queryresult(qs: QuerySet) -> Optional[list]:
    'Return the cached query result of the passed `QuerySet`.'
    if VERSION[0] == 1 or VERSION[0] == 2:
        return qs._result_cache


def is_fk_cached(obj: Model, field_name: str) -> bool:
    'Return `True` if the `ForeignKey` field on the object is cached.'
    if VERSION[0] == 1:
        return hasattr(obj, f'_{field_name}_cache')
    elif VERSION[0] == 2:
        if getattr(obj, '_state', None) is None or \
                getattr(obj._state, 'fields_cache', None) is None:
            return False
        return field_name in obj._state.fields_cache
    return False


def set_fk_cache(
        obj: Model, field_name: str, value: Model, override: bool = True):
    """
    Set a cache on the `obj` for a `ForeignKey` field, override when
    requested.
    """
    if is_fk_cached(obj=obj, field_name=field_name) and not override:
        return
    if VERSION[0] == 1:
        setattr(obj, f'_{field_name}_cache', value)
    elif VERSION[0] == 2:
        if getattr(obj, '_state', None) is None:
            obj._state = dict()
        if getattr(obj._state, 'fields_cache', None) is None:
            obj._state.fields_cache = dict()
        obj._state.fields_cache[field_name] = value


def del_fk_cache(obj: Model, field_name: str):
    'Delete a cached `ForeignKey` on the `Model`.'
    if not is_fk_cached(obj=obj, field_name=field_name):
        return
    if VERSION[0] == 1:
        delattr(obj, f'_{field_name}_cache')
    elif VERSION[0] == 2:
        del obj._state.fields_cache


_old_m2m_savedata = ManyToManyField.save_form_data


def _save_m2m_form_data(
        self: ManyToManyField, instance: Model, data: QuerySet):
    _old_m2m_savedata(self=self, instance=instance, data=data)
    set_prefetch_cache(
        relation=getattr(instance, self.name), queryset=data, override=True)


ManyToManyField.save_form_data = _save_m2m_form_data

Change History (2)

comment:1 by László Károlyi, 5 years ago

Description: modified (diff)

comment:2 by Carlton Gibson, 5 years ago

Component: Database layer (models, ORM)Forms
Needs documentation: set
Needs tests: set
Patch needs improvement: set
Resolution: needsinfo
Status: newclosed
Type: UncategorizedNew feature
Version: 2.23.0

Hi.

My initial thought here is that the invalidation logic is likely to be far-too fragile to justify the additional complexity of code.
Refetching from the database after save operations is commonplace, simple and reliable. Whereas cache invalidation is notoriously difficult.
There would have to be a big performance gain, when saving is already an expensive operation, so likely dominates the timings vs the refetch unless you are dealing with many rows.

Major changes such as this should be taken to the DevelopersMailingList, where if there is agreement to proceed we can track work with a ticket here.
If you are to post to the mailing list, please think first about tests demonstrating correctness, and ideally provide some kind of benchmark as to the expected performance gains.

Thanks.

Note: See TracTickets for help on using tickets.
Back to Top