Opened 10 hours ago

Last modified 8 hours ago

#35779 assigned Uncategorized

ORM to avoid deferring the reference_id of a prefetch

Reported by: Thiago Bellini Ribeiro Owned by: GunSliger00007
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Thiago Bellini Ribeiro Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Suppose that I have a User and an Email models, and they are related through Email.user_id

If I query User like this:

User.objects.prefetch_related("email_set", Email.objects.only("email"))

I'll see n+1 issues either way, as user_id was deferred, and the ORM will need to refetch the object from the database.

Although the fix for this is easy, just changing this to:

User.objects.prefetch_related("email_set", Email.objects.only("user_id", "email"))

It can catch some users off-guard, as it is not so obvious that the ORM will not do "the correct thing".

Based on that, I was wondering if maybe the ORM could force the reference_id of a relation to not be deferred when it is used in a Prefetch object, the same way it already does with pk. It makes sense to me at least.

---

On a side note, this caused some unexpected behavior at work today, as we were fixing RemovedInDjango50Warnings and the simple addition of chunk_size to an .iterator() with a Prefetch caused some Model.DoesNotExist issues.

After digging for a while I found out about this issue, which means our prefetch_related was being thrown into the trash, but also because the iteration was taking some time to happen (very large table), some related objects got deleted in the meantime, and when the ORM tried to refresh_from_db to get the related id, it failed with that error.

Change History (3)

comment:1 by GunSliger00007, 8 hours ago

Owner: set to GunSliger00007
Status: newassigned

comment:2 by GunSliger00007, 8 hours ago

Consider optimizing your queries by prefetching only the fields you need. In your case, you’re already using only() to limit the fields loaded by the query, which helps reduce the amount of data loaded into memory.If the related Email objects are large, consider using .values() or .values_list() to reduce the amount of data furthe

User.objects.prefetch_related(

Prefetch("email_set", queryset=Email.objects.only("user_id", "email"))

)

Version 0, edited 8 hours ago by GunSliger00007 (next)

in reply to:  2 comment:3 by Thiago Bellini Ribeiro, 8 hours ago

Replying to GunSliger00007:

Consider optimizing your queries by prefetching only the fields you need. In your case, you’re already using only() to limit the fields loaded by the query, which helps reduce the amount of data loaded into memory.If the related Email objects are large, consider using .values() or .values_list() to reduce the amount of data furthe

User.objects.prefetch_related(
    Prefetch("email_set", queryset=Email.objects.only("user_id", "email"))
)

That User/Email was mostly to exemplify the issue :)

Note: See TracTickets for help on using tickets.
Back to Top