Context Navigation

← Previous Ticket
Next Ticket →

#14474 closed (wontfix)

Unnecessary deepcopying of QuerySet inside filter() method results in slower execution

Reported by:	Piotr Czachur	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	dev
Severity:		Keywords:	deepcopy, filter
Cc:		Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

When building query using multiple filter() calls like that:

    qs = MyModel.objects.filter(...).filter(...).filter(...)

it results in sql.Query.clone() for every filter() call.
Query.clone() uses deepcopy() for "where", "having", "aggregates" and "deferred_loading" properties:

        obj.where = deepcopy(self.where, memo=memo)
        obj.having = deepcopy(self.having, memo=memo)
        obj.aggregates = deepcopy(self.aggregates, memo=memo)
        obj.deferred_loading = deepcopy(self.deferred_loading, memo=memo)

... which can be time-consumig operation.

My question is, do we really need deepcopy when we call filter() like this?
I mean we only need QuerySet created by last filter() call, so we can overwrite previous QuerySet instead of creating a (deep)copy.
Maybe adding flag for filter() method would be possible: filter(overwrite=False).

P.S.

I attach profiler results for non-trivial query run on my Ubuntu box: CPU: Intel(R) Celeron(R) M CPU 530 @ 1.73GHz, Python 2.6.5 / MySQL 5.1@InnoDB
Results show that for complicated query, query building time can be far larger than executing SQL and filling model instances with results.
I tested two version of the query:

    # multi-filter
    result = Zero.objects.select_related('one__two__three') \
        .filter(field_x=None) \
        .filter(field_t='P') \
        .filter(one__id__gt=2) \
        .filter(one__field_j__isnull=True) \
        .filter(one__two__id__isnull=False) \
        .filter(one__two__field_y__year=2009) \
        .filter(one__two__three__field_j__isnull=True) \
        .filter(one__two__three__field_z__gt=2) \
        [:1]
    len(result)

and

    # filter-once
    result = Zero.objects.select_related('one__two__three') \
        .filter(field_x=None, 
        field_t='P', 
        one__id__gt=2, 
        one__field_j__isnull=True, 
        one__two__id__isnull=False, 
        one__two__field_y__year=2009, 
        one__two__three__field_j__isnull=True, 
        one__two__three__field_z__gt=2 
        )[:1]
    len(result)

Attachments (1)

QuerySet_execution_time.ods (18.4 KB ) - added by Piotr Czachur 14 years ago.: QuerySet profiling on Ubuntu box CPU: Intel(R) Celeron(R) M CPU 530 @ 1.73GHz, Python 2.6.5, MySQL 5.1@Innodb

Download all attachments as: .zip

Change History (4)

by Piotr Czachur, 14 years ago

Attachment:	QuerySet_execution_time.ods added

QuerySet profiling on Ubuntu box CPU: Intel(R) Celeron(R) M CPU 530 @ 1.73GHz, Python 2.6.5, MySQL 5.1@Innodb

comment:1 by Russell Keith-Magee, 14 years ago

Resolution:	→ invalid
Status:	new → closed

Yes, a deepcopy is required, because you can build on querysets in parallel paths. i.e.,

qs = Author.objects.all()
qs1 = qs.filter(name='foo')
qs2 = qs.filter(age=5)

If we don't deepcopy, qs1 and qs2 will be partially sharing internal memory structures, which will lead to interesting behavior.

comment:2 by Piotr Czachur, 14 years ago

Resolution:	invalid
Status:	closed → reopened

Maybe I haven't made myself entirely clear.

I'm not asking if deepcopying in general is necessary.
I'm suggesting that for specific - but not rare - use cases we can let users decide whether to skip deepcopying and overwrite existing queryset.

# In such case we are sure that "Author.objects.all()" return value won't be used again, to create other, more specific query - we don't need deepcopying here.
qs = Author.objects.all().filter(overwrite=True, name='foo')

This will give use quite performance boost, while remaining backward compatible (default: overwrite=False).

comment:3 by Russell Keith-Magee, 14 years ago

Resolution:	→ wontfix
Status:	reopened → closed

While we're at it, we could add a 'explode_randomly=True' option, defaulting to false. :-)

In other words: No.

If you are in a situation where optimizing the usage of a queryset if an issue, you should be using a raw queryset.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#14474 closed (wontfix)

Unnecessary deepcopying of QuerySet inside filter() method results in slower execution

Description

Attachments (1)

Change History (4)

by Piotr Czachur, 14 years ago

comment:1 by Russell Keith-Magee, 14 years ago

comment:2 by Piotr Czachur, 14 years ago

comment:3 by Russell Keith-Magee, 14 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us