Opened 17 years ago

Closed 16 years ago

Last modified 16 years ago

#6748 closed (fixed)

Ensure that debug error pages never evaluate QuerySets

Reported by: Kenneth Arnold Owned by: Malcolm Tredinnick
Component: Core (Other) Version: dev
Severity: Keywords: debug slow queryset repr memory
Cc: kenneth.arnold@…, ehs@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The local variable display on debug 500 pages unintentionally evaluates QuerySets. At best, the result is a delay [1]. At worst, the attempted evaluation may cause another exception, which will mask the root cause of the error.

Based on my suggestions, Mike Axiak of MIT ESP wrote the attached code (originally bundled with other code). I have attempted to factor out this part in a form that could conceivably apply to trunk. Patch attached.

[1] The load delay may be significantly greater than a normal page load, for two reasons:

  1. The debug page will include all local QuerySets, even if some of them were intermediary results in constructing the final QuerySet. This is particularly bad if the intermediate one does not yet have a row limit; I have occasionally found a debug page iterating over an entire huge table because of this.
  1. The model __repr__ often returns ForeignKey fields that may not be used on the page. The patch doesn't address this yet.

Attachments (4)

django-safereprqueryset-in-debug.patch (3.5 KB ) - added by Kenneth Arnold 17 years ago.
safe-queryset-debug-v2.patch (2.0 KB ) - added by Kenneth Arnold 17 years ago.
Revised, simplified version of the patch
project.zip (13.8 KB ) - added by edsu 16 years ago.
A simple demo application to demo QuerySet evaluation in debug view
queryset-repr.patch (759 bytes ) - added by korpios 16 years ago.

Download all attachments as: .zip

Change History (20)

by Kenneth Arnold, 17 years ago

comment:1 by Jacob, 17 years ago

Triage Stage: UnreviewedDesign decision needed

by Kenneth Arnold, 17 years ago

Revised, simplified version of the patch

comment:2 by Kenneth Arnold, 17 years ago

I've added a smaller and actually more robust patch that takes advantage of the pprint module and the recent qs-rf merge to make this a really simple change. It will now even avoid evaluating querysets in lists or other objects.

The most prominent case for this bug to pop up will probably be views that wrap generic views, since the querysets passed around there rarely have row limits.

(I still have not solved the foreign key problem. But at this point we should consider it a separate issue, I think.)

comment:3 by edsu, 16 years ago

I got burned by this pretty bad, with the result being django devserver sucking up all available memory in my development environment where DEBUG was set to True.

I've recreated the problem as a very minimalistic django app, attached to the ticket as project.zip. This app contains a simple view:

def word(request, word):
    w = Word.objects.get(name=word)
    return HttpResponse(w)

First you'll need to load some demo data, and run the server:

% python manage.py syncdb
% python manage runserver

Run top in another to watch memory usage. Don't worry, this example won't grow past 11MB.

First hit the view for a Model that exists:

http://localhost:8080/word/time

Notice the very boring display and low memory footprint. Now try a URL like:

http://localhost:8080/word/thisworddoesnotexist

Notice your browser pause for a bit. Eventually you should get a debug view back, with the DoesNotExist exception. Look at the 4th stack frame and observe all the Word objects that are present in the self variable?

The reason for this is the debug view wants to display self in that frame, and self is a QuerySet(project.app.Word), which when you call __repr__ on it causes the query to run and to return the entire table into memory.

See line 301 of django/db/models/query.py for where the exception is being thrown. Notice how self is a local variable in that stack frame? The debug view tries to display that QuerySet using __repr__ on line 508 in the inline template in django/views/debug.py.

If the Model in question happens to have millions of wide rows (as ours did) you'll never see the Debug screen since our machine started swapping like mad trying to suck back all the results into memory.

I'm not sure what the fix is. Perhaps QuerySet.__repr__ should simply say it's a QuerySet instead of executing the query and displaying the results?

by edsu, 16 years ago

Attachment: project.zip added

A simple demo application to demo QuerySet evaluation in debug view

comment:4 by edsu, 16 years ago

Note, examples using port 8080 above should be 8000.

comment:5 by edsu, 16 years ago

I had to reduce the amount of initial data in the project.zip demo app, since it exceeded the attachment size. You should still be able to see the large amount of Word objects in the 4th stack frame in the debug view though.

comment:6 by Kenneth Arnold, 16 years ago

Cc: kenneth.arnold@… added

Did you try the patch? It should still apply, perhaps with fuzz; I've been using it locally ever since.

comment:7 by edsu, 16 years ago

Keywords: memory added

Yes, the patch worked beautifully on r8210 of the trunk. Would be nice to see this make it in.

comment:8 by edsu, 16 years ago

Cc: ehs@… added

comment:9 by korpios, 16 years ago

I just got bit by this; I'm wondering if the root issue isn't so much that debug pages evaluate QuerySets, but that QuerySet.__repr__ evaluates the entire thing. An object's __repr__ should be a value useful to a programmer, and a QuerySet containing 14149 objects (to use my current example) is not a useful string while sitting at a Python console.

Attaching my first crack at this; a better implementation might take the result cache into account so you don't end up having the __repr__ of a QuerySet slightly over the cutoff (here, CHUNK_SIZE) being more expensive than it would be in the original implementation. I intentionally kept the output the same if the QuerySet's length was below the cutoff, as I don't want to break countless doctests. ^_^

by korpios, 16 years ago

Attachment: queryset-repr.patch added

comment:10 by korpios, 16 years ago

Cc: korpios@… added

comment:11 by Collin Grady, 16 years ago

I was also just bitten by this, we have a very large table, and when a query died in one spot, it sucked up ram and timed out the connection trying to build an oversized repr of the queryset - applying a patch along these lines fixed it entirely

comment:12 by korpios, 16 years ago

Cc: korpios@… removed

comment:13 by Malcolm Tredinnick, 16 years ago

The first patch here uses some GPL code as the genesis of the patch. It would be illegal for us to include that. Please don't submit any more code that started out life in the ESP project for inclusion in Django unless you are the sole author of that code and permitted to relicense it.

I also don't really like the idea of fundamentally changing the repr output. Querysets are basically lists, so a repr that looks like a list is reasonable. Plus it would break a few hundred thousand tests and tutorials around the planet, which isn't particularly nice.

The final patch suffers from the Butch Cassidy problem -- it's the fall that'll kill you. Printing out the results isn't the problem. It's loading them into memory in the first place (len(qs) loads all results).

Once the test suite finishes running, I'll commit a patch that has the same effect as Tom's patch without loading all the results (it doesn't print the total number, since that requires more database queries and we can live without it).

comment:14 by Malcolm Tredinnick, 16 years ago

Owner: changed from nobody to Malcolm Tredinnick
Status: newassigned

comment:15 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: assignedclosed

(In [9202]) Fixed #6748 -- When printing the repr() of querysets, don't load or display
more than 20 objects.

This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.

If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".

comment:16 by Malcolm Tredinnick, 16 years ago

(In [9205]) [1.0.X] Fixed #6748 -- When printing the repr() of querysets, don't load or
display more than 20 objects.

This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.

If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".

Backport of r9202 from trunk.

Note: See TracTickets for help on using tickets.
Back to Top