#6748 closed (fixed)
Ensure that debug error pages never evaluate QuerySets
Reported by: | Kenneth Arnold | Owned by: | Malcolm Tredinnick |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Keywords: | debug slow queryset repr memory | |
Cc: | kenneth.arnold@…, ehs@… | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The local variable display on debug 500 pages unintentionally evaluates QuerySets
. At best, the result is a delay [1]. At worst, the attempted evaluation may cause another exception, which will mask the root cause of the error.
Based on my suggestions, Mike Axiak of MIT ESP wrote the attached code (originally bundled with other code). I have attempted to factor out this part in a form that could conceivably apply to trunk. Patch attached.
[1] The load delay may be significantly greater than a normal page load, for two reasons:
- The debug page will include all local
QuerySets
, even if some of them were intermediary results in constructing the finalQuerySet
. This is particularly bad if the intermediate one does not yet have a row limit; I have occasionally found a debug page iterating over an entire huge table because of this.
- The model
__repr__
often returnsForeignKey
fields that may not be used on the page. The patch doesn't address this yet.
Attachments (4)
Change History (20)
by , 17 years ago
Attachment: | django-safereprqueryset-in-debug.patch added |
---|
comment:1 by , 17 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
by , 17 years ago
Attachment: | safe-queryset-debug-v2.patch added |
---|
comment:2 by , 17 years ago
I've added a smaller and actually more robust patch that takes advantage of the pprint
module and the recent qs-rf merge to make this a really simple change. It will now even avoid evaluating querysets in lists or other objects.
The most prominent case for this bug to pop up will probably be views that wrap generic views, since the querysets passed around there rarely have row limits.
(I still have not solved the foreign key problem. But at this point we should consider it a separate issue, I think.)
comment:3 by , 16 years ago
I got burned by this pretty bad, with the result being django devserver sucking up all available memory in my development environment where DEBUG was set to True.
I've recreated the problem as a very minimalistic django app, attached to the ticket as project.zip. This app contains a simple view:
def word(request, word): w = Word.objects.get(name=word) return HttpResponse(w)
First you'll need to load some demo data, and run the server:
% python manage.py syncdb % python manage runserver
Run top in another to watch memory usage. Don't worry, this example won't grow past 11MB.
First hit the view for a Model that exists:
Notice the very boring display and low memory footprint. Now try a URL like:
Notice your browser pause for a bit. Eventually you should get a debug view back, with the DoesNotExist
exception. Look at the 4th stack frame and observe all the Word objects that are present in the self
variable?
The reason for this is the debug view wants to display self
in that frame, and self
is a QuerySet(project.app.Word)
, which when you call __repr__
on it causes the query to run and to return the entire table into memory.
See line 301 of django/db/models/query.py
for where the exception is being thrown. Notice how self
is a local variable in that stack frame? The debug view tries to display that QuerySet
using __repr__
on line 508 in the inline template in django/views/debug.py
.
If the Model in question happens to have millions of wide rows (as ours did) you'll never see the Debug screen since our machine started swapping like mad trying to suck back all the results into memory.
I'm not sure what the fix is. Perhaps QuerySet.__repr__
should simply say it's a QuerySet
instead of executing the query and displaying the results?
by , 16 years ago
Attachment: | project.zip added |
---|
A simple demo application to demo QuerySet evaluation in debug view
comment:5 by , 16 years ago
I had to reduce the amount of initial data in the project.zip demo app, since it exceeded the attachment size. You should still be able to see the large amount of Word objects in the 4th stack frame in the debug view though.
comment:6 by , 16 years ago
Cc: | added |
---|
Did you try the patch? It should still apply, perhaps with fuzz; I've been using it locally ever since.
comment:7 by , 16 years ago
Keywords: | memory added |
---|
Yes, the patch worked beautifully on r8210 of the trunk. Would be nice to see this make it in.
comment:8 by , 16 years ago
Cc: | added |
---|
comment:9 by , 16 years ago
I just got bit by this; I'm wondering if the root issue isn't so much that debug pages evaluate QuerySet
s, but that QuerySet.__repr__
evaluates the entire thing. An object's __repr__
should be a value useful to a programmer, and a QuerySet
containing 14149 objects (to use my current example) is not a useful string while sitting at a Python console.
Attaching my first crack at this; a better implementation might take the result cache into account so you don't end up having the __repr__
of a QuerySet
slightly over the cutoff (here, CHUNK_SIZE
) being more expensive than it would be in the original implementation. I intentionally kept the output the same if the QuerySet
's length was below the cutoff, as I don't want to break countless doctests. ^_^
by , 16 years ago
Attachment: | queryset-repr.patch added |
---|
comment:10 by , 16 years ago
Cc: | added |
---|
comment:11 by , 16 years ago
I was also just bitten by this, we have a very large table, and when a query died in one spot, it sucked up ram and timed out the connection trying to build an oversized repr of the queryset - applying a patch along these lines fixed it entirely
comment:12 by , 16 years ago
Cc: | removed |
---|
comment:13 by , 16 years ago
The first patch here uses some GPL code as the genesis of the patch. It would be illegal for us to include that. Please don't submit any more code that started out life in the ESP project for inclusion in Django unless you are the sole author of that code and permitted to relicense it.
I also don't really like the idea of fundamentally changing the repr output. Querysets are basically lists, so a repr that looks like a list is reasonable. Plus it would break a few hundred thousand tests and tutorials around the planet, which isn't particularly nice.
The final patch suffers from the Butch Cassidy problem -- it's the fall that'll kill you. Printing out the results isn't the problem. It's loading them into memory in the first place (len(qs)
loads all results).
Once the test suite finishes running, I'll commit a patch that has the same effect as Tom's patch without loading all the results (it doesn't print the total number, since that requires more database queries and we can live without it).
comment:14 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:15 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [9202]) Fixed #6748 -- When printing the repr() of querysets, don't load or display
more than 20 objects.
This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.
If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".
comment:16 by , 16 years ago
(In [9205]) [1.0.X] Fixed #6748 -- When printing the repr() of querysets, don't load or
display more than 20 objects.
This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.
If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".
Backport of r9202 from trunk.
Revised, simplified version of the patch