Opened 15 years ago
Closed 12 years ago
#13222 closed Bug (fixed)
unexpected HttpResponse iteration behavior
Reported by: | teepark | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | HTTP handling | Version: | 1.3-rc |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
For all the builtin iterables, and most from libraries, the __iter__
method returns something representing one and only one iteration. django.http.HttpResponse.__iter__
sets an iterator attribute on itself, then returns self (it also defines a next
method so it is its own iterator). This is bad behavior for an iterable and causes problems when trying to do anything but basic iteration.
A use case (and where I encountered this issue) is in a WSGI server that pulls off the first item in the iterable app return value, prefixes anything that was sent via write
calls, and then rebuilds the iterable with itertools.chain
:
if prefix: body_iterator = iter(body) try: first_chunk = body_iterator.next() except StopIteration: first_chunk = '' body = itertools.chain((prefix + first_chunk,), body_iterator)
Using django.core.handlers.wsgi.WSGIHandler
(which just returns the HttpResponse, since it's iterable) with this server, the first chunk of the HttpResponse gets doubled because itertools.chain
calls iter
on its arguments, and that resets the HttpResponse's iterator.
To compare with some builtins and other libraries:
>>> it = iter(["foobar"]) >>> list(it), list(it) (['foobar'], []) >>> it = iter("foobar" for i in [1]) >>> list(it), list(it) (['foobar'], []) >>> import werkzeug >>> it = iter(werkzeug.Response("foobar").response) >>> list(it), list(it) (['foobar'], []) >>> import webob >>> it = iter(webob.Response("foobar").app_iter) >>> list(it), list(it) (['foobar'], []) >>> import django.http >>> it = iter(django.http.HttpResponse("foobar")) >>> list(it), list(it) (['foobar'], ['foobar'])
Attachments (3)
Change History (33)
comment:1 by , 15 years ago
Has patch: | set |
---|
comment:2 by , 15 years ago
Version: | 1.1 |
---|
comment:3 by , 15 years ago
Cc: | added |
---|
comment:4 by , 15 years ago
Cc: | removed |
---|
follow-up: 6 comment:5 by , 15 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:6 by , 15 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
Replying to ubernostrum:
Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.
In the future please search existing tickets before opening new ones.
I did search through and I saw that ticket. #7581 deals with HttpResponse.content, which accesses _container
directly, so is orthogonal to iteration behavior. My patch won't fix #7581, and fixing #7581 won't help this issue.
by , 15 years ago
Attachment: | httpresponse.diff added |
---|
just makes HttpResponse.__iter__
a generator, and adds a regression test
comment:7 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:8 by , 14 years ago
Type: | → Bug |
---|
comment:9 by , 14 years ago
Severity: | → Normal |
---|
comment:10 by , 14 years ago
Patch needs improvement: | set |
---|
Could you rewrite the tests using unittests since this is now Django's preferred way?
by , 14 years ago
Attachment: | httpresponse2.diff added |
---|
same patch but tests moved from doctest to unittest
comment:11 by , 14 years ago
Patch needs improvement: | unset |
---|
follow-up: 13 comment:12 by , 14 years ago
Triage Stage: | Accepted → Design decision needed |
---|---|
Version: | → 1.3-rc1 |
I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.
comment:13 by , 14 years ago
Replying to Alex:
I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.
I'd be happy if HttpResponse emulated file objects' behavior, but that's not quite the case either (the code snippet from the OP works fine with a file).
File objects have a single cursor location that all iterators wind up sharing, but they don't set it back to the beginning of the file every time you call iter() on it, which in effect is what HttpResponse does currently.
IMO the design decision should be whether we fix it to be like list (the current patch) or we fix it to be like files.
comment:14 by , 14 years ago
In some respect this is an issue which could perhaps be brought up on Python WEB-SIG because what you are talking about is how the iterable returned by a WSGI application should behave. I know for anything I have seen and implemented, use of an iterator causes data to be consumed and it wouldn't be rewound to the original beginning of the data if a new iterator were created as Django seems to be doing by what is described here. So, perhaps another odd corner case for WSGI specification which should be clarified.
comment:17 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
Triage Stage: | Design decision needed → Accepted |
comment:18 by , 12 years ago
I'm going to follow Graham's advice: remove the rewinding behavior, ie. behave like files and generators. WSGI doesn't expect to iterate the response more than once.
Another advantage is that this behavior can be implemented consistently in HttpResponse
and StreamingHttpResponse
by , 12 years ago
Attachment: | 13222.patch added |
---|
comment:20 by , 12 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:21 by , 12 years ago
LGTM, with the assumption that this works for every python version we care about:
>>> l = [1,2,3] >>> i = iter(l) >>> l.append(4) >>> list(i) [1, 2, 3, 4]
comment:23 by , 12 years ago
@content.setter
actually supports any iterable type, and dictionaries and set iterators (and perhaps others) will raise if their original object is modified since the iterator's creation.
>>> s = set([1,2,3]) >>> i = iter(s) >>> s.add(4) >>> list(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Set changed size during iteration
There's not really a good reason to use either of those types as HttpResponse.content
, but _content
becomes the provided object so long as it hasattr("__iter__")
, so this implementation does also create new limitations on what is an acceptable content
.
It's my general feeling that separate iterators for separate __iter__
invocations is a more robust approach because of this kind of issue, but this patch is far and away better than the current behavior.
comment:24 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:25 by , 12 years ago
I committed before seeing your last comment.
It doesn't make much sense to instantiate a response with a dictionary or a set: these aren't ordered types.
comment:26 by , 12 years ago
>>> import collections >>> d = collections.deque([1,2,3]) >>> i = iter(d) >>> d.append(4) >>> list(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: deque mutated during iteration
it's not about ordered vs unordered, it's about whether pre-existing iterators tolerate mutation.
comment:27 by , 12 years ago
Well, if there's a valid use case for that, let's open a new ticket.
Usually one builds the entire content first, then iterates over it to render the response.
comment:28 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
This commit broke pickling of HttpResponse
objects :(
comment:30 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.
In the future please search existing tickets before opening new ones.