#6527 closed Bug (fixed)
A bug in HttpResponse with iterators
Reported by: | Owned by: | Aymeric Augustin | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | http iterators |
Cc: | mikolo2@…, jim@…, robert.coup@…, real.human@…, Forest Bond, daemianmack@…, net147 | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
>>> from django.http import HttpResponse >>> response = HttpResponse(file('helloworld.txt','r')) >>> print response Content-Type: text/html; charset=utf-8 helloworld >>> print response Content-Type: text/html; charset=utf-8 >>>
Seems to me django.http.HttpResponse has to be modified so that when its init is given an iterator, it will go through the iterator only once.
I'd be happy to deliver a patch if this is something that should be fixed.
Attachments (4)
Change History (40)
comment:1 by , 17 years ago
Component: | Uncategorized → HTTP handling |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 17 years ago
It came up because I downloaded django analytics from google code and added their middleware. What it does is to scan the reponse for the end body tag and insert the analytics tracker code there. I think it's very clean and it worked great.
Unrelated, I have a piece of code that acts as a proxy for the flickr service - I have to keep my key safe so my server has to act as a go between. My view ends with:
flickr_resp = urlopen(flickr_api_url, params) return HttpResponse(''.join(flickr_resp), mimetype='application/json')
and it worked well until I added the middleware.
It's one of the nasties bags I found - The first thing I did was to add "response = ..." and then print it so I can use the local server to debug. The response seemed fine, after a lot of searching and wandering about I decided to print it twice.
It's definitely not the result I expected. When you have a framework that acts like that, it loses my trust.
follow-up: 31 comment:3 by , 17 years ago
Triage Stage: | Design decision needed → Accepted |
---|
The HttpResponse class should turn the iterator into a list and store that result. Might as well do it in the __init__
method.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
comment:4 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Taken during PyCon 2008 sprint.
comment:5 by , 17 years ago
Has patch: | set |
---|
Fixed in the attached patch file. I added regression tests to cover this case.
Note, svn thinks the entire AUTHORS file changed due to bug #6545, which refers to a line-ending problem with pool text files.
comment:6 by , 17 years ago
If you're going to read the contents from the iterator, there's no need to create another attribute on the model. Just put the contents into self._contents
. You don't need to store the iterator.
comment:7 by , 17 years ago
Roger. There is no self._contents
, so I'm guessing you meant to write self._container
.
The entire class got simpler, as a result. Excellent!
The patch is 6527 V2.patch
.
The guideline, "If an HttpResponse has been initialized with an iterator as its content, you can’t use the HttpResponse instance as a file-like object. Doing so will raise Exception," is no longer required, so its change is now in the patch. Svn thinks every line changed due to bug #6545.
follow-up: 10 comment:9 by , 17 years ago
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
comment:10 by , 17 years ago
Replying to daonb <bennydaon@gmail.com>:
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
Thank you for your kind words. I wish I could do more, but I'm just getting up to speed on the pool. It's thrilling to be able to contribute in a small way to Django's success!
comment:11 by , 16 years ago
Cc: | added |
---|
comment:12 by , 16 years ago
Cc: | added |
---|
comment:13 by , 16 years ago
Cc: | added |
---|
comment:14 by , 15 years ago
comment:15 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:16 by , 15 years ago
@stugots Hopefully we can get this patch into trunk at the end of SoC2009. I changed the ownership so I don't forget to address it when I am reviewing my changes later on.
comment:17 by , 15 years ago
Triage Stage: | Accepted → Fixed on a branch |
---|
Fixed on branches/soc2009/http-wsgi-improvements.
comment:18 by , 15 years ago
So if/when this hits trunk, we will lose the benefit of not having the entire iterator (potentially large and time consuming) saved in memory, which could also potentially cause browser timeouts? If HttpResponse will have no real support for iterators, why not simply make HttpResponse require a string argument? Users wanting to pass an iterator can simply consume it then.
comment:19 by , 15 years ago
Triage Stage: | Fixed on a branch → Design decision needed |
---|
The suggested fix, and the changes committed to soc2009/http-wsgi-improvements cannot possibly be committed to trunk. This severely breaks previously documented behaviour because, as mrmachine points out, you will no longer be able to iteratively generate large responses, and in particular, will not be able to iteratively deliver chunks of content to the client.
This will break any page that takes a long time to fully generate, or about 10% of one of my sites.
I think that any change will need to consider the different types of middleware that operate on a response's contents.
Some types of middleware will only examine the generated response, eg CacheMiddleware will not modify the response, only store a version in the cache when complete.
Others will want to replace the content and modify headers, eg GzipMiddleware.
The current 'fix' only satisfies the requirements of the modifying middleware.
I fixed the limitation of the cache middleware to cope with iterator based responses with code like so:
def buffer_and_cache_response(response, cache_key, timeout): from copy import copy def worker(rsp): from cStringIO import StringIO buf = StringIO() # we need to copy the response before we iterate through it # if we copy it after, then the response will have a generator object # on self._iterator that will not be picklable buffered_response = copy(rsp) for chunk in rsp: buf.write(chunk) yield chunk buffered_response.content = buf.getvalue() buf.close() rsp.close() cache.set(cache_key, buffered_response, timeout) new_resp = copy(response) new_resp._container = worker(response) new_resp._is_string = False return new_resp
Our cache middleware (virtually copy/paste of UpdateCacheMiddleware) determines if this is an iterator derived response by checking response._is_string (hacky, I know), and returns the rv of this function. This is just an example of a more intelligent, correct way to fix this issue.
comment:20 by , 15 years ago
Cc: | added |
---|
comment:21 by , 14 years ago
Cc: | added |
---|
comment:22 by , 14 years ago
I ended up using the exact same "response._is_string" check in my patch on #10627.
If the documentation for "Passing Iterators" also mentioned "generators", it would cover one side of the behavior issue.
If the documentation on writing middleware warned of the potential of responses being iterators/generators, and thus to no indirectly consume them, along with mentioning the _is_string [or equivalent 'correct'] test, it would at least cover that base.
comment:23 by , 14 years ago
I worked on this three years ago. And it's still not resolved. Amazing.
comment:24 by , 14 years ago
Type: | → Bug |
---|
comment:25 by , 14 years ago
Severity: | → Normal |
---|
comment:26 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
UI/UX: | unset |
comment:27 by , 13 years ago
Alex and I looked at this at the sprint and thought there had been extensive discussion of iterable responses and middleware during the TemplateResponse discussions, but weren't sure what the conclusions of that discussion were. It doesn't seem that the relevant HttpResponse code has changed.
comment:28 by , 13 years ago
Triage Stage: | Design decision needed → Accepted |
---|
This shouldn't be DDN -- it's still a bug -- but it's far from clear how to solve this problem. We've got to figure out a consistant way of handling lazy vs. non-lazy responses, and we haven't yet. That's a bigger discussion and probably needs a "master ticket" or an extended discussion on django-dev.
comment:29 by , 13 years ago
Patch needs improvement: | set |
---|
comment:30 by , 12 years ago
Owner: | changed from | to
---|
comment:31 by , 12 years ago
Replying to mtredinnick:
The HttpResponse class should turn the iterator into a list and store that result.
Yes, it should.
Might as well do it in the
__init__
method.
Today, it seems that the consensus it to keep the streaming behavior in the (rare) cases where it works. This means turning the iterator into a list only when it's going to be iterated and the content loaded in memory anyway — typically when middleware reads response.content
.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
A separate StreamingHttpResponse
was just introduced for this purpose in #7581. In fact, the last pull request also included the changes required to fix this ticket. I didn't commit them because they weren't strictly necessary for #7581. I've extracted them into a separate patch, which I'll be attaching shortly.
This is a logical follow-up for #7581 and I'll commit it quickly if no one objects.
by , 12 years ago
Attachment: | 6527-v3.patch added |
---|
follow-up: 33 comment:32 by , 12 years ago
I think that the desire to keep the partial streaming behaviour as-is for HttpResponse
was primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistent HttpResponse
class that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
I would like to keep the existing behaviour for now and raise a PendingDeprecationWarning
when an iterator is consumed by accessing .content
, that directs people to use the new streaming response class. When the deprecation cycle is finished, HttpResponse
should consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
An HttpResponse
class that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
comment:33 by , 12 years ago
Replying to mrmachine:
I think that the desire to keep the partial streaming behaviour as-is for
HttpResponse
was primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistentHttpResponse
class that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
Yes, I understand that.
I would like to keep the existing behaviour for now and raise a
PendingDeprecationWarning
when an iterator is consumed by accessing.content
, that directs people to use the new streaming response class. When the deprecation cycle is finished,HttpResponse
should consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
Actually we would have to raise the warning when the iterator is *not* consumed by accessing .content
, but rather by the WSGI server iterating the response.
An
HttpResponse
class that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
Did you mean "stores its content in a string"?
comment:34 by , 12 years ago
No. I meant that when ._container
is a list the response can be pickled. When it is an iterator/generator, it cannot. You are right about raising the warning when an HttpResponse
is implicitly streamed, rather than when a user consumes the iterator content (accidentally or not). That makes more sense.
comment:35 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I'm not sure that this is an actual bug. One of the benefits in passing an iterator is so that you don't get the whole response saved in memory, isn't it?