Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#20187 closed Bug (fixed)

Django 1.5 using a cached HttpResponse with WSGI has an empty body

Reported by: smbrooks1@… Owned by: Aymeric Augustin
Component: HTTP handling Version: 1.5
Severity: Normal Keywords: HttpResponse
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With the change to HttpResponse made in Django 1.5, I'm finding that in my code, which caches a generated response, results in an empty body when that page is requested a second time. The first time the page is requested, it is not in the cache, and the page is generated normally and added to the cache. A subsequent request for the same page finds the response in the cache and that response is returned, but with a content length of zero.

The reason is that the HttpResponse in Django 1.5 does not reset the content iterator when the content is requested to be iterated over again (the next time the response content is required).

I note the comments made about the way an iterator should behave when requested to iterate again:
https://code.djangoproject.com/ticket/13222
and the code which was added to explicitly prevent a reiteration from resetting the iterator. However, that causes a problem when using cached responses.

The HttpResponse in my case was not created by passing an iterator to HttpResponse. It is just using a string.

The problem is that the __iter__ method of HttpResponse contains the line:

    if not hasattr(self, '_iterator'):
      self._iterator = iter(self._container)

This prevents the iterator being reset the next time it is required to iterate over the content.
_container still has the original content, but __iter__ does not reset the iterator as _iterator exists as an attribute since the first time that response was returned. The cached response is returning a used iterator, which returns no content.

I suspect this is a bug.
What about a work-around in the meantime?
When I retrieve the response from the cache, I could do:

response._iterator = iter(response._container)

or:

del response._iterator

This works, but makes my code dependent on the internals of the HttpResponse class, which isn't great.

My suggestion is to change the __iter__ method for HttpResponse such that it creates an instance of a separate iterator class. The __iter__ method should not return self. It would be OK for the __iter__ method of the separate iterator class to return self. This should avoid the problem reported in https://code.djangoproject.com/ticket/13222.

I accept that StreamingHttpResponse instances can only be iterated once, and are therefore not suitable for caching.

Kind regards,
Steve

Change History (11)

comment:2 by andrea_crotti, 12 years ago

Are you able maybe to create a test which show this bug?

comment:3 by andrea_crotti, 12 years ago

Something like this should work in your opinion then?

def test_http_response_rewinds_if_explicit(self):

ht = HttpResponse(content='Something')
self.assertEqual(list(ht), Something)
self.assertEqual(list(ht), Something)

I think a possible solution (if it makes sense) would be to check if the content passed in is an iterator or not, and in case it's not avoid consuming it, what do you think?

comment:4 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin
Status: newassigned

comment:5 by Aymeric Augustin, 12 years ago

Triage Stage: UnreviewedAccepted

comment:6 by anonymous, 11 years ago

Hi Andrea,

your test case should probably have:


   self.assertEqual(list(ht), ['Something'])
   self.assertEqual(list(ht), ['Something'])

The present HttpResponse implementation will result in the second list(ht) being an empty list [].
I like your suggestion of checking if the content passed is an iterator, and the member _base_content_is_iter can probably be used in this test.
In the class HttpResponse __iter__ method:
we could replace:

        if not hasattr(self, '_iterator'):
            self._iterator = iter(self._container)

with:

        if not hasattr(self, '_iterator') or not self._base_content_is_iter:
            self._iterator = iter(self._container)

The documentation on caching should give some mention to this, and warn that HttpResponse instances using iterators, or StreamingHttpResponse instances should not be cached as subsequent use of these instances will return empty bodies.

comment:7 by smbrooks1@…, 11 years ago

Hi Andrea,

that last comment was by me, the creator of the ticket.

Thanks,
Steve

comment:8 by Aymeric Augustin, 11 years ago

I've been looking at this again today. As explained in #13222 it's a bad practice to return self in an iterable's __iter__ method because multiple calls to iter(response) return the same iterator. I fixed most symptoms by preventing Django from rewinding that iterator but it's still a sub-optimal behavior when iteration is repeatable.

Until the deprecation path for old-style streaming responses completes, iteration isn't repeatable in general. Besides, this code is very fragile and sensitive overall. So I'm reluctant to make changes right now.

Once the deprecation path has completed, we'll be able to simplify drastically the implementation of HttpResponse. The content will always be stored as a string internally, say, in self._content. As a consequence __iter__ could be as simple as:

    def __iter__(self):
        return iter([self._content])

This returns a new iterator every time it's called, as suggested in #13222.

This will break some of the tests added when fixing #13222. Those tests should probably be rewritten to make more precise assertion about the problems. Essentially, they should test that calling iter(response) again doesn't rewind a previously obtained iterator, not that it returns an empty iterator.

This goes against my previous decision, but the context will have changed :)

Some documentation may be useful; I'm not sure. We'll be in a better position to decide that once we have a patch.

comment:9 by Aymeric Augustin, 11 years ago

stable/1.6.x was forked and the deprecated code was removed since my last comment. I will now take care of this.

comment:10 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In a480f8320a5b36501bfd0e8cd70b8dc04adf2d08:

Simplified iteration in HTTP response objects.

Fixed #20187 -- Allowed repeated iteration of HttpResponse.

All this became possible when support for old-style streaming responses was
finally removed.

comment:11 by Aymeric Augustin, 11 years ago

(Just to be clear, it isn't possible to backport this to 1.6.)

Note: See TracTickets for help on using tickets.
Back to Top