#6256 closed (fixed)
Extraneous newlines in large amounts of POST data
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I ran into a problem doing some stress testing of newforms-admin. I'd bring up a change page for a model that has approx 700 other models edited inline, make no changes to the data (which was all valid), attempt to "save and continue editing", and get seemingly random validation failures for some of the inline-edited fields. I'd get messages about 1-character fields having data that was too long (3 chars), even though all I could see in the field was 1 character, or a message that my foreign key selection wasn't valid, even though it was, etc. Occasionally the save would work, but 4 out of 5 times it would fail with one or more validation errors. Eventually I tracked the problem down to parse_file_upload in django/http/__init__.py (http://code.djangoproject.com/browser/django/trunk/django/http/__init__.py#L100). After adding some code to dump the contents of raw_message to a file, plus the sequence of items appended to POST on line 130, I found that while the post data in raw_message looked fine, sometimes the values returned by submessage.get_payload() had extraneous newlines at the beginning of the value.
Since Django is using Python's email routines here, I searched for relevant Python bugs and found this:
http://bugs.python.org/issue1555570
(Someone else using Django already ran into this! But I can't find anything open in Django's trac that matches....)
The Python bug is still open. The last comment there seems to imply that while the provided patch is probably fine, the issue is really a doc problem and that the doc should make it clear that the Python code is expecting a file that has been opened in universal newline mode, where CRLF will be translated to just LF. For the routine that Django is using to construct the email message, I take that to mean that the string passed to email.message_from_string should have just LFs, not CRLFs as it does now. I tried replacing:
msg = email.message_from_string(raw_message)
with:
msg = email.message_from_string(raw_message.replace('\r\n', '\n')
and it does resolve the problem for my case: six successive saves of this model worked fine, whereas without the fix maybe only one would have worked.
Even though I ran into the problem using newforms-admin, the code here is identical on trunk so that is where I think it should be fixed.
Attachments (2)
Change History (14)
comment:1 by , 17 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 17 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Remind me to flog Jdunck with some soggy noodles next time I see him. We have to fix this in Django so that it works with existing Python code, not just in future versions of Python. He should have opened a Django bug.
comment:3 by , 17 years ago
Has patch: | set |
---|
I've attached the patch described in the initial report. It has the benefit of working, but I rather dislike it from a performance point of view, since it introduces a copy of the whole post data string. But I don't see any way to avoid that, since I don't see any way of getting the file object constructed from the request socket to be opened in "universal newline mode" as described here: http://docs.python.org/lib/built-in-funcs.html#built-in-funcs ...I don't even know if that mode is available for file objects backed by sockets, and if it is I don't see that the python SocketServer (which I think is where the file object is created) provides any way to set the mode, nor do I know if using that mode would have adverse effects on other parts of the code. So, here's the simple one liner fix that might at least serve as a starting point. Perhaps it would be better to just do the replace on the post_data and avoid introducing the '\r's when creating the raw_message in the first place; someone more familiar with this code could better make that decision.
comment:4 by , 17 years ago
Cc: | added |
---|
comment:5 by , 17 years ago
Patch needs improvement: | set |
---|
Attention! The current patch modifies uploaded binary data. It needs improvement.
I guess the best solution is to not use email.message_from_string(raw_message), but maintain
a patched copy of this method in django.
comment:6 by , 17 years ago
Sigh. Indeed, I completely overlooked the case of, say, uploading a (binary) file. Any CRLF sequences in that data should not be tampered with but will be corrupted by the patch. This would mean, then, that the comment in the Python bug that states that this issue is more of a doc problem and that the caller of the parser routines should ensure the 'file' has been opened in universal newline mode is incorrect, right? The 'file', in the case of a multipart mime message, may contain a mixture of text and binary data, and the only code with enough information to figure out which CRLFs are actually newline sequences is the code that is parsing based on boundary markers. Have I got that right? If so, someone should probably mention that in the Python issue tracker.
comment:7 by , 17 years ago
OK, I've uploaded a new patch. This one builds its own Parser class with a patched parser function based on the fix attached to the Python issue. It only does this if FeedParser can be imported, since that looks to be where the error was introduced. If FeedParser doesn't exist (true for Python2.3, which Django still supports, right?), then Python's existing Parser class is used.
I have no idea if a patch like this that overrides functions in a Python-provided class is suitable for inclusion in Django, but perhaps it can be useful to others hitting this problem who don't want to or cannot easily patch their Python installation.
comment:8 by , 17 years ago
Note that the (latest) patch for #2070 completely removes Django's use of the Python email parsing routines. So, assuming the parser provided by #2070 is bug-free, this problem goes away when #2070 goes in. FWIW from a quick look through the parsing code in #2070 and a few tests of that code in situations where I can easily hit this bug with current trunk code, the #2070 patch does indeed fix this problem.
comment:9 by , 17 years ago
I somehow don't like the long try: ...except block in this patch (6256-2.diff). I think the own
class should be used always.
comment:10 by , 17 years ago
Perhaps there is a better way to structure the fix, but using the ParserClass provided in the patch is only possible for Python 2.4 and up. FeedParser doesn't exist in 2.3, so the ParserClass in the patch can't be used when Django is running on Python 2.3. That's what the try/except is allowing for; when running on 2.3 the older Python Parser class is used, which doesn't have a problem (unverified conjecture -- I don't have Python 2.3 to test on) since the identified Python bug is in the wrapper around the Python 2.4+ FeedParser class.
Not sure it is worth tweaking the patch, since I think the long-term fix is the #2070 code, which completely replaces the use of Python's routines for email parsing.
comment:11 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:12 by , 16 years ago
Cc: | removed |
---|
A nice bit of detective work!