#21231 closed New feature (fixed)
Limiting the number of variables and files that a POST request can contain
Reported by: | Owned by: | ||
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | andre.cruz@…, Florian Apolloner | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The Problem
When deploying a django application one should be able to asses what
will be *the absolute maximum* of memory consumed by each serving
process/thread. Failing to make this assessment correctly may leave the
application open to denial-of-service attacks. For example, when
deploying with apache, an excessive number of threads should be
available in order protect yourself from slowloris-type attacks. Also,
in order to limit the maximum memory and CPU usage under attack
"LimitRequestBody" should be set to a reasonably small value (1MB for
example). So, here are the naive math: "100 parallel requests" X "1MB
per request" = "100MB maximum memory consumed"
The problem is that the naive math is wrong. A single 1MB POST request
may require *much* more that 1MB to process. 1MB of
"application/x-www-form-urlencoded" type may contain 200,000 short
name-value pairs, which when fetched into the interpreter consume tens of
megabytes (depending on the platform, probably 10-50MB). So the
correct math would be: "100 parallel requests" X "50MB per request" =
"5000MB maximum memory consumed". Probably too much for many systems.
As far as I know, django does not offer an easy way to protect against
such edge cases (lots of simultaneous malicious POST request that consume
unexpected amounts of memory).
The proposal
Introduce two new settings, limiting the maximum number of entities in
a POST request:
FORM_DATA_URLENCODED_MAX_VARS: the maximum number of name-value
pairs in the case of "application/x-www-form-urlencoded". (no limit by
default)
FORM_DATA_MULTIPART_MAX_PARTS: the maximum number of parts in the
case of "multipart/form-data". (no limit by default)
In case those limits are exceeded, the request must be discarded
without being fetched/parsed into memory. (In our example, it would
consume only 1MB instead of 50.)
Change History (52)
comment:1 by , 11 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 11 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Obviously, I have not been sufficiently clear in explaining the problem.
I agree that as you say "It isn't unusual to allocate 100MB or 200MB of RAM per process in a Django application server", but it is not usual to allocate 50MB of RAM *for processing a single request*. For example, you may have 1 process (200MB) and 100 threads (say additional 100MB combined), and that's OK, but when you get a blast of requests, continuing malicious POST-data, each and every thread will have to create a huge django request-object (100 X 50 = 5000MB combined).
Memory is a complex problem, and it might very well be the case that I did not get it right. The last reply, though, doesn't appear to be predicated on a very thorough understanding of the proposal.
comment:4 by , 11 years ago
I'm also not convinced this is something that Django needs to provide. Your best bet to get this accepted is to start a mailing list thread as suggested above.
comment:5 by , 11 years ago
Well, I do not see how a good a and configurable validation of the HTTP input is not something that Django needs to provide. The proposal also will help with the PYTHONHASHSEED-problem (maliciously crafted POST that results in O(n*n) dict-behaviour).
In any case I will not spend more energy to persuade anyone to the merits of my proposal (Honestly I do not care that much about Django). I will rather patch django when I deploy a critical web-app.
comment:6 by , 11 years ago
If you have a patch which you believe works and solves this problem, then we will gladly consider it for inclusion upstream. That of course assumes that we can demonstrate that the changes work correctly and don't introduce any other significant issues. If these do provide a suitable security benefit against DDOS style attacks as then it may be worth turning this on by default.
comment:7 by , 11 years ago
I do not have a patch, and probably will not have soon, if ever. I discovered this problem while I was analysing the possible weaknesses of a web-app of mine that *may in the future* be installed as a business critical web-app (http://sourceforge.net/projects/cmb/).
So I decided to report my findings.
comment:8 by , 11 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Ok, I'm going to close this for now, largely as I agree with Aymeric that this will likely be exceedingly difficult (if possible) to implement in python, and I personally don't see it as a major weakness as it can be managed fairly easily by nginx or a similar web server.
If you do at some point write a patch which manages this directly in Django, please feel free to reopen this issue with a concrete patch and we'll see what happens.
comment:9 by , 11 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
I think I have a nicely working patch for this issue. Here it is:
https://github.com/epandurski/django/commit/9ae456e4280843e87a24e508131dfd9919b20db7
What do you think?
comment:10 by , 11 years ago
I have a feeling this doesn't belong into Django. My hunch is that this will be solved only partially, and there are ways to circumvent the protections Django has. In addition, it is likely that Apache or nginx have already solved this problem for us.
Not closing this as I don't understand HTTP protocol that well...
comment:11 by , 11 years ago
It is a hot potato, isn't it?
I think this problem can be solved using an apache module, but I as far as I know, none of the standard modules does the job. (I assume mod_security could be helpful, but I do not know much about mod_security).
I think the problem is much easier to solve in Python/Django than as as a web-server module, exactly because it is not trivial and needs to allow flexible settings. I agree that my patch is quite incomplete, and I can think of other things that should be done as well. But those things are not very hard to do in python (as you can see, the my patch is trivial). Another important factor is that web developer as a whole do not understand those problems, and do not even know that they exists. Unfortunately, this does not make problems to disappear. Django could lead the way and promote good security practices.
Think about this simple example scenario: I have to allow users to upload big files (say 20MB) in my Django app, so I set my "LimitRequestBody" to 20MB. Then suddenly all forms in my Django application should be able to "swallow" 20MB of url-endoded data without eating up my CPU and memory! Also, how many uploaded files a 20MB POST request can contain? 200,000 should be a good guess. So, how much memory and CPU will consume the execution of 200,000 upload-handlers?
I think Django core developers should make a design decision about how to solve those problems, and what the web developers should be advised to do about them.
follow-up: 19 comment:12 by , 11 years ago
I did some additional thinking about what settings should be available to the user so that he/she can have sufficient control over the validation of the POST-ed raw data. I believe that exactly 3 parameters are needed:
FORM_DATA_MAX_FILES_COUNT -- the maximum number of file-parts in the case of "multipart/form-data". (no limit by default)
FORM_DATA_MAX_FIELDS_COUNT -- the maximum number of name-value pairs in the case of "application/x-www-form-urlencoded", and the maximum number of field-parts in case of "multipart/form-data". (no limit by default)
FORM_DATA_MAX_FIELDS_TOTAL_LENGTH -- the maximum total length of the posted fields (both for "application/x-www-form-urlencoded" and "multipart/form-data"). This is the sum of the lengths of all field-names and field-values). Reading from the input socket should not continue after this length-limit has been exceeded. (no limit by default)
Implementing those should be relatively easy (not much more complex than my proposed patch).
More suitable names for these settings may be decided.
One more think, that I forgot to mention: I believe Django is the proper place to validate this things, because they are part of the HTTP request body (the payload). HTTP does not restrict the format of the request payload (as far as I know), and generally it depends of the MIME-type stated in the HTTP-header (application/x-www-form-urlencoded, multipart/form-data, etc.) You should not expect Apache to make such a deep-inspection of request's payload.
comment:13 by , 11 years ago
I agree that these settings would be nice, but I agree with Aymeric et al. that Django is perhaps not the best place for it. Since we are talking about a memory sizing issue it would be preferable to stop processing such requests as early as possible. Since Django is usually situated at or close to the end of the request processing chain, something before it handling this issue would be preferable. I totally understand that Apache usually is not expected to the deep-request-inspection but at the least the application server should be able to do that. Going down to the application framework or the application itself for something like that should be only a last resort.
As reference from other environments, Tomcat has a maxParameterCount (1). Gunicorn offers something similar but sadly only for headers (2). Regarding the number of file-parts, this is something mod_security already offers (3).
(1) http://tomcat.apache.org/tomcat-7.0-doc/config/http.html
(2) http://docs.gunicorn.org/en/latest/configure.html#limit-request-fields
(3) https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#wiki-SecUploadFileLimit
comment:14 by , 11 years ago
"I totally understand that Apache usually is not expected to the deep-request-inspection but at the least the application server should be able to do that."
The application server may serve lots of web applications, each one needing a different validation policy. Only the web application knows what it does with the input data, and therefore only the application (or the application framework) is in a position to correctly validate the input.
I agree that it is good idea to do input validation as early as possible, and be as isolated as possible from the rest. I do not see, however, where else this functionality can be inserted in the Django framework. I do not think it can be implemented as Django middleware. Suggestions?
comment:15 by , 11 years ago
I agree that this is something we should be considering. Take the following...
import time from django.http import QueryDict content = ('a=1&' * 2500000).rstrip('&') d = QueryDict(content, encoding='utf-8');
That represents parsing a single valid 10MB URLEncoded POST request. On my machine it consumes ~350MB and 100% CPU for ~30secs.
There's no built-in configurable protection in either Apache or Nginx against that type of request without having also installed optional plugins, and I'd consider max_client_body_size = 10m
to be a perfectly reasonable nginx setting.
I'll defer to the core dev team on this, but request parsing really does seem to me to be the domain of the application to deal with, rather than the server. For example, suppose you wanted the application to deal with JSON requests that can contain base64 encoded file fields, or some other content type that can contain a mix of file uploads and standard data - the server has no way of parsing and inspecting if the requests are malicious or not. As a further example - would we expect the server to be responsible for parsing and validating against malicious XML payloads and other content types that the application might parse?
FWIW, I believe that PHP includes built-in protection against this type of request, I've not managed to find information about other languages/frameworks.
The simplest option would prob. be similar to the FORM_DATA_MAX_FIELDS_TOTAL_LENGTH
already noted, but perhaps tweak the semantics slightly so that it's simply a setting specifying the maximum allowable request size *excluding* any file upload parts REQUEST_MAX_DATA_SIZE
or similar. That'd allow the developer to differentiate between the maximum allowable file upload size per-request, and the maximum allowable in-memory data structures per-request.
comment:16 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:18 by , 11 years ago
Cc: | added |
---|
comment:19 by , 11 years ago
Cc: | added |
---|
Replying to epandurski@…:
FORM_DATA_MAX_FIELDS_TOTAL_LENGTH -- the maximum total length of the posted fields (both for "application/x-www-form-urlencoded" and "multipart/form-data"). This is the sum of the lengths of all field-names and field-values). Reading from the input socket should not continue after this length-limit has been exceeded. (no limit by default)
Isn't that covered by Apache's LimitRequestBody already?
follow-up: 21 comment:20 by , 11 years ago
Isn't that covered by Apache's LimitRequestBody already?
I almost have forgotten what were my intentions with the "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" field :)
I believe they are almost the same, except for 2 things:
1) LimitRequestBody's context is "server config". "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" is specific to the web-application. There might be many different applications running on one server, doing different CPU/memory intensive stuff.
2) FORM_DATA_MAX_FIELDS_TOTAL_LENGTH is only about "application/x-www-form-urlencoded" and "multipart/form-data" MIME types, which I believe are the only MIME types which content django may decide to load into memory (is that correct?). Notice that somebody may want to allow POST-ing other MIME-types of big data, and yet have his/her django web-application protected from DoS attacks.
comment:21 by , 11 years ago
Replying to epandurski@…:
1) LimitRequestBody's context is "server config". "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" is specific to the web-application. There might be many different applications running on one server, doing different CPU/memory intensive stuff.
LimitRequestBody
can be set per file/location/vhost etc, so that wouldn't count
2) FORM_DATA_MAX_FIELDS_TOTAL_LENGTH is only about "application/x-www-form-urlencoded" and "multipart/form-data" MIME types, which I believe are the only MIME types which content django may decide to load into memory (is that correct?). Notice that somebody may want to allow POST-ing other MIME-types of big data, and yet have his/her django web-application protected from DoS attacks.
I might be wrong here: But if my app allows file uploads and I set this value to be equivalent to 1GB (I have big files ;)) I could still easily DOS the server; in my opinion it would make more sense if that would limit post data excluding file data.
comment:22 by , 11 years ago
LimitRequestBody can be set per file/location/vhost etc, so that wouldn't count
http://httpd.apache.org/docs/2.2/mod/core.html
I might be wrong here: But if my app allows file uploads and I set this value
to be equivalent to 1GB (I have big files ;)) I could still easily DOS the server;
in my opinion it would make more sense if that would limit post data excluding
file data.
Ah, now I remember what was the idea. You are right. The idea is that
FORM_DATA_MAX_FIELDS_TOTAL_LENGTH includes only the length of the form-fields
in the "multipart/form-data" POST. (A "multipart/form-data" POST may contain file
uploads and form fields).
comment:23 by , 11 years ago
I just like to add that the fact that Django does not deal well with large form fields or multipart parts is an unexpected behaviour. Looking at the file upload features one would think that Django could cope well in use cases where we want to receive large file uploads, and so have our web server configured to allow large request bodies. In the case of "application/x-www-form-urlencoded" requests, large bodies should not be expected and normally we can filter them at the web server. But "multipart/form-data" requests are processed by the application, since we want to allow large FILE parts but not large FIELD parts, and so it is there that safety measures need to be.
I know of several other sites that use Django, process large file uploads, which are susceptible to this issue. I've contacted the developers and they were indeed just as surprised as I was by this "feature" which I consider a security problem.
comment:24 by , 11 years ago
@epandurski: Can you change your patch to include the three settings you proposed and open a PR?
@edevil: I am having a hard time to figure out what you are trying to achieve with your comment. Since this was publicly reported we can't hide it, no matter if we consider it a security problem or not. The ticket is accepted, so the next step would be to provide a working patch and put it up for discussion.
comment:25 by , 11 years ago
@apollo13: I'm trying to get this to be considered a security problem so that it is backported to Django 1.5.
comment:26 by , 11 years ago
@evdevil: Even if we consider it security sensitive, if 1.7 comes out before inclusion of this patch, it won't get fixed in 1.5. Since 1.5 is EOL soon I'd strongly encourage you to upgrade or stick with LTS releases.
comment:27 by , 11 years ago
@apollo13: I may try to do this, but I am quite busy lately. Most probably, I will not have enough spare time to work on this soon. I expect the work to take 0.5-2 days, but I remember that the HTTP-parsing code is quite hairy, so it might get longer to do it right.
comment:28 by , 11 years ago
First pass pull request: https://github.com/django/django/pull/2403
Having a single setting limiting the maximum memory for data in a request would address this sufficiently from my POV.
comment:29 by , 11 years ago
I have a few questions regarding the pull request, which I left in the Github page. Should I post them here?
comment:30 by , 11 years ago
@edevil - No, may as well keep those comments just against the PR. (They're not really *design* discussion stuff, just implementation details, so...)
comment:32 by , 11 years ago
This is a serious problem, and there seems to be a patch available. Can someone from the dev team look at it please?
comment:33 by , 11 years ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
I haven't looked very closely, but there are still some unaddressed comments on the PR, and tests are missing.
comment:34 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:35 by , 10 years ago
Has patch: | set |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
Updated PR with tests.
comment:37 by , 10 years ago
Has patch: | unset |
---|---|
Patch needs improvement: | unset |
I submitted a new PR that addresses some of the comments.
comment:38 by , 10 years ago
Has patch: | set |
---|
comment:39 by , 10 years ago
I've answered all the comments on the PR. Is there any additional changes you would like me to make?
Can this be merged?
comment:40 by , 10 years ago
We just had the feature freeze for 1.8 and are working on stabilizing master so we can issue an alpha release, so this likely won't be merged until 1.9 at the earliest.
comment:42 by , 10 years ago
I think this falls under "security hardening features." If you believe this is a security fix that should be backported to all support versions of Django, please use security@… rather than this ticket tracker, as described in our security policies. Thanks.
comment:43 by , 10 years ago
I do believe this is a security problem and I have used security@ to bring attention to this issue. In February and March of 2014 I exchanged emails with Russell Keith-Magee and Florian Apolloner who both asked me not to discuss this publicly. Russell even said that the core team was considering this as a security issue and that they were discussing how to solve it (I'm not on the security@ list).
@tomchristie above already hints that there is a problem with the current implementation so I'm not revealing anything new.
comment:49 by , 10 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:51 by , 9 years ago
Keywords: | 1.9 added |
---|
comment:52 by , 9 years ago
Keywords: | 1.9 removed |
---|
Deferring from 1.9 as the patch is still under active review.
comment:53 by , 9 years ago
Patch needs improvement: | set |
---|
comment:54 by , 9 years ago
Patch needs improvement: | unset |
---|
comment:55 by , 9 years ago
Updated PR. I've put out another call for reviews on django-developers.
comment:56 by , 9 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I'm sorry, but I fail to see how this could be achieved in Python. Memory use will grow until the GC kicks in, and that isn't under your control.
POST data is a rather small part of the memory footprint of a standard Django application. It isn't unusal to allocate 100MB or 200MB of RAM per process in a Django application server.
Your proposal doesn't appear to be predicated on a very good understanding of memory management in Python and memory use in Django applications; I'm not convinced it has enough value to warrant introducing two settings.
Besides, Django doesn't provide a production-grade application server, and doesn't plan to (see the docs for runserver). The common way to avoid excessive memory usage is to have the application server manage it, generally by killing processes that exceed a given threshold.
If you want to discuss possible ways to restrict memory use in Django, I suggest starting a discussion on the django-developers mailing-list. Thank you.