Opened 17 months ago
Last modified 4 months ago
#34654 assigned Bug
Post-normalization performed on the Username field leading to the bypass of the whitespace stripping
Reported by: | Sim4n6 | Owned by: | George Kussumoto |
---|---|---|---|
Component: | contrib.auth | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Summary
In the Django codebase [L74](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/contrib/auth/forms.py#L74), when "strip" is enabled [L265](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L265), the username should not contain any trailing or leading whitespace. However, after stripping those whitespace [L278..L279](https://github.com/django/django/blob/a52bdea5a27ba44b13eda93642231c65c581e083/django/forms/fields.py#L278..L279), normalization is performed. That means stripping whitespaces could be bypassed in the username using similar Unicode characters as "\u2800"
Take the following case:
result = unicodedata.normalize("NFKC", "\u2800\u2800sim4n6\u2800\u2800".strip())
In case, the stripped string "sim4n6" contains a leading and trailing whitespace, that would be deleted and work perfectly.
In case, the stripped string "\u2800\u2800sim4n6\u2800\u2800" contains similar Unicode characters to the whitespace which could be copied for instance from https://emptycharacter.com.
Impact
There is a security concern regarding including spaces in the username, plus that, the username is attacker-controlled input.
This means the whitespace stripping on the username could be bypassed and a remote user could use unintended white space.
@Sim4n6
Change History (13)
comment:1 by , 17 months ago
Easy pickings: | unset |
---|---|
Triage Stage: | Unreviewed → Accepted |
Version: | 4.1 → dev |
comment:2 by , 17 months ago
There wasn't a concrete fix discussed, though some voices said:
- normalize_username() already applies NFKC normalization and Python updates the Unicode Character Database in each release.
- It'd make sense to update the normalization to include these newly identified empty characters.
- But, likely Django shouldn't duplicate the UCD and it should rely on Python's.
Personally, I was wondering if we could signal, visually, the being/end of the username plus, perhaps, showing the username length? This could be a small but straightforward improvement (of course this only makes sense in the templates/forms that Django ships), and perhaps it could serve as a guide to users to apply to their own forms.
comment:4 by , 17 months ago
One way to fix this is normalizing first and then perform the stripping.
comment:6 by , 17 months ago
Indeed. Next is my understanding.
In the strip() function doc https://docs.python.org/3/library/stdtypes.html?highlight=removing%20whitespace#str.strip
If the intended character is omitted, the function removes the whitespace.
The problem is that the stripping of the leading and the trailing whitespace is performed before the Unicode normalization.
This case fits the attack "using Unicode encoding to bypass Validation logic" : https://capec.mitre.org/data/definitions/71.html
I have done some initial fuzzing (nothing fancy) the Unicode character BRAILLE PATTERN BLANK fits partially the criteria.
Visually looks like a space. But this one remains the same when normalized.
The idea is there may be a Unicode character that when stripped using the strip() function, it does not get removed. But after normalization with the NFKC form, it may come back as whitespace.
If a user is able to set a username with whitespace, it kind of looks a bit "fishy". Two usernames "sim4n6" and "sim4n6 " (with a trailing space).
And also some fuzzing leads to this code points too:
Code Point: 0xa8 Fuzzed: [ ̈sim4n6 ̈].
Code Point: 0xaf Fuzzed: [ ̄sim4n6 ̄].
Code Point: 0xb4 Fuzzed: [ ́sim4n6 ́].
Code Point: 0xb8 Fuzzed: [ ̧sim4n6 ̧].
Code Point: 0x2d8 Fuzzed: [ ̆sim4n6 ̆].
Code Point: 0x2d9 Fuzzed: [ ̇sim4n6 ̇].
Code Point: 0x2da Fuzzed: [ ̊sim4n6 ̊].
Code Point: 0x2db Fuzzed: [ ̨sim4n6 ̨].
Code Point: 0x2dc Fuzzed: [ ̃sim4n6 ̃].
Code Point: 0x2dd Fuzzed: [ ̋sim4n6 ̋].
Code Point: 0x37a Fuzzed: [ ͅsim4n6 ͅ].
Code Point: 0x384 Fuzzed: [ ́sim4n6 ́].
Code Point: 0x385 Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1fbd Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fbf Fuzzed: [ ̓sim4n6 ̓].
Code Point: 0x1fc0 Fuzzed: [ ͂sim4n6 ͂].
Code Point: 0x1fc1 Fuzzed: [ ̈͂sim4n6 ̈͂].
Code Point: 0x1fcd Fuzzed: [ ̓̀sim4n6 ̓̀].
Code Point: 0x1fce Fuzzed: [ ̓́sim4n6 ̓́].
Code Point: 0x1fcf Fuzzed: [ ̓͂sim4n6 ̓͂].
Code Point: 0x1fdd Fuzzed: [ ̔̀sim4n6 ̔̀].
Code Point: 0x1fde Fuzzed: [ ̔́sim4n6 ̔́].
Code Point: 0x1fdf Fuzzed: [ ̔͂sim4n6 ̔͂].
Code Point: 0x1fed Fuzzed: [ ̈̀sim4n6 ̈̀].
Code Point: 0x1fee Fuzzed: [ ̈́sim4n6 ̈́].
Code Point: 0x1ffd Fuzzed: [ ́sim4n6 ́].
Code Point: 0x1ffe Fuzzed: [ ̔sim4n6 ̔].
Code Point: 0x2017 Fuzzed: [ ̳sim4n6 ̳].
Code Point: 0x203e Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0x309b Fuzzed: [ ゙sim4n6 ゙].
Code Point: 0x309c Fuzzed: [ ゚sim4n6 ゚].
Code Point: 0xfc5e Fuzzed: [ ٌّsim4n6 ٌّ].
Code Point: 0xfc5f Fuzzed: [ ٍّsim4n6 ٍّ].
Code Point: 0xfc60 Fuzzed: [ َّsim4n6 َّ].
Code Point: 0xfc61 Fuzzed: [ ُّsim4n6 ُّ].
Code Point: 0xfc62 Fuzzed: [ ِّsim4n6 ِّ].
Code Point: 0xfc63 Fuzzed: [ ّٰsim4n6 ّٰ].
Code Point: 0xfe49 Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4a Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4b Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe4c Fuzzed: [ ̅sim4n6 ̅].
Code Point: 0xfe70 Fuzzed: [ ًsim4n6 ً].
Code Point: 0xfe72 Fuzzed: [ ٌsim4n6 ٌ].
Code Point: 0xfe74 Fuzzed: [ ٍsim4n6 ٍ].
Code Point: 0xfe76 Fuzzed: [ َsim4n6 َ].
Code Point: 0xfe78 Fuzzed: [ ُsim4n6 ُ].
Code Point: 0xfe7a Fuzzed: [ ِsim4n6 ِ].
Code Point: 0xfe7c Fuzzed: [ ّsim4n6 ّ].
Code Point: 0xfe7e Fuzzed: [ ْsim4n6 ْ].
Code Point: 0xffe3 Fuzzed: [ ̄sim4n6 ̄].
Those code points can be used as trailing and leading in usernames and result in whitespace, after Unicode normalization the way Django does it.
char = chr(code_point)
# Username input with potential trialing and leading whitespace
username = char + "sim4n6" + char
# this is the way it is done by Django
s = unicodedata.normalize("NFKC", username.strip())
if contains_whitespace(s):
# Print the Unicode character and its code point for demonstration purposes
print(f"Code Point: {hex(code_point)}\tResult: [{s}].")
With those code points, it is not only visually a space but a regular one.
Voila
PS: I wrote about this once: https://sim4n6.beehiiv.com/p/unicode-characters-bypass-security-checks
comment:8 by , 17 months ago
I suggest normalizing first and then performing the stripping as previously mentioned, that would reduce the odds and then handle the unusual cases.
PS: I still insist on the fact there is a security risk here.
comment:9 by , 17 months ago
I think the correct solution to this security error was to add a validation step that rejects user names with certain characters, I hope this is the correct solution and if something else needs to be changed, please let me know, thank you very much.
PR: https://github.com/django/django/pull/17014
comment:10 by , 17 months ago
The contains_unwanted_characters()
would fail more often than you expect.
import unicodedata def contains_unwanted_characters(username): """Check if the username contains characters that normalize to whitespace.""" normalized_username = unicodedata.normalize("NFKC", username) return username != normalized_username.strip() usernames = ["sim4n6", " ̈sim4n6 ̈", "℻", "Sim4n6","Sim4n6"] for username in usernames: print(contains_unwanted_characters(username))
Between I still believe there is a security concern here. Anyway, I have a security background and this issue is classified as non-related. So, that would be my last comment regarding this issue.
Regards,
comment:11 by , 4 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:12 by , 4 months ago
In case you change your mind regarding this issue , please keep me posted on its outcome via https://hackerone.com/sim4n6?type=user
Thanks
comment:13 by , 4 months ago
Here are some ideas to discuss:
- To the main point of bypassing white space removal: while this is true for the
UsernameField
, most forms using this field also have another validation in the model level, such asUnicodeUsernameValidator
andEmailValidator
. These validators run on post-normalized values, so symbols likeBRAILLE PATTERN BLANK
or white space (regardless of position) will raise errors.
- The
AuthenticationForm
might not invalidate such inputs, but to successfully exploit this, one should have bypassed the creation form first (I think). I'm not a security expert, but maybe there's an additional condition of using non-unique usernames.
- The case for the password reset is already addressed in https://www.djangoproject.com/weblog/2019/dec/18/security-releases/
- When using custom user models things are different, the application code should include the appropriate validators. Depending on the implementation, the normalization steps too.
- At first, I thought about adding
UnicodeUsernameValidator
to theUsernameField
. It was promising, all tests passed. Then I realized it not only breaks compatibility, but we shouldn't make this assumption on the username since the validator is particular about what it accepts, compromising the flexibility to setUSERNAME_FIELD
in custom models. For example,UnicodeUsernameValidator
is stricter thanEmailValidator
(at least at first glance).
- Another possibility was to add the validator in the form definition instead of the form field (since it's expected some customization when using custom models). Possible candidates were
UserCreationForm
andUserChangeForm
(custom-users-and-built-in-auth-forms doc) since they are more tied to the default user model. But this is already accomplished by having the validator in the model.
- As mentioned before in the comments, re-running
.strip()
after normalization or comparing string length doesn't seem to address the problem reported.
So far, I don't have a conclusion, but I lean lightly on the side that no code changes are required. I wanted to share my thoughts and hear others.
I don't have a security background so I might have overlooked/simplified things a bit. Please keep that in mind.
Accepting following the conversation with the security team. Though, do note that the issue is not considered a security concern but could lead to potential impersonation incidents.