Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#32778 closed Cleanup/optimization (fixed)

Avoided unnecessary recompilation of token regex in _sanitize_token().

Reported by: Abhyudai Owned by: Abhyudai
Component: CSRF Version: 3.2
Severity: Normal Keywords: middleware, csrf
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

I was looking into the source code of the middleware for some reason, and saw that the regular expression is compiled inside the module. I think compiling it a module level could potentially save some time as the function _sanitize_token is called twice inside the function process_view for the CsrfMiddleware class.

This is the intended patch.

diff --git a/django/middleware/csrf.py b/django/middleware/csrf.py
index f323ffb..deaf7d8 100644
--- a/django/middleware/csrf.py
+++ b/django/middleware/csrf.py
@@ -22,6 +22,8 @@ from django.utils.log import log_response
 
 logger = logging.getLogger('django.security.csrf')
 
+ASCII_ALPHANUMERIC_RE = re.compile('[^a-zA-Z0-9]')
+
 REASON_BAD_ORIGIN = "Origin checking failed - %s does not match any trusted origins."
 REASON_NO_REFERER = "Referer checking failed - no Referer."
 REASON_BAD_REFERER = "Referer checking failed - %s does not match any trusted origins."
@@ -107,7 +109,7 @@ def rotate_token(request):
 
 def _sanitize_token(token):
     # Allow only ASCII alphanumerics
-    if re.search('[^a-zA-Z0-9]', token):
+    if ASCII_ALPHANUMERIC_RE.search(token):
         return _get_new_csrf_token()
     elif len(token) == CSRF_TOKEN_LENGTH:
         return token

I'm not sure how exactly to profile this change. I tried using the djangobench package after some tinkering to its source code. Since it was reporting changes even on queries, I wasn't sure to trust it. Any leads on this front would be great.

I would be happy to make the change, if this seems reasonable.

Change History (9)

comment:1 by Abhyudai, 4 years ago

For what's it worth, the results for the default_middleware section when run on my machine when comparing the patch branch to the main branch were:

Control: Django 4.0.dev20210524043148 (in git branch main)                                                                                            
Experiment: Django 4.0.dev20210524043148 (in git branch feat/optimize-csrf) 

Running 'default_middleware' benchmark ...                                
Min: -0.000020 -> -0.000111: 0.1772x faster
Avg: 0.000751 -> 0.000740: 1.0160x faster                                  
Not significant                                                           
Stddev: 0.00529 -> 0.00522: 1.0144x smaller (N = 50) 

Although, I'm not sure if this just tests the changes for the Csrf class since the default middleware includes a lot more things.

comment:2 by Mariusz Felisiak, 4 years ago

Summary: Compile alphanumeric regex for CSRF middleware at module level.Avoided unnecessary recompilation of token regex in _sanitize_token().
Triage Stage: UnreviewedAccepted

Thanks, sounds good. Would you like to prepare patch? Please use _lazy_re_compile() to avoid compilation when importing the module.

comment:3 by Mariusz Felisiak, 4 years ago

Easy pickings: set

comment:4 by Abhyudai, 4 years ago

Owner: changed from nobody to Abhyudai
Status: newassigned

comment:5 by Abhyudai, 4 years ago

Has patch: set

comment:6 by Mariusz Felisiak, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 866dccb6:

Fixed #32778 -- Avoided unnecessary recompilation of token regex in _sanitize_token().

comment:8 by Chris Jerdonek, 4 years ago

I added a follow-up PR here that renames token_re: https://github.com/django/django/pull/14461

comment:9 by GitHub <noreply@…>, 4 years ago

In d270dd58:

Refs #32778 -- Improved the name of the regex object detecting invalid CSRF token characters.

This also improves the comments near where the variable is used.

Note: See TracTickets for help on using tickets.
Back to Top