#32691 closed Cleanup/optimization (fixed)
Performance regression in Exact lookup on BooleanField on MySQL.
Reported by: | Todor Velichkov | Owned by: | Roman Miroshnychenko |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | bool, booleanfield, orm, sql, mysql |
Cc: | Adam Johnson, Simon Charette, Dan Strokirk | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When filtering on boolean field in Django 2.2 the SQL output would look like this:
WHERE `mytable`.`myfield` = 1
in Django 3.2 the SQL changed to:
WHERE `mytable`.`myfield`
The problem with the 2nd approach is that in mysql8 this field is not used in indexes anymore, causing a great performance degradation.
Change History (19)
comment:1 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Type: | Uncategorized → Cleanup/optimization |
comment:2 by , 4 years ago
Summary: | Django 3.2 boolean field SQL output is causing a performance regression in mysql. → Django 3.2 boolean field SQL output is causing a performance regression in MySQL. |
---|
comment:3 by , 4 years ago
Hi Mariusz,
I don't quite agree, both clauses are syntactically correct yes, but they are not the same.
MySQL does not have a boolean type of column, so Django is using tinyint
which stores value between -128..127
or 0..255
When you tell mysql
you are looking for `mytable`.`myfield`
this basically translates to all values between -128..127
except 0
which is a range query equivalent to myfield__in=[-128..-1]+[1..127]
even myfield__exact
lookup is acting the same way.
comment:4 by , 4 years ago
MySQL does not have a boolean type of column, so Django is using tinyint
Django is using BOOL
which is an alias for tinyint(1)
. As far as I'm aware, storing values except 0
and 1
in BOOL
columns has never been supported by Django.
comment:5 by , 4 years ago
Django is not storing values except 0
and 1
in BOOL
, but MySQL tinyint(1)
support it.
So my question is, why is Django asking MySQL to search for values other than 0
and 1
in a BOOL
column. Doesn't it make sense to only search for 0
and 1
when Django is only storing 0
and 1
?
My second question is: Let's say Django is right about asking MySQL to search for values other than 0
and 1
in a BOOL
column, then shouldn't it at least myboofield__exact
act in a different way and ask MySQL to look for a specific value ? Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.
comment:6 by , 4 years ago
Cc: | added |
---|
Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.
I'm not a MySQL expert, can you provide any link to the MySQL docs which will confirm that it's not possible. It's surprising that you cannot create an index with a BOOL
(TINYINT
) column.
comment:7 by , 4 years ago
The issue is not that the index will not be created, the index will be created, but it will not be used against expressions like:
WHERE `mytable`.`myfield`
because such an expression evaluates to True
for a range of values (in the case of signed tinyint(1)
the range is -128..127
except 0
) which turns the query into a range
query which are hard to optimize. MySQL needs a constant value like:
WHERE `mytable`.`myfield` = 1
in order for the index to be used.
An example: If we have a row with a value of myfield = 2
(MySQL allows this because the column is tinyint(1)
) then:
- The first expression will include this row in the results because
2
evaluates toTrue
- The second expression will not include this row because
2 != 1
Django only stores 0
and 1
why its expression is including rows which Django itself do not store.
comment:8 by , 4 years ago
If we want to move forward with this change the only possible resolution I see is to add a has_native_boolean_type
database feature and set it to false for MySQL. Another more intrusive alternative would be to have BooleanField.get_db_prep_value
return an int
instead of a bool
on MySQL so the lookup machinery is made aware of this quirk instead of having it magically happen at the database connector layer (mysqlclient
, pymysql
).
In the mean time this can be worked around by doing .filter(boolean_field=1)
. Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.
In all cases I'd argue that creating an index on a boolean field is a bad practice particularly when it's backed by a b-tree implementation and your backend doesn't support partial indices to allow excluding parts of the covered table as your underlying tree will get heavily unbalanced on way or the other.
comment:9 by , 4 years ago
In the mean time this can be worked around by doing
.filter(boolean_field=1)
. Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.
I tried this before opening the ticket, but it didn't work, however it looks like .filter(boolean_field=models.Value(1))
is working!!
For now this should be sufficient for us to complete our migration to Django 3.2
! Thanks for the time spent on this!
comment:10 by , 3 years ago
Why not add as_mysql()
method to Exact
lookup that simply calls parent's as_sql()
method instead of doing this boolean field magic as in Exact.as_sql()
? We had the same issue as in the original post and this solution worked for us.
comment:11 by , 3 years ago
Cc: | added |
---|
Adding a dedicated Exact.as_mysql
method would also work, feels like a feature flag might be more appropriate though.
comment:12 by , 3 years ago
Cc: | added |
---|
We've also seen this change cause a production outage when we've tried upgrading to Django 3, so either a notice in the docs or some other way to to avoid it changing back again in the future would be great.
Thanks Todor for your tip about models.Value(1)
!
comment:13 by , 3 years ago
Has patch: | set |
---|
comment:15 by , 3 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
comment:16 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Summary: | Django 3.2 boolean field SQL output is causing a performance regression in MySQL. → Performance regression in Exact lookup on BooleanField on MySQL. |
Triage Stage: | Accepted → Ready for checkin |
follow-up: 19 comment:18 by , 3 years ago
Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.
comment:19 by , 3 years ago
Replying to Anže Pečar:
Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.
Unfortunately not, this is a regression in Django 3.1 that is no longer supported. Per our backporting policy this means it doesn't qualify for a backport to 3.2.x or 4.0.x anymore.
See Django’s release process for more details.
Thanks for this ticket, however both clauses are correct in MySQL. The way how the ORM builds filters is an implementation detail not something Django should guarantee (see #25367).