Opened 6 years ago
Closed 10 months ago
#30224 closed Bug (wontfix)
Mysql Datetime value is string instead of datetime object
Reported by: | Martin Kuhn | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | mysql |
Cc: | Simon Charette, Sarah Boyce | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
The Problem
Datetimes are sometimes returned as objects or strings. It depends on the value. E.g. '2019-03-01 00:00:00.000000' is returned as a string, while2018-12-12 22:02:53.134000
is provided as a datetime object in the sql/compiler.py.
My Table
CREATE TABLE
myTable
(
id
char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
created_on
datetime(6) NOT NULL,
modified_on
datetime(6) NOT NULL,
first_spot_broadcast
datetime(6) DEFAULT NULL,
expiry_date
datetime(6) DEFAULT NULL
)
DB driver: django.db.backends.mysql
Django version: 2.0.4 tried 2.2 as well
DRF 3.8.2
mysqlclient = "==1.4.2"
This issue only occurred for me when I execute this query. I tried upgrading to 2.2 and the issue persisted.
.annotate( minExpiryDate=Min('buyout__expiry_date', filter=Q(buyout__is_archived=False, buyout__is_deleted=False))) \ .filter(Q(minExpiryDate__gte=(now + expiryInterval)) & Q(minExpiryDate__lte=(now + expiryInterval + timedelta(days=1))))
My stack trace is:
Traceback (most recent call last): File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1741, in <module> main() File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1735, in main globals = debugger.run(setup['file'], None, None, is_module) File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/usr/share/pycharm/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/server/manage.py", line 15, in <module> execute_from_command_line(sys.argv) File "site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "site-packages/django/core/management/base.py", line 316, in run_from_argv self.execute(*args, **cmd_options) File "site-packages/django/core/management/base.py", line 353, in execute output = self.handle(*args, **options) File "/server/buyout/management/commands/send_expiry_reminder_email.py", line 8, in handle send_expiry_reminder_summary() File "site-packages/celery/local.py", line 191, in __call__ return self._get_current_object()(*a, **kw) File "site-packages/celery/app/task.py", line 375, in __call__ return self.run(*args, **kwargs) File "/server/buyout/tasks/expiry_reminder.py", line 46, in send_expiry_reminder_summary if len(projects_expiring) > 0: File "site-packages/django/db/models/query.py", line 250, in __len__ self._fetch_all() File "site-packages/django/db/models/query.py", line 1186, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "site-packages/django/db/models/query.py", line 63, in __iter__ for row in compiler.results_iter(results): File "site-packages/django/db/models/sql/compiler.py", line 1009, in apply_converters value = converter(value, expression, connection) File "site-packages/django/db/backends/mysql/operations.py", line 252, in convert_datetimefield_value value = timezone.make_aware(value, self.connection.timezone) File "site-packages/django/utils/timezone.py", line 264, in make_aware return timezone.localize(value, is_dst=is_dst) File "site-packages/pytz/__init__.py", line 222, in localize if dt.tzinfo is not None: AttributeError: 'str' object has no attribute 'tzinfo' Exception ignored in: <generator object cursor_iter at 0x7f7ab3f88390> Traceback (most recent call last): File "site-packages/django/db/models/sql/compiler.py", line 1469, in cursor_iter cursor.close() File "site-packages/MySQLdb/cursors.py", line 86, in close while self.nextset(): File "site-packages/MySQLdb/cursors.py", line 136, in nextset nr = db.next_result() MySQLdb._exceptions.OperationalError: (2006, '')
Change History (27)
comment:1 by , 6 years ago
Description: | modified (diff) |
---|
comment:2 by , 6 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:3 by , 6 years ago
Type: | Uncategorized → Bug |
---|
comment:4 by , 6 years ago
comment:5 by , 6 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Hello Martin,
There's effectively something broken here but nothing seems to prove Django is at fault; as you've noticed it looks like it's mysqlclient
that returns str
for some datetimes.
What I suggest you do to confirm the origin of the bug is to use mysqlclient
directly to connect to your database and retrieve the problematic rows. If the returned data contains string for datetimes columns then it's likely a mysqlclient
bug and there's not much Django can do.
I'll close the ticket for now but please reopen if you can provide more details about how Django is at fault.
comment:6 by , 6 years ago
Ok, I managed to narrow down the problem. My Query is using annotations to use the aggregation function MIN.
Project.objects \ .prefetch_related('buyout_set') \ .prefetch_related('teammember_set') \ .annotate(min_expiry_date=Min('buyout__expiry_date', filter=Q(buyout__is_archived=False, buyout__is_deleted=False)) ) \ .filter(Q(min_expiry_date__gte=(now + expiryInterval)) & Q(min_expiry_date__lte=(now + expiryInterval + timedelta(days=1))))
I tested the generated query below using mysqlclient
and found that it does return the annotated value as a string. The problem is a custom annotation "MIN(Date)". Probably mysqlclient has no type info for this custom value, since it could be any/multiple types. (E.g. using an IF statement.)
However this auto generated query could be rewritten to use CAST( "annotation" as DateTime).
The resulting query is
SELECT `advertisement_project`.`id`, `advertisement_project`.`created_on`, `advertisement_project`.`modified_on`, `advertisement_project`.`is_deleted`, `advertisement_project`.`created_by_id`, `advertisement_project`.`modified_by_id`, `advertisement_project`.`name`, `advertisement_project`.`code`, `advertisement_project`.`spot_length`, `advertisement_project`.`eta`, `advertisement_project`.`spot_id`, `advertisement_project`.`spot_id_suffix`, `advertisement_project`.`folder_id`, `advertisement_project`.`system_id`, `advertisement_project`.`status`, `advertisement_project`.`type`, `advertisement_project`.`ad_type`, `advertisement_project`.`parent_project_id`, `advertisement_project`.`original_project_id`, `advertisement_project`.`territory_id`, `advertisement_project`.`language_id`, `advertisement_project`.`campaign_id`, `advertisement_project`.`preview_id`, `advertisement_project`.`links_id`, `advertisement_project`.`producer_names`, MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) AS `minExpiryDate` FROM `advertisement_project` LEFT OUTER JOIN `buyout_buyout` ON (`advertisement_project`.`id` = `buyout_buyout`.`project_id`) GROUP BY `advertisement_project`.`id` HAVING (MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) >= '2019-03-01 00:00:00' AND MIN(CASE WHEN (`buyout_buyout`.`is_archived` = 0 AND `buyout_buyout`.`is_deleted` = 0) THEN `buyout_buyout`.`expiry_date` ELSE NULL END) <= '2019-03-02 00:00:00') ORDER BY NULL;
comment:7 by , 6 years ago
I think the most straightforward and graciously solution is to patch Django's django/db/backends/mysql/operations.py
convert_datetimefield_value function. It should simply check for type string and try to convert it manually.
comment:8 by , 6 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:9 by , 6 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Thanks for the investigation Martin, it's greatly appreciated.
I agree that the most straightforward solution for the datetime would be to adapt convert_datetimefield_value
but it'd be great to figure out if other types are affected as well (e.g. decimal, floats, ...) so we can figure out if the solution needs to live at the Aggregate(filter).as_mysql
level as well.
Could you possible try to figure if it's the case?
I also wonder if it could be possible to let mysqlclient
know what the expected type is for a specific cursor instead though. We have this information in the compiler (through .output_field
) so if we can provide per-cursor converter hints I assume it would be even more efficient because it would be performed in C. Another option is figuring out if mysqlclient
can somehow figure out how to introspect such return value on it's own (e.g. maybe MySQL returns a different field type in this case) but in all cases we'll have to ship a Django workaround anyway.
comment:11 by , 6 years ago
The problem seems to be Aggregation functions have a beef with Datetime in general. They convert them into a string. :(
So truly the best workaround would be to Wrap these functions with a CAST operation.
c.execute(""" SELECT MIN(CAST('2018-01-01 00:00:00' AS datetime)); """) 1 c.fetchall()[0][0] '2018-01-01 00:00:00'
By the way mysqlclient's response was "The answer is noooo!" Defiantly not their problem.
We could ask Mysql to fix it, but the same seems to be the case for mariaDB. I experimented with different MySql flavours and versions, before opening this bug. That means a improvement on the server would probably be around 2055.
comment:12 by , 6 years ago
Version: | 2.2 → master |
---|
Ah that's too bad. I guess we'll have to fix it on the Django side in this case.
Does it only happen for Min
and Max
functions? If it's the case I'm tempted to suggest we address the issue by implementing as_mysql
methods on these two functions that wrap the inner expression with a CAST
like you did here when self.output_field.internal_type() == 'DateTimeField
. The easiest way is probably to provide a different template
to the super functioné
Here's an example of how this was done for a duration handling of Avg
on MySQL
- https://github.com/django/django/blob/846624ed0858aec0e51baebaa5b397e135c6d1dc/django/db/models/functions/mixins.py#L25-L29
- https://github.com/django/django/blob/65858119d23e37872505a4476e7141c33981fb50/django/db/models/aggregates.py#L99
Happy to review your PR if you're interested in submitting one.
comment:13 by , 6 years ago
Yes, I can take a loop in my free time. I just need to make the following correction.
Seems like this is only a problem for MySql and Percona. Looks like mariadb is working fine. I just checked against their corresponding latest versions.
comment:14 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:15 by , 5 years ago
I am trying to recreate this issue, I have written a (test case)https://github.com/django/django/compare/master...rohitjha941:ticket_30224 which generates the following query:
SELECT "expressions_experiment"."id", "expressions_experiment"."name", "expressions_experiment"."assigned", "expressions_experiment"."completed", "expressions_experiment"."estimated_time", "expressions_experiment"."start", "expressions_experiment"."end", Min(CASE WHEN "expressions_result"."experiment_id" = 1 THEN "expressions_result"."result_time" ELSE NULL END) AS "min_expiry_date" FROM "expressions_experiment" LEFT OUTER JOIN "expressions_result" ON ( "expressions_experiment"."id" = "expressions_result"."experiment_id" ) GROUP BY "expressions_experiment"."id", "expressions_experiment"."name", "expressions_experiment"."assigned", "expressions_experiment"."completed", "expressions_experiment"."estimated_time", "expressions_experiment"."start", "expressions_experiment"."end"
and output is
[{'id': 1, 'name': 'Rohit', 'assigned': datetime.date(2020, 3, 1), 'completed': datetime.date(2020, 3, 21), 'estimated_time': datetime.timedelta(days=20, seconds=36000), 'start': datetime.datetime(2020, 3, 1, 11, 23, 36, 34995), 'end': datetime.datetime(2020, 3, 21, 21, 23, 36, 34998), 'min_expiry_date': datetime.datetime(2020, 4, 20, 21, 23, 36, 35778)}]
comment:16 by , 5 years ago
Cc: | added |
---|
comment:17 by , 3 years ago
Could we proceed with this issue with this very minimalistic failing example?
Failing with AttributeError
>>> SomeModel.objects.annotate(bla=Value(timezone.now(), models.DateTimeField())).first().bla AttributeError: 'str' object has no attribute 'utcoffset'
Working with Cast
>>> SomeModel.objects.annotate(bla=Cast(Value(timezone.now(), models.DateTimeField()), models.DateTimeField())).first().bla datetime.datetime(2022, 6, 21, 8, 25, 39, 678166, tzinfo=datetime.timezone.utc)
Are we asking everyone to always wrap all kinds of expressions involving timestamps with Cast
? There is no such hint in https://docs.djangoproject.com/en/4.0/ref/models/expressions/#value-expressions. We use Value(<some datetime>)
a lot as fallbacks in When
, Coalesce
, Least
, ... expressions.
comment:18 by , 3 years ago
Cc: | removed |
---|
comment:19 by , 21 months ago
I confirm that this issue is related to mysql, in my case with version 5.7
Upgrading to MariaDB solved this issue. However, it can be tricky to do this kind of upgrade, so I'll give my feedback if it can help others that struggle to proceed to the upgrade.
Disclaimer/Environment
First and foremost, please take into consideration the following environment/context as I have NOT tested this process for other OS/version, so if you're not under this environment, you should proceed CAREFULLY and double-check steps!
- CentOS 7.5
- MySQL version 5.7
- Django 2.2.28
- Python 3.6.9
- pip 21.0.1
Prerequisites
I suggest you first to put your server into maintenance. It depends on your WSGI application/webserver, but the goal here is to prevent any inbound HTTP connection to your server. In my case, I just configured a maintenance view from the NGINX server (we have a reverse proxy) in order to let the users know, and we just stopped all redirections to our gunicorn instances.
Then :
- Stop mysql service
systemctl stop mysql
- Take a backup of the databases (it will save all of your Django applications/projects if you have several). PS: For the sake of the example, I've stored it on /root but feel free to store anywhere you want, and also care to specify the root user of your mysql instance to get the dump.
cp -apf /var/lib/mysql /var/lib/mysql_backup cd /root/ mysqldump --all-databases -u <your_user_goes_here> -p > all_databases.sql
- Finally, remove all mysql packages
yum remove mysql-server mysql-client mysql-community-*
Install MariaDB server 10.4
- You need to use the MariaDB official repositories for YUM
yum install wget wget https://downloads.mariadb.com/MariaDB/mariadb_repo_setup chmod +x mariadb_repo_setup sudo ./mariadb_repo_setup
- Before you can install MariaDB, you may need to cleanup your ius and epel releases repos in your server (in my case, I had to, otherwise it would generate errors during installing MariaDB). If you feel you can skip this step, go ahead, you can go back to this step if needed.
yum erase ius-release rm /etc/yum.repos.d/ius*.repo* yum install https://repo.ius.io/ius-release-el7.rpm https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
- Then, install all the packages required for Django to work with MariaDB (MariaDB-devel was required in my case, otherwise some libraries were missing for Django to work)
yum install MariaDB-server MariaDB-client MariaDB-devel
- Start the MariaDB services afterwards
systemctl start mariadb systemctl enable mariadb
Proceed to migration
Welcome to the tricky steps! Good luck...
- You should start by using the MariaDB's command to upgrade the database. If you get no errors, good for you.. :)
mariadb-upgrade -u <your_db_user> -p
- If errors appeared at some steps, you can always try to add the --force flag as per suggested by MariaDB:
mariadb-upgrade -u <your_db_user> -p --force
- If you had no luck with some tables using JSON data type, you may encounter
Unknown data type "MYSQL_JSON"
errors when selecting data in your DB. Then you need to manually fix all the tables with that JSON type columns. First, you need to stop the MariaDB instancesystemctl stop mariadb
- Then, open a new mysql instance with the following settings (in my case, datadir was
/var/lib/mysql/
and plugin-dir was/usr/lib64/mysql/plugin
)mysqld --no-defaults --datadir=<Your data dir directory> --lc-messages_dir=./share --plugin-dir=<path to directory containing type_mysql_json.so> --plugin-maturity=alpha -u <your_sql_user>
- On another terminal, you can then install the mysql_json plugin and fix the tables with
install soname 'type_mysql_json'; ALTER TABLE `database_name`.`table_name` FORCE;
- You can also use this command if you have lots of tables with JSON type columns
SELECT CONCAT("ALTER TABLE `", TABLE_SCHEMA,"`.`", TABLE_NAME, "` FORCE;") AS MySQLCMD FROM TABLES WHERE TABLE_SCHEMA = "<YOUR_DATABASE_NAME>";
- If this, still, does not fix all of your SELECT commands in your DB, you can also try to retrieve the dump directly:
mysql -u <your_db_user> -p < all_databases.sql
- Don't forget to restart the MariaDB instance, if it was shutdown :)
systemctl start mariadb
Testing & troubleshooting
You should deeply test requests (read, create, update and delete) once everything seem to work again. You can, for instance:
- Run random SELECT on several tables of your databases.
- Start a Django instance
- Dive deeply into your Django Admin, and do some CRUD operations from there
Here a few bugs that I've encountered after the migration, if it helps:
Error loading MySQLdb module. Did you install mysqlclient?
when launching a Django instance
To solve this, I just had to reinstall mysql client from pip
pip install --force-reinstall --ignore-installed --no-binary :all: mysqlclient
EnvironmentError: mysql_config not found
when launching a Django instance
Make sure that you have all the MariaDB yum required packages (especially the MariaDB-devel
package)
- For no apparent reasons, some of the tables didn't recovered from the upgrade (Celery-related issue)
I have several django-celery-beat instances running, and for some, the periodic tasks did not recovered from the upgrade. I simply had to manually restore the tables from Celery (days after migration), and here some useful tricks if you want to recover from a previous SQL dump without erasing all your current and accurate data from other tables.
- From your SQL server, create a temp restore user and assign permissions on the only tables you'd want to restore
GRANT SELECT ON django_project.* TO 'admin_restore_temp'@'localhost' IDENTIFIED BY 'its_pwd'; GRANT ALL ON django_project.table1 TO 'admin_restore_temp'@'localhost';
- Restore the data (assuming you're getting a recover from
/data/backup/db1.sql
dump file)mysql --user admin_restore_temp --password --force < /data/backup/db1.sql
- Check that everything is okay, and don't forget to drop the temp user
DROP USER admin_restore_temp;
Useful resources
Hope these instructions can help you to migrate safely from mysql to MariaDB in order to workaround this issue. Here's some of the resources I've used:
- https://mariadb.com/fr/resources/blog/installing-mariadb-10-on-centos-7-rhel-7/
- https://mariadb.com/kb/en/restoring-data-from-dump-files/
Good luck!
comment:20 by , 10 months ago
I experienced the exact same issue with Django 4.1.13, mysqlclient 2.2.4 and MySQL 5.7.44-48. It seems setting USE_TZ=False
will bypass this issue but of course it creates other problems by losing time zone info. The issue can not be reproduced with MySQL 8.0.32-24.
comment:21 by , 10 months ago
Cc: | added |
---|
For the record, Django supports MySQL 8.0.11 and higher.
Simon, do you think this issue is still valid (assuming that in fact is not reproduceable in MySQL 8 and above)?
comment:22 by , 10 months ago
If we've confirmed that the issue is only reproducible on versions on MySQL that Django no longer support then it should effectively be closed.
comment:23 by , 10 months ago
Cc: | added |
---|
Adding Sarah as cc since we agreed she may use their MySQL setup to check on this.
comment:24 by , 10 months ago
As long as the PR covers the issue as you all understand it, I can confirm that this is issue is resolved for MySQL 8.0.22+ but still exists from 8.0.11-8.0.21.
In MySQL 8.0.21
FAIL [0.010s]: test_filtered_aggregate_datetime (aggregation.test_filter_argument.FilteredAggregateTests.test_filtered_aggregate_datetime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tests/django/tests/aggregation/test_filter_argument.py", line 223, in test_filtered_aggregate_datetime self.assertEqual( AssertionError: {'max_filtered_pubdate': '2008-06-23'} != {'max_filtered_pubdate': datetime.date(2008, 6, 23)} - {'max_filtered_pubdate': '2008-06-23'} + {'max_filtered_pubdate': datetime.date(2008, 6, 23)}
but from MySQL 8.0.22+
. ---------------------------------------------------------------------- Ran 1 test in 0.023s OK
(happy days)
comment:25 by , 10 months ago
Thanks for the sleuthing Sarah!
In this case I think we should either keep this ticket open, as MySQL 8.0 EOL in April 2025, in hope that someone will take a look at it and provide a not too invasive patch that can make the cut for the 5.1 release in August as this won't qualify for 5.2.
Given how long this ticket has been opened and that upgrading to MySQL 8.0.22+ (released 4 years ago) should be relatively trivial if you're on 8.0.11+ already I think we should simply wont-fix this one.
comment:26 by , 10 months ago
Keywords: | mysql added; Mysql removed |
---|---|
Owner: | removed |
Status: | assigned → new |
Triage Stage: | Accepted → Unreviewed |
I think it's best to take the second approach which is to wontfix
this ticket since it's very unlikely that we could get a fix in before feature freeze.
comment:27 by , 10 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Could it be related to this bug ?
https://sourceforge.net/p/mysql-python/bugs/325/