#17260 closed Bug (fixed)
`QuerySet.dates()` is computed in UTC when time zone support is enabled
Reported by: | Aymeric Augustin | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | kmike84@…, anssi.kaariainen@…, semente+django@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Aggregation on dates is performed in UTC (the database timezone) when settings.USE_TZ = True
, while an end user may expect its current time zone to be taken into account. This is a known problem in the initial implementation of time zone support.
It affects:
QuerySet.dates()
,- the
year
,month
,day
, andweek_day
lookups, date_hierarchy
in the admin,- the date based generic views.
Attachments (2)
Change History (27)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Owner: | changed from | to
---|
comment:4 by , 13 years ago
For the record, I'm afraid this will quickly become contentious once 1.4 is out...
comment:5 by , 13 years ago
You mean contentious as in "wontfix, breaks existing programs"? In my opinion that would be bad. It is pretty clear to me that the current behavior is broken. The dates should be in the currently active time zone.
I think I know how to fix this:
- aggregates can return parameters (useful for other features, too "conditional aggregates" for example). This is returning parameters from the cols part of the query.
- other needed parts already can return parameters (where conditions etc).
- use the AT TIME ZONE trick above, or if that is not available, use just " + timedelta offset minutes".
- hope that everything works.
The problem is there is a _lot_ of places to fix. In addition, the ORM is a little complex to work with sometimes. I just can't see getting this into 1.4 in time.
Any way to mark this as a known bug which _will_ be fixed in the future? The other way forward is introducing "at_timezone" argument for the operations, but how to do that for __date
for example?
comment:6 by , 13 years ago
I mean "we're going to take some flak".
I plan to fix this eventually, I just don't know how.
comment:7 by , 13 years ago
Yes, I can see this is very complex to fix in a way that actually works even closely the same on every backend.
It might be possible to just add the offset into the datetime, then do the conversion to the date. This all in the DB. This could actually work, as DST is usually around 3-4 o'clock AM, so there would not be error in date conversions.
I am tempted to give this a try...
comment:8 by , 13 years ago
Cc: | added |
---|
comment:9 by , 13 years ago
Cc: | added |
---|
I have a "POC" version ready. It just adds some timedeltas to the values where needed. In the database everything is still done at time zone UTC, we just add the offsets as Python reports them.
The biggest problem with this approach is DST handling. We use the offset as it is _now_, however the database date might have a different offset (offset now is 3 hours, in the summer 2 hours). So, we have an error of one hour for summer dates.
This means that this issue is really hard to fix: if we use current offsets, we will make an error of one hour across DST (not to mention actual time zone changes in South America...). That is not correct enough to call this fixed. We could use "AT TIME ZONE 'tz name'" where available, but that has just one little problem: we don't know the time zone name.
So, I really can't see an easy fix for this. We could invent a mapping mechanism for the Python time zone name -> DB time zone name. That isn't easy either, I believe PostgreSQL can have different time zone names on Windows and Linux. So, the user should actually define his converter dictionary in the settings.py, for each database. Or we could have our own mappings: time zone name -> list of time zones to try. On first connect inspect what time zones are available. Endless source of bugs, but on the other hand that isn't much different from supporting the different locales. This is the best approach available.
After all this MySQL would still be broken. I don't believe there is anything Django can do to make time zones work correctly together with dates queries on that database. I really wonder if it is a good idea to encourage people to use USE_TZ with MySQL. The setup is broken, and as far as I see, will be broken.
The attached patch is, as said, just a proof of concept, or more correctly a poor of the concept not working.
by , 13 years ago
Attachment: | dates_in_tz.diff added |
---|
comment:10 by , 13 years ago
One more possibility: just require that every time zone used is usable as database time zone as is. Current time zones follow this rule (I mean the configured settings' time zone). It might be hard to require this one after 1.4 is released, and it might be also hard for those who do not use pytz.
comment:11 by , 13 years ago
OK, another POC patch which has some support for AT TIME ZONE when using PostgreSQL.
As far as I can see SQLite is somewhat easy to fix, PostgreSQL and Oracle are fixable if we can translate the Python time zone names to something the database understands. It seems the best fix for MySQL is adding the offset which has the DST problem.
I am now done with tinkering with this ticket. I hope the POCs have some value for solving this properly after 1.4 has been released. At this point my only worry is that the option of requiring that all time zone names used must be usable as-is in the database will be lost after 1.4 is released.
Lesson of the day: In POSIX syntax GMT+3 is GMT-3. Of course. As if the timezone handling wasn't confusing enough already...
comment:12 by , 13 years ago
An update about MySQL: it has convert_tz function which does something equivalent to AT TIME ZONE. However, there are some problems with convert_tz:
- The timezone table for MySQL has to be loaded, if I understand the documentation correctly this is not done by default
- If you try to convert a datetime using an unknown (to MySQL) timezone name, you get null back
- If the datetime falls out of the range 1970-2038 you will not get any conversion, you get the input datetime back.
So, the convert_tz function would cure the problem for MySQL, assuming the datetime converted is in range 1970-2038. That could be documented as known limitation. This was the last missing piece of this puzzle:
- SQLite should be easy to support by using Python code directly in the query.
- PostgreSQL and Oracle should work using AT TIME ZONE syntax. Python <-> DB timezone names must match or there must be a converter.
- MySQL should work with convert_tz (with the above cave-eats), Again Python <-> DB timezone names must match or there must be a converter.
There is a lot of work left to actually implement the above properly for all core databases. However, that should be pretty straightforward coding to do. The big problem is converting the timezone names from Python to DB format. Any ideas how to do that in somewhat clean fashion?
comment:14 by , 13 years ago
I've given a lot of thought to this problem when preparing a talk for DjangoCon.eu. I suggest to:
1) Fetch all the datetimes and perform the aggregation in Python to compute QuerySet.dates()
on a DateTimeField
when USE_TZ = True
.
- That shouldn't be hard to implement.
- Performance may suffer, but it has always been an expensive operation. This would just shift the load from the database server to the application server.
- We'll need a parameter to specify which time zone to use. It should default to the default time zone, because that's what the model layer always does. In many cases, programmers will want to use the current time zone instead, but for the sake of explicitness and consistency, I still prefer to use the default time zone by default. I'm open to discussing this.
- If some backends can easily do this operation in the database (PostgreSQL) they can provide their own efficient implementation.
2) Deprecate the __day/month/year/week_day
lookups on a DateTimeField
when USE_TZ = True
- There's no way we can pass an explicit time zone argument with the lookup syntax.
- Even if the
__year
lookup can be emulated with a range lookup, the__day
and__month
lookups can't. They allow finding "all objects on the 1st of any month" or "all objects in January of any year" which is slightly pathological. - Like akaariai says, "there's a lot of work left", and I think that if we start on the path of hacking that, we'll be stuck forever trying to fix a fundamentally broken operation.
comment:15 by , 12 years ago
I'm still making up my mind on this ticket. I like Anssi's last proposal more and more for 1).
comment:16 by , 12 years ago
Not directly related to this ticket, but if we can convert the datetimes to correct time zone in the DB, there are other uses than solving this ticket, too. For example it would be very useful to be able to generate "sum of payments per month and client" aggregate using the ORM. In these cases getting the timezones correctly can be important. Of course, the TZ issue isn't the only issue preventing this type of query currently...
BTW there could be qs.timezone() operation which sets the whole queryset to work in the given timezone.
Still, as said before, in-db support will take a lot of time to get working correctly.
comment:17 by , 12 years ago
My last idea was to introduce a datetime_trunc_sql
operation in addition to date_trunc_sql
.
When USE_TZ = False
, datetime_trunc_sql
would be equivalent to date_trunc_sql
. When USE_TZ = True
, datetime_trunc_sql
would convert to the appropriate time zone before truncating. It could take the timezone as an optional argument.
Then QuerySet.dates()
would use datetime_trunc_sql
for DateTimeField
and date_trunc_sql
for DateField
.
By the way, your use case is totally valid — I've implemented a lot of reports like this in Python, since it's currently not possible in DB.
comment:18 by , 12 years ago
That datetime_trunc_sql seems like a clean solution.
I believe the timezone must be a cursor.execute() parameter, otherwise there could be potential for sql injection. Not 100% sure of this though.
What should one do if the timezone isn't available in the DB? Just error out? This seems like the cleanest approach, but of course, this isn't backwards compatible. Maybe raise a warning, and continue using no conversions? Basically we would then need a method get_db_tz_name for the backends, which would convert the Python timezone to the DB timezone, and error/warn if the timezone can't be converted to db timezone. This should prevent the SQL-injection possibility too. I believe we could ask the get_db_tz_name() to escape the time zone properly.
So, +1 for the datetime_trunc_sql approach. I really like the idea of using the DB for this, as aggregation for example can be a lot cheaper to do directly in the DB instead of fetching the data into Python.
comment:19 by , 12 years ago
I'd better keep things simple by documenting that tzname
must return a string that the database will understand as a time zone name. The tzinfo database, used by pytz
, is the de-facto standard. Adding a conversion functions sounds like overengineering to me.
While SQL injections through time zone names are unlikely, indeed, we must prevent them.
NB: SQLite will be restricted to using the default time zone, as defined by os.environ['TZ']
. I'm willing to live with this limitation. We can put that under the "use a more feature complete database" umbrella.
comment:20 by , 12 years ago
I don't buy the argument that tzinfo names are usable on every DB. Are they on Oracle, DB2, and MSSQL? Are they usable on databases running on Windows? If I understand correctly Windows doesn't use the tzinfo database.
MySQL will be interesting once again. If the timezone table isn't populated, it returns silently unconverted data. This is other part of the get_db_tz_name() - the idea is that this function could warn/error in this case. It would basically query the DB (on MySQL at least) if the timezone is valid. If not, it would do the warn/error, if yes, then it would cache that the tzinfo name is usable.
But, I think you are right here about not adding the conversion functions. They would be absolutely horrible to maintain.
If we require the user to explicitly set the tzname by using .use_tz(tzname) queryset method, then it is possible to document that the user must ensure that the tzname is usable in the DB. The tzname could be a callable (most likely one returning the current active time zone name) which is resolved at execute time. Is this the idea you were referring to by "documenting that tzname must return a string that the database will understand"? If so, +1 to that approach.
comment:21 by , 12 years ago
Cc: | added |
---|
comment:22 by , 12 years ago
I've started a branch to work on this problem: https://github.com/aaugustin/django/compare/queryset-datetimes
This branch is unstable and will be rebased often, don't fork it.
comment:23 by , 12 years ago
I've started by documented the target public API.
Conversions/truncations will use the current time zone by default because:
- aggregation should generally happen in the end user's time zone;
- it allows using
timezone.override
as shown below.
The design is sufficiently flexible to deal with all the cases discussed above.
Even "uncooperative" setups (MySQL, Windows) can be made to work with a custom tzinfo whose tzname
returns the appropriate value for the database:
qs.datetimes('dt', 'day', timezone=non_standard_tzinfo_object)
with timezone.override(non_standard_tzinfo_object): qs.filter(dt__day=1)
Using .datetimes()
doesn't require a custom tzinfo class; the appropriate time zone name for the database can be provided directly:
qs.datetimes('dt', 'day', timezone='Non-standard name")
I don't care if MySQL silently returns wrong data when it's misconfigured. This isn't worse than all the other data truncation or corruption bugs (for instance, when inserting a string longer than the field size) that happen with MySQL's default configuration. If you use MySQL and care about data integrity, you have to read its manual (and much more).
Finally I don't see the need for a get_db_tz_name
method; the standard tzname
suffices.
comment:24 by , 12 years ago
Has patch: | set |
---|
comment:25 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Here are the comments that were sent to the mailing list:
Aymeric Augustin (myself)
Under PostgreSQL, provided the current time zone has a valid name (e.g. it's provided by pytz), we could temporarily set the connection's timezone to this time zone and restore it to UTC afterwards. With other backends, I can't think of a simpler solution than fetching all values and doing the aggregation in Python. Neither of these thrills me, so for the time being, I've documented this as a known limitation.
Luke Plant
You could argue a case for saying that the behaviour has some advantages: if one admin user filters a list of objects by date, and sends the link to a colleague, they would probably expect the colleague to end up with the same list of objects. But even if that isn't very convincing, I'm happy with this limitation given the implementation difficulties of fixing it.
Anssi Kääriäinen
I don't see this as a major issue, except if Django needs to support this behaviour for backwards compatibility reasons. But as this is documented as a limitation, not as a "feature", this is fine as is. It would be nice to fix this in the future. For PostgreSQL, tracking the current time zone for the connection seems a bit hard - we can't rely a SET TIME ZONE is really going to be in effect, unless we also track rollbacks/commits (see #17062). And changing it for every DB query seems bad, too. One idea is to use "SELECT col AT TIME ZONE 'GMT+2' as col". This way the session time zone doesn't need to be changed. For example:
Oracle seems to have a similar feature. I don't know if other databases offer similar functionality. MySQL has probably support for something similar when using TIMESTAMP columns. But as said, I don't see this as a major issue. It would be nice to support this some day.