Opened 18 months ago

Last modified 2 months ago

#34699 assigned Cleanup/optimization

Filtering on annotated TruncSecond expression gives unexpected result.

Reported by: Stefan Owned by: Wes P.
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Aymeric Augustin Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

Change History (15)

comment:1 by Mariusz Felisiak, 18 months ago

Cc: Aymeric Augustin added
Component: UncategorizedDatabase layer (models, ORM)

in reply to:  description comment:2 by Abdulla Dlshad, 18 months ago

I believe this would be the case for all other Trunc Date expressions as they are subclassed with TruncBase.
Replying to Stefan:

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

comment:3 by Francesco Panico, 18 months ago

Owner: changed from nobody to Francesco Panico
Status: newassigned

comment:4 by Natalia Bidart, 17 months ago

I've been trying to understand this report to properly triage it. From the docs for Trunc and derivatives, when describing DateTimeField truncation, the example given shows this:

 >>> Experiment.objects.annotate(
...     date=TruncDate("start_datetime"),
...     day=TruncDay("start_datetime", tzinfo=melb),
...     hour=TruncHour("start_datetime", tzinfo=melb),
...     minute=TruncMinute("start_datetime"),
...     second=TruncSecond("start_datetime"),
... ).values("date", "day", "hour", "minute", "second").get()
{'date': datetime.date(2014, 6, 15),
 'day': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'hour': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=timezone.utc),
 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=timezone.utc)
}

The text and examples seem to indicate that when the stored datetime is in a given timezone, that tz would be used in the truncation unless otherwise specified by the tzinfo param.

@stefan, can you double check that the values for published are actually datetimes in the tz you configured you app?

comment:5 by Natalia Bidart, 17 months ago

Also @stefan, what database are you using in your example? Thanks!

comment:6 by Natalia Bidart, 17 months ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

I did some more research on this. I'm accepting this ticket because I do believe there is an unexpected behavior, which may be a bug or just a documentation issue.

The simplest test of getting all books and printing their published date and their truncated-to-seconds published date is, IMHO, unexpected: the published value is in UTC but the annotated value is in the timezone the app is configured. My settings:

USE_TZ = True
TIME_ZONE = "Europe/Berlin"

The model:

from django.db import models
from django.utils import timezone


class Book(models.Model):
    published = models.DateTimeField(default=timezone.now)

In the shell:

import time
from django.db.models.functions import TruncSecond
from testapp.models import Book
for i in range(3):
    Book.objects.create(); time.sleep(1)
annotated_books = Book.objects.annotate(_published_trunc=TruncSecond('published'))
for i in annotated_books.all():
    print(i.published, i._published_trunc)

With output:

2023-07-14 18:39:22.620603+00:00 2023-07-14 20:39:22+02:00
2023-07-14 18:39:28.585856+00:00 2023-07-14 20:39:28+02:00
2023-07-14 18:39:29.590469+00:00 2023-07-14 20:39:29+02:00
2023-07-14 18:39:30.595811+00:00 2023-07-14 20:39:30+02:00

comment:7 by Mariusz Felisiak, 17 months ago

I'm pretty sure it's not a valid ticket, but couldn't find quickly find a proper justification. Any change to the current behavior would be backward incompatible.

comment:8 by Natalia Bidart, 17 months ago

Thanks Mariusz for your insight. I was wondering what are your thoughts for my example above, where the result of calling TruncSecond on a timestamp would have a timezone different than the argument.

comment:9 by Mariusz Felisiak, 17 months ago

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

in reply to:  9 comment:10 by Natalia Bidart, 17 months ago

Replying to Mariusz Felisiak:

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Right, I understand the docs above and how that match the results I got. But, at the same time, the first paragraph about "Time Zones" says:

When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
[...]
Even if your website is available in only one time zone, it’s still good practice to store data in UTC in your database.

The above aligns perfectly (and makes sense) with timezone.now returning an aware datetime in UTC. But then, at least to me, is quite surprising that TruncSecond, which is an operation fully occurring in the DB, would not "respect" that "UTC invariant" and have the TIME_ZONE setting affecting the results. Does my point make sense? Do you have historic information about why TruncSecond (and related functions) would not operate with/keep the tz defined in the datetime being processed?

comment:11 by Stefan, 17 months ago

@Natalia, apologies I should've said, it was PSQL 14.3.

I want to clarify that from a purely ORM perspective the result is surprising, given

Book.objects.update(published=timezone.now())
Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(_published_trunc__lte=timezone.now()).count()  # 0

the result is 0. One has to dive in to the SQL to understand why this is happening, and it's because AT TIME ZONE 'Europe/Berlin' is making the timezone naive in the DB level. The docs do say:

Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Although it says the datetime is converted to the new timezone before the value is truncated, converted to new timezone doesn't necessarily suggest that the resulting time zone is naive - one could still assume it an aware timezone just with an offset of +11:00. The latter is further suggested because the immediately following examples in that doc show the returned value on the annotation are an aware value with an offset of +11:00, and not naive.

comment:12 by Francesco Panico, 17 months ago

Owner: Francesco Panico removed
Status: assignednew

comment:13 by Wes P., 2 months ago

Owner: set to Wes P.
Status: newassigned

comment:14 by Wes P., 2 months ago

I did a little research into this and I would say Stefan and Mariusz are both correct. The behavior of TruncSecond is unexpected but any changes to the feature would break things for people who have used any Trunc functions in a filter and have manually adjusted for the timezone naive result. My proposal is a change to the documentation with a warning about using any of the Trunc functions in a filter when the set timezone is not UTC and to recommend a way of constructing the filter in such away as to produce the expected results. I have a number of tests that confirm the documented warning so that if someone changes the behavior in the future the documentation should be updated.

The way that Django uses DATE_TRUNC function in Postgres leads us to believe that the underlying filter will also be timezone aware, but the way that Django calls DATE_TRUNC in Postgres returns a timezone naive result (see: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC). Django returns a timezone aware result by converting the naive value which is why you see the different timezone values in your sample query earlier in the ticket.

The TruncSecond (and other Trunc functions) were added (I think) in 2013 and at that time, Postgres did not have the option for doing a timezone aware DATE_TRUNC without doing it the way that Django is currently doing it which returns a timezone naive result. In 2019 Postgres added a new feature to the function that does return a timezone aware result so that the filter in Stefan's situation would work as expected. I did play with this and it would be possible to add this as a new feature, possibly in the same way that Postgres has, by passing an optional parameter to the TruncBase class that would access the feature so that it won't break existing code.

comment:15 by Wes P., 2 months ago

Has patch: set

This is my first ticket and pull request. https://github.com/django/django/pull/18660

Note: See TracTickets for help on using tickets.
Back to Top