Opened 3 weeks ago

Closed 3 weeks ago

Last modified 2 weeks ago

#36293 closed New feature (needsinfo)

Extend `django.utils.text.compress_sequence()` to optionally flush data written to compressed file

Reported by: huoyinghui Owned by:
Component: HTTP handling Version: dev
Severity: Normal Keywords: gzip flush
Cc: Carlton Gibson, Matthew Somerville Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by huoyinghui)

This ticket proposes adding a test to confirm that compress_sequence()
in django.utils.text correctly flushes each chunk during gzip streaming.

The absence of zfile.flush() would cause compressed output to be buffered,
delaying response delivery in streaming contexts. This test uses timed
chunk generation to verify that data is emitted approximately once per second,
indicating that gzip output is non-blocking when flush() is used.

See related PR: https://github.com/django/django/pull/19335

Change History (10)

comment:2 by huoyinghui, 3 weeks ago

Description: modified (diff)

comment:3 by Natalia Bidart, 3 weeks ago

Cc: Carlton Gibson Matthew Somerville added
Component: UncategorizedHTTP handling
Keywords: flush added; blocking removed
Resolution: needsinfo
Status: newclosed
Type: UncategorizedNew feature
Version: 5.2dev

Hello huoyinghui, thank you for taking the time to create this ticket. Initially your request made sense, but I did some archeology and in fact zfile.flush() was removed in caa3562d5bec1196502352a715a539bdb0f73c2d (while fixing #24242). I can see in your patch how you added a flag to control whether the flushing occurs or not, but I'm not convinced this is the right solution.

We would need some stronger evidence that the lack of flushing is causing any issues with a streamed response. The commit above mentions:

Testing shows without the flush() the buffer is being flushed every 17k or so and compresses the same as if it had been done as a whole string.

I'll close as needsinfo, I'll add as cc a few folks, and edit the ticket title to be more precise. Please reopen when you can provide more evidence via a Django test project for proper assestment. Thank you!

comment:4 by Natalia Bidart, 3 weeks ago

Summary: Add test to verify non-blocking behavior of compress_sequence() with zfile.flush()Extend `django.utils.text.compress_sequence()` to optionally flush data written to compressed file

comment:5 by Carlton Gibson, 3 weeks ago

I'd like to see more exploration of solutions in project-space before we add API here in Django. In particular, from the PR, the magic "text/event-stream" restriction here is... well... a little ad hoc.

Better would be to subclass the GZIP middleware, and just skip compression for for SSE responses (presuming events of less than 17kb). Such would be a small number of lines, and quite a simple approach.

in reply to:  5 comment:6 by huoyinghui, 2 weeks ago

I hope you can help me make the naming of the code configuration more standardized.
I’ve already resolved this in my own project by subclassing GZipMiddleware to skip compression for responses with Content-Type: text/event-stream. However, I believe this is a common enough use case that it deserves better support at the framework level.

SSE relies on real-time delivery, and buffering caused by gzip (especially under the 17KB threshold) can introduce noticeable latency on the client side. Developers may not immediately realize gzip is the cause, leading to unnecessary debugging time.

It would be helpful if Django could:

  1. Document this behavior and its impact on SSE;
  2. Automatically skip compression for text/event-stream responses;
  3. Or offer a more explicit way to opt-out of compression per response.

Supporting this natively would improve the developer experience and better accommodate streaming use cases.

Here’s my test case:

run test in pycharm: tests.middleware.test_gzip.GzipMiddlewareTest.test_flush_streaming_compression
Test results:

  • Case 1: flush_each=True

✅ The client receives each event promptly — no visible delay.

  • Case 2: flush_each=False (default gzip behavior)

⚠️ All events are buffered and maybe delivered only after ~17KB, which defeats SSE’s purpose. The client appears stuck until buffer threshold is reached.

Last edited 2 weeks ago by huoyinghui (previous) (diff)

by huoyinghui, 2 weeks ago

Attachment: image-20250407-181853.png added

by huoyinghui, 2 weeks ago

Attachment: image-20250407-181923.png added

comment:7 by Carlton Gibson, 2 weeks ago

As per the GZip middleware documentation, response content will not be compressed if (among other options):

The response has already set the Content-Encoding header.

You should then be able to set the `identity` content encoding before sending the response to bypass the middleware here.

The RFC has this:

The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept-Encoding header, and SHOULD NOT be used in the Content-Encoding header.

So you might want to strip that header in a subsequent middleware (but compare the MUST in section 14.11)

I think we could consider adding a check here:

        # Avoid gzipping if we've already got a content-encoding.
        if response.has_header("Content-Encoding"):
            # POSSIBLE ADDITION: remove header if response["Content-Encoding"] == "identity" here. 
            return response

With a line in the docs to make this explicit.

Last edited 2 weeks ago by Carlton Gibson (previous) (diff)

in reply to:  5 comment:8 by huoyinghui, 2 weeks ago

Replying to Carlton Gibson:

This is a good suggestion—it’s simple to implement and avoids having the real-time performance of the SSE request blocked by compress_sequence. I accept it.

I think the significance of this issue lies in the fact that when users use Django to develop SSE requests, they may experience sudden blocking. However, it’s not directly caused by their own actions, making it hard to understand and debug.

Perhaps documentation can be added to explain how Django handles SSE responses. To avoid blocking, users need to configure

 response["Content-Encoding"] == "identity".
Last edited 2 weeks ago by huoyinghui (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top