Opened 12 years ago

Last modified 4 years ago

#18855 new New feature

persist a socket across reloads of the dev server

Reported by: Dan LaMotte Owned by:
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: unai@…, Florian Apolloner Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Using livereload (livereload.com) with Django becomes painful when updating a file immediately results in reloading the webpage AND the Django dev server. There is a short period of time when the dev server is not listening as it is busy reloading (frequently hit when using livereload).

This is exacerbated with larger projects as reload time is longer.

This has been an itch I've wanted to scratch for a long time. Hopefully, this will be accepted so others can benefit from it also. Helps plenty even with users hitting Reload manually on their browser.

Change History (20)

comment:1 by Dan LaMotte, 12 years ago

Owner: changed from nobody to Dan LaMotte
Status: newassigned

comment:2 by Dan LaMotte, 12 years ago

Has patch: set

comment:3 by Anssi Kääriäinen, 12 years ago

Triage Stage: UnreviewedAccepted
Version: 1.4master

It seems this could only be done on Unix platforms (at least socket.fromfd is documented to be available only on Unix). The best option for platform support is to write the code in a way where the code works when needed socket abilities are available, but if they are not, then we still get a working server, just without socket persistence capabilities.

If I understand the patch correctly, the socket is first created in the main process. Then, it is passed to the server subprocess as os.environ parameter. Finally the server subprocess will try to use the already existing socket by using socket.fromfd().

So, in addition to the above platform problem, at least these possible limitations come to mind:

  • Could we leak the socket's fileno in to os.environ where it is accidentally reused later on? Could we pass it to the server subprocess some other way?
  • It seems we will need to have socket creation logic in Django's code. We don't have that now. This might cause us some headaches later on.
  • Could the socket be left in a state where it contains cruft from the old process (this might be a stupid question, I don't know socket semantics too well)? What I mean here is that maybe the server process reads half of an request, gets reloaded, and then the new process has a broken half-request waiting for it? The same could happen for write-to-client, too, if I am not mistaken.

I would like to get rid of the unavailability of the dev server during reload, no question about that, but we should not commit this unless we have pretty good guarantees that this will cause zero regressions on any supported platforms. This is not worth risking breakages. I am marking this accepted, but this is only on the grounds that we can make this bulletproof.

comment:4 by Dan LaMotte, 12 years ago

Thanks for the review. I've updated the pull request to address the platform issue.

I dont know the platforms supported by Django, but I added a check to the code to test if "fromfd" exists in the socket module. If it does, it will use the new persist socket code. If not, the code will not be used. I believe its the case that if "fromfd" is not supported, it will not exist, however, I don't have a host to test this on.

Yes, the socket needs to be first created in the main process that survives the reload process. This way, the socket stays open and accepting connections.

  • Not sure what you mean by leak? The child process that gets reloaded would be the only process that could get the FD. The spawnve called in the autoreloader ensures that the child's FDs are identical to that of its parent, but besides this, its an entirely new process (it calls exec or something similar for the platform). This means that no other process state is saved from the parent. We can either pass the FD number via the environment OR via the command line. It appears the autoreload module uses an environment variable "RUN_MAIN" to tell the child process that it should run the HTTP server. I think using another environment variable is suitable to pass the FD also.
  • I had not realized this. The socket is only created now, its still configured the way it was before. This should make it less risky IMO.
  • I don't believe it introduces any new risks. Specifically, a "half write" could happen with the current implementation if its not handled (I'm not familiar enough to know if that's the case). This would not change that AFAIK. The child process would (during cleanup due to exit) will close the client's connected socket just as if there wasn't a persisted socket. As for reading a partial response, I believe we're safe from this simply because the .accept() on this main server socket returns a client socket which handles the connection. If it only reads part of the response and the connection is interrupted due to reload, the child exiting will close the socket and the pending data will be discarded by the kernel. It will not remain pending on the server socket (this is guaranteed by TCP I believe, I don't believe this would be the case with something like UDP for instance).

I'll do some looking into writing some tests along with this commit. It'll be significantly interesting/complicated to test I think because of the inherent race conditions involved. Any advice or ideas are welcomed.

Thanks again for your review and acceptance of this. I'll update this request when I've looked into writing the tests and/or have written some.

in reply to:  3 comment:5 by bhuztez, 12 years ago

I think this is duplicate of #16049

comment:6 by Dan LaMotte, 12 years ago

Yes. I didn't see that one before opening this one. However, at the moment, this ticket has a corresponding patch/pull request. Its missing tests, but I plan on fixing that. Also, I have been using the patch in my own development with success and zero issues.

comment:7 by Aymeric Augustin, 12 years ago

Needs documentation: set
Needs tests: set

This is very interesting. To move forwards it just needs:

  • docs — including an explanation that it doesn't work on Windows with Python 2 (see discussion on #16049)
  • tests — or an explanation of why it's impossible to test

In the meantime, I've closed the pull request, because our triage options on GitHub are limited and we don't keep pull requests with partial patches.

Please re-open it or make a new PR when you get a chance to make these improvements!

comment:8 by Dan LaMotte, 12 years ago

Sorry this has been inactive so long. Life has been pretty busy. The testing part is a bit challenging simply because it's very "race-condition-y". I have used the patch in development since I initially wrote it (and I write code everyday with Django) so I'm very certain it works without regressions, however, I can understand adding tests to the code base.

I'll get going on the docs.

comment:9 by Dan LaMotte, 12 years ago

Created new pull request: https://github.com/django/django/pull/893

Added documentation and brought the branch up to date with the latest master branch.

I explain in the pull request why I think it'll be ok to not add tests (given the difficulty). But if there are suggestions, I can surely look into them and update the pull request.

comment:10 by Dan LaMotte, 12 years ago

I think this is slipping through the cracks. This is my first contribution to Django, so I'm probably making this harder than it needs to be. Sorry about that.

Just wanted to check in and see what else I can do to get this into 1.6 (hopefully).

Thanks much for your help and patience.

comment:11 by Unai Zalakain, 11 years ago

Cc: unai@… added

comment:12 by Unai Zalakain, 11 years ago

Needs documentation: unset
Needs tests: unset
Triage Stage: AcceptedReady for checkin

LGTM

comment:13 by Florian Apolloner, 11 years ago

Cc: Florian Apolloner added

What's the reason for adding an option to disable that feature? I can't imagine why someone would like to disable it if it works.

comment:14 by Dan LaMotte, 11 years ago

Could have sworn I added it because I was asked to, but cannot seem to find that at the moment. I guess it's only there so if for some reason its causing more problems than it solves for that user, they can disable it.

comment:15 by Tim Graham, 11 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

I wonder if our time would be better spent in efforts like #21978 -- "Add optional gunicorn support to runserver", rather than trying to enhance our own HTTP server?

In any case, the patch no longer merges cleanly according to GitHub so bumping off RFC.

comment:16 by Dan LaMotte, 11 years ago

Triage Stage: AcceptedReady for checkin

I've just updated the pull request to rebase off of the latest master. Please re-consider this patch.

I think the ticket you reference is great, however, this patch is ready now and it addresses a common problem that many would appreciate being fixed.

comment:17 by Florian Apolloner, 11 years ago

Triage Stage: Ready for checkinAccepted

Did you actually check that this works fine on Python3.4?

comment:18 by Unai Zalakain, 11 years ago

Once addressed the notes added to the PR patch, it works like a charm with 3.4

comment:19 by Tim Graham, 10 years ago

As noted on the pull request, this causes Address already in use errors locally when running ./runtests.py admin_widgets --selenium --liveserver=localhost:8090-8100 in parallel (as well as on Jenkins). It seems that the alternate ports are never tried.

Last edited 10 years ago by Tim Graham (previous) (diff)

comment:20 by Mariusz Felisiak, 4 years ago

Owner: Dan LaMotte removed
Status: assignednew
Note: See TracTickets for help on using tickets.
Back to Top