Opened 3 years ago
Closed 3 years ago
#32913 closed Cleanup/optimization (fixed)
test_glob inside utils_tests.test_autoreload.WatchmanReloaderTests is flaky
Reported by: | Abhyudai | Owned by: | Nick Pope |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Tom Forbes | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The test sometimes raises an exception. The traceback
on my local setup:
ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "../django/django/utils/autoreload.py", line 529, in update_watches self._update_watches() File "../django/django/utils/autoreload.py", line 518, in _update_watches self._watch_root(root) File "../django/django/utils/autoreload.py", line 438, in _watch_root result = self.client.query('watch-project', str(root.absolute())) File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 1052, in query res = self.receive() File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 960, in receive result = self.recvConn.receive() File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 695, in receive buf = [self.transport.readBytes(sniff_len)] File "../django/venv/lib/python3.8/site-packages/pywatchman/__init__.py", line 344, in readBytes raise SocketTimeout('timed out waiting for response') pywatchman.SocketTimeout: timed out waiting for response, while executing ('watch-project', '/tmp/django_p9emwuc2/tmpdx4bsuny') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor yield File "/usr/lib/python3.8/unittest/case.py", line 676, in run self._callTestMethod(testMethod) File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod method() File "/usr/lib/python3.8/unittest/mock.py", line 1325, in patched return func(*newargs, **newkeywargs) File "../django/tests/utils_tests/test_autoreload.py", line 542, in test_glob with self.tick_twice(): File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__ return next(self.gen) File "../django/tests/utils_tests/test_autoreload.py", line 531, in tick_twice next(ticker) File "../django/django/utils/autoreload.py", line 556, in tick self.update_watches() File "../django/django/utils/autoreload.py", line 532, in update_watches if self.check_server_status(ex): File "../django/django/utils/autoreload.py", line 584, in check_server_status raise WatchmanUnavailable(str(inner_ex)) from inner_ex django.utils.autoreload.WatchmanUnavailable: timed out waiting for response, while executing ('watch-project', '/tmp/django_p9emwuc2/tmpdx4bsuny')
Till now, what I have observed is this only occurs when at least all the tests inside the test_utils.test_autoreloader.WatchmanReloaderTest
class are run at once. I made a script that you may use to verify this:
#!/usr/bin/bash iteration=0 while true; do iteration=$((iteration + 1)) ./runtests.py utils_tests.test_autoreloader.WatchmanReloaderTests if [[ $? -ne 0 ]]; then echo -e "-----------\nit failed in the $iteration iteration--------------------\n" break fi echo -e "-------------\nrunning the $iteration iteration\n-----------------\n" done
The flakiness is highly rare when executing only running the tests under WatchmanReloaderTests
, in my case it sometimes took close to 2000 operations to fail. Although, when the whole test suite is run at once, the problems seems to occur more frequently(it occurred twice out of something like 10 times on my end)
Change History (7)
comment:1 by , 3 years ago
Cc: | added |
---|
comment:2 by , 3 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:3 by , 3 years ago
Has patch: | set |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Re-opening this as I've been experiencing this same issue.
The issue is that WatchmanReloaderTests.setUp()
sets self.reloader.client_timeout = 0.1
to speed up the tests.
It seems that this is too low as at 0.1s the following fail consistently for me on every execution:
ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_multiple_globs (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_multiple_recursive_globs (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_nested_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)
Changing to 0.2s the following fail consistently:
ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)
Changing to 0.5s the following fail intermittently:
ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests)
In the end I settled for 0.6s which seemed to work consistently. See PR.
Unfortunately, much like the tests for caching, I think we're going to have to suffer longer running tests.
comment:4 by , 3 years ago
Strangely, with 0.6s running just the tests for utils_tests.test_autoreload.WatchmanReloaderTests
works consistently.
However, when running the entire test suite, I still get the following failures:
ERROR: test_glob (utils_tests.test_autoreload.WatchmanReloaderTests) ERROR: test_glob_recursive (utils_tests.test_autoreload.WatchmanReloaderTests)
Bumping to 1 second, and even 1.5 seconds, seemed to result in the same tests failing intermittently.
I seem to have had a reasonable amount of success with 2 seconds, so I've updated the PR with that. (It's still better than 5 seconds which is the default.)
Perhaps something else is at play that makes it worse when there are more tests being executed, but I'm not sure what.
I'm wondering if we shouldn't allow the DJANGO_WATCHMAN_TIMEOUT
environment variable to be applicable when running the test suite also? That'll allow users to easily overriding the value when running Django's test suite. I added a fixup for that to the PR which can be squashed or removed as required.
comment:5 by , 3 years ago
For further info, I bisected this to ed3c59097a01ed3f32f8a8bed95307fb5c181251.
That was when I started getting failures for running just the utils_tests.test_autoreload.WatchmanReloaderTests
.
This reduced the timeout from the pywatchman
default of 1.0
to 0.1
for the tests.
I still get timeouts on the commit before that when running the entire test suite, however, so 1 second isn't enough.
comment:6 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Triage Stage: | Accepted → Ready for checkin |
I'm going to have to say
needsinfo
for now.Looks like it's an issue in
pywatchman
:Is that timeout configurable maybe? 🤔
If we can identify a specific issue in Django, or a specific change we can make to work around the issue, we can reopen.