Opened 8 years ago
Last modified 4 years ago
#27734 new Cleanup/optimization
Add a helpful error message when a parallel test worker is assigned an unexpected index
Reported by: | Adam Wróbel | Owned by: | nobody |
---|---|---|---|
Component: | Testing framework | Version: | 1.10 |
Severity: | Normal | Keywords: | parallel |
Cc: | Chris Jerdonek | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've had this problem since upgrading to Python 3.6 on macOS Sierra – when I run my tests in parallel many of them blow up trying to use database with index that is higher than the number of parallel test processes. On my MacBook the default process number is 4 and tests fail trying to connect to test_my_db_5. On my iMac the default process number is 8 and test fail connecting to test_my_db_9. The same thing happens if I set the number of processes explicitly, e.g. --parallel=2
. Higher number of processes yields less failures. If I explicitly set the number of processes to a number that is greater than the number of my test files then the tests succeed.
Traceback (most recent call last): File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 216, in __call__ self._post_teardown() File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 908, in _post_teardown self._fixture_teardown() File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/test/testcases.py", line 943, in _fixture_teardown inhibit_post_migrate=inhibit_post_migrate) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/__init__.py", line 130, in call_command return command.execute(*args, **defaults) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/base.py", line 345, in execute output = self.handle(*args, **options) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/commands/flush.py", line 54, in handle allow_cascade=allow_cascade) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/core/management/sql.py", line 15, in sql_flush tables = connection.introspection.django_table_names(only_existing=True, include_views=False) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 88, in django_table_names existing_tables = self.table_names(include_views=include_views) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/introspection.py", line 55, in table_names with self.connection.cursor() as cursor: File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 233, in cursor cursor = self.make_cursor(self._cursor()) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 204, in _cursor self.ensure_connection() File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection self.connect() File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/utils.py", line 94, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise raise value.with_traceback(tb) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection self.connect() File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/base/base.py", line 171, in connect self.connection = self.get_new_connection(conn_params) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/django/db/backends/postgresql/base.py", line 176, in get_new_connection connection = Database.connect(**conn_params) File "/Users/amw/.virtualenvs/envname/lib/python3.6/site-packages/psycopg2/__init__.py", line 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) django.db.utils.OperationalError: FATAL: database "test_my_db_5" does not exist
My requirements.txt file:
boto3==1.4.3 daemonize==2.4.7 Django==1.10.5 django-appconf==1.0.2 django-compressor==2.1 django-debug-toolbar==1.6 django-dotenv==1.4.1 django-extensions==1.7.5 django-ipware==1.1.6 djangorestframework==3.5.3 filemagic==1.6 inflection==0.3.1 jmespath==0.9.0 psycopg2==2.6.2 python-dateutil==2.6.0 rcssmin==1.0.6 rjsmin==1.0.12 s3transfer==0.1.10 six==1.10.0 sqlparse==0.2.2 uWSGI==2.0.14
Change History (12)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
I don't know if it's Django's fault, but I do know that it's Django that is creating test databases and configuring connection parameters. If you take a look on the backtrace you'll see that the only other Python package on it is psycopg2.
Parallel nature of this problem makes it hard to investigate with my debugger. The exception occurs in the child process, but I don't know where the incorrect information is generated. I was hoping for a hint for where to start the search.
comment:3 by , 8 years ago
The parallel test runner is in django/test/runner.py
. Can you provide a small sample project (without any third-party dependencies) that reproduces the issue?
comment:4 by , 8 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:5 by , 7 years ago
I'm having this problem. It's not Django's fault, although there could be better error reporting.
--parallel
creates N databases for the N workers to use. Each worker claims the database it will use by incrementing a shared counter. When the multiprocessing.Pool
is initialized with N processes, the workers are assigned databases 1 through N.
The problem happens when a worker exits. Normally, the workers in a pool will stay alive for the whole life of the pool. In my case, workers are exiting because of seg faults. When this happens, the pool repopulates with new workers. When the first new worker is initialized, it increments the counter to N + 1 and tries to use that database, and this error results.
Only some of my tests cause the seg fault, so if I run --parallel
with other tests, it works fine.
It would have saved some time diagnosing the issue if Django raised an error when a worker incremented the counter over N, which seems like a reasonable thing for the initializer to do.
comment:6 by , 7 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
I agree it would be nice to have a better error message in this case.
comment:7 by , 7 years ago
Summary: | Parallel tests try to use database with too high indices → Add a helpful error message when a parallel test worker is assigned an unexpected index |
---|---|
Type: | Bug → Cleanup/optimization |
comment:8 by , 7 years ago
I don't think my issue was with segfaulting processes, but it's certain that something caused the process to be restarted. Better error message might be nice to have, but won't be helpful unless it helps determining what caused the process segfault/restart.
Maybe there would be a way to reuse a connection of a dead process?
comment:9 by , 5 years ago
To anyone else who's still having this problem, it seems to be specific to OSX. Or at least this reason and fix is.
Putting this somewhere in your code before the test runner is initialized (settings worked good for me) fixes the issue:
import os import platform import sys if 'test' in sys.argv and platform.system().lower() == 'darwin': os.environ.setdefault('OBJC_DISABLE_INITIALIZE_FORK_SAFETY', 'YES')
comment:11 by , 4 years ago
Cc: | added |
---|
I haven't see that issue in my Python 3.6 testing. Could you investigate and tell us why Django is at fault?