Opened 12 years ago

Closed 11 years ago

Last modified 6 years ago

#20483 closed Cleanup/optimization (fixed)

Reduce the set of apps seen by individual tests

Reported by: Anssi Kääriäinen Owned by: nobody
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: marc.tamlyn@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently Django's test suite contains close to 1000 models. For every transaction test case we need to flush all the tables, then install contenttypes and permissions for every model. This means around 4000 inserted rows per single test in TransactionTestCase.

To speed Django's test suite we could require that TransactionTestCases always contain information of what applications they depend on. By doing this we only need to flush a couple of models.

For example using PostgreSQL running transactions + transactions_regress takes 4.5 seconds when non-needed applications are masked, and 55 seconds without masking. MySQL and transactions + transactions_regress is 3.6s compared to 70s. The difference increases when more models are added to the test run. For example MySQL with transactions, transatcions_regress and queries tests takes 220s vs 9.1s. [The times do not include database creation which itself takes a lot of time and isn't changed by the patch].

Full suite on SQLite takes 560s on master, 213s patched. If normal TestCases were isolated, too, the test suite would be still faster (around 160s). The reason for this is fixture loading - for any fixture loaded we go through all apps, then check for all combinations of postfixes (.json, .xml, ...), and compression types (.tar.gz, .bzip, ...). Then check if file with any name of any combination exists. This is unsurprisingly very expensive to do.

A proof of concept patch is available from https://github.com/akaariai/django/compare/isolated_apps.

The idea of isolating test apps from each other doesn't work that well for user's tests. The problem is that usually applications contain a lot of dependencies (for example any time you do client.login() you are depending on django.contrib.auth and django.contrib.sessions). So, forcing this to on for user apps can't be done. Opt-in is of course possible, but I am not sure documenting this at all is necessary.

The patch needs some more work. The biggest issues are usage of settings._ALWAYS_INSTALLED_APPS for detecting if isolation should happen, and use of TRUNCATE ... CASCADE when using PostgreSQL (this could be dangerous, as flush could cascade to unmanaged tables).

Change History (47)

comment:1 by Marc Tamlyn, 12 years ago

Cc: marc.tamlyn@… added

This is such an enormous performance increase, I feel it should be available for local testing with sqlite in memory even if making it work on the CI server with a "real" database has it's own problems. Obviously the benefits on the CI server are also significant as we should get much faster feedback of fails, but the priority should be ease of testing locally.

Personally I think it should be ok for _ALWAYS_INSTALLED_APPS to be there all the time - I'm not sure how many models are in these apps but it's certainly a lot less than the 1000+ across the whole test suite. If combined with some better multi threading then we're on to a huge win.

comment:2 by Anssi Kääriäinen, 12 years ago

I think I am going to do these changes & commit:

  1. app_mask is used if the TestCase has it set. Otherwise all apps are used. So, no need for _ALWAYS_INSTALLED_APPS hack.
  2. add some way (environment variable?) to enable "warn on missing app_mask" so that we can eventually have app_mask for every Django's test.
  3. manually cascade the TRUNCATE for postgresql (travel all reverse foreign keys, add found tables to the truncate command)

When all test cases have requirements marked, we could get rid of runtests.py ALWAYS_INSTALLED_APPS and instead create tables for apps in set(t.app_mask for t in testcases).

Some numbers from testing this on postgresql: 4500s unpatched, 300s with this ticket's patch. The time doesn't include database creation, so the full time is probably more like 4700s vs 500s. Still, order of magnitude difference.

comment:3 by Anssi Kääriäinen, 12 years ago

Patch updated (force-pushed to https://github.com/akaariai/django/compare/isolated_apps). Here is the commit message:

Any django.test.testcases.TransactionTestCase subclass can include
app_mask. The app_mask tells the subset of INSTALLED_APPS the test
requires. Any model outside the app_mask can not be queried. The
currently tested app is always included in the app_mask.

The intended use case is to make Django's test suite faster. By
masking applications the non-masked applications do not need to
flushed, nor is there need to install contenttypes or permissions for
masked applications. For example SQLite tests run in about 210s vs
560s, and PostgreSQL tests run in 4500s vs 300s.

The change does not alter behaviour of tests when the app_mask is not
set, unless DJANGO_ALWAYS_MASK_APPS[_TX] environment variable is set.
The _TX variant controls transactional test cases, while plain variant
controls other test cases. runtests.py contains set of the _TX
variable. If set, app_mask is set to current tested app unless the
test case contains its own app_mask.

The code might be a little rough, but it seems to work on both SQLite and PostgreSQL. Please review, I would like to get this committed soon. This should cut our CI cycle time by order of magnitude.

comment:4 by Anssi Kääriäinen, 12 years ago

I updated the patch, the os.environ usage is now gone, instead TestCase has now app_mask. It defaults to None which is equivalent to all INSTALLED_APPS. runtests.py changes the default for TransactionTestCase to "self" which means just the test's own app. Of course, one can set the mask in subclasses and those are not overridden. The test's own app is always included.

I also cleaned up the expansion logic in django_table_names(). Main change is just making it DRY.

There are some changes to models/loading.py (that is, the appcache), and then there are some checks added in various places to make sure only masked apps are used. These might seem a little scary, but as long as the cache's set_app_mask() isn't called there should be very small risk of regressions.

One naming issue: I wonder if app_mask is good, and if my usage of mask is consistent. I raise AppMaskedError when an application is not in TestCase's app_mask. Better ideas? app_dependencies maybe?

EDIT: forgot to add that I plan to squash the commits on merge. The stale commit message of the first commit is a known problem.

Last edited 12 years ago by Anssi Kääriäinen (previous) (diff)

comment:5 by Anssi Kääriäinen, 11 years ago

I think I want to wait and see what Andrew has planned for migrations. There is some risk of conflicts between this patch and migrations work (app_cache and how flush works).

comment:6 by Aymeric Augustin, 11 years ago

Here's my review of the current patch.


First, sorry if this sounds like bikeshedding, but I think app_mask isn't a very good name: it lists applications that are *unmasked*. At least, that's implied by the exception when accessing a model in another app: it says the app is masked.

I suggest to rename app_mask to installed_apps. Indeed, it has the same purpose as settings.INSTALLED_APPS: it's a list of apps available in the app cache.


Then, I'd like to be sure the API added to the app cache, assert_model_unmasked(model), is useful and convenient.

If it is, it should be used in django/contrib/auth/management/__init__.py and django/contrib/contenttypes/management.py.

In the current code, it isn't clear why auth.Permission or contenttypes.ContentType wouldn't be in get_models(). Have you considered the following alternatives?

try:
    get_model('auth', 'Permission')
except AppMaskedError:
    return
if is_masked(auth.Permission):
    return

They make it more obvious that it's related to masking. I like both, if you go with the second one you must make the function available in django.db.models.

You'll have to adjust django/db/models/manager.py accordingly.


Bikeshedding again — I'm sure we can come up with a better name than expand_dependent. Maybe include_masked_relations or follow_forwards_relations?


You haven't changed anything in django/db/backends/postgresql_psycopg2/operations.py, have you?


I have no idea what you're trying to achieve in get_apps(); I can't parse elt[1].__name__.rsplit('.', 1)[0]. This expression overgrew the list comprehension, it's time to switch to a for loop or at least introduce a helper function.


I'm not sure what the change in tests/file_storage/tests.py is trying to achieve. Could you add a comment?


And that's it!

I think this patch is fantastic, and I would very much like to have it committed to master before we fork stable/1.6.x. But I don't want to steal your thunder. If that helps, I can iterate on your patch with the changes I suggested. I just need a quick explanation for the two last items. Let me know!

comment:7 by Aymeric Augustin, 11 years ago

In addition to the reduced app list, this ticket mentioned two other ideas, which are tracked in other tickets:

  • keep the database between test runs: I made a proposal in #20550;
  • load fixtures faster: I committed a patch for #20485.

comment:8 by Aymeric Augustin, 11 years ago

Needs documentation: set

One last thing: the new behavior of TransactionTestCase should be documented, any project can take advantage of it.

in reply to:  6 comment:9 by Anssi Kääriäinen, 11 years ago

Replying to aaugustin:

First, sorry if this sounds like bikeshedding, but I think app_mask isn't a very good name: it lists applications that are *unmasked*. At least, that's implied by the exception when accessing a model in another app: it says the app is masked.

I suggest to rename app_mask to installed_apps. Indeed, it has the same purpose as settings.INSTALLED_APPS: it's a list of apps available in the app cache.

There are a couple of issues with installed_apps:

  1. Installed_apps doesn't really change the installed apps, only the apps seen by the test case. The difference is mostly technical.
  2. From user perspective what is the difference between override_settings(INSTALLED_APPS=something) and installed_apps = something? The answer is that this is needed for technical reasons (see 1.)

So maybe it would suffice to document that installed_apps doesn't really set INSTALLED_APPS and is different than override_settings(INSTALLED_APPS=something).

The installed_apps name is better than app_mask, but maybe there is something even better... allowed_apps? app_dependencies?

Then, I'd like to be sure the API added to the app cache, assert_model_unmasked(model), is useful and convenient.

If it is, it should be used in django/contrib/auth/management/__init__.py and django/contrib/contenttypes/management.py.

In the current code, it isn't clear why auth.Permission or contenttypes.ContentType wouldn't be in get_models(). Have you considered the following alternatives?

try:
    get_model('auth', 'Permission')
except AppMaskedError:
    return
if is_masked(auth.Permission):
    return

They make it more obvious that it's related to masking. I like both, if you go with the second one you must make the function available in django.db.models.

You'll have to adjust django/db/models/manager.py accordingly.

Agreed. Not sure what is the best way. Maybe using get_app always is good enough.

Bikeshedding again — I'm sure we can come up with a better name than expand_dependent. Maybe include_masked_relations or follow_forwards_relations?

include_masked_relations doesn't sound good, in SQL relation is pretty much synonymous for table, and this name could be interpreted that all masked tables should be included. follow_forwards_relations sounds better. Or maybe just cascade? That should immediately ring a bell for SQL users.

You haven't changed anything in django/db/backends/postgresql_psycopg2/operations.py, have you?

Nothing real. I did change something earlier (added a CASCADE to TRUNCATE) but didn't apparently clean up properly.

I have no idea what you're trying to achieve in get_apps(); I can't parse elt[1].__name__.rsplit('.', 1)[0]. This expression overgrew the list comprehension, it's time to switch to a for loop or at least introduce a helper function.

The things in apps are modules related to each apps "models.py" files. The names of these are things like "django.contrib.auth.models", so .rsplit('.', 1)[0] will get you the app name. You are correct that this isn't readable.

I'm not sure what the change in tests/file_storage/tests.py is trying to achieve. Could you add a comment?

file_storage/tests.py depends on servers/tests.py LiveServerBase. LiveServerBase installs a fixture from servers/fixtures. App-masking prevents one from doing this. So, I took the dirtiest way out and just masked the fixture. The proper fix is to refactor LiveServerBase to somewhere else, and not make it include fixtures. In general any dependency between two test apps is a no-no, so this should be fixed no matter what we do with this patch.

And that's it!

I think this patch is fantastic, and I would very much like to have it committed to master before we fork stable/1.6.x. But I don't want to steal your thunder. If that helps, I can iterate on your patch with the changes I suggested. I just need a quick explanation for the two last items. Let me know!

If you have time to work on this, please do. If you don't have time, I will try to find time to improve the patch later this week.

The reason the patch looks like it does is that I first just tried to write a proof of concept. As usual, the proof of concept turned out to be a bit more than that. So, the end result doesn't make much sense in some parts of the patch.

comment:10 by Aymeric Augustin, 11 years ago

For app_mask: what about available_apps?

comment:11 by Marc Tamlyn, 11 years ago

+1 to available_apps - I'd been trying to think of a good name but I think that's about right.

mask was totally confusing terminology to me to start with...

comment:12 by Simon Charette, 11 years ago

@akaariai what's the technical reason that prevent us from reusing the override_settings(INSTALLED_APPS=[...]) mechanism instead of introducing a new one?

Can't TransactionTestCase register a signal receiver and just call cache.set_app_mask?

comment:13 by Anssi Kääriäinen, 11 years ago

The available_apps name sounds good to me.

Currently truncation on PostgreSQL will need to cascade to apps outside available_apps, this is an example where override_settings != available_apps. But now that I think of this the proper fix to this issue isn't automatic cascade outside available_apps - why not include the cascades in available_apps, too? Seems like there is a dependency from the test case's models to the apps we need to cascade, so marking this dependency explicitly seems like the right thing to do. If this leads to crazy amounts of cascades maybe we need to think about why the cascades are needed in the first place.

It isn't actually that easy to find good reasons why override_settings must be different than available_apps. One issue that comes to mind is that available_apps needs to be subset of INSTALLED_APPS. I don't think that needs to be true for override_settings. Another problem is that at least currently we can't actually flush the whole app-cache, so there might be issues where this isn't working like changed INSTALLED_APPS.

Still, I think that marking app-dependencies is special enough to warrant nice declarative API. Also, we might want to mark available_apps for every TestCase in Django so that we can get rid of runtests.py ALWAYS_AVAILABLE_APPS. Check all dependencies of the tests we are going to run. That is our set of INSTALLED_APPS, and no need to include ALWAYS_AVAILABLE_APPS in there. This seems to be a lot easier to achieve with available_apps than override_settings based implementation.

comment:14 by Aymeric Augustin, 11 years ago

The main problem with the app cache is that it's populated at compile time. This task is performed by the ModelBase metaclass.

This has a few unfortunate consequences:

  • importing an application's models module makes its models available in the app cache regardless of INSTALLED_APPS;
  • it's very hard to control (eg. save and restore) the state of the app cache in a given Python process; one can't re-run the import process as Python will just look up the module in sys.modules.

Once in a while someone tries to create models dynamically in a test, and it never works well :)

Since it's impossible to emulate cleanly a different value of INSTALLED_APPS, at least as far as the app cache is concerned, I prefer using another mechanism.

The problem we're trying to work around here is "TransactionTestCase is slow as molasses because it makes 4000 inserts for each test". It makes sense (to me) to solve this at the level of TransactionTestCase.

comment:15 by Simon Charette, 11 years ago

Alright since this alternate mechanism is introduced because of the limitation of the app_cache may I suggest we avoid documenting it until #3591 is fixed and we can revisit the approach taken here?

comment:16 by Aymeric Augustin, 11 years ago

Needs documentation: unset

Yes, we can keep it private for the time being. Anyway it's quite specific to Django's own test suite. I suspect it's fairly uncommon to have hundreds of *unrelated* models.

comment:17 by Aymeric Augustin, 11 years ago

I've started working on a branch: https://github.com/django/django/pull/1240. I will force-push revisions to that branch until it's ready.

I have one significant concern at this point: this idea requires quite a few changes in code that isn't related to tests. I'll look into minimizing these changes.

comment:18 by Preston Holmes, 11 years ago

I've given this a look over and had only a couple minor comments - I'm of course very excited for the speedup!

My only major comment is I'd love for this to be possible without touching the app_cache logic, it seems less than ideal to have a bunch of test optimization code inside the app_cache. I don't think that is any reason alone not to land this improvement. I would at least add some comments as to the purpose of the available/masking business in the docstring, or some comments to get_model*

Here is an idea of how this could be implemented differently, without changing the proposed API from the testcase point of view:

create a separate instance of an app_cache that is a filtered clone of the full version, and then set the global django.db.models.loading.cache global to that filtered version for the testcase, and restore it on teardown. This way all the filtering logic that is specific to test-optimization happens in the testing code, and the app_cache instance doesn't know any different.

While I think that is a bit "better" I really don't feel that it is worth redoing this work given this is working and nearly ready - and it could always be a task that gets done later since we are talking about just some internals of Django's internal test running - I do agree with comments above that this should be private API for now.

It does seem that we should document this in:

https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/unit-tests/

as a practice to follow for Django's own tests.

But overall the benefits are far outweighing any of the cons I've brought up.

comment:19 by Aymeric Augustin, 11 years ago

Well, I think it's worth exploring a few alternative implementations before committing this optimization. Your proposal is interesting.

My main goal is to avoid as much as possible touching non-test code -- ie. anything outside of django/test/ and tests/.

comment:20 by Anssi Kääriäinen, 11 years ago

I think schema migrations will need a way to switch app-cache anyways. Schema migrations will need to use different set of models, not just "mask" some of them. So changing the app-cache is something we will likely need in any case.

Swapping the app-cache isn't zero-risk business. Currently the app-cache class is borg pattern (meaning every instance shares state). This must be first changed so that we can create instances of the app-cache with different state. Then we will also need app-cache swapping logic. To me this is much higher risk than having some filtering capability in the app-cache. In the current patch it is very likely that the app-cache itself will not break if no "app mask" is set. If you remove the borg pattern from app-cache who knows what might break in user apps (likely nothing, but this is much harder to see)?

Another issue is the changes all around Django's code base. Those aren't zero risk. I think we can get rid of almost all of those.

I tried the available_apps = self + ALWAYS_INSTALLED_APPS for every TransactionTestCase, results for PostgreSQL and transactions + transactions_regress is 54s, with Aymeric's patch from pull 1240 it is 5.6s, and on master it is 60s. Full test suite on master is around 4500s, around 750 with self + ALWAYS_INSTALLED_APPS and 300s on pull 1240. In addition there is one test that needs more than its own app (proxy_model_inheritance.tests.ProxyModelInheritanceTests.test_table_exists()). So, ALWAYS_INSTALLED_APPS is possible, but it doesn't give as good performance as setting available_apps to smallest possible set and most of all it doesn't solve any of the hard problems (app-cache changes or changes in non-test code).

I will next try doing the following changes:

  • all dependent apps need to be in the available_apps (that is, delete & truncate cascades, too). This will get rid of the backends changes, and changes in deletion.
  • use of override_settings(INSTALLED_APPS=available_apps). This way we should be able to get rid of changes in the signal listeners for permissions, contenttypes and super user creation. Instead we can disconnect the signal itself.

comment:21 by Anssi Kääriäinen, 11 years ago

I did some more hacking. This time I tried to use override_settings(INSTALLED_APPS). The result is maybe a bit cleaner, but then again, maybe not...

It is now evident that we can't get fully rid of global state. The problem is table truncation and the need to cascade into applications not in current available_apps set. For example you might have app1 and app2 that don't know about each other, but both contain foreign key to auth.user. Now when app1 does flush, it will flush also auth.user, and that flush needs to cascade to app2. This is impossible to do unless we want to use TRUNCATE CASCADE (which could cascade to unmanaged models etc), or we have global installed apps and models available somewhere (likely in global app-cache). So, end result doesn't look much cleaner. (Fundamental problem here is that the database schema itself is global state that is extremely hard to get rid of).

Patch at https://github.com/akaariai/django/tree/pull_1240. We might still try how things look when instantiating multiple app-caches. But I am not sure if that will result in anything truly cleaner.

comment:22 by Aymeric Augustin, 11 years ago

With the current implementation of the app cache, it can contain models from an application that isn't in INSTALLED_APPS. All you need is to import the model, and the metaclass ModelBase will register the model with register_models. It automatically determines the app, see around the comment # Figure out the app_label by looking one level up. in django/db/models/base.py.

So override_settings(INSTALLED_APPS=just_a_few_apps) sounds like a poor API to removing models from the app-cache. Even if INSTALLED_APPS was set to just_a_few_apps in the first place, other models could still be added to the cache if they're imported somehow.

(In the long term, I think the app cache should prevent loading models when their app isn't in INSTALLED_APPS. Maybe app-loading does this, I'm not sure.)

comment:23 by Anssi Kääriäinen, 11 years ago

While working on the experimental branch for using override_settings, I stumbled upon this same issue. The end result is weird - if you use override_settings(INSTALLED_APPS) only apps in INSTALLED_APPS are usable. If you don't apps outside current INSTALLED_APPS are usable.

The branch contains at least one nice addition, too. It seems evident that we can (and IMHO should) get rid of the get_model(masked=True/False) and instead use the "only_installed" flag for this purpose, too. All the places that need changes to masked kwarg already happen to use only_installed.

We can get rid of the delete cascade problem - any model outside currently available apps should be empty, so cascading to those models should not do anything in any case. But the truncate cascade problem is harder. PostgreSQL won't let you truncate a table if all dependant tables aren't truncated in the same query, even if those dependant tables are empty. We can either use TRUNCATE CASCADE (but this can cascade to models outside of Django's control), or the current logic of expanding cascades.

comment:24 by Aymeric Augustin, 11 years ago

In current patches, the app containing the test is implicitly included. But with the new test discovery tests can live outside apps. I understand that including the current app in available_apps would be slightly more verbose, but it's also more consistent and explicit.

The code for implicitly including the current app looks sufficiently robust, I don't have anything against it from a technical point of view. I'm really considering this from an API point of view -- ie. terse vs. explicit.

Last edited 11 years ago by Aymeric Augustin (previous) (diff)

in reply to:  23 ; comment:25 by Aymeric Augustin, 11 years ago

Replying to akaariai:

We can get rid of the delete cascade problem - any model outside currently available apps should be empty, so cascading to those models should not do anything in any case. But the truncate cascade problem is harder. PostgreSQL won't let you truncate a table if all dependant tables aren't truncated in the same query, even if those dependant tables are empty. We can either use TRUNCATE CASCADE (but this can cascade to models outside of Django's control), or the current logic of expanding cascades.


I thought requiring every related app to be in available_apps solved this problem?

I wanted to document it like this: https://github.com/aaugustin/django/commit/4717a1e

If the trade-off is "a few more apps in available_apps" vs. "logic for expanding cascades", I'm choosing the first one :)

As far as I can tell, for Django's own test suite, we don't have the problem because we don't have inter-app FKs, do we?

in reply to:  18 comment:26 by Aymeric Augustin, 11 years ago

Replying to ptone:

create a separate instance of an app_cache that is a filtered clone of the full version, and then set the global django.db.models.loading.cache global to that filtered version for the testcase, and restore it on teardown. This way all the filtering logic that is specific to test-optimization happens in the testing code, and the app_cache instance doesn't know any different.

That's a good idea, but it isn't going to be enough. django.db.models.loading exposes all methods of the app cache singleton as module-level functions. You'd have to swap all these too. Doable, but messy.

Lots of things could be refactored with a cleaner app cache...

in reply to:  25 comment:27 by Aymeric Augustin, 11 years ago

Replying to aaugustin:

I thought requiring every related app to be in available_apps solved this problem?

I missed comment 21. Sorry for the noise.

comment:28 by Aymeric Augustin, 11 years ago

I updated my pull request (https://github.com/django/django/pull/1240) as follows:

  • I simplified the behavior and minimized changes by removing everything that isn't strictly necessary.
  • I updated available_apps declarations, since the current application isnt' implicitly included any longer.
  • I refactored the changes to the app cache, AppCache.available_apps is now a set of app labels, and I added unset_available_apps() which looks slightly better than set_available_apps(None).
  • I added documentation: the API is still explicitly described as private and not subject to the deprecation policy, but it's explained for contributors and committers.

Tests pass under SQLite, I'm stumbling on the same problem as Anssi: flush fails on PostgreSQL. I see two possibilities:

  • use TRUNCATE .. CASCADE -- since this happens on the test database, and isn't recommended outside of Django's test suite, I'm not too concerned about cascading to unmanaged models and destroying data.
  • use DELETE FROM ... instead of TRUNCATE ... -- this needs a benchmark.

Either solution requires a non-trivial amount of changes :(

comment:29 by Aymeric Augustin, 11 years ago

I updated the pull request again to use TRUNCATE ... CASCADE when flushing after a TransactionTestCase when available_apps is set.

Run time reported by the test runner, with Python 2.7:

  • SQLite: 200s
  • Postgres: 273s
  • MySQL: 230s

If memory serves, on the same machine, run time used to be 10 minutes on SQlite, 20 minutes on PostgreSQL, and about the same on MySQL.

This is kinda cool.

comment:30 by Aymeric Augustin, 11 years ago

Patch needs improvement: unset

This patch is about as good as I can make it, it's ready for a final review.

comment:31 by Anssi Kääriäinen, 11 years ago

I wonder if there are databases where TRUNCATE CASCADE isn't available, but TRUNCATE alone isn't possible... I guess such databases could always use DELETE for the tables. So, this seems OK.

I added one comment in the pull request but that isn't anything that can be acted on in this patch.

The biggest worry is that there seems to be zero safety net for querying & modifying data outside the available apps. The question is: how do you know the available_apps you have are actually correct? Tests passing doesn't mean they are correct, it just means that currently the possible data leaks between TransactionTestCases do not matter. Also, how do you know the installed_apps remains correct? (You can create sessions for example in setUp for transaction tests and the data is leaked between tests).

comment:32 by Aymeric Augustin, 11 years ago

Tests tend to blow up rather quickly and badly when you forget something in available_apps, but regardless, you are right -- silent leaks could occur. Now, if a leak doesn't have any consequences, how bad is it? :)

comment:33 by Aymeric Augustin, 11 years ago

A few TestCases (not TransactionTestCases) have available_apps declared in the current patch. I inherited this from your original proof of concept.

Is it still useful, or is it irrevelant now that fixture loading is optimized?

comment:34 by Anssi Kääriäinen, 11 years ago

The available_apps for plain TestCases should be irrelevant.

It is hard to say how bad state leaks are. These are usually annoying to fix (run single test app and tests pass, run the full test suite and you get a failure). I guess we can commit this and then add some autodetection of leaked state if need be.

comment:35 by Aymeric Augustin, 11 years ago

Triage Stage: AcceptedReady for checkin

I've ensured that every TransactionTestCase and every LiveServerTestCase has available_apps (possibly set to the empty list). I've removed superfluous declarations. I've improved the docs a bit.

Updated PR: https://github.com/django/django/pull/1240

comment:36 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 4daf570b98cc840e1a154f3876bc7463924cb9ae:

Added TransactionTestCase.available_apps.

This can be used to make Django's test suite significantly faster by
reducing the number of models for which content types and permissions
must be created and tables must be flushed in each non-transactional
test.

It's documented for Django contributors and committers but it's branded
as a private API to preserve our freedom to change it in the future.

Most of the credit goes to Anssi. He got the idea and did the research.

Fixed #20483.

comment:37 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In c6e6d4eeb776c473567362405cdbc6a0328eb194:

Defined available_apps in relevant tests.

Fixed #20483.

comment:38 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In dfcce4288ac6887e9995c48f2fde38f555cec03e:

Fixed available_apps for selenium tests.

Refs #20483.

comment:39 by Anssi Kääriäinen, 11 years ago

We have a problem... flush is called as part of teardown of TXTestCase. So flush installs contenttypes / permissions / based on the current test's available_apps, but the contenttypes/permissions are used by the next test. This of course isn't correct.

This isn't easy to fix. If we move flush to pre-test, that leads to other problems (we moved flush to post-test just some time ago). We can't just stop creating contenttypes & permissions in flush unconditionally. But I think that is still the way to go: flush the db after test so that it is totally empty, the next test will then add required data, including contenttypes & permissions to the test database. But I don't know how to actually do this. Maybe we could just call "flush but don't create anything" in post-teardown, then explicitly call "create data" in setup. But this is a scary change to do in alpha stage...

As the changes in this ticket affect only Django's test suite and everything works OK there the situation isn't that bad. But this should still be fixed ASAP.

I noticed the problem while working on some more test speedups - some tests use syncdb directly in TestCase subclasses and the result is creation of a lot of contenttypes & permissions for no reason. As a positive sidenote: the branch gives almost 20% faster tests with relatively minor changes, that is 200s -> 165s on sqlite.

comment:40 by Aymeric Augustin, 11 years ago

Resolution: fixed
Severity: NormalRelease blocker
Status: closednew
Triage Stage: Ready for checkinAccepted

comment:41 by Aymeric Augustin, 11 years ago

Resolution: fixed
Severity: Release blockerNormal
Status: newclosed
Triage Stage: AcceptedReady for checkin

This ticket is too long, I've created #20579 to discuss this problem. Re-closing.

comment:42 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 55cbd65985bfad02512a64a4cb8468140f15ee84:

Fixed #20579 -- Improved TransactionTestCase.available_apps.

Also moved its documentation to the 'advanced' section. It doesn't
belong to the 'overview'. Same for TransactionTestCase.reset_sequences.

When available_apps is set, after a TransactionTestCase, the database
is now totally empty. post_syncdb is fired at the beginning of the next
TransactionTestCase.

Refs #20483.

comment:43 by Simon Charette <charette.s@…>, 8 years ago

In 973cfd2:

Refs #20483 -- Implemented cascaded flush on Oracle.

The initial implementation added support for PostgreSQL but it is also required
on Oracle (13b7f299de79e3eb101c3f015386eba39a8f3928).

Thanks Mariusz Felisiak for the foreign key retreival queries.

comment:44 by GitHub <noreply@…>, 7 years ago

In 5e9f7f1:

Refs #20483 -- Removed unneeded column from _foreign_key_constraints() on Oracle.

comment:45 by Tim Graham <timograham@…>, 6 years ago

In ce8b65a:

Fixed #30054 -- Implemented cascaded flush on SQLite.

This is required to maintain foreign key integrity when using
TransactionTestCase.available_apps.

Refs #30033, #14204, #20483.

comment:46 by Tim Graham <timograham@…>, 6 years ago

In 2b2ae4ee:

Refs #30054, #20483 -- Cached SQLite references graph retrieval on sql_flush().

comment:47 by Tim Graham <timograham@…>, 6 years ago

In ec7bf6d:

Refs #20483 -- Cached Oracle references retrieval on sql_flush().

Note: See TracTickets for help on using tickets.
Back to Top