#34884 closed Cleanup/optimization (wontfix)
Half bug/half enhancement : inconsistent behavior of get_or_create() regarding related attributes cache
Reported by: | Laurent Lyaudet | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | ORM get_or_create |
Cc: | laurent.lyaudet@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
The sample script below demonstrates that get_or_create() avoid unnecessary DB query for related object given as argument only if created is True.
It would be nice if the case created is False could be made coherent with it.
Setting the cache when possible would avoid having to do all over the code:
my_object = MyModel.objects.get_or_create(related_object=related_object) my_object.related_object = related_object
I can code a patch and provide it in a few days if my idea is accepted :)
Here is the script:
In [1]: from django.db import connection In [2]: from django.contrib.contenttypes.models import ContentType In [3]: from django.contrib.auth.models import Permission In [4]: content_type = ContentType.objects.get_for_model(Permission) In [5]: permission, created = Permission.objects.get_or_create(name="Test", content_type=content_type, codename="Test") In [6]: created Out[6]: True In [7]: connection.queries Out[7]: [{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON typnamespace = ns.oid WHERE typname = 'hstore'", 'time': '0.001'}, {'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'", 'time': '0.000'}, {'sql': 'SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE ("django_content_type"."app_label" = \'auth\' AND "django_content_type"."model" = \'permission\')', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}, {'sql': 'INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES (\'Test\', 2, \'Test\') RETURNING "auth_permission"."id"', 'time': '0.001'}] In [8]: permission.content_type Out[8]: <ContentType: permission> In [9]: connection.queries Out[9]: [{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON typnamespace = ns.oid WHERE typname = 'hstore'", 'time': '0.001'}, {'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'", 'time': '0.000'}, {'sql': 'SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE ("django_content_type"."app_label" = \'auth\' AND "django_content_type"."model" = \'permission\')', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}, {'sql': 'INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES (\'Test\', 2, \'Test\') RETURNING "auth_permission"."id"', 'time': '0.001'}] In [10]: permission, created = Permission.objects.get_or_create(name="Test", content_type=content_type, codename="Test") In [11]: created Out[11]: False In [12]: connection.queries Out[12]: [{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON typnamespace = ns.oid WHERE typname = 'hstore'", 'time': '0.001'}, {'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'", 'time': '0.000'}, {'sql': 'SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE ("django_content_type"."app_label" = \'auth\' AND "django_content_type"."model" = \'permission\')', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}, {'sql': 'INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES (\'Test\', 2, \'Test\') RETURNING "auth_permission"."id"', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}] In [13]: permission.content_type Out[13]: <ContentType: permission> In [14]: connection.queries Out[14]: [{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON typnamespace = ns.oid WHERE typname = 'hstore'", 'time': '0.001'}, {'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'", 'time': '0.000'}, {'sql': 'SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE ("django_content_type"."app_label" = \'auth\' AND "django_content_type"."model" = \'permission\')', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}, {'sql': 'INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES (\'Test\', 2, \'Test\') RETURNING "auth_permission"."id"', 'time': '0.001'}, {'sql': 'SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = \'Test\' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = \'Test\')', 'time': '0.001'}, {'sql': 'SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE "django_content_type"."id" = 2', 'time': '0.001'}]
Best regards,
Laurent Lyaudet
Change History (10)
comment:2 by , 16 months ago
The previous patch lowered the running time of my automatic tests from 195 s to 181 s which is quite good for such a small patch.
I tried also a slightly more powerful patch but it yields no mesurable gain.
The second patch on get
can totally replace the previous patch, you don't need both.
original_get = QuerySet.get def patched_get(self, *args, **kwargs): result = original_get(self, *args, **kwargs) for key, value in kwargs.items(): try: if isinstance(result._meta.get_field(key), ForeignKey): # isinstance handles OneToOneField also. setattr(result, key, value) except FieldDoesNotExist: pass return result QuerySet.get = patched_get
This patch could still be improved to handle args also.
But I don't see a use case where one use:
my_object = MyModel.objects.get(Q(related_object=related_object))
instead of
my_object = MyModel.objects.get(related_object=related_object)
And clearly it could not be patched if one use ~Q()
or Q() | Q()
.
follow-up: 4 comment:3 by , 16 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Hello Laurent, thank you for your ticket.
As far as I understand your report, the key to your issue is about using select_related
when getting the Permission
you need. Specifically, see how if I create a permission using the content_type's PK, the resulting object does not have the ContentType instance fetched by default:
>>> permission, created = Permission.objects.get_or_create(name="Test", content_type_id=content_type.pk, codename="OtherTest") >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 BEGIN INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES ('Test', 2, 'OtherTest') RETURNING "auth_permission"."id" COMMIT >>> permission.content_type <ContentType: Authentication and Authorization | permission> >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 BEGIN INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES ('Test', 2, 'OtherTest') RETURNING "auth_permission"."id" COMMIT SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE "django_content_type"."id" = 2 LIMIT 21
You can see the extra query at the end when the content_type
attribute is accessed. Similarly, you can avoid the "extra" query when the object exists by doing:
>>> permission, created = Permission.objects.select_related("content_type").get_or_create(name="Test", content_type_id=content_type.pk, codename="OtherTest") >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 >>> permission.content_type <ContentType: Authentication and Authorization | permission> >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21
comment:4 by , 16 months ago
Hello Natalia,
Your solution is suboptimal to say the least and denotes a lack of thought on what happens under the hood.
I was somehow prepared for such an answer that proves that BS reigns suprems in most FOSS.
Here are the problems :
- First: adding a select_related is as cumbersome to write than the line:
my_object.related_object = related_object
in my first ticket description :
my_object = MyModel.objects.get_or_create(related_object=related_object) my_object.related_object = related_object
- Second: adding a select_related is worse than the line above when it comes to the extra computation that you add both in Python and in the DBMS.
This join is useless, it is superfluous work for the machine, bad programming.
- Third and most important: for the consistency of your data, with select_related you now have 2 objects of type ContentType.
If you update one of these 2 objects in Python, the other is not updated unless, you save the first and refresh from db the second.
This is plain bad programming.
- Fourth: setting arrogantly * status: new => closed * resolution: => invalid is a full proof of how BS this kind of answer is.
Replying to Natalia Bidart:
Hello Laurent, thank you for your ticket.
As far as I understand your report, the key to your issue is about using
select_related
when getting thePermission
you need. Specifically, see how if I create a permission using the content_type's PK, the resulting object does not have the ContentType instance fetched by default:
>>> permission, created = Permission.objects.get_or_create(name="Test", content_type_id=content_type.pk, codename="OtherTest") >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 BEGIN INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES ('Test', 2, 'OtherTest') RETURNING "auth_permission"."id" COMMIT >>> permission.content_type <ContentType: Authentication and Authorization | permission> >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename" FROM "auth_permission" WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 BEGIN INSERT INTO "auth_permission" ("name", "content_type_id", "codename") VALUES ('Test', 2, 'OtherTest') RETURNING "auth_permission"."id" COMMIT SELECT "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_content_type" WHERE "django_content_type"."id" = 2 LIMIT 21You can see the extra query at the end when the
content_type
attribute is accessed. Similarly, you can avoid the "extra" query when the object exists by doing:
>>> permission, created = Permission.objects.select_related("content_type").get_or_create(name="Test", content_type_id=content_type.pk, codename="OtherTest") >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21 >>> permission.content_type <ContentType: Authentication and Authorization | permission> >>> print("\n\n".join(i["sql"] for i in connection.queries)) SELECT "auth_permission"."id", "auth_permission"."name", "auth_permission"."content_type_id", "auth_permission"."codename", "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "auth_permission" INNER JOIN "django_content_type" ON ("auth_permission"."content_type_id" = "django_content_type"."id") WHERE ("auth_permission"."codename" = 'OtherTest' AND "auth_permission"."content_type_id" = 2 AND "auth_permission"."name" = 'Test') LIMIT 21
comment:5 by , 16 months ago
Laurent, first of all, please maintain a professional tone and adhere to the Django Code of Conduct when composing/sending messages.
Secondly, if you disagree with this resolution, you are welcomed to follow the documentation and start a new conversation on the Django Forum, where you'll reach a wider audience and likely get more feedback.
follow-up: 7 comment:6 by , 16 months ago
Hi Laurent,
To echo what Natalian said, please avoid comments like "in case some bikeshedding occurs that puts to trash coherency and efficiency in yet another open source software where some people against free software have influence". It adds nothing helpful. Assuming bad faith on the part of the Django community is no way to introduce yourself and get people to want to work with you. Thank you.
As for the issue, I think it's a bit of a niche case. I mocked up a patch using your proposal. I'm not sure that the extra complexity is worthwhile, as well as the performance penalty of iterating through the kwargs
(which is unnecessary when a foreign key isn't present). The patch may need consideration for other relations like GenericForeignKey
. I'm not sure offhand. I wonder if update_or_create()
has a similar inconsistency. I'm afraid that trying to make the ORM smarter about this might open a can of worms.
Cheers!
-
django/db/models/query.py
diff --git a/django/db/models/query.py b/django/db/models/query.py index 1125302933..818d31faac 100644
a b class QuerySet(AltersData): 939 939 Return a tuple of (object, created), where created is a boolean 940 940 specifying whether an object was created. 941 941 """ 942 from django.db.models.fields.related import ForeignKey 942 943 # The get() needs to be targeted at the write database in order 943 944 # to avoid potential transaction consistency problems. 944 945 self._for_write = True 945 946 try: 946 return self.get(**kwargs), False 947 obj, created = self.get(**kwargs), False 948 for key, value in kwargs.items(): 949 try: 950 if isinstance(obj._meta.get_field(key), ForeignKey): 951 # isinstance handles OneToOneField also. 952 setattr(obj, key, value) 953 except exceptions.FieldDoesNotExist: 954 pass 955 return obj, created 947 956 except self.model.DoesNotExist: 948 957 params = self._extract_model_params(defaults, **kwargs) 949 958 # Try to create an object using passed params. … … class QuerySet(AltersData): 953 962 return self.create(**params), True 954 963 except IntegrityError: 955 964 try: 965 # Another get() case that would need to be updated. 956 966 return self.get(**kwargs), False 957 967 except self.model.DoesNotExist: 958 968 pass -
tests/get_or_create/tests.py
diff --git a/tests/get_or_create/tests.py b/tests/get_or_create/tests.py index 0b56d6b1a2..4aac3ff580 100644
a b class GetOrCreateTests(TestCase): 129 129 self.assertEqual(book.authors.count(), 2) 130 130 131 131 # Try creating a book through an author. 132 _, created = ed.books.get_or_create(name="Ed's Recipes", publisher=p)132 eds_recipes, created = ed.books.get_or_create(name="Ed's Recipes", publisher=p) 133 133 self.assertTrue(created) 134 # In the created=True case, the publisher is set on the new object. 135 with self.assertNumQueries(0): 136 self.assertEqual(eds_recipes.publisher.name, 'Acme Publishing') 137 138 # In the case of created=False, the publisher isn't set on the object. 139 eds_recipes, created = ed.books.get_or_create(name="Ed's Recipes", publisher=p) 140 self.assertFalse(created) 141 # This assertion fails due to the query required to fetch the 142 # publisher, demonstrating #34884. 143 with self.assertNumQueries(0): 144 self.assertEqual(eds_recipes.publisher.name, 'Acme Publishing') 134 145 135 146 # Now Ed has two Books, Fred just one. 136 147 self.assertEqual(ed.books.count(), 2)
comment:7 by , 15 months ago
Hi Natalia, Hi Tim,
I did what was needed : https://pypi.org/project/django-monkey-patches/
Replying to Tim Graham:
Hi Laurent,
To echo what Natalian said, please avoid comments like "in case some bikeshedding occurs that puts to trash coherency and efficiency in yet another open source software where some people against free software have influence". It adds nothing helpful. Assuming bad faith on the part of the Django community is no way to introduce yourself and get people to want to work with you. Thank you.
Sorry, that's the result of previous interactions with the FOSS community among which Django here:
https://groups.google.com/g/django-developers/c/257nJTrf1Qk/m/7wz1AVj_AwAJ
where the silence closed the discussion.
As for the issue, I think it's a bit of a niche case. I mocked up a patch using your proposal. I'm not sure that the extra complexity is worthwhile, as well as the performance penalty of iterating through the
kwargs
(which is unnecessary when a foreign key isn't present). The patch may need consideration for other relations likeGenericForeignKey
. I'm not sure offhand. I wonder ifupdate_or_create()
has a similar inconsistency.
In my tests, as I previously said, there is a global performance gain of more than 7 % ! (for such a small patch), despite the very small penalty in some cases.
There is several orders of magnitude between iterating over some kwargs and avoiding a DB query (a fortiori if you have an n-tiers architecture).
update_or_create()
is already patched since it uses get_or_create()
internally: https://github.com/django/django/blob/f7389c4b07ceeb036436e065898e411b247bca78/django/db/models/query.py#L967
I'm afraid that trying to make the ORM smarter about this might open a can of worms.
FUD, if it's not precise, it's not constructive and does not help having a clear view of the question.
Maybe you do not it intentionally, but maybe you do, I can't know.
https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt
Replying to Natalia Bidart:
Laurent, first of all, please maintain a professional tone and adhere to the Django Code of Conduct when composing/sending messages.
Yes but it says "This code applies equally to founders, mentors and those seeking help and guidance."
And closing my ticket without an open discussion before is a kind of slap in the face that is unfortunately current on the Internet.
It says also "Be friendly and patient", "Be welcoming", "Be considerate", "Be respectful".
I'm not the only one who does not strictly followed this code of conduct here.
Secondly, if you disagree with this resolution, you are welcomed to follow the documentation and start a new conversation on the Django Forum, where you'll reach a wider audience and likely get more feedback.
Well if the forum is as responsive as the dev google-group...
Sorry, I like truth and progress, my energy can bring a lot of good to the world, but it cannot overcome people that do not want to try to see the problem at hand like me for 5 minutes.
Best regards,
Laurent Lyaudet
comment:8 by , 15 months ago
Resolution: | invalid → wontfix |
---|---|
Type: | Uncategorized → Cleanup/optimization |
where the silence closed the discussion.
Unfortunately Django cannot accommodate every feature request. If no one finds your proposal compelling enough to respond, perhaps it is too niche. In the mailing list thread you mentioned, I'm afraid that the lack of indentation may have prevented some engagement. I guess the forum doesn't suffer from that issue. I'm not active on the forum, but I have heard that it's more active than the mailing list.
I'm afraid that trying to make the ORM smarter about this might open a can of worms.
What I meant is that other changes may be required to make created and non-created model instances identical. The patch I offered already adds non-trivial complexity and only handles foreign keys.
And closing my ticket without an open discussion before is a kind of slap in the face that is unfortunately current on the Internet.
I understand that it can come off that way, but as you see, a closed ticket doesn't end the discussion. We don't like to leave tickets in an untriaged state while the discussion takes place. I am going to change the ticket resolution to "wontfix" because I think it's something that could be changed if there is clear community consensus. I offer my opinion that I don't think the proposal would be accepted, but no one is stopping you from engaging further if this feature is important enough to you. A more compelling proposal would address my doubt about whether or not foreign key caching is the only difference between created and non-created model instances.
comment:9 by , 14 months ago
Hello Tim,
Thanks for your constructive answer.
I did not think about making the created and the non-created model instances strictly identical.
I had a much simpler goal, my approach was more agile and more bazaar on this topic.
But I like the cathedral way of thinking.
And it's maybe more appropriate to think about it for a framework where you have no control on the code using the framework.
This should not be possible to make it so, since the "API"/convention of get_or_create() is to apply the defaults values (from "defaults" kwarg) only on creation.
As of now it is not clear yet, in my mind, how hacky a "field taken from a query_filter" can be processed.
I mean, for example, is it possible to have a field that is a valid component of a query filter but also has a "setter" hidden in the class code ?
Can you have :
my_counter = models.IntegerField(...)
my_foreign_key = models.ForeignKey(...)
with some custom class code such that
whenever my_foreign_key is set the field my_counter is incremented.
I would be able to do it with a property but the property would have a name distinct from "my_foreign_key".
If I can be provocative, someone that do this deserves to debug his code when he updates his framework ;) XD.
More seriously, almost anything is possible in the code using Django,
and thus it is about direction.
Breaking changes can happen in any framework on major version releases.
I think my goal/direction was reasonable.
I am just trying to correct a cache inconsistency,
whilst having identical instances in both cases seems to be impossible.
In my point of view, the trade-off is positive,
but I'm open to think about it further.
In any case, correcting this cache inconsistency would be a step toward identical instances in both cases
Can you restrict your goal of identical instances in both cases to something more "possible" ?
What kind of compatibility problems do you foresee ?
Best regards,
Laurent Lyaudet
comment:10 by , 14 months ago
I have said my piece. I don't think the additional complexity in Django is warranted for what looks like a niche case. The workaround of setting the attribute after get_or_create()
isn't particularly burdensome. You can try to solicit the opinions of others if you are so motivated.
I coded a monkey patch for my use case where it will not be soon to upgrade Django and in case some bikeshedding occurs that puts to trash coherency and efficiency in yet another open source software where some people against free software have influence.
Here it is :
get_or_create_monkey_patch.py
You can execute it at the end of the previous script + some test and obtain this: