Opened 14 years ago

Closed 14 years ago

Last modified 12 years ago

#13442 closed (invalid)

only() and defer() will always select id

Reported by: mampf Owned by:
Component: Database layer (models, ORM) Version: 1.1
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi there,

when you do a query with defer() or only(), you will still execute a SELECT query including the object's id. This makes it impossible to use distinct() on it.

last_5_uploaded_files = UploadFile.objects.order_by('upload_time')[:5] 
last_changed_list = MainObject.objects \ 
    .filter(upload_file__in=last_5_uploaded_files) \ 
    .only('field_1', 'hostname', 'target', 'modifed_date') \
    .distinct() \ 
    .order_by('-modified_date')[:50]

    response = render_to_response('output/mainobject_list.html',        
    {'object_list': last_changed_list,})

    connection.queries
    print("")
    # Still includes MainObject.id! WHY?
    print("current queue: %s" % connection.queries)
    print("")

    return response

Now if I print the executed SQL using connection.queries I get an SQL statement which includes queryset.id. This is why my distinct won't word - I want a distinct list created without id.
On the other hand, .values('field_1', 'field_n') does work. But this way I do not get the foreign table resolved, just it's id.

Please change only() and defer(), so it doesn't select the id in any case.

Thanks

Change History (4)

comment:1 by Luke Plant, 14 years ago

Resolution: invalid
Status: newclosed

The docs state that the primary key can never be deferred. We have to include the primary key, because the other fields in the return objects need to be lazily loaded. For that to work, we need to store the id so that we can later retrieve the right row of the database if needed.

Your use case doesn't really fit the reasons why either distinct() or only() exist. only() is about only retrieving certain fields of objects, not generating arbitrary SQL. You may need to use some raw SQL to get what you want.

Thanks.

comment:2 by mampf, 14 years ago

Hi lukeplant,

thanks for your answer. Here are some additional thoughts.

I used this document: http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer-fields
It doesn't state that that the primary key an never be deferred.

only() is about only retrieving certain fields of objects

I thought I was doing this. But how useful is distinct when I need raw SQL to make it useful anyway? I see no usecase for that. The docs say:

However, if your query spans multiple tables, it's possible to get duplicate results when a QuerySet is evaluated. That's when you'd use distinct().

If I span over multiple tables, I always get ids as seen above. This is why I will never have an use case for distinct() then.

Thanks for your answer. It made some thinks about how the orm works a little clearer to me.

in reply to:  2 comment:3 by Luke Plant, 14 years ago

Replying to mampf:

Hi lukeplant,

thanks for your answer. Here are some additional thoughts.

I used this document: http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer-fields
It doesn't state that that the primary key an never be deferred.

You need to read it again :-)

only() is about only retrieving certain fields of objects

I thought I was doing this. But how useful is distinct when I need raw SQL to make it useful anyway? I see no usecase for that. The docs say:

However, if your query spans multiple tables, it's possible to get duplicate results when a QuerySet is evaluated. That's when you'd use distinct().

If I span over multiple tables, I always get ids as seen above. This is why I will never have an use case for distinct() then.

The use case for distinct is, as stated, to eliminate duplicate results (which will have the same id).

comment:4 by Anssi Kääriäinen, 12 years ago

Component: ORM aggregationDatabase layer (models, ORM)
Note: See TracTickets for help on using tickets.
Back to Top