document that order_by('?') is a huge performance issue

order_by('?') generates an SQL query that is horrendous from a performance point of view (the "ORDER BY RAND() LIMIT" type query).

Info on this:

For the current state of affairs, I think at the very least a warning should be added to .
That page happily states that you can use the method for obtaining a random row, but in a real scenario that is a very bad idea, and should be avoided at all costs.

On a more useful approach, maybe extra code could be added to a model's Meta class if you plan on grabbing random rows from that particular table. This could set up needed tables/columns/constraints in order to extract a random row without such a big performance hit. If you use order_by('?') on a model with this Meta setting, the enhancement would be transparent. How and if this improvement could be implemented is open for discussion, and is probably database dependent. The page I linked above has some discussion on the topic.

Added a warning sentence that order_by('?') may be expensive and slow

comment:1 by Simon G.

I think it's fairly common knowledge that ORDER BY RAND is horrifically inefficient, but it's probably a good idea to place a warning there. Want to write one up?

As for implementing a better random, I think the costs outweigh the benefits, especially if it does mean cracking into weird SQL dialects. This is something to raise on django-developers.

comment:2 by Matt Boersma

comment:3 by Adrian Holovaty

(In [6293]) Fixed #5267 -- Documented that order_by('?') queries can be slow

