#34325 closed Cleanup/optimization (fixed)
Clarify PercentRank() description.
Reported by: | dennisvang | Owned by: | dennisvang |
---|---|---|---|
Component: | Documentation | Version: | 4.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
The documentation for the PercentRank window function says:
Computes the percentile rank of the rows in the frame clause. This computation is equivalent to evaluating:
(rank - 1) / (total rows - 1)
(my emphasis)
However, I'm not so sure "percentile rank" is the correct term.
If you look up the (statistical) term "percentile rank" online, you'll find various definitions, ranging from
(CF - 0.5 * F) / N
where CF—the cumulative frequency—is the count of all scores less than or equal to the score of interest, F is the frequency for the score of interest, and N is the number of scores in the distribution.
<number of values less than the score of interest> / <total number of values in the data set>
(equivalent to (CF - F) / N
)
Both definitions are also used e.g. by scipy.
The latter definition is similar to that in the Django docs, but still subtly different in the denominator.
Note also that the documentation for the percent_rank
function in the SQLite and PostgreSQL database backends does not mention "percentile rank" at all. Instead, they use the term "relative rank."
To prevent confusion, wouldn't it be better to use the same terminology as the database backends?
Change History (9)
comment:1 by , 21 months ago
Description: | modified (diff) |
---|
comment:2 by , 21 months ago
Description: | modified (diff) |
---|
comment:3 by , 21 months ago
Description: | modified (diff) |
---|
comment:4 by , 21 months ago
Description: | modified (diff) |
---|
follow-up: 6 comment:5 by , 21 months ago
Summary: | PercentRank confusion → Clarify PercentRank() description. |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:6 by , 21 months ago
Replying to Mariusz Felisiak:
Agreed, "relative rank" is less confusing. Would you like to prepare a patch?
Certainly. Please have a look at https://github.com/django/django/pull/16539
I also replaced "Percent Rank" in the corresponding *table* by "Relative Rank," but I'm not sure if that's necessary. An alternative would be to use "PercentRank," without the space, to match the name of the function.
comment:7 by , 21 months ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Triage Stage: | Accepted → Ready for checkin |
Agreed, "relative rank" is less confusing. Would you like to prepare a patch?