Opened 3 years ago
Closed 3 years ago
#33711 closed New feature (duplicate)
Faceted queries.
Reported by: | Antoine LAURENT | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello.
In my current projet, I create an aggregation function that relies on PostgreSQL GROUPING SETS to compute facets on a queryset.
This is how we use it :
> aggregate_sql(series_qs, author='author__name, kinds='kind') { 'authors': {' Agatha Christie': 34, 'Brandon Sanderson': 18, ...}, 'kinds': {'Fantasy': 25, ...}, 'total': 70, }
It's the kind of feature we use a lot on all our lists to have a better user experience with our filters.
Since I think this might be of interest to other people, I would love to add this to Django.
I'm currently unsure on were to add this feature in Django. It would be really easy to use on a queryset object (like values()') but I don't think GROUPING exists on all supported databases, so maybe add it in
django.contrib.postgres` (without the queryset function) ?
I pasted the code bellow.
I relied on queryset.values_list()
to build a SQL query, and then tweaked it to add GROUPING SETS and unnesting ArrayFields.
I will need to re-write the code to build the SQL query properly.
import re from django.contrib.postgres import fields as pg_fields from django.core.exceptions import EmptyResultSet from django.db import connection def aggregate_sql(queryset, **agg_fields): """ Get count for each given agg_fields. """ def _get_first_non_null(data): for i, value in enumerate(data[:-1]): if value: return i, value return None, None db_table = queryset.model._meta.db_table db_pk = queryset.model._meta.pk.name # With pg_fields.ArrayFields, we want to use PostgreSQL unnest function to count the values inside the list unnest_fields = [] field_names: list[str] = list(agg_fields.values()) for field_name in field_names: model = queryset.model while True: first, _delimiter, second = field_name.partition('__') if second: # get related model try: model = getattr(model, first).field.related_model except AttributeError: model = getattr(model, first).related.related_model field_name = second else: field = model._meta.get_field(first) unnest_fields.append(isinstance(field, pg_fields.ArrayField)) break # RAW SQL explenation : # ~~~~~~~~~~~~~~~~~~~~~ # GROUPING SETS will allow us to make a count on each value of each given field. # We need COUNT(DISTINCT id) because we might unnest some ArrayFields to get seperated values. # We use the ORDER_BY clause to ensure that even if some fields have None values, we can # identify the total count as the first line. # # SELECT year, unnest(propulsion), brand_id, COUNT(id) # FROM device # GROUP BY GROUPING SETS (year, unnest(propulsion), brand_id, ()) # ORDER BY count DESC; # # year | unnest | brand_id | count # ------+-----------------+--------------------------------------+------- # | | | 10128 # | electric | | 8412 # | | | 6818 # | | a58e4174-22dc-4001-aa78-dce815b137b0 | 6025 # | | b4d86cbf-2ed7-4605-b756-cac63ce73e46 | 3004 # 2021 | | | 1909 # | human | | 1687 # | | b57bd339-042a-4a41-8815-1f9cbfaa8ef9 | 1099 # 2018 | | | 920 # 2019 | | | 407 # 2020 | | | 74 # | electric_assist | | 29 # # The third line (with 6818 results) counts the devices with year=None # There is currently no way to distinguish None values from different fields # Prepare results result_keys = list(agg_fields.keys()) aggregation = {name: {} for name in result_keys} try: # Use values_list to build most of the SQL query # NB: remove any order that will intefer with the sql parsing later on django_query, params = queryset.values_list(*field_names).order_by().query.sql_with_params() query_start, _delimiter, query_end = django_query.partition('FROM') except EmptyResultSet: aggregation['total'] = 0 return aggregation # extract original select items and add unnest function if required SQL_REGEX = r'SELECT (DISTINCT )?(.*) ' select_items = re.search(SQL_REGEX, query_start).groups()[1] sql_fields = [] for field_name, unnest in zip(select_items.split(', '), unnest_fields): if unnest: sql_fields.append(f'unnest({field_name})') else: sql_fields.append(field_name) fields = ', '.join(sql_fields) query = f""" SELECT {fields}, COUNT(DISTINCT "{db_table}"."{db_pk}") FROM {query_end} GROUP BY GROUPING SETS ({fields}, ()) ORDER BY count DESC """ with connection.cursor() as cursor: cursor.execute(query, params) results = cursor.fetchall() # As stated is the previous comment, the first line is the total count aggregation['total'] = results[0][-1] for data in results: indice, value = _get_first_non_null(data) # if indice si None, then we have a count on an None value from one of the fields if indice is not None: aggregation_name = result_keys[indice] aggregation[aggregation_name][str(value)] = data[-1] return aggregation
Change History (1)
comment:1 by , 3 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Resolution: | → duplicate |
Status: | new → closed |
Summary: | Suggested feature : Faceted queries → Faceted queries. |
Type: | Uncategorized → New feature |
Duplicate of #27646.