GROUP BY-clausule

2025-07-21

Van toepassing op: aangevinkt als ja Databricks SQL Databricks Runtime

De GROUP BY component wordt gebruikt om de rijen te groeperen op basis van een set opgegeven groeperingsexpressies en rekenaggregaties voor de groep rijen op basis van een of meer opgegeven statistische functies. Databricks SQL biedt ook ondersteuning voor geavanceerde aggregaties om meerdere aggregaties uit te voeren op dezelfde set van invoerrecords via de clausules GROUPING SETS, CUBE, ROLLUP. De groeperingsexpressies en geavanceerde aggregaties kunnen worden gemengd in de GROUP BY clausule en genest in een GROUPING SETS clausule.

Zie meer informatie in de sectie Mixed/Nested Grouping Analytics.

Wanneer een FILTER component is gekoppeld aan een statistische functie, worden alleen de overeenkomende rijen doorgegeven aan die functie.

Syntaxis

GROUP BY ALL

GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ]

GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [, ...] ) } [, ...]

grouping_set
   { expression |
     ( [ expression [, ...] ] ) }

Aggregatiefuncties worden gedefinieerd als

aggregate_name ( [ DISTINCT ] expression [, ...] ) [ FILTER ( WHERE boolean_expression ) ]

Parameterwaarden

ALLE

Van toepassing op: Databricks SQL Databricks Runtime 12.2 LTS en hoger

Een korte notatie voor het toevoegen van alle SELECT-list-expressies die geen statistische functies als group_expressions bevatten. Als er geen dergelijke expressie bestaat GROUP BY ALL , is gelijk aan het weglaten van de GROUP BY component die resulteert in een globale aggregatie.

GROUP BY ALL is niet gegarandeerd een set groepsexpressies te produceren die kunnen worden opgelost. Azure Databricks geeft UNRESOLVED_ALL_IN_GROUP_BY of MISSING_AGGREGATION als de geproduceerde clausule niet goed is gevormd.
group_expression

Hiermee geeft u de criteria voor het groeperen van rijen samen. De groepering van rijen wordt uitgevoerd op basis van resultaatwaarden van de groeperingsexpressies. Een groeperingsexpressie kan een kolomnaam zijn, zoals GROUP BY akolompositie, zoals GROUP BY 0, of een expressie zoals GROUP BY a + b. Als group_expression een aggregatiefunctie bevat, zal Azure Databricks een GROUP_BY_AGGREGATE fout genereren.
grouping_set

Een groeperingsset wordt opgegeven door nul of meer door komma's gescheiden expressies tussen haakjes. Wanneer de groeperingsset slechts één element heeft, kunnen haakjes worden weggelaten. Is bijvoorbeeld GROUPING SETS ((a), (b)) hetzelfde als GROUPING SETS (a, b).
GROEPERINGSSETS

Groepeert de rijen voor elke groeperingsset die na GROUPING SETS zijn gespecificeerd. Voorbeeld:

GROUP BY GROUPING SETS ((warehouse), (product)) is semantisch gelijk aan een samenvoeging van resultaten van GROUP BY warehouse en GROUP BY product.

Deze clausule is een verkorte notatie voor een UNION ALL waarbij elke tak van de UNION ALL-operator aggregatie uitvoert van elke groeperingsset zoals gespecificeerd in de GROUPING SETS-clausule.

Op dezelfde manier is GROUP BY GROUPING SETS ((warehouse, product), (product), ()) semantisch gelijk aan de samenvoeging van resultaten van GROUP BY warehouse, product, GROUP BY product en een globale aggregatie.

Notitie

Voor Hive-compatibiliteit biedt Databricks SQL ondersteuning voor GROUP BY ... GROUPING SETS (...). De GROUP BY expressies worden meestal genegeerd, maar als ze extra expressies bevatten naast de GROUPING SETS expressies, worden de extra expressies opgenomen in de groeperingsexpressies en is de waarde altijd null. In SELECT a, b, c FROM ... GROUP BY a, b, c GROUPING SETS (a, b)is de uitvoer van kolom c bijvoorbeeld altijd null.

ROLLUP

Hiermee geeft u meerdere niveaus van aggregaties in één instructie. Deze component wordt gebruikt voor het berekenen van aggregaties op basis van meerdere groeperingssets. ROLLUP is een afkorting voor GROUPING SETS. Voorbeeld:

GROUP BY warehouse, product WITH ROLLUP

GROUP BY ROLLUP(warehouse, product)

is gelijk aan

GROUP BY GROUPING SETS((warehouse, product),
                       (warehouse),
                       ())

Terwijl

GROUP BY ROLLUP(warehouse, product, location))

is gelijk aan

GROUP BY GROUPING SETS((warehouse, product, location),
                       (warehouse, product),
                       (warehouse),
                       ())

De N-elementen van een ROLLUP specificatie resulteren in N+1 GROUPING SETS.

KUBUS

De CUBE component wordt gebruikt om aggregaties uit te voeren op basis van een combinatie van groeperingskolommen die zijn opgegeven in de GROUP BY component. CUBE is een afkorting voor GROUPING SETS. Voorbeeld:

GROUP BY warehouse, product WITH CUBE

GROUP BY CUBE(warehouse, product)

is gelijk aan:

GROUP BY GROUPING SETS((warehouse, product),
                       (warehouse),
                       (product),
                       ())

Terwijl

GROUP BY CUBE(warehouse, product, location)

is gelijk aan het volgende:

GROUP BY GROUPING SETS((warehouse, product, location),
                       (warehouse, product),
                       (warehouse, location),
                       (product, location),
                       (warehouse),
                       (product),
                       (location),
                       ())

De N-elementen van een CUBE specificatie resulteert in 2^N GROUPING SETS.

aggregate_name

Een statistische functienaam (MIN, MAX, COUNT, SUM, AVG, enzovoort).
DUIDELIJK

Verwijdert duplicaten in invoerrijen voordat ze worden doorgegeven aan statistische functies.
FILTER

Filtert de invoerrijen waarvoor de boolean_expression in de WHERE-clausule waar is, worden doorgegeven aan de aggregatiefunctie; andere rijen worden verwijderd.

Analyse van gemengde/geneste groepering

Een GROUP BY-component kan meerdere group_expressions en meerdere CUBE, ROLLUPen GROUPING SETSs bevatten.

GROUPING SETS kan ook geneste CUBE, ROLLUP of GROUPING SETS clausules bevatten. Voorbeeld:

GROUPING SETS(ROLLUP(warehouse, location),
              CUBE(warehouse, location)),
GROUPING SETS(warehouse,
              GROUPING SETS(location,
                            GROUPING SETS(ROLLUP(warehouse, location),
                                          CUBE(warehouse, location))))

CUBE en ROLLUP is gewoon syntaxissuiker voor GROUPING SETS.

Raadpleeg de bovenstaande secties voor het vertalen CUBE en ROLLUP naar GROUPING SETS.

group_expression kan in deze context als één groep GROUPING SETS worden behandeld.

Voor meerdere GROUPING SETS in de GROUP BY component genereert Databricks SQL één GROUPING SETS door een kruisproduct van het origineel GROUPING SETSuit te voeren.

Voor geneste GROUPING SETS in de GROUPING SETS clausule neemt Databricks SQL de groeperingssets en stript deze. Bijvoorbeeld de volgende query's:

GROUP BY warehouse,
         GROUPING SETS((product), ()),
         GROUPING SETS((location, size),
         (location),
         (size),
         ());

GROUP BY warehouse,
         ROLLUP(product),
         CUBE(location, size);

zijn gelijk aan het volgende:

GROUP BY GROUPING SETS( (warehouse, product, location, size),
                        (warehouse, product, location),
                        (warehouse, product, size),
                        (warehouse, product),
                        (warehouse, location, size),
                        (warehouse, location),
                        (warehouse, size),
                        (warehouse))

Terwijl

GROUP BY GROUPING SETS(GROUPING SETS(warehouse),
                       GROUPING SETS((warehouse, product)))`

is gelijk aan

GROUP BY GROUPING SETS((warehouse),
                       (warehouse, product))`.

Voorbeelden

CREATE TEMP VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
       (100, 'Fremont', 'Honda Accord', 15),
       (100, 'Fremont', 'Honda CRV', 7),
       (200, 'Dublin', 'Honda Civic', 20),
       (200, 'Dublin', 'Honda Accord', 10),
       (200, 'Dublin', 'Honda CRV', 3),
       (300, 'San Jose', 'Honda Civic', 5),
       (300, 'San Jose', 'Honda Accord', 8);

-- Sum of quantity per dealership. Group by `id`.
> SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;
  id sum(quantity)
 --- -------------
 100            32
 200            33
 300            13

-- Use column position in GROUP by clause.
> SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
  id sum(quantity)
 --- -------------
 100            32
 200            33
 300            13

-- Multiple aggregations.
-- 1. Sum of quantity per dealership.
-- 2. Max quantity per dealership.
> SELECT id, sum(quantity) AS sum, max(quantity) AS max
    FROM dealer GROUP BY id ORDER BY id;
  id sum max
 --- --- ---
 100  32  15
 200  33  20
 300  13   8

-- Count the number of distinct dealers in cities per car_model.
> SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
    car_model count
 ------------ -----
  Honda Civic     3
    Honda CRV     2
 Honda Accord     3

-- Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
> SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;
    car_model count
 ------------ -----
  Honda Civic     3
    Honda CRV     2
 Honda Accord     3

-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
> SELECT id,
         sum(quantity) FILTER (WHERE car_model IN ('Honda Civic', 'Honda CRV')) AS `sum(quantity)`
    FROM dealer
    GROUP BY id ORDER BY id;
  id sum(quantity)
 --- -------------
 100            17
 200            23
 300             5

-- Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 1. city, car_model
-- 2. city
-- 3. car_model
-- 4. Empty grouping set. Returns quantities for all city and car models.
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
    ORDER BY city;
      city    car_model sum
 --------- ------------ ---
      null         null  78
      null  HondaAccord  33
      null     HondaCRV  10
      null   HondaCivic  35
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

-- Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH ROLLUP
    ORDER BY city, car_model;
      city    car_model sum
 --------- ------------ ---
      null         null  78
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

-- Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
> SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH CUBE
    ORDER BY city, car_model;
      city    car_model sum
 --------- ------------ ---
      null         null  78
      null  HondaAccord  33
      null     HondaCRV  10
      null   HondaCivic  35
    Dublin         null  33
    Dublin  HondaAccord  10
    Dublin     HondaCRV   3
    Dublin   HondaCivic  20
   Fremont         null  32
   Fremont  HondaAccord  15
   Fremont     HondaCRV   7
   Fremont   HondaCivic  10
  San Jose         null  13
  San Jose  HondaAccord   8
  San Jose   HondaCivic   5

--Prepare data for ignore nulls example
> CREATE TEMP VIEW person (id, name, age) AS
   VALUES (100, 'Mary', NULL),
          (200, 'John', 30),
          (300, 'Mike', 80),
          (400, 'Dan' , 50);

--Select the first row in column age
> SELECT FIRST(age) FROM person;
  first(age, false)
 --------------------
  NULL

--Get the first row in column `age` ignore nulls,last row in column `id` and sum of column `id`.
> SELECT FIRST(age IGNORE NULLS), LAST(id), SUM(id) FROM person;
  first(age, true)    last(id, false)    sum(id)
 ------------------- ------------------ ----------
  30                  400                1000

Feedback

Is deze pagina nuttig?

Delen via

GROUP BY-clausule

Syntaxis

Parameterwaarden

Analyse van gemengde/geneste groepering

Voorbeelden

Verwante artikelen:

Feedback

Aanvullende resources