你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

percentile()、percentiles()(聚合函数)

Applies to: ✅Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

The percentile() function calculates an estimate for the specified nearest-rank percentile of the population defined by expr. 准确性取决于百分位区域中的填充密度。

percentiles() 的工作方式与 percentile() 类似。 但是,percentiles() 可以同时计算多个百分位数值,这比单独计算每个百分位数值更有效。

To calculate weighted percentiles, see percentilesw().

Note

This function is used in conjunction with the summarize operator.

Syntax

percentile( expr,percentile)

percentiles( expr,percentiles)

Learn more about syntax conventions.

Parameters

Name 类型 Required Description
expr string ✔️ 用于聚合计算的表达式。
percentile real ✔️ 一个常数,用于指定百分位。
percentiles real ✔️ 一个百分位数,或多个由逗号分隔的百分位数。

Returns

Returns a table with the estimates for expr of the specified percentiles in the group, each in a separate column.

Note

若要在单个列中返回所有百分位数,请参阅以数组的形式返回百分位数

Examples

本节中的示例演示如何使用语法帮助你入门。

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the Weather table in the Weather analytics sample gallery. 可能需要修改示例查询中的表名称以匹配工作区中的表。

计算单个百分位数

下面的示例显示大于示例集 95% 和小于示例集 5% 的 DamageProperty 值。

StormEvents | summarize percentile(DamageProperty, 95) by State

Output

显示的结果表仅包括前 10 行。

State percentile_DamageProperty_95
ATLANTIC SOUTH 0
FLORIDA 40000
GEORGIA 143333
MISSISSIPPI 80000
AMERICAN SAMOA 250000
KENTUCKY 35000
OHIO 150000
KANSAS 51392
MICHIGAN 49167
ALABAMA 50000

计算多个百分位数

下面的示例显示同时使用 5、50(中值)和 95 计算的 DamageProperty 值。

StormEvents | summarize percentiles(DamageProperty, 5, 50, 95) by State

Output

显示的结果表仅包括前 10 行。

State percentile_DamageProperty_5 percentile_DamageProperty_50 percentile_DamageProperty_95
ATLANTIC SOUTH 0 0 0
FLORIDA 0 0 40000
GEORGIA 0 0 143333
MISSISSIPPI 0 0 80000
AMERICAN SAMOA 0 0 250000
KENTUCKY 0 0 35000
OHIO 0 2000 150000
KANSAS 0 0 51392
MICHIGAN 0 0 49167
ALABAMA 0 0 50000
... ...

以数组形式返回百分位数

使用 percentiles_array() 函数在动态数组类型的单个列中返回所有百分位数,而不是将每个值在单独的列中返回。

Syntax

percentiles_array( expr,percentiles)

Parameters

Name 类型 Required Description
expr string ✔️ 用于聚合计算的表达式。
percentiles 实际或动态 ✔️ 一个百分位数,或多个由逗号分隔的百分位数,或百分位数的动态数组。 每个百分位都必须是一个实际值。

Returns

Returns an estimate for expr of the specified percentiles in the group as a single column of dynamic array type.

Examples

Comma-separated percentiles

Multiple percentiles can be obtained as an array in a single dynamic column, instead of in multiple columns as with percentiles().

TransformedSensorsData
| summarize percentiles_array(Value, 5, 25, 50, 75, 95), avg(Value) by SensorName

Output

结果表仅显示前 10 行。

SensorName percentiles_Value avg_Value
sensor-82 ["0.048141473520867069","0.24407515500271132","0.48974511106780577","0.74160998970950343","0.94587903204190071"] 0.493950914
sensor-130 ["0.049200214398937764","0.25735850440187535","0.51206374010048239","0.74182335059053839","0.95210342463616771"] 0.505111463
sensor-56 ["0.04857779335488676","0.24709868149337144","0.49668762923789589","0.74458470404241883","0.94889104840865857"] 0.497955018
sensor-24 ["0.051507199150534679","0.24803904945640423","0.50397070213183581","0.75653888126010793","0.9518782718727431"] 0.501084379
sensor-47 ["0.045991246974755672","0.24644331118208851","0.48089197707088743","0.74475142784472248","0.9518322864959039"] 0.49386228
sensor-135 ["0.05132897529660399","0.24204987641954018","0.48470113942206461","0.74275730068433621","0.94784079559229406"] 0.494817619
sensor-74 ["0.048914714739047828","0.25160926036445724","0.49832498850160978","0.75257887767110776","0.94932261924236094"] 0.501627252
sensor-173 ["0.048333149363009836","0.26084250046756496","0.51288012531934613","0.74964772791583412","0.95156058795294"] 0.505401226
sensor-28 ["0.048511161184567046","0.2547387968731824","0.50101318228599656","0.75693845702682039","0.95243122486483989"] 0.502066244
sensor-34 ["0.049980293859462954","0.25094722564949412","0.50914023067384762","0.75571549713447961","0.95176564809278674"] 0.504309494
... ... ...

百分位数的动态数组

可以在整数或浮点数动态数组中指定 percentiles_array 的百分位数。 数组必须是常数,但不一定是文本。

TransformedSensorsData
| summarize percentiles_array(Value, dynamic([5, 25, 50, 75, 95])), avg(Value) by SensorName

Output

结果表仅显示前 10 行。

SensorName percentiles_Value avg_Value
sensor-82 ["0.048141473520867069","0.24407515500271132","0.48974511106780577","0.74160998970950343","0.94587903204190071"] 0.493950914
sensor-130 ["0.049200214398937764","0.25735850440187535","0.51206374010048239","0.74182335059053839","0.95210342463616771"] 0.505111463
sensor-56 ["0.04857779335488676","0.24709868149337144","0.49668762923789589","0.74458470404241883","0.94889104840865857"] 0.497955018
sensor-24 ["0.051507199150534679","0.24803904945640423","0.50397070213183581","0.75653888126010793","0.9518782718727431"] 0.501084379
sensor-47 ["0.045991246974755672","0.24644331118208851","0.48089197707088743","0.74475142784472248","0.9518322864959039"] 0.49386228
sensor-135 ["0.05132897529660399","0.24204987641954018","0.48470113942206461","0.74275730068433621","0.94784079559229406"] 0.494817619
sensor-74 ["0.048914714739047828","0.25160926036445724","0.49832498850160978","0.75257887767110776","0.94932261924236094"] 0.501627252
sensor-173 ["0.048333149363009836","0.26084250046756496","0.51288012531934613","0.74964772791583412","0.95156058795294"] 0.505401226
sensor-28 ["0.048511161184567046","0.2547387968731824","0.50101318228599656","0.75693845702682039","0.95243122486483989"] 0.502066244
sensor-34 ["0.049980293859462954","0.25094722564949412","0.50914023067384762","0.75571549713447961","0.95176564809278674"] 0.504309494
... ... ...

Nearest-rank percentile

P-th percentile (0 <P<= 100) of a list of ordered values, sorted in ascending order, is the smallest value in the list. The P percent of the data is less or equal to P-th percentile value (from Wikipedia article on percentiles).

Define 0-th percentiles to be the smallest member of the population.

Note

由于计算的近似性质,实际返回的值可能不是群体的成员。 Nearest-rank definition means that P=50 does not conform to the interpolative definition of the median. When evaluating the significance of this discrepancy for the specific application, the size of the population and an estimation error should be taken into account.

百分位中的估算错误

The percentiles aggregate provides an approximate value using T-Digest.

Note

  • 估算错误的界限因请求百分位的值而异。 在 [0..100] 范围的两端,准确性最高。 0 和 100 百分位是分布的确切最小值和最大值。 越接近此范围的中间值,准确性越低。 中间值准确性最差,不超过 1%。
  • 错误边界通过级别观察而不是通过值。 假设 percentile(X, 50) 返回了值 Xm。 此估算可保证至少 49%,至多 51% 的 X 值小于或等于 Xm。 对 Xm 与 X 的实际中间值之间的差异没有理论上的限制。
  • 估算有时可能会得出精确的值,但没有可靠的条件来定义何时会出现这种情况。