你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

scan operator

Applies to: ✅Microsoft FabricAzure Data ExplorerAzure MonitorMicrosoft Sentinel

基于谓词扫描数据、进行匹配和生成序列。

根据运算符的步骤中定义的谓词确定匹配记录。 谓词取决于前面的步骤生成的状态。 匹配记录的输出取决于运算符的步骤中定义的输入记录和分配。

Syntax

T| scan [ with_match_id=MatchIdColumnName ] [ declare(ColumnDeclarations) ] with(StepDefinitions)

ColumnDeclarations syntax

ColumnName:ColumnType[=DefaultValue ] [, ... ]

StepDefinition syntax

step StepName [ output = all | last | none] :Condition [ =>Column=Assignment [, ... ] ] ;

Learn more about syntax conventions.

Parameters

Name 类型 Required Description
T string ✔️ 输入表格源。
MatchIdColumnName string 某个类型为 long 的列的名称,在扫描执行过程中会将其追加到输出中。 指示记录匹配的从 0 开始的索引。
ColumnDeclarations string Declares an extension to the schema of T. These columns are assigned values in the steps. If not assigned, the DefaultValue is returned. Unless otherwise specified, DefaultValue is null.
StepName string ✔️ 用于在扫描状态下引用条件和分配的值。 步骤名称必须是唯一的。
Condition string ✔️ 一个计算结果为 truefalse 的表达式,定义输入中的哪些记录与步骤匹配。 如果对于步骤的状态或前一个步骤的状态,条件为 true,说明记录与步骤相匹配。
Assignment string 当某个记录与步骤相匹配时分配到相应列的标量表达式。
output string 控制步骤在重复匹配时的输出逻辑。 all 输出与步骤匹配的所有记录,last 仅输出步骤的一系列重复匹配项中的最后一条记录,而 none 不会输出任何与步骤匹配的记录。 默认为 all

Returns

输入的某个记录与某个步骤的每个匹配项的记录。 输出的架构是在源的架构基础上使用子句中的 declare 列进行了扩展后得到的架构。

Scan logic

scan 会按每条记录检查序列化输入数据,将每条记录与每个步骤的条件进行比较,同时考虑每个步骤的当前状态。

State

可以将 scan 运算符的基础状态视为一个表,其中每个 step 都有一行。 每个步骤都使用列的最新值以及前面所有步骤和当前步骤中声明的变量来维护自己的状态。 如果相关,它还会保留正在进行的序列的匹配 ID。

If a scan operator has n steps named s_1, s_2, ..., s_n then step s_k would have k records in its state corresponding to s_1, s_2, ..., s_k. The StepName.ColumnName format is used to reference a value in the state. For instance, s_2.col1 would reference column col1 that belongs to step s_2 in the state of s_k. 有关详细示例,请参阅扫描逻辑演练

状态开始时为空,在扫描的输入记录与步骤匹配时更新。 When the state of the current step is nonempty, the step is referred to as having an active sequence.

Matching logic

每个输入记录针对所有步骤按照从最后一个步骤到第一个步骤的反向顺序进行计算。 When a record r is evaluated against some step s_k, the following logic is applied:

  • Check 1: If the state of the previous step (s_k-1) is nonempty, and r meets the Condition of s_k, then a match occurs. 匹配会导致以下操作:

    1. The state of s_k is cleared.
    2. The state of s_k-1 is promoted to become the state of s_k.
    3. The assignments of s_k are calculated and extend r.
    4. The extended r is added to the output and to the state of s_k.

    Note

    If Check 1 results in a match, Check 2 is disregarded, and r moves on to be evaluated against s_k-1.

  • Check 2: If the state of s_k has an active sequence or s_k is the first step, and r meets the Condition of s_k, then a match occurs. 匹配会导致以下操作:

    1. The assignments of s_k are calculated and extend r.
    2. The values that represent s_k in the state of s_k are replaced with the values of the extended r.
    3. If s_k is defined as output=all, the extended r is added to the output.
    4. If s_k is the first step, a new sequence begins and the match ID increases by 1. 这只在使用 with_match_id 时会影响输出。

Once the checks for s_k are complete, r moves on to be evaluated against s_k-1.

有关此逻辑的详细示例,请参阅扫描逻辑演练

Examples

本节中的示例演示如何使用语法帮助你入门。

The examples in this article use publicly available tables in the help cluster, such as the StormEvents table in the Samples database.

The examples in this article use publicly available tables, such as the Weather table in the Weather analytics sample gallery. 可能需要修改示例查询中的表名称以匹配工作区中的表。

Cumulative sum

计算某个输入列的累计和。 The result of this example is equivalent to using row_cumsum().

range x from 1 to 5 step 1 
| scan declare (cumulative_x:long=0) with 
(
    step s1: true => cumulative_x = x + s1.cumulative_x;
)

Output

x cumulative_x
1 1
2 3
3 6
4 10
5 15

具有重置条件的多个列的累积和

计算两个输入列的累积和,每当累积和达到 10 或更多时,将总和值重置为当前记录值。

range x from 1 to 5 step 1
| extend y = 2 * x
| scan declare (cumulative_x:long=0, cumulative_y:long=0) with 
(
    step s1: true => cumulative_x = iff(s1.cumulative_x >= 10, x, x + s1.cumulative_x), 
                     cumulative_y = iff(s1.cumulative_y >= 10, y, y + s1.cumulative_y);
)

Output

x y cumulative_x cumulative_y
1 2 1 2
2 4 3 6
3 6 6 12
4 8 10 8
5 10 5 18

向前填充一列

向前填充一个字符串列。 为每个空值分配上次出现的非空值。

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "",
    2m, "B",
    3m, "",
    4m, "",
    6m, "C",
    8m, "",
    11m, "D",
    12m, ""
]
;
Events
| sort by Ts asc
| scan declare (Event_filled: string="") with 
(
    step s1: true => Event_filled = iff(isempty(Event), s1.Event_filled, Event);
)

Output

Ts Event Event_filled
00:00:00 A A
00:01:00 A
00:02:00 B B
00:03:00 B
00:04:00 B
00:06:00 C C
00:08:00 C
00:11:00 D D
00:12:00 D

Sessions tagging

将输入划分为多个会话:会话在会话的第一个事件后 30 分钟时结束,在此之后将启动新会话。 Note the use of with_match_id flag, which assigns a unique value for each distinct match (session) of scan. Also note the special use of two steps in this example, inSession has true as condition so it captures and outputs all the records from the input while endSession captures records that happen more than 30m from the sessionStart value for the current match. endSession 步骤具有 output=none,这意味着它不会生成输出记录。 endSession 步骤用于将当前匹配项的状态从 inSession 提升为 endSession,从而允许从当前记录起开始新的匹配(会话)。

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "A",
    2m, "B",
    3m, "D",
    32m, "B",
    36m, "C",
    38m, "D",
    41m, "E",
    75m, "A"
]
;
Events
| sort by Ts asc
| scan with_match_id=session_id declare (sessionStart: timespan) with 
(
    step inSession: true => sessionStart = iff(isnull(inSession.sessionStart), Ts, inSession.sessionStart);
    step endSession output=none: Ts - inSession.sessionStart > 30m;
)

Output

Ts Event sessionStart session_id
00:00:00 A 00:00:00 0
00:01:00 A 00:00:00 0
00:02:00 B 00:00:00 0
00:03:00 D 00:00:00 0
00:32:00 B 00:32:00 1
00:36:00 C 00:32:00 1
00:38:00 D 00:32:00 1
00:41:00 E 00:32:00 1
01:15:00 A 01:15:00 2

计算每个用户的会话长度

使用 scan 作员计算每个用户会话的会话开始时间、结束时间和持续时间。 会话定义为用户登录和后续注销之间的句点。 通过组合 partitionscanoutput=noneoutput=all,此模式可确保每个会话(即每个登录/注销对)返回 单个行 ,而不是每个事件一行。

逻辑的工作原理是:

  • 在步骤 s1 中:使用扫描步骤捕获登录时间戳 output=none
  • 在步骤 s2 中:仅当找到匹配的注销时,才发出行 output=all
let LogsEvents = datatable(Timestamp:datetime, userID:int, EventType:string)
[
    datetime(2024-05-28 08:15:23), 1, "login",
    datetime(2024-05-28 08:30:15), 2, "login",
    datetime(2024-05-28 09:10:27), 3, "login",
    datetime(2024-05-28 12:30:45), 1, "logout",
    datetime(2024-05-28 11:45:32), 2, "logout",
    datetime(2024-05-28 13:25:19), 3, "logout"
];
LogsEvents
| sort by userID, Timestamp
| partition hint.strategy=native by userID (
    sort by Timestamp asc 
    | scan declare (start: datetime, end: datetime, sessionDuration: timespan) with (
        step s1 output=none: EventType == "login" => start = Timestamp;
        step s2 output=all: EventType == "logout" => start = s1.start, end = Timestamp, sessionDuration = Timestamp - s1.start;
    )
)
| project start, end, userID, sessionDuration

Output

userID start end sessionDuration
1 2024-05-28 08:15:23.0000 2024-05-28 12:30:45.0000 04:15:22
3 2024-05-28 09:10:27.0000 2024-05-28 13:25:19.0000 04:14:52
2 2024-05-28 08:30:15.0000 2024-05-28 11:45:32.0000 03:15:17

启动与暂停之间的事件

查找 5 分钟内发生的、事件 Start 与事件 Stop 之间的所有事件序列。 为每个序列分配匹配 ID。

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

Ts Event m_id
00:01:00 Start 0
00:02:00 B 0
00:03:00 D 0
00:04:00 Stop 0
00:08:00 Start 1
00:11:00 E 1
00:12:00 Stop 1

计算事件的自定义漏斗图

根据 Hail 计算序列 > -Tornado> -Thunderstorm WindState 的漏斗图完成情况,并对事件之间的时间间隔设置自定义阈值(Tornado 内的 1hThunderstorm Wind 内的 2h)。 This example is similar to the funnel_sequence_completion plugin, but allows greater flexibility.

StormEvents
| partition hint.strategy=native by State 
    (
    sort by StartTime asc
    | scan with 
    (
        step hail: EventType == "Hail";
        step tornado: EventType == "Tornado" and StartTime - hail.StartTime <= 1h;
        step thunderstormWind: EventType == "Thunderstorm Wind" and StartTime - tornado.StartTime <= 2h;
    )
    )
| summarize dcount(State) by EventType

Output

EventType dcount_State
Hail 50
Tornado 34
Thunderstorm Wind 32

扫描逻辑演练

This section demonstrates the scan logic using a step-by-step walkthrough of the Events between start and stop example:

let Events = datatable (Ts: timespan, Event: string) [
    0m, "A",
    1m, "Start",
    2m, "B",
    3m, "D",
    4m, "Stop",
    6m, "C",
    8m, "Start",
    11m, "E",
    12m, "Stop"
]
;
Events
| sort by Ts asc
| scan with_match_id=m_id with 
(
    step s1: Event == "Start";
    step s2: Event != "Start" and Event != "Stop" and Ts - s1.Ts <= 5m;
    step s3: Event == "Stop" and Ts - s1.Ts <= 5m;
)

Output

Ts Event m_id
00:01:00 Start 0
00:02:00 B 0
00:03:00 D 0
00:04:00 Stop 0
00:08:00 Start 1
00:11:00 E 1
00:12:00 Stop 1

The state

scan 运算符的状态视为一个表,其中每个步骤都有一行,每个步骤都有自己的状态。 此状态包含所有前面步骤和当前步骤中列和声明的变量的最新值。 To learn more, see State.

对于此示例,可以使用下表表示状态:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3

“X”表示特定字段与该步骤无关。

匹配逻辑

This section follows the matching logic through each record of the Events table, explaining the transformation of the state and output at each step.

Note

输入记录针对步骤按照从最后一步 (s3) 到第一步 (s1) 的反向顺序进行计算。

Record 1

Ts Event
0m "A"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition of Event == "Start". Record 1 is discarded without affecting the state or output.

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3

Record 2

Ts Event
1m "Start"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 is passed because the record meets the condition of Event == "Start". 此匹配会启动一个新序列,并分配 m_idRecord 2 and its m_id (0) are added to the state and the output.

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 0 00:01:00 "Start" X X X X
s2 X X
s3

Record 3

Ts Event
2m "B"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. 此匹配会导致 s1 的状态被清除,并将 s1 中的序列提升为 s2Record 3 and its m_id (0) are added to the state and the output.
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 0 00:01:00 "Start" 00:02:00 "B" X X
s3

Record 4

Ts Event
3m "D"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the record doesn't meet the condition of Event == "Stop", and Check 2 isn't passed because s3 lacks an active sequence.
  • s2: Check 1 isn't passed because the state of s1 is empty. it passes Check 2 because it meets the condition of Ts - s1.Ts < 5m. Record 4 and its m_id (0) are added to the state and the output. 此记录中的值会覆盖 s2.Tss2.Event 的先前状态值。
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 0 00:01:00 "Start" 00:03:00 "D" X X
s3

Record 5

Ts Event
4m "Stop"

每个步骤的记录计算:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". 此匹配会导致 s2 的状态被清除,并将 s2 中的序列提升为 s3Record 5 and its m_id (0) are added to the state and the output.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 is irrelevant because there's no previous step. Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 6

Ts Event
6m "C"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because s3 doesn't meet the s3 condition of Event == "Stop".
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step, and Check 2 isn't passed because it doesn't meet the condition of Event == "Start". Record 6 is discarded without affecting the state or output.

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 7

Ts Event
8m "Start"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because it doesn't meet the condition of Event == "Stop".
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition of Event == "Start". 此匹配使用新的 s1m_id 中启动了一个新序列。 Record 7 and its m_id (1) are added to the state and the output.

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 1 00:08:00 "Start" X X X X
s2 X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Note

现在有两个处于该状态的活动序列。

Record 8

Ts Event
11m "E"

每个步骤的记录计算:

  • s3: Check 1 isn't passed because the state of s2 is empty, and Check 2 isn't passed because it doesn't meet the s3 condition of Event == "Stop".
  • s2: Check 1 is passed because the state of s1 is nonempty and the record meets the condition of Ts - s1.Ts < 5m. 此匹配会导致 s1 的状态被清除,并将 s1 中的序列提升为 s2Record 8 and its m_id (1) are added to the state and the output.
  • s1: Check 1 is irrelevant because there's no previous step, and Check 2 isn't passed because the record doesn't meet the condition of Event == "Start".

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 1 00:08:00 "Start" 00:11:00 "E" X X
s3 0 00:01:00 "Start" 00:03:00 "D" 00:04:00 "Stop"

Record 9

Ts Event
12m "Stop"

每个步骤的记录计算:

  • s3: Check 1 is passed because s2 is nonempty and it meets the s3 condition of Event == "Stop". 此匹配会导致 s2 的状态被清除,并将 s2 中的序列提升为 s3Record 9 and its m_id (1) are added to the state and the output.
  • s2: Check 1 isn't passed because the state of s1 is empty, and Check 2 isn't passed because s2 lacks an active sequence.
  • s1: Check 1 isn't passed because there's no previous step. it passes Check 2 because it meets the condition of Event == "Start". 此匹配使用新的 s1m_id 中启动了一个新序列。

State:

单步执行 m_id s1.Ts s1.Event s2.Ts s2.Event s3.Ts s3.Event
s1 X X X X
s2 X X
s3 1 00:08:00 "Start" 00:11:00 "E" 00:12:00 "Stop"

Final output

Ts Event m_id
00:01:00 Start 0
00:02:00 B 0
00:03:00 D 0
00:04:00 Stop 0
00:08:00 Start 1
00:11:00 E 1
00:12:00 Stop 1