dedup
The dedup
operator filters out duplicate events.
You need to specify the field that you want to inspect for duplicates (FieldName). When that field has the same value in multiple events, those events are considered duplicates.
You can also specify multiple fields. In that case, events are considered duplicates only if all of the specified fields are duplicated. For example, if you specify fields named Phone
and Email
, events are considered duplicates only if they have an identical Phone
value and an identical Email
value.
By default, the dedup
operator outputs only the first duplicate it finds, and drops the rest. You can change this behavior by specifying the number of duplicates to keep (NumberOfDuplicatesToKeep).
When looking for duplicates, the dedup
operator compares events timed within 30 seconds of each other. You can also specify a different time window (TimeWindow).
Syntax
Scope | dedup [time_window=TimeWindow] [num_duplicates=NumberOfDuplicatesToKeep] by FieldName [, ...]
Arguments
- TimeWindow (int): The
dedup
operator compares only those events whose_time
values are within TimeWindow seconds of each other. If an event lacks an explicit time, the system time at the moment of processing the event is used instead. Default:30
seconds. - NumberOfDuplicatesToKeep (int or expression): The number of duplicate events to keep. Default:
1
. - FieldName: The name of the field that you want to inspect for duplicates. Allowed formats:
fieldName
or[“field name”]
. Separate multiple fields with a comma:fieldName1, [“field name 2”], ...
.
Results
Filters out events identified as duplicates.
Examples
Filter out events that have the same value in the Name
field.
dataset=myDataset
| dedup by Name
Filter out events that have the same values in the corresponding Name
, Home address
, and Work address
fields.
dataset=myDataset
| dedup by Name, ['Home address'], ["Work address"]
Filter out events that contain Name
duplicates, and that were found within a minute of each other. If there are more than 5 such events, keep only the first 5.
dataset=myDataset
| dedup time_window=60 num_duplicates=5 by Name
dataset=$vt_dummy event<1000
| extend randomNumber=rand(10)
| dedup by randomNumber