dedup

The dedup operator filters out duplicate events.

You need to specify the field that you want to inspect for duplicates (FieldName). When that field has the same value in multiple events, those events are considered duplicates.

You can also specify multiple fields. In that case, events are considered duplicates only if all of the specified fields are duplicated. For example, if you specify fields named Phone and Email, events are considered duplicates only if they have an identical Phone value and an identical Email value.

By default, the dedup operator outputs only the first duplicate it finds, and drops the rest. You can change this behavior by specifying the number of duplicates to keep (NumberOfDuplicatesToKeep).

When looking for duplicates, the dedup operator compares events timed within 30 seconds of each other. You can also specify a different time window (TimeWindow).

Syntax

Scope | dedup [time_window=TimeWindow] [num_duplicates=NumberOfDuplicatesToKeep] by FieldName [, ...]

Arguments

  • TimeWindow (int): The dedup operator compares only those events whose _time values are within TimeWindow seconds of each other. If an event lacks an explicit time, the system time at the moment of processing the event is used instead. Default: 30 seconds.
  • NumberOfDuplicatesToKeep (int or expression): The number of duplicate events to keep. Default: 1.
  • FieldName: The name of the field that you want to inspect for duplicates. Allowed formats: fieldName or [“field name”]. Separate multiple fields with a comma: fieldName1, [“field name 2”], ....

Results

Filters out events identified as duplicates.

Examples

Filter out events that have the same value in the Name field.

dataset=myDataset
| dedup by Name

Filter out events that have the same values in the corresponding Name, Home address, and Work address fields.

dataset=myDataset
| dedup by Name, ['Home address'], ["Work address"]

Filter out events that contain Name duplicates, and that were found within a minute of each other. If there are more than 5 such events, keep only the first 5.

dataset=myDataset
| dedup time_window=60 num_duplicates=5 by Name
dataset=$vt_dummy event<1000
| extend randomNumber=rand(10)
| dedup by randomNumber
Run in Cribl Search
Last updated by: Dritan Bitincka