Parsers are organized into "stages" (named using a "sXX-
s00-raw: low level parser, such as syslog
s01-parse: most of the services parsers (ssh, nginx etc.)
s02-enrich: enrichment that requires parsed events (ie. geoip-enrichment) or generic parsers that apply on parsed logs (ie. second stage http parser)
The number and structure of stages can be altered by the user, the directory structure and their alphabetical order dictates in which order stages and parsers are processed.
Every event starts in the first stage, and will move to the next stage once it has been successfully processed by a parser that has the
onsuccess directive set to
next_stage, and so on until it reaches the last stage, when it's going to start to be matched against scenarios.
The preliminary stage (
s00-raw) is mostly the one that will parse the structure of the log. This is where syslog-logs are parsed for example. Such a parser will parse the syslog header to detect the program source.
The main stage (
s01-parse) is the one that will parse actual applications logs and output parsed data and static assigned values. There is one parser for each type of software. To parse the logs, regexp or GROK pattern are used. If the parser is configured to go to the
next_stage, then it will be process by the
The enrichment (
s02-enrich) stage is the one that will enrich the normalized log (we call it an event now that it is normalized) in order to get more information for the heuristic process. This stage can be composed of grok patterns and so on, but as well of plugins that can be writen by the community (geiop enrichment, rdns ...) for example geoip-enrich.
It is possible to write custom stage. If you want some specific parsing or enrichment to be done after the
s02-enrich stage, it is possible by creating a new folder
s03-<custom_stage> (and so on). The configuration that will be created in this folder will process the logs configured to go to
next_stage in the