A parser is a YAML configuration file that describes how a string must be parsed. Said string can be a log line, or a field extracted from a previous parser.
While a lot of parsers rely on the GROK approach (a.k.a regular expression named capture groups), parsers can also use expressions to perform parsing on specific data (ie. json), refer to external methods for enrichment or even perform whitelisting.
The event enters the parser, and might exit successfully or not:
Parsers are organized into stages to allow pipelines and branching in parsing. An event can go to the next stage if at least one parser in the given stage parsed it successfully while having
onsuccess set to
next_stage. Otherwise, the event is considered unparsed and will exit the pipeline (and be discarded):
Each parser can add, change or even delete data from the event. The current approach is:
s00-raw: takes care of the overall log structure (ie. extract log lines from JSON blob, parse syslog protocol info)
s01-parse: parses the actual log line (ssh, nginx etc.)
s02-enrich: does some post processing, such as geoip-enrich or post-parsing of http events to provide more context
Once an event has successfully exited the parsing pipeline, it is ready to be matched against scenarios. As you might expect, each parser relies on the information that is parsed during previous stages.
Once a scenario overflows, the resulting event is going to be processed by a distinct set of parsers, called "postoverflows".
Those parsers are located in
/etc/crowdsec/postoverflows/ and typically contain additional whitelists, a common example is to whitelist decisions coming from some specific FQDN.
Usually, those parsers should be kept for "expensive" parsers that might rely on external services.
See the Hub to explore parsers, or see below some examples:
The parsers usually reside in