The CrowdSec runtime revolves around a few simple concepts:
- It reads logs (defined via datasources configuration)
- Those logs are parsed via parsers and eventually enriched
- Those normalized logs are matched against the scenarios that the user has deployed
- When a scenario is "triggered", CrowdSec generates an alert and eventually one or more associated decisions:
- The alert is here mostly for traceability and will stay even after the decision expires
- The decision, on the other hand, is short-lived and tells what action should be taken against the offending IP/range/user...
- This information (the signal, the associated decisions) is then sent to crowdsec's Local API and stored in the database
As you might have guessed by now, CrowdSec itself does the detection part and stores those decisions. Then, bouncers can "consume" those decisions (via the very same Local API and apply some actual remediation.
Crowd sourced aspect
Whenever the Local API receives an alert with associated decisions, it shares the alert's meta-information with our central API:
- The source IP address that triggered the alert
- The scenario that was triggered
- The timestamp of the attack
This is the only data that is sent to our API, and it is processed on our side to be able to redistribute the relevant blocklists to all the participants. You can check the central API documentation in the references link to have a comprehensive view of what might be shared between your instance and our services.
Bouncers are standalone software pieces in charge of acting upon actors that triggered alerts. To do so, the bouncers query the Local API to know if there is an existing decision against a given IP, range, username, etc. You can find a list of existing bouncers on the hub
Acquisition configuration defines which streams of information CrowdSec must process.
A stream of information can be a file, a journald event log, a cloudwatch stream, and more or less any kind of stream, such as a Kafka topic.
Acquisition configuration always contains a stream (ie. a file to tail) and a tag (ie. "these are in syslog format" "these are non-syslog nginx logs".md).
File acquisition configuration is defined as:
labels part is here to tag the incoming logs with a type.
labels.type are used by the parsers to know which logs to process.
The concept of stages is central to data parsing in CrowdSec, as it allows to have various "steps" of parsing. All parsers belong to a given stage. While users can add or modify the stages order, the following stages exist:
s00-raw: low-level parser, such as syslog
s01-parse: most of the services' parsers (ssh, nginx, etc.)
s02-enrich: enrichment that requires parsed events (ie. geoip-enrichment) or generic parsers that apply on parsed logs (ie. second stage HTTP parser)
Every event starts in the first stage, and moves to the next stage once it has been successfully processed by a parser that has the
onsuccess directive set to
next_stage, and so on until it reaches the last stage, when it's going to start to be matched against scenarios. Thus an sshd log might follow this pipeline:
s00-raw: parsed by
crowdsecurity/syslog-logs(will move the event to the next stage)
s01-raw: parsed by
crowdsecurity/sshd-logs(will move the event to the next stage)
s02-enrich: parsed by
For logs to be able to be exploited and analyzed, they need to be parsed and normalized, and this is where parsers are used.
A parser is a YAML configuration file that describes how a string is being parsed. The said string can be a log line, or a field extracted from a previous parser. While a lot of parsers rely on the GROK approach (a.k.a regular expression named capture groups), parsers can also reference enrichment modules to allow specific data processing.
A parser usually has a specific scope. For example, if you are using Nginx, you will probably want to use the
crowdsecurity/nginx-logs parser which allows your CrowdSec setup to parse Nginx's access and error logs.
Parsers are organized into stages to allow pipelines and branching in parsing.
See the Hub to explore parsers, or see below some examples:
You can as well write your own!
Enrichment is a parser that adds extra context to a log event so that CrowdSec can later take a better decision. In most cases, you should be able to find the relevant enrichers on our Hub.
A common/simple type of enrichment would be geoip-enrich of an event (adding information such as origin country, origin Autonomous System and origin IP range to an event).
Once again, you should be able to find the ones you're looking for on the Hub!
A scenarios is the expression of a heuristic that allows you to qualify a specific event (usually an attack). It is a YAML file that describes a set of events characterizing a scenario. Scenarios in CrowdSec gravitate around the leaky bucket principle.
A scenario description includes at least:
- Event eligibility rules. For example, if we're writing an ssh brute-force detection, we only focus on logs of type
- Bucket configuration such as the leak speed or its capacity (in our same ssh brute-force example, we might allow 1 failed auth per 10s and no more than 5 in a short amount of time:
- Aggregation rules: per source IP or other criteria (in our ssh brute-force example, we will group per source ip)
The description allows for many other rules to be specified (blackhole, distinct filters, etc.), to allow rather complex scenarios.
See Hub to explore scenarios and their capabilities, or see below some examples:
You can as well write your own!
To make users' lives easier, "collections" are available, which are just a bundle of parsers and scenarios.
In this way, if you want to cover basic use-cases of let's say "Nginx", you can just install the
crowdsecurity/nginx collection that is composed of
crowdsecurity/nginx-logs parser, as well as generic HTTP scenarios such as
As usual, these can be found on the Hub!
A postoverflow is a parser that is applied on overflows (scenario results) before the decision is written to the local DB or pushed to the API. Parsers in postoverflows are meant to be used for "expensive" enrichment/parsing processes that you do not want to perform on all incoming events, but rather on a decision that is about to be taken.
An example could be the slack/mattermost enrichment plugin that requires human confirmation before applying the decision or reverse-dns lookup operations.
Event is the runtime representation of an item being processed by CrowdSec: it can be a log line being parsed, or an Overflow being reprocessed.
Event object is modified by parsers, scenarios, and directly via user statics expressions (for example).
Alert is the runtime representation of a bucket overflow being processed by CrowdSec: it is embedded in an Event.
Alert object is modified by post-overflows and profiles.
Decision is the representation of the consequence of a bucket overflow: a decision against an IP address, an IP range, an AS, a Country, a User, a Session etc.
Decisions are generated by the Local API (LAPI) when an
Alert is received, according to the existing profiles