Version: Next

Remediation Component Metrics

Overview

This document provides a comprehensive guide for developers to implement the "Remediation Metrics" feature in a remediation component. The remediation metrics feature allows remediation components to report raw metrics about their activity to the Local API (LAPI), which can then be forwarded to the Central API (CAPI) for monitoring and analytics purposes.

The remediation component should send the following data:

"dropped" metrics: the total number of units (byte, packet or request) for which a remediation (ban, captcha, etc.) has been applied. For this metrics, data should be split into origin/remediation pairs.
"processed" metrics: the total number of units that has been processed by the remediation component. It must also include the number of "bypass" (i.e. when no decision were applied).
"active_decisions" metrics: it represents the number of decisions currently known by the remediation component.

Additionally, some relevant time values must be sent:

"window_size_seconds": The time interval between metric reports (typically 1800 seconds / 30 minutes). We recommend a minimum delay of 15 minutes between each transmission.
"utc_startup_timestamp": When the remediation component started. This can vary depending on implementation:
- For daemon bouncers: timestamp when the daemon process started
- For "on-demand" bouncer like the PHP one: timestamp of the first LAPI call/pull

As an example, here is the kind of expected payload that you will have to build and send:

Metrics Payload example

{
    "remediation_components": [{
        "name": "my-bouncer",
        "type": "crowdsec-custom-bouncer",
        "version": "1.0.0",
        "feature_flags": [],
        "utc_startup_timestamp": 1704067200,
        "os": {
            "name": "linux",
            "version": "5.4.0"
        },
        "metrics": {
            "meta": {
                "window_size_seconds": 1800,
                "utc_now_timestamp": 1704069000
            },
            "items": [
                {
                    "name": "dropped",
                    "value": 150,
                    "unit": "request",
                    "labels": {
                        "origin": "CAPI",
                        "remediation": "ban"
                    }
                },
                {
                    "name": "dropped",
                    "value": 25,
                    "unit": "request", 
                    "labels": {
                        "origin": "cscli",
                        "remediation": "ban"
                    }
                },
                {
                    "name": "dropped",
                    "value": 12,
                    "unit": "request", 
                    "labels": {
                        "origin": "cscli",
                        "remediation": "captcha"
                    }
                },
                {
                    "name": "processed",
                    "value": 1175,
                    "unit": "request"
                },
                {
                    "name": "active_decisions",
                    "value": 342010
                }
            ]
        }
    }]
}

For more details on valid payloads, please refer to the API specification.

Architecture Overview

Key Features

Implementing remediation metrics involves several capabilities:

Metrics Storage:
- Store "remediation by origin" counters and relevant time values in a persistent storage.
- Update or delete stored values
Metrics Building:
- Retrieve metrics in storage
- Format metrics according to the API specification
Metrics Transmission:
- Send metrics to LAPI usage-metrics endpoint
- Update metrics items so that next push will only send fresh metrics

Core Concepts

Origins: The source of a remediation (e.g., CAPI, lists:***, cscli, etc).

As we want to track the total number of processed items, we also need to be able to count the number of "bypass". That's why you may use a clean and clean_appsec origins to track bypass remediations for regular and AppSec traffic respectively.
Remediations: The final action effectively applied by the remediation component (e.g., "ban", "captcha", "bypass")

The remediation stored in metrics must be the final remediation effectively applied by the bouncer, not the original decision from CrowdSec. Examples:
- Captcha Resolution: If the original decision was "captcha" but the user has already solved the captcha and can access the page, store "bypass" as the final remediation.
- Remediation Transformation: If the original decision was "ban" but the bouncer configuration transforms it to "captcha" (and the user hasn't solved it yet), store "captcha" as the final remediation.
- Fallback Scenarios: If a timeout occurs and the bouncer applies a fallback remediation, store the fallback remediation, not the original intended one.

Implementation Guide

1. Storage

1.1 Cached Items

Every time the remediation component is involved, storage should be used to persist data:

origin and remediation
time values

For example, you could have the following cached items:

TIME_VALUES = {
    "utc_startup_timestamp": <timestamp>,      // When the bouncer was started or used for the first time
    "last_metrics_sent": <timestamp>,    // Last successful metrics transmission
}

ORIGINS_COUNT = {
    "<origin>": {
        "<remediation>": <count>
    }
}

Storing a last_metrics_sent value makes it easy to compute the window_size_seconds value.

1.1 Metrics Tracking

Once you know the final remediation that has been applied, you should increment the count of the related "origin/remediation" pair.

Below are a few lines of pseudo-code to help you visualize what the final implementation might look like.

function updateMetricsOriginsCount(origin: string, remediation: string, delta: int = 1): int
    // Get current count from cache
    currentCount = getFromCache("ORIGINS_COUNT[origin][remediation]") ?? 0
    
    // Update count (delta can be negative for decrementing)
    newCount = max(0, currentCount + delta)
    
    // Store updated count in cache
    storeInCache("ORIGINS_COUNT[origin][remediation]", newCount)
    
    return newCount

2. Metrics Building Process

In order to send metrics, you will have to retrieved cached values and build the required payload.

2.1 Build Metrics Items

The main information belongs to the metrics items:

function buildMetricsItems(originsCount: object): object
    metricsItems = []
    processedTotal = 0
    originsToDecrement = {}
    
    for each origin in originsCount:
        for each remediation, count in origin:
            if count <= 0:
                continue
                
            // Track total processed requests
            processedTotal += count
            
            // Prepare for decrementing after successful send
            originsToDecrement[origin][remediation] = -count
            
            // Skip bypass remediations in "dropped" metrics
            if remediation == "bypass":
                continue
                
            // Create "dropped" metric for blocked requests
            metricsItems.append({
                "name": "dropped",
                "value": count,
                "unit": getMetricUnit(), // "request", "packet", or other relevant unit
                "labels": {
                    "origin": origin,
                    "remediation": remediation
                }
            })
    
    // Add total processed metric
    if processedTotal > 0:
        metricsItems.append({
            "name": "processed",
            "value": processedTotal,
            "unit": getMetricUnit() // "request", "packet", or other relevant unit
        })
    
    // Add active_decisions metric (if supported)
    activeDecisions = getActiveDecisionsCount()
    if activeDecisions > 0:
        metricsItems.append({
            "name": "active_decisions",
            "value": activeDecisions,
        })
    
    return {
        "items": metricsItems,
        "originsToDecrement": originsToDecrement
    }

Note that it's important to record the number sent for each origin/remediation in order to reset the respective counter after the push.

2.2 Build Complete Metrics Payload

In addition to the metrics items, payload requires properties and meta attributes:

function buildUsageMetrics(properties: object, meta: object, items: array): object
    // Prepare bouncer properties
    bouncerProperties = {
        "name": properties.name,
        "type": properties.type,
        "version": properties.version,
        "feature_flags": properties.feature_flags ?? [],
        "utc_startup_timestamp": properties.utc_startup_timestamp
    }
    
    // Add optional OS information
    if properties.os:
        bouncerProperties["os"] = {
            "name": properties.os.name,
            "version": properties.os.version
        }
    
    // Prepare metadata
    metricsMetadata = {
        "window_size_seconds": meta.window_size_seconds,
        "utc_now_timestamp": meta.utc_now_timestamp
    }
    
    // Build final payload
    return {
        "remediation_components": [{
            ...bouncerProperties,
            "metrics": {
                "meta": metricsMetadata,
                "items": items
            }
        }]
    }

3. Complete Push Metrics Implementation

function pushUsageMetrics(bouncerName: string, bouncerVersion: string, bouncerType: string): array
    // Get timing information
    startupTime = getStartUp()
    currentTime = getCurrentTimestamp()
    lastSent = getFromCache("CONFIG.last_metrics_sent") ?? startupTime
    
    // Get current metrics
    originsCount = getOriginsCount()
    metricsData = buildMetricsItems(originsCount)
    
    // Return early if no metrics to send
    if metricsData.items.isEmpty():
        log("No metrics to send")
        return []
    
    // Prepare properties and metadata
    properties = {
        "name": bouncerName,
        "type": bouncerType,
        "version": bouncerVersion,
        "utc_startup_timestamp": startupTime,
        "os": getOsInformation()
    }
    
    meta = {
        "window_size_seconds": max(0, currentTime - lastSent),
        "utc_now_timestamp": currentTime
    }
    
    // Build and send metrics
    metricsPayload = buildUsageMetrics(properties, meta, metricsData.items)
    
    // Send to LAPI/CAPI
    sendMetricsToAPI(metricsPayload)
    
    // Decrement counters after successful send
    for origin, remediationCounts in metricsData.originsToDecrement:
        for remediation, deltaCount in remediationCounts:
            updateMetricsOriginsCount(origin, remediation, deltaCount)
    
    // Update last sent timestamp
    storeMetricsLastSent(currentTime)
    
    return metricsPayload

Useful Tips

When to Update Metrics

Call updateMetricsOriginsCount() after each remediation decision is effectively applied:

// After determining and applying the final remediation
initialRemediation = getRemediationForIP(clientIP)
origin = initialRemediation.origin
finalAction = applyBouncerLogic(initialRemediation.action)

// Increment the counter with the final action
updateMetricsOriginsCount(origin, finalAction, 1)

When to Push Metrics

Typically push metrics on a scheduled interval (e.g., every 30 minutes):

// In your scheduled metrics push job
try:
    sentMetrics = pushUsageMetrics("my-bouncer", "1.0.0", "crowdsec-custom-bouncer")
    if sentMetrics.isEmpty():
        log("No metrics were sent")
    else:
        log("Successfully sent metrics", sentMetrics)
catch Exception as e:
    log("Failed to send metrics", e)

Existing Implementations

Remediation metrics have already been implemented in various languages and frameworks. You can use it as inspiration for your own implementation:

The LUA library used by the NGINX remediation component
The PHP library used by the WordPress remediation component.
The Firewall Bouncer written in Go. Used for nftables/iptables.

Overview​

Metrics Payload example​

Architecture Overview​

Key Features​

Core Concepts​

Implementation Guide​

1. Storage​

1.1 Cached Items​

1.1 Metrics Tracking​

2. Metrics Building Process​

2.1 Build Metrics Items​

2.2 Build Complete Metrics Payload​

3. Complete Push Metrics Implementation​

Useful Tips​

When to Update Metrics​

When to Push Metrics​

Existing Implementations​