Centralized Kubernetes Logging with Fluentbit: A Step-by-Step Guide

Introduction to EFK Stack for Kubernetes Logging

In Kubernetes environments, especially those running microservices architectures (MSA), managing logs can become complex. Containers are ephemeral, and pods can be deleted or rescheduled, leading to potential log data loss if not handled properly. To address these challenges, implementing a centralized logging system is crucial. The EFK stack (Elasticsearch, Fluentd/Fluentbit, Kibana) is a popular choice for building a robust and scalable logging solution for Kubernetes. This article focuses on using Fluentbit within the EFK stack to collect, process, and forward logs from your Kubernetes cluster.

Benefits of using EFK with Fluentbit:

Log Persistence: Even when pods are deleted, container logs are preserved as they are shipped to a central storage.
Centralized Log Management: Aggregates logs from all nodes and pods in one location, simplifying log analysis and troubleshooting in MSA environments.
Scalability and High Availability: The EFK stack is designed to handle large volumes of log data, offering scalability and high availability.
Real-time Log Analysis and Visualization: Elasticsearch enables fast searching, analysis, and storage of logs, while Kibana provides a web interface to visualize and explore log data through charts and dashboards.
Versatile Data Handling: Suitable for both log and metric data, allowing for unified monitoring and observability.
Cluster-wide Log Collection: Efficiently gathers container logs from all nodes within the Kubernetes cluster.
Resource Monitoring: Can also collect CPU, memory, and disk usage metrics from Kubernetes nodes, offering a holistic view of cluster health.

Understanding Fluentbit

Fluentbit acts as the log forwarder in the EFK stack. It is a lightweight and efficient open-source Log Processor and Forwarder which allows you to collect data and logs from different sources, unify and send them to multiple destinations.

Fluentbit operates through a pipeline architecture:

INPUT -> (PARSER) -> FILTER -> BUFFER -> (ROUTER) -> OUTPUT

This pipeline defines how Fluentbit processes log data:

INPUT: Defines the sources from which Fluentbit collects logs and metrics. In Kubernetes, this is typically container logs. Refer to the official Fluentbit documentation for Inputs.
PARSER: Transforms unstructured log data into a structured format. Parsers use regular expressions to extract relevant fields from log messages. For Docker container logs, the docker parser is commonly used to format logs into JSON, with the log message stored under the log key. See the Fluentbit Parsers documentation for more details.
FILTER: Allows you to modify or filter log records based on specific criteria. Filters can be used to include only relevant logs, add metadata, or apply further parsing.
BUFFER: Acts as a temporary storage for logs, handling fluctuations in input and output speeds. Buffers can be configured to use memory or file storage. Logs are grouped into chunks based on tags.
OUTPUT: Specifies the destination where processed logs are sent. Fluentbit supports various outputs, including Elasticsearch (ES), HTTP endpoints, cloud services like Cloudwatch, S3, and Firehose, and message queues like Logstash. Consult the Fluentbit Outputs documentation for a complete list.

Fluentbit Configuration Options:

Fluent Option	Description	Notes
INPUT	Defines the source of log or metric events (records) that Fluentbit will collect and process in the pipeline.	Official Documentation
PARSER	Converts unstructured log data into structured data, often using regular expressions. The `docker` parser is crucial for standardizing Docker container logs into JSON.	Official Documentation
FILTER	Enables filtering and modification of log events. You can select specific logs or enrich data with additional parsing or metadata.
BUFFER	Provides temporary storage to manage differences in processing speeds between input and output stages. Buffers can be memory-based or file-based, and are organized into chunks per tag.
OUTPUT	Determines the destination for processed log records. Fluentbit offers a wide range of output plugins, including Elasticsearch, HTTP, and cloud storage solutions.	Official Documentation

Kubernetes Cluster Deployment for Fluentbit

To deploy Fluentbit in your Kubernetes cluster, you will need the following YAML configuration files. These files should be applied in the order listed, typically within a dedicated logging namespace. These configurations define how Fluentbit collects, filters, parses, and forwards logs based on the options specified in the ConfigMap.

Deployment Files:

Service-account.yaml:
- Creates a dedicated service account for Fluentbit.
- Service accounts are essential for assigning specific roles and permissions to applications running in Kubernetes.
```
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
```
Role.yaml:
- Defines a ClusterRole named fluent-bit-read with restricted permissions.
- This role grants Fluentbit read-only access to Kubernetes namespaces and pods, necessary for log collection.
- Role-Based Access Control (RBAC) is implemented for security, ensuring Fluentbit only accesses required resources. Learn more about RBAC in Kubernetes.
```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  verbs: ["get", "list", "watch"]
```

Role-binding.yaml:

Binds the fluent-bit-read ClusterRole to the fluent-bit service account.
This RoleBinding grants the defined permissions to the Fluentbit application running with the specified service account.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-read
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: logging

ConfigMap.yaml:

Contains the core configuration for Fluentbit’s behavior.
Defines settings for input, filtering, parsing, and output stages of the Fluentbit pipeline.
In this example, it includes configurations for collecting Krakend API Gateway logs and Kubernetes metadata.
The ConfigMap allows for flexible configuration management of Fluentbit without modifying the DaemonSet directly.

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
      Flush           1
      Log_Level       info
      Daemon          off
      Parsers_File    parsers.conf
      HTTP_Server     On
      HTTP_Listen     0.0.0.0
      HTTP_Port       2020
    [OUTPUT]
      Name            stdout
      Match           *
    [OUTPUT]
      Name            http
      Match           *
      Host            amazonaws.com
      Port            8080
      Format          json
    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
  input-kubernetes.conf: |
    [INPUT]
      Name             tail
      Tag              kube.*
      #Path             /var/log/containers/*.log # Collect all container logs
      Path             /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello'
      Parser           docker
      DB                 /var/log/flb_kube.db
      Mem_Buf_Limit      5MB
      Skip_Long_Lines    On
      Refresh_Interval   10
  filter-kubernetes.conf: |
    [FILTER]
      Name             kubernetes
      Match            kube.*
      Kube_URL         https://kubernetes.default.svc:443
      Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
      Kube_Tag_Prefix  kube.var.log.containers.
      Merge_Log        On
      Merge_Log_Key    log_processed
      K8S-Logging.Parser   On
      K8S-Logging.Exclude  Off
  parsers.conf: |
    [PARSER]
      Name         apache
      Format       regex
      Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) [(?

DeamonSet.yaml:

Deploys Fluentbit as a DaemonSet, ensuring one Fluentbit pod runs on each Kubernetes node.
This is crucial for collecting logs from every node in the cluster efficiently.
DaemonSets are ideal for node-level agents like log collectors.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit-logging
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:1.5
        imagePullPolicy: Always
        ports:
        - containerPort: 2020
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch" # Replace with your Elasticsearch host
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200" # Replace with your Elasticsearch port
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

Detailed Fluentbit Configuration

By default, Fluentbit, when configured according to standard guides, collects logs from all pods within the Kubernetes cluster. A key aspect of Kubernetes logging is that container logs are stored on the node’s host filesystem under /var/log/containers/{POD_NAME}-{UID).log. Kubernetes also employs log rotation (LogRotate) to manage log file sizes and prevent disk space exhaustion.

Customizing Data Collection (Input):

  @INCLUDE input-kubernetes.conf
  input-kubernetes.conf: |
    [INPUT]
      Name             tail
      Tag              kube.*
      #Path             /var/log/containers/*.log # Collect all container logs
      Path             /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello'
      Parser           docker
      DB                 /var/log/flb_kube.db
      Mem_Buf_Limit      5MB
      Skip_Long_Lines    On
      Refresh_Interval   10

The input-kubernetes.conf file configures the log input source.
Typically, kubectl logs -f POD_NAME -n logging uses the Kubernetes API to tail the log file /var/log/containers/{POD_NAME}-{UID).log.
The Path parameter in the INPUT section specifies which log files Fluentbit should monitor.
- By default, Path /var/log/containers/*.log would collect logs from all containers.
- To collect logs from specific services, you can narrow down the path. In this example, Path /var/log/containers/hello*.log is used to collect logs only from containers whose names start with “hello-“. This is useful for focusing on specific application logs, such as “hello-lg-*.log” as mentioned in the original article, to target logs related to a service named “hello-lg”.

Configuring Data Output (Output):

  # Output to stdout for debugging
  [OUTPUT]
    Name            stdout
    Match           *

  # Output to HTTP endpoint (e.g., Logstash)
  [OUTPUT]
    Name            http
    Match           *
    Host            amazonaws.com # Replace with your Logstash or HTTP endpoint host
    Port            8080 # Replace with your Logstash or HTTP endpoint port
    Format          json

The OUTPUT section defines where Fluentbit sends the processed logs.
The first OUTPUT block configures stdout output, which is useful for debugging and verifying that Fluentbit is collecting and processing logs correctly. Logs are printed to the Fluentbit container’s standard output.
The second OUTPUT block configures http output, designed to send logs to an HTTP endpoint, such as Logstash.
- The Host and Port parameters should be configured to point to your Logstash instance or any other HTTP endpoint that can receive logs.
- Format json specifies that logs should be sent in JSON format.
- Note that using HTTP output for Logstash (as in the original example with amazonaws.com:8080) is less common than using dedicated protocols like the Elasticsearch output plugin for direct integration. HTTP output typically requires additional pipeline configurations in Logstash to handle incoming HTTP requests. For direct Elasticsearch integration, using the elasticsearch output plugin is recommended for better performance and features.

Results and Verification

Previously, without targeted filtering, logs from kube-system and other pods might have been mixed in, making it difficult to isolate specific application logs. By customizing the Fluentbit ConfigMap, specifically the input-kubernetes.conf and adjusting the Path, you can ensure that only the logs from your target services are collected. After deploying these configurations, you should verify that Fluentbit is indeed collecting and forwarding only the intended logs. You can check the Fluentbit pod logs (using kubectl logs -n logging <fluentbit-pod-name>) and your configured output destination (e.g., Elasticsearch or Logstash) to confirm that logs from your target services are being successfully processed and delivered.

References

Resource	URL
(Official) Fluentbit Kubernetes Guide	https://docs.fluentbit.io/manual/installation/kubernetes
EFK-Fluentbit Usage Guide (Korean)	https://frozenpond.tistory.com/201