Centralized Kubernetes Logging with Fluentbit: A Step-by-Step Guide

Introduction to EFK Stack for Kubernetes Logging

In Kubernetes environments, especially those running microservices architectures (MSA), managing logs can become complex. Containers are ephemeral, and pods can be deleted or rescheduled, leading to potential log data loss if not handled properly. To address these challenges, implementing a centralized logging system is crucial. The EFK stack (Elasticsearch, Fluentd/Fluentbit, Kibana) is a popular choice for building a robust and scalable logging solution for Kubernetes. This article focuses on using Fluentbit within the EFK stack to collect, process, and forward logs from your Kubernetes cluster.

Benefits of using EFK with Fluentbit:

  • Log Persistence: Even when pods are deleted, container logs are preserved as they are shipped to a central storage.
  • Centralized Log Management: Aggregates logs from all nodes and pods in one location, simplifying log analysis and troubleshooting in MSA environments.
  • Scalability and High Availability: The EFK stack is designed to handle large volumes of log data, offering scalability and high availability.
  • Real-time Log Analysis and Visualization: Elasticsearch enables fast searching, analysis, and storage of logs, while Kibana provides a web interface to visualize and explore log data through charts and dashboards.
  • Versatile Data Handling: Suitable for both log and metric data, allowing for unified monitoring and observability.
  • Cluster-wide Log Collection: Efficiently gathers container logs from all nodes within the Kubernetes cluster.
  • Resource Monitoring: Can also collect CPU, memory, and disk usage metrics from Kubernetes nodes, offering a holistic view of cluster health.

Understanding Fluentbit

Fluentbit acts as the log forwarder in the EFK stack. It is a lightweight and efficient open-source Log Processor and Forwarder which allows you to collect data and logs from different sources, unify and send them to multiple destinations.

Fluentbit operates through a pipeline architecture:

INPUT -> (PARSER) -> FILTER -> BUFFER -> (ROUTER) -> OUTPUT

This pipeline defines how Fluentbit processes log data:

  • INPUT: Defines the sources from which Fluentbit collects logs and metrics. In Kubernetes, this is typically container logs. Refer to the official Fluentbit documentation for Inputs.
  • PARSER: Transforms unstructured log data into a structured format. Parsers use regular expressions to extract relevant fields from log messages. For Docker container logs, the docker parser is commonly used to format logs into JSON, with the log message stored under the log key. See the Fluentbit Parsers documentation for more details.
  • FILTER: Allows you to modify or filter log records based on specific criteria. Filters can be used to include only relevant logs, add metadata, or apply further parsing.
  • BUFFER: Acts as a temporary storage for logs, handling fluctuations in input and output speeds. Buffers can be configured to use memory or file storage. Logs are grouped into chunks based on tags.
  • OUTPUT: Specifies the destination where processed logs are sent. Fluentbit supports various outputs, including Elasticsearch (ES), HTTP endpoints, cloud services like Cloudwatch, S3, and Firehose, and message queues like Logstash. Consult the Fluentbit Outputs documentation for a complete list.

Fluentbit Configuration Options:

Fluent Option Description Notes
INPUT Defines the source of log or metric events (records) that Fluentbit will collect and process in the pipeline. Official Documentation
PARSER Converts unstructured log data into structured data, often using regular expressions. The docker parser is crucial for standardizing Docker container logs into JSON. Official Documentation
FILTER Enables filtering and modification of log events. You can select specific logs or enrich data with additional parsing or metadata.
BUFFER Provides temporary storage to manage differences in processing speeds between input and output stages. Buffers can be memory-based or file-based, and are organized into chunks per tag.
OUTPUT Determines the destination for processed log records. Fluentbit offers a wide range of output plugins, including Elasticsearch, HTTP, and cloud storage solutions. Official Documentation

Kubernetes Cluster Deployment for Fluentbit

To deploy Fluentbit in your Kubernetes cluster, you will need the following YAML configuration files. These files should be applied in the order listed, typically within a dedicated logging namespace. These configurations define how Fluentbit collects, filters, parses, and forwards logs based on the options specified in the ConfigMap.

Deployment Files:

  1. Service-account.yaml:

    • Creates a dedicated service account for Fluentbit.
    • Service accounts are essential for assigning specific roles and permissions to applications running in Kubernetes.
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: fluent-bit
      namespace: logging
  2. Role.yaml:

    • Defines a ClusterRole named fluent-bit-read with restricted permissions.
    • This role grants Fluentbit read-only access to Kubernetes namespaces and pods, necessary for log collection.
    • Role-Based Access Control (RBAC) is implemented for security, ensuring Fluentbit only accesses required resources. Learn more about RBAC in Kubernetes.
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: fluent-bit-read
    rules:
    - apiGroups: [""]
      resources:
      - namespaces
      - pods
      verbs: ["get", "list", "watch"]
  3. Role-binding.yaml:

    • Binds the fluent-bit-read ClusterRole to the fluent-bit service account.
    • This RoleBinding grants the defined permissions to the Fluentbit application running with the specified service account.
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: fluent-bit-read
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: fluent-bit-read
    subjects:
    - kind: ServiceAccount
      name: fluent-bit
      namespace: logging
  4. ConfigMap.yaml:

    • Contains the core configuration for Fluentbit’s behavior.
    • Defines settings for input, filtering, parsing, and output stages of the Fluentbit pipeline.
    • In this example, it includes configurations for collecting Krakend API Gateway logs and Kubernetes metadata.
    • The ConfigMap allows for flexible configuration management of Fluentbit without modifying the DaemonSet directly.
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: fluent-bit-config
      namespace: logging
      labels:
        k8s-app: fluent-bit
    data:
      # Configuration files: server, input, filters and output
      # ======================================================
      fluent-bit.conf: |
        [SERVICE]
          Flush           1
          Log_Level       info
          Daemon          off
          Parsers_File    parsers.conf
          HTTP_Server     On
          HTTP_Listen     0.0.0.0
          HTTP_Port       2020
        [OUTPUT]
          Name            stdout
          Match           *
        [OUTPUT]
          Name            http
          Match           *
          Host            amazonaws.com
          Port            8080
          Format          json
        @INCLUDE input-kubernetes.conf
        @INCLUDE filter-kubernetes.conf
      input-kubernetes.conf: |
        [INPUT]
          Name             tail
          Tag              kube.*
          #Path             /var/log/containers/*.log # Collect all container logs
          Path             /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello'
          Parser           docker
          DB                 /var/log/flb_kube.db
          Mem_Buf_Limit      5MB
          Skip_Long_Lines    On
          Refresh_Interval   10
      filter-kubernetes.conf: |
        [FILTER]
          Name             kubernetes
          Match            kube.*
          Kube_URL         https://kubernetes.default.svc:443
          Kube_CA_File     /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          Kube_Token_File  /var/run/secrets/kubernetes.io/serviceaccount/token
          Kube_Tag_Prefix  kube.var.log.containers.
          Merge_Log        On
          Merge_Log_Key    log_processed
          K8S-Logging.Parser   On
          K8S-Logging.Exclude  Off
      parsers.conf: |
        [PARSER]
          Name         apache
          Format       regex
          Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) [(?
  5. DeamonSet.yaml:

    • Deploys Fluentbit as a DaemonSet, ensuring one Fluentbit pod runs on each Kubernetes node.
    • This is crucial for collecting logs from every node in the cluster efficiently.
    • DaemonSets are ideal for node-level agents like log collectors.
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: fluent-bit
      namespace: logging
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      selector:
        matchLabels:
          k8s-app: fluent-bit-logging
      template:
        metadata:
          labels:
            k8s-app: fluent-bit-logging
            version: v1
            kubernetes.io/cluster-service: "true"
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "2020"
            prometheus.io/path: /api/v1/metrics/prometheus
        spec:
          containers:
          - name: fluent-bit
            image: fluent/fluent-bit:1.5
            imagePullPolicy: Always
            ports:
            - containerPort: 2020
            env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch" # Replace with your Elasticsearch host
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200" # Replace with your Elasticsearch port
            volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
            terminationGracePeriodSeconds: 10
          volumes:
          - name: varlog
            hostPath:
              path: /var/log
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
          - name: fluent-bit-config
            configMap:
              name: fluent-bit-config
          serviceAccountName: fluent-bit
          tolerations:
          - key: node-role.kubernetes.io/master
            operator: Exists
            effect: NoSchedule
          - operator: "Exists"
            effect: "NoExecute"
          - operator: "Exists"
            effect: "NoSchedule"

Detailed Fluentbit Configuration

By default, Fluentbit, when configured according to standard guides, collects logs from all pods within the Kubernetes cluster. A key aspect of Kubernetes logging is that container logs are stored on the node’s host filesystem under /var/log/containers/{POD_NAME}-{UID).log. Kubernetes also employs log rotation (LogRotate) to manage log file sizes and prevent disk space exhaustion.

Customizing Data Collection (Input):

  @INCLUDE input-kubernetes.conf
  input-kubernetes.conf: |
    [INPUT]
      Name             tail
      Tag              kube.*
      #Path             /var/log/containers/*.log # Collect all container logs
      Path             /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello'
      Parser           docker
      DB                 /var/log/flb_kube.db
      Mem_Buf_Limit      5MB
      Skip_Long_Lines    On
      Refresh_Interval   10
  • The input-kubernetes.conf file configures the log input source.
  • Typically, kubectl logs -f POD_NAME -n logging uses the Kubernetes API to tail the log file /var/log/containers/{POD_NAME}-{UID).log.
  • The Path parameter in the INPUT section specifies which log files Fluentbit should monitor.
    • By default, Path /var/log/containers/*.log would collect logs from all containers.
    • To collect logs from specific services, you can narrow down the path. In this example, Path /var/log/containers/hello*.log is used to collect logs only from containers whose names start with “hello-“. This is useful for focusing on specific application logs, such as “hello-lg-*.log” as mentioned in the original article, to target logs related to a service named “hello-lg”.

Configuring Data Output (Output):

  # Output to stdout for debugging
  [OUTPUT]
    Name            stdout
    Match           *

  # Output to HTTP endpoint (e.g., Logstash)
  [OUTPUT]
    Name            http
    Match           *
    Host            amazonaws.com # Replace with your Logstash or HTTP endpoint host
    Port            8080 # Replace with your Logstash or HTTP endpoint port
    Format          json
  • The OUTPUT section defines where Fluentbit sends the processed logs.
  • The first OUTPUT block configures stdout output, which is useful for debugging and verifying that Fluentbit is collecting and processing logs correctly. Logs are printed to the Fluentbit container’s standard output.
  • The second OUTPUT block configures http output, designed to send logs to an HTTP endpoint, such as Logstash.
    • The Host and Port parameters should be configured to point to your Logstash instance or any other HTTP endpoint that can receive logs.
    • Format json specifies that logs should be sent in JSON format.
    • Note that using HTTP output for Logstash (as in the original example with amazonaws.com:8080) is less common than using dedicated protocols like the Elasticsearch output plugin for direct integration. HTTP output typically requires additional pipeline configurations in Logstash to handle incoming HTTP requests. For direct Elasticsearch integration, using the elasticsearch output plugin is recommended for better performance and features.

Results and Verification

Previously, without targeted filtering, logs from kube-system and other pods might have been mixed in, making it difficult to isolate specific application logs. By customizing the Fluentbit ConfigMap, specifically the input-kubernetes.conf and adjusting the Path, you can ensure that only the logs from your target services are collected. After deploying these configurations, you should verify that Fluentbit is indeed collecting and forwarding only the intended logs. You can check the Fluentbit pod logs (using kubectl logs -n logging <fluentbit-pod-name>) and your configured output destination (e.g., Elasticsearch or Logstash) to confirm that logs from your target services are being successfully processed and delivered.

References

Resource URL
(Official) Fluentbit Kubernetes Guide https://docs.fluentbit.io/manual/installation/kubernetes
EFK-Fluentbit Usage Guide (Korean) https://frozenpond.tistory.com/201

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *