Introduction to EFK Stack for Kubernetes Logging
In Kubernetes environments, especially those running microservices architectures (MSA), managing logs can become complex. Containers are ephemeral, and pods can be deleted or rescheduled, leading to potential log data loss if not handled properly. To address these challenges, implementing a centralized logging system is crucial. The EFK stack (Elasticsearch, Fluentd/Fluentbit, Kibana) is a popular choice for building a robust and scalable logging solution for Kubernetes. This article focuses on using Fluentbit within the EFK stack to collect, process, and forward logs from your Kubernetes cluster.
Benefits of using EFK with Fluentbit:
- Log Persistence: Even when pods are deleted, container logs are preserved as they are shipped to a central storage.
- Centralized Log Management: Aggregates logs from all nodes and pods in one location, simplifying log analysis and troubleshooting in MSA environments.
- Scalability and High Availability: The EFK stack is designed to handle large volumes of log data, offering scalability and high availability.
- Real-time Log Analysis and Visualization: Elasticsearch enables fast searching, analysis, and storage of logs, while Kibana provides a web interface to visualize and explore log data through charts and dashboards.
- Versatile Data Handling: Suitable for both log and metric data, allowing for unified monitoring and observability.
- Cluster-wide Log Collection: Efficiently gathers container logs from all nodes within the Kubernetes cluster.
- Resource Monitoring: Can also collect CPU, memory, and disk usage metrics from Kubernetes nodes, offering a holistic view of cluster health.
Understanding Fluentbit
Fluentbit acts as the log forwarder in the EFK stack. It is a lightweight and efficient open-source Log Processor and Forwarder which allows you to collect data and logs from different sources, unify and send them to multiple destinations.
Fluentbit operates through a pipeline architecture:
INPUT -> (PARSER) -> FILTER -> BUFFER -> (ROUTER) -> OUTPUT
This pipeline defines how Fluentbit processes log data:
- INPUT: Defines the sources from which Fluentbit collects logs and metrics. In Kubernetes, this is typically container logs. Refer to the official Fluentbit documentation for Inputs.
- PARSER: Transforms unstructured log data into a structured format. Parsers use regular expressions to extract relevant fields from log messages. For Docker container logs, the
docker
parser is commonly used to format logs into JSON, with the log message stored under thelog
key. See the Fluentbit Parsers documentation for more details. - FILTER: Allows you to modify or filter log records based on specific criteria. Filters can be used to include only relevant logs, add metadata, or apply further parsing.
- BUFFER: Acts as a temporary storage for logs, handling fluctuations in input and output speeds. Buffers can be configured to use memory or file storage. Logs are grouped into chunks based on tags.
- OUTPUT: Specifies the destination where processed logs are sent. Fluentbit supports various outputs, including Elasticsearch (ES), HTTP endpoints, cloud services like Cloudwatch, S3, and Firehose, and message queues like Logstash. Consult the Fluentbit Outputs documentation for a complete list.
Fluentbit Configuration Options:
Fluent Option | Description | Notes |
---|---|---|
INPUT | Defines the source of log or metric events (records) that Fluentbit will collect and process in the pipeline. | Official Documentation |
PARSER | Converts unstructured log data into structured data, often using regular expressions. The docker parser is crucial for standardizing Docker container logs into JSON. |
Official Documentation |
FILTER | Enables filtering and modification of log events. You can select specific logs or enrich data with additional parsing or metadata. | |
BUFFER | Provides temporary storage to manage differences in processing speeds between input and output stages. Buffers can be memory-based or file-based, and are organized into chunks per tag. | |
OUTPUT | Determines the destination for processed log records. Fluentbit offers a wide range of output plugins, including Elasticsearch, HTTP, and cloud storage solutions. | Official Documentation |
Kubernetes Cluster Deployment for Fluentbit
To deploy Fluentbit in your Kubernetes cluster, you will need the following YAML configuration files. These files should be applied in the order listed, typically within a dedicated logging
namespace. These configurations define how Fluentbit collects, filters, parses, and forwards logs based on the options specified in the ConfigMap.
Deployment Files:
-
Service-account.yaml:
- Creates a dedicated service account for Fluentbit.
- Service accounts are essential for assigning specific roles and permissions to applications running in Kubernetes.
apiVersion: v1 kind: ServiceAccount metadata: name: fluent-bit namespace: logging
-
Role.yaml:
- Defines a ClusterRole named
fluent-bit-read
with restricted permissions. - This role grants Fluentbit read-only access to Kubernetes namespaces and pods, necessary for log collection.
- Role-Based Access Control (RBAC) is implemented for security, ensuring Fluentbit only accesses required resources. Learn more about RBAC in Kubernetes.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fluent-bit-read rules: - apiGroups: [""] resources: - namespaces - pods verbs: ["get", "list", "watch"]
- Defines a ClusterRole named
-
Role-binding.yaml:
- Binds the
fluent-bit-read
ClusterRole to thefluent-bit
service account. - This RoleBinding grants the defined permissions to the Fluentbit application running with the specified service account.
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: fluent-bit-read roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: fluent-bit-read subjects: - kind: ServiceAccount name: fluent-bit namespace: logging
- Binds the
-
ConfigMap.yaml:
- Contains the core configuration for Fluentbit’s behavior.
- Defines settings for input, filtering, parsing, and output stages of the Fluentbit pipeline.
- In this example, it includes configurations for collecting Krakend API Gateway logs and Kubernetes metadata.
- The
ConfigMap
allows for flexible configuration management of Fluentbit without modifying the DaemonSet directly.
apiVersion: v1 kind: ConfigMap metadata: name: fluent-bit-config namespace: logging labels: k8s-app: fluent-bit data: # Configuration files: server, input, filters and output # ====================================================== fluent-bit.conf: | [SERVICE] Flush 1 Log_Level info Daemon off Parsers_File parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 [OUTPUT] Name stdout Match * [OUTPUT] Name http Match * Host amazonaws.com Port 8080 Format json @INCLUDE input-kubernetes.conf @INCLUDE filter-kubernetes.conf input-kubernetes.conf: | [INPUT] Name tail Tag kube.* #Path /var/log/containers/*.log # Collect all container logs Path /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello' Parser docker DB /var/log/flb_kube.db Mem_Buf_Limit 5MB Skip_Long_Lines On Refresh_Interval 10 filter-kubernetes.conf: | [FILTER] Name kubernetes Match kube.* Kube_URL https://kubernetes.default.svc:443 Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token Kube_Tag_Prefix kube.var.log.containers. Merge_Log On Merge_Log_Key log_processed K8S-Logging.Parser On K8S-Logging.Exclude Off parsers.conf: | [PARSER] Name apache Format regex Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) [(?
-
DeamonSet.yaml:
- Deploys Fluentbit as a DaemonSet, ensuring one Fluentbit pod runs on each Kubernetes node.
- This is crucial for collecting logs from every node in the cluster efficiently.
- DaemonSets are ideal for node-level agents like log collectors.
apiVersion: apps/v1 kind: DaemonSet metadata: name: fluent-bit namespace: logging labels: k8s-app: fluent-bit-logging version: v1 kubernetes.io/cluster-service: "true" spec: selector: matchLabels: k8s-app: fluent-bit-logging template: metadata: labels: k8s-app: fluent-bit-logging version: v1 kubernetes.io/cluster-service: "true" annotations: prometheus.io/scrape: "true" prometheus.io/port: "2020" prometheus.io/path: /api/v1/metrics/prometheus spec: containers: - name: fluent-bit image: fluent/fluent-bit:1.5 imagePullPolicy: Always ports: - containerPort: 2020 env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch" # Replace with your Elasticsearch host - name: FLUENT_ELASTICSEARCH_PORT value: "9200" # Replace with your Elasticsearch port volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: fluent-bit-config mountPath: /fluent-bit/etc/ terminationGracePeriodSeconds: 10 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: fluent-bit-config configMap: name: fluent-bit-config serviceAccountName: fluent-bit tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule - operator: "Exists" effect: "NoExecute" - operator: "Exists" effect: "NoSchedule"
Detailed Fluentbit Configuration
By default, Fluentbit, when configured according to standard guides, collects logs from all pods within the Kubernetes cluster. A key aspect of Kubernetes logging is that container logs are stored on the node’s host filesystem under /var/log/containers/{POD_NAME}-{UID).log
. Kubernetes also employs log rotation (LogRotate
) to manage log file sizes and prevent disk space exhaustion.
Customizing Data Collection (Input):
@INCLUDE input-kubernetes.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
#Path /var/log/containers/*.log # Collect all container logs
Path /var/log/containers/hello*.log # Collect logs only from containers with names starting with 'hello'
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
- The
input-kubernetes.conf
file configures the log input source. - Typically,
kubectl logs -f POD_NAME -n logging
uses the Kubernetes API to tail the log file/var/log/containers/{POD_NAME}-{UID).log
. - The
Path
parameter in theINPUT
section specifies which log files Fluentbit should monitor.- By default,
Path /var/log/containers/*.log
would collect logs from all containers. - To collect logs from specific services, you can narrow down the path. In this example,
Path /var/log/containers/hello*.log
is used to collect logs only from containers whose names start with “hello-“. This is useful for focusing on specific application logs, such as “hello-lg-*.log” as mentioned in the original article, to target logs related to a service named “hello-lg”.
- By default,
Configuring Data Output (Output):
# Output to stdout for debugging
[OUTPUT]
Name stdout
Match *
# Output to HTTP endpoint (e.g., Logstash)
[OUTPUT]
Name http
Match *
Host amazonaws.com # Replace with your Logstash or HTTP endpoint host
Port 8080 # Replace with your Logstash or HTTP endpoint port
Format json
- The
OUTPUT
section defines where Fluentbit sends the processed logs. - The first
OUTPUT
block configuresstdout
output, which is useful for debugging and verifying that Fluentbit is collecting and processing logs correctly. Logs are printed to the Fluentbit container’s standard output. - The second
OUTPUT
block configureshttp
output, designed to send logs to an HTTP endpoint, such as Logstash.- The
Host
andPort
parameters should be configured to point to your Logstash instance or any other HTTP endpoint that can receive logs. Format json
specifies that logs should be sent in JSON format.- Note that using HTTP output for Logstash (as in the original example with
amazonaws.com:8080
) is less common than using dedicated protocols like the Elasticsearch output plugin for direct integration. HTTP output typically requires additional pipeline configurations in Logstash to handle incoming HTTP requests. For direct Elasticsearch integration, using theelasticsearch
output plugin is recommended for better performance and features.
- The
Results and Verification
Previously, without targeted filtering, logs from kube-system
and other pods might have been mixed in, making it difficult to isolate specific application logs. By customizing the Fluentbit ConfigMap, specifically the input-kubernetes.conf
and adjusting the Path
, you can ensure that only the logs from your target services are collected. After deploying these configurations, you should verify that Fluentbit is indeed collecting and forwarding only the intended logs. You can check the Fluentbit pod logs (using kubectl logs -n logging <fluentbit-pod-name>
) and your configured output destination (e.g., Elasticsearch or Logstash) to confirm that logs from your target services are being successfully processed and delivered.
References
Resource | URL |
---|---|
(Official) Fluentbit Kubernetes Guide | https://docs.fluentbit.io/manual/installation/kubernetes |
EFK-Fluentbit Usage Guide (Korean) | https://frozenpond.tistory.com/201 |