Kubernetes Monitoring - Prometheus, EFK Stack

Prometheus Stack
There are 4 different places we need to monitor in Kubernetes:
- How is the K8s Cluster working? What objects are there?
- What is the current status of objects? EX: Are the replicas we wrote in Deployment really created?
- We need to monitor Nodes. We run containers on worker nodes. How are the CPU and memory usage, traffic of these nodes?
- Logs are generated inside containers. How will we read these logs?
Without using any monitoring tool;
–> I can learn the status of existing Pods with the kubectl get pods command.
–> I can learn the status of all objects in the system with kubectl get all -A.
–> For a specific object (EX: a pod), I can learn with kubectl describe podName.
–> I can learn what events have occurred in a cluster from the beginning with kubectl get events -A. (-A brings
us all namespaces.)
–> I can see CPU and Memory usage with kubectl top node. (If I say Pod, we can see the pod's.)
–> I can access logs with kubectl logs <podName>.
Although we can get what we want manually with such commands, it's easier and more logical to manage this from a central place. When an error occurs, the Alert mechanism should come into play and send me an email. We can manage all of these with Prometheus. (The first 3)
What is Prometheus?
–> It's a metrics server. It's used almost all over the world. CNCF project (like k8s).
–> Pull based operation: You do the installation, Prometheus collects the necessary metrics itself.
–> It doesn't only work with k8s, for example you can also use it in Frontend.
- Kubernetes Metrics pulls the current status of objects by talking to the k8s API and sends them to Prometheus.
- Node Exporter pulls the status of Nodes and sends them to Prometheus.
- Prometheus can talk directly with the Kubernetes API. It can learn the status of the cluster.
- We can run Query in Prometheus. But I need to visualize this information, I need to create dashboards. We can provide this with Grafana.
- Installation and integration of Prometheus and other tools is normally very tedious and complex. For this reason, prometheus-community created a stack under the name helm-charts that will facilitate this situation. Let's move to stack installation..
Installation
- Let's create a namespace named monitoring:
kubectl create namespace monitoring
- Let's install kubectl-prometheus-stack:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kubeprostack --namespace monitoring prometheus-community/kube-prometheus-stack
- Let's make sure it's installed:
kubectl get pods -n monitoring
# Output:
NAME READY STATUS RESTARTS AGE
alertmanager-kubeprostack-kube-promethe-alertmanager-0 2/2 Running 0 15m
kubeprostack-grafana-5c5d98864b-dn2jz 3/3 Running 0 15m
kubeprostack-kube-promethe-operator-5fcb5784fc-57pvs 1/1 Running 0 15m
kubeprostack-kube-state-metrics-5765b49669-6qqgl 1/1 Running 0 15m
kubeprostack-prometheus-node-exporter-82mcn 1/1 Running 0 15m
prometheus-kubeprostack-kube-promethe-prometheus-0 2/2 Running 0 15m
- Many pods like grafana are exposed in the installation. (Opened to the outside world) However, prometheus is not. For this, we need to do port-forwarding. Thus, we will be able to connect to prometheus from the web interface.
kubectl --namespace monitoring port-forward svc/kubeprostack-kube-promethe-prometheus 9090
# then we can go to http://localhost:9090.
- Let's go to the
http://localhost:9090Prometheus UI screen and run a few queries:
kube_pod_created # Shows all pods created so far.
count by (namespace) (kube_pod_created) # Distribution by namespace
sum by (namespace) (kube_pod_info) # Currently running pods
sum by (namespace) (kube_pod_status_ready{condition="false"}) # Pods not in Ready state "distribution by namespace"
- Let's check Grafana:
kubectl --namespace monitoring port-forward svc/kubeprostack-grafana 8080:80
# user: admin
# pw: prom-operator
# To get Grafana password:
kubectl get secret kubeprostack-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
- Let's check Alert Manager:
kubectl --namespace monitoring port-forward svc/kubeprostack-kube-promethe-alertmanager 9093
EFK Stack (ElasticSearch Fluentd Kibana)
It's the stack used for Logging.
- ElasticSearch is where logs are collected and stored.
- Kibana is used to visualize data taken from ElasticSearch. (Like Grafana in a way.)
- LogStash (ElasticSearch product) and Fluentd (CNCF project) are tools responsible for collecting logs.
- FluentD is more performant than LogStash.
Installation (on minikube)
The installation here is based on minikube, and yaml files have been prepared to change some settings. Although installation on Cloud is easily done with helm, when we use helm in minikube installation, we encounter some errors.
- Let's start minikube and activate the relevant storage addons:
minikube start --cpus 4 --memory 6144
minikube addons enable default-storageclass
minikube addons enable storage-provisioner
- Let's create a pod that will continuously generate logs: (
testpod.yaml)
apiVersion: v1
kind: Pod
metadata:
name: loggenerator
spec:
containers:
- name: loggenerator
image: busybox
args: [ /bin/sh, -c,'i=0; while true; do echo "Test Log $i"; i=$((i+1)); sleep 1; done' ]
- Let's create a new namespace named efk:
$ kubectl create namespace efk
- Let's create the ElasticSearch cluster:
elastic.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: efk
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
namespace: efk
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.16.0
resources:
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
- containerPort: 9200
name: rest
protocol: TCP
- containerPort: 9300
name: inter-node
protocol: TCP
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-cluster-0,es-cluster-1,es-cluster-2"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
initContainers:
- name: fix-permissions
image: busybox
command: [ "sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data" ]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: [ "sysctl", "-w", "vm.max_map_count=262144" ]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: [ "sh", "-c", "ulimit -n 65536" ]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 1Gi
kubectl apply -f elastic.yaml