AWS Distro for OpenTelemetry

Container Insights EKS Fargate

This section shows the design of components in an ADOT Collector pipeline that enables the collection of Container Insights metrics from EKS Fargate workloads and explains how to configure and deploy an ADOT Collector to collect system metrics from workloads deployed to an EKS Fargate cluster and send them to CloudWatch.

Design of Container Insights support in ADOT Collector for EKS Fargate

The kubelet on a worker node in a Kubernetes cluster exposes resource metrics such as CPU, memory, disk, and network usage at the /metrics/cadvisor endpoint. However, in EKS Fargate networking architecture, a pod is not allowed to directly reach the kubelet on that worker node. Hence, the ADOT Collector calls the Kubernetes API Server to proxy the connection to the kubelet on a worker node, and collect kubelet’s cAdvisor metrics for workloads on that node. These metrics are made available in Prometheus format. Therefore, the collector uses an instance of Prometheus Receiver as a drop-in replacement for a Prometheus server and scrapes these metrics from the Kubernetes API server endpoint. Using Kubernetes service discovery, the receiver can discover the worker nodes in an EKS cluster based on the role configured to discover targets, see here: kubernetes_sd_configs. Hence, a single collector is able to discover all worker nodes in an EKS cluster.

The metrics then go through a series of processors that perform filtering, renaming, data aggregation and conversion etc. The following is the list of processors used in the pipeline of an ADOT Collector instance for EKS Fargate illustrated above.

Filter Processor to include or exclude metrics based on their name
Metrics Transform Processor to rename metrics as well as perform aggregations on metrics across labels
Cumulative to Delta Processor to convert cumulative sum metrics to delta sums.
Delta to Rate Processor to convert delta sum metrics to rate metrics. This rate is a gauge.
Metrics Generation Processor to create new metrics using existing metrics

The final component in the pipeline is AWS CloudWatch EMF Exporter which converts the metrics to embedded metric format (EMF) and then sends them directly to CloudWatch Logs using the PutLogEvents API.

The following list of pod metrics is sent to CloudWatch by the ADOT Collector for each of the workloads running on EKS Fargate. Other data points will still be sent to CloudWatch as raw CloudWatch logs but not in Metric format.

pod_cpu_utilization_over_pod_limit
pod_cpu_usage_total
pod_cpu_limit
pod_memory_utilization_over_pod_limit
pod_memory_working_set
pod_memory_limit
pod_network_rx_bytes
pod_network_tx_bytes

Each metric will be associated with the following three dimension sets and collected under the CloudWatch namespace named ContainerInsights.

ClusterName, LaunchType
ClusterName, Namespace, LaunchType
ClusterName, Namespace, PodName, LaunchType

Deploying ADOT Collector to EKS Fargate

Let’s get into the details of deploying the ADOT Collector in an EKS Fargate cluster and then collecting metrics data from workloads.

List of pre-requisites for deploying the ADOT Collector.

An EKS cluster that supports Kubernetes version 1.21 or higher. You may create the EKS cluster using one of the approaches outlined here.
When your cluster creates pods on AWS Fargate, the components that run on the Fargate infrastructure must make calls to AWS APIs on your behalf. This is so that they can execute actions such as pull container images from Amazon ECR. The EKS pod execution role provides the IAM permissions to do this. Create a Fargate pod execution role as outlined here.
Before you can schedule pods running on Fargate, you must define a Fargate profile that specifies which pods should use Fargate when they are launched. For the sample deployments used here, we create two Fargate profiles as outlined here. The first Fargate profile is named fargate-container-insights, specifying the namespace fargate-container-insights. The second one is named applications, specifying the namespace golang.
The ADOT Collector requires IAM permissions to send performance log events to CloudWatch. This is done by associating a Kubernetes service account with an IAM role using the IAM Roles for Service Accounts (IRSA) feature supported by EKS. The IAM role should be associated with the AWS managed policy CloudWatchAgentServerPolicy. The helper script shown below may be used, after substituting the CLUSTER_NAME and REGION variables, to create an IAM role named EKS-ADOT-ServiceAccount-Role that is granted these permissions and is associated with a Kubernetes service account named adot-collector.

##!/bin/bash
CLUSTER_NAME=YOUR-EKS-CLUSTER-NAME
REGION=YOUR-EKS-CLUSTER-REGION
SERVICE_ACCOUNT_NAMESPACE=fargate-container-insights
SERVICE_ACCOUNT_NAME=adot-collector
SERVICE_ACCOUNT_IAM_ROLE=EKS-Fargate-ADOT-ServiceAccount-Role
SERVICE_ACCOUNT_IAM_POLICY=arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

eksctl utils associate-iam-oidc-provider \
--cluster=$CLUSTER_NAME \
--approve

eksctl create iamserviceaccount \
--cluster=$CLUSTER_NAME \
--region=$REGION \
--name=$SERVICE_ACCOUNT_NAME \
--namespace=$SERVICE_ACCOUNT_NAMESPACE \
--role-name=$SERVICE_ACCOUNT_IAM_ROLE \
--attach-policy-arn=$SERVICE_ACCOUNT_IAM_POLICY \
--approve

Deploy the ADOT Collector as a Kubernetes StatefulSet using the eks-fargate-container-insights.yaml after replacing the placeholder variable YOUR-EKS-CLUSTER-NAME in the manifest with the name of your EKS cluster and replacing region=us-east-1 in the emfexporter if you are not using us-east-1 region.
Deploy a sample stateless workload to the cluster with the following deployment manifest.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
  namespace: golang
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webapp
      role: webapp-service
  template:
    metadata:
      labels:
        app: webapp
        role: webapp-service
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '3000'
        prometheus.io/path: '/metrics'
    spec:
      containers:
        - name: go
          image: public.ecr.aws/awsvijisarathy/prometheus-webapp:latest
          imagePullPolicy: Always
          resources:
            requests:
              cpu: "256m"
              memory: "512Mi"
            limits:
              cpu: "512m"
              memory: "1024Mi"

The above deployments are both targeting a namespace associated with a Fargate profile and hence the workloads will be scheduled to run on Fargate. It may take a couple of minutes for a Fargate worker node to be provisioned for each of these workloads and for the pods to reach a Ready status. Executing the command kubectl get nodes -l eks.amazonaws.com/compute-type=fargate should now list the Fargate worker nodes named with the prefix fargate. Verify that the ADOT Collector and the workload pods are all running using the following commands:

kubectl get pods -n fargate-container-insights
kubectl get pods -n golang

Visualizing EKS Fargate resource metrics using CloudWatch Container Insights

The performance log events for the workloads will be found under the log group named /aws/containerinsights/CLUSTER_NAME/performance as shown below. A separate log stream is created for each pod running on Fargate.

Shown below is a representative example of the JSON data with embedded metric format contained in one of the log events that identifies the data as pertaining to the metrics named pod_cpu_usage_total and pod_cpu_utilization_over_pod_limit.

Shown below is a graph of the same metric pod_cpu_utilization_over_pod_limit seen in the CloudWatch metrics dashboard.

The metrics may also be visualized using the prebuilt Container Insights dashboards that display data at the cluster, node, namespace, service and pod level. Shown below is a view of the dashboard displaying EKS Fargate metrics at the cluster level.

Container Insights EKS Fargate

Container Insights EKS Fargate

Design of Container Insights support in ADOT Collector for EKS Fargate

Deploying ADOT Collector to EKS Fargate

Visualizing EKS Fargate resource metrics using CloudWatch Container Insights

On this page