Introduction: Why Container Orchestration?
Problem Statement:
As microservices-based applications scale, managing containers across multiple environments manually becomes inefficient and error-prone.
Solution:
Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers.
Key Benefits of Kubernetes Orchestration:
- Automated deployment and scaling
- Self-healing (auto-restarting failed containers)
- Service discovery and DNS
- Load balancing and traffic routing
- Resource optimization and bin-packing
Virtual Machines vs Containers vs Kubernetes
Virtual Machines | Docker Containers | Kubernetes |
---|---|---|
Hardware-level virtualization | OS-level virtualization | Container orchestration |
Heavyweight | Lightweight and fast | Automates container ops |
Boot time: Minutes | Boot time: Seconds | Self-healing, scalable |
Key Insight:
Containers solve the portability problem. Kubernetes solves the scalability and reliability problem of containers in production.
Storage in Kubernetes (Dynamic & CSI)
Problem Statement:
How do we abstract and dynamically provision storage in Kubernetes without being tied to a specific cloud or on-premise provider?
Solution:
- CSI (Container Storage Interface): Plugin architecture enabling support for diverse storage backends.
- Storage Class: Defines provisioner type, reclaim policy, volume type (e.g., gp2, io1).
- PVC (Persistent Volume Claim): Application’s request for storage.
- PV (Persistent Volume): Actual disk resource provisioned.
Flow:
App → PVC → StorageClass + CSI → PV
Reference: https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/
Kubernetes Architecture
Control Plane (Master Node):
- API Server: Entry point for interactions with the Kubernetes cluster. (
kubectl
, CI/CD). Exposes the Kubernetes API, which allows users, administrators, and other components to communicate with the cluster. - Scheduler: Responsible for placing Pods onto suitable worker nodes. It takes into account factors like resource availability, constraints, and optimization goals.
- Controller Manager: Responsible for monitoring the state of various objects in the cluster and taking corrective actions to ensure the desired state is maintained.It includes several built-in controllers, such as the Replication Controller, Deployment Controller, and StatefulSet Controller.
- Etcd: Distributed key-value store that serves as Kubernetes’ backing store for all cluster data, Holds the configuration data and the state of the entire cluster. This includes information about Pods, Services, replication settings, and more.
Together, these components form the master control plane, which acts as the brain and command center of the Kubernetes cluster.
Worker Node (Data Plane):
Worker nodes, also known as worker machines or worker servers, are the heart of a Kubernetes cluster. They are responsible for running containers and executing the actual workloads of your applications.
- Kubelet: The Kubelet is an agent that runs on each worker node and communicates with the master control plane. Its primary responsibility is to ensure that containers within Pods are running and healthy as per the desired state defined in the cluster’s configuration. The Kubelet works closely with the master control plane to start, stop, and manage containers based on Pod specifications.
- Container Runtime: Runs containers (e.g., containerd, Docker). A container runtime is the software responsible for running containers on the worker nodes.
- Kube-Proxy: Network rules and service load balancing. Kube Proxy sets up routing and load balancing so that applications can seamlessly communicate with each other and external resources.
- CSI: Worker nodes need to provide storage for persistent data. Kubernetes supports different storage solutions through Container Storage Interfaces (CSI). These interfaces allow different storage providers to integrate with Kubernetes and offer persistent storage volumes for applications.
Architecture Flow Example:
kubectl run mypod -n dev<br>
Triggers → API Server → Scheduler → Etcd → Node → Kubelet → Container Runtime
Instruction Flow (From YAML to Running Pod)
- YAML applied → API Server receives the desired state
- State saved in etcd
- Scheduler places pod on a node
- Kubelet receives pod spec and starts it using container runtime
- Kube-Proxy configures networking
- Metrics and HPA may scale resources as needed
PODS
Pods are fundamental building blocks in Kubernetes that group one or more containers together and provide a shared environment for them to run within the same network and storage context.
Allows you to colocate containers that need to work closely together within the same network namespace.
They can communicate using localhost and share the same IP address and port space.
Containers within a Pod share the same storage volumes, which allows them to easily exchange data and files.
Volumes are attached to the Pod and can be used by any of the containers within it.
Kubernetes schedules Pods as the smallest deployable unit.
If you want to scale or manage your application, you work with Pod replicas, not individual containers.
A Pod can include init containers, which are containers that run before the main application containers.
Kubernetes High Availability & Failure Scenarios
Component | Failure Impact | Recovery |
---|---|---|
API Server | Cluster becomes unmanageable | Restart or HA deployment |
Etcd | State loss, no new scheduling | Restore from backup, use HA etcd |
Scheduler | No new pods scheduled | Restart scheduler |
Controller Manager | Auto-scaling and replication broken | Restart or recover HA |
Kubelet | Node disconnected, unmonitored pods | Restart kubelet or reboot node |
Kube-Proxy | Service communication broken | Restart kube-proxy |
CoreDNS | DNS lookup failure for services | Restart CoreDNS |
Reference: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
Kubernetes Services
In Kubernetes, Services are a fundamental concept that enables communication and load balancing between different sets of Pods, making your applications easily discover able and resilient.
Why Do We Need Kubernetes Services?
- Provide stable networking for pods
- Enable communication between components
- Expose applications externally
Types of Services:
Cluster IP: The default service type. It provides internal access within the cluster.
NodePort: Opens a port (30000–32767) on each node, allowing external access to services. Make sure to configure security groups accordingly.
LoadBalancer: Distributes incoming traffic across multiple pods, ensuring high availability and better performance.
Ingress: HTTP routing with host/path rules
Network Policies (Ingress & Egress)
Problem Statement:
How do we secure communication between microservices in a Kubernetes cluster?
Use Case: 3-Tier Microservice Architecture
- Frontend ↔ Backend ↔ Database
- Backend requires external API access
Ingress Policy:
- Allow only frontend → backend communication
- Block others from accessing backend
Egress Policy:
- Allow backend to call external APIs
- Deny frontend/database internet access
Secrets & ConfigMaps
Resource | Purpose | Security Level |
Config Map | Store non-sensitive config | Plain text in etcd |
Secret | Store Sensitive Data | Base-64 encoded, more secure |
Practical Use Case:
- Store
DB_USER
,DB_PASS
,DB_PORT
in Secrets/ConfigMaps - Inject into pods via
envFrom
,volumeMounts
Kubernetes CI/CD Integration (Brief Outline)
Problem Statement:
How do we automate builds, tests, and deployments on Kubernetes?
Approach:
- GitOps (ArgoCD, FluxCD)
- Pipelines (Jenkins X, GitHub Actions, GitLab CI)
- Helm or Kustomize for manifest management
- Canary/Blue-Green deployments
How to handle CrashLoopBackOff Error ?
Error Message:
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 CrashLoopBackOff 5 3m
Cause:
Application inside the container is crashing repeatedly.
Missing dependencies, incorrect configuration, or resource limitations.
Fix:
Check logs for error messages:
kubectl logs my-app-5c78f8d6f5-xyz12
Describe the pod for more details:
kubectl describe pod my-app-5c78f8d6f5-xyz12
Fix application errors or adjust resource limits.
How to fix ImagePullBackOff Error ?
Error Message:
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 ImagePullBackOff 0 3m
Cause:
- Image name or tag is incorrect.
- Image doesn’t exist in the container registry.
- Authentication issues with a private registry.
Fix:
- Check pod description:
kubectl describe pod my-app-5c78f8d6f5-xyz12
- Verify the image name and tag in your Deployment YAML:
containers:
- name: my-app
image: myregistry.com/my-app:latest
- If using a private registry, ensure authentication is set up:
kubectl create secret docker-registry regcred \
–docker-server=myregistry.com \
–docker-username=myuser \
–docker-password=mypassword
How to fix Pod Stuck in “Pending” State ?
Error Message:
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 Pending 0 5m
Cause:
- No available nodes with required resources.
- Taints or tolerations preventing scheduling.
- Persistent Volume (PV) not bound to the pod.
Fix:
- Check pod events:
kubectl describe pod my-app-5c78f8d6f5-xyz12
- Check available nodes:
kubectl get nodes
- Verify Persistent Volume Claims:
kubectl get pvc
How to fix Node Not Ready ?
Error Message:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 NotReady <none> 50m v1.27.2
Cause:
- Network issues.
- Kubelet is down.
- Disk pressure or insufficient resources.
Fix:
- Describe node status:
kubectl describe node node-1
- Check Kubelet logs:
journalctl -u kubelet -n 100
- Restart Kubelet:
systemctl restart kubelet
- Ensure the node has enough disk space:
df -h
How to fix Service Not Accessible error?
Error Message:
curl: (7) Failed to connect to my-service port 80: Connection refused
Cause:
- Service is not correctly exposed.
- No running pods backing the service.
- Network policies blocking traffic.
Fix:
- Check service details:
kubectl get svc my-service kubectl describe svc my-servic
- Verify that pods are running:
kubectl get pods -o wide
- If using NodePort, ensure the port is open in the firewall.
How to fix “OOMKilled” (Out of Memory) ?
Error Message:
kubectl get pod my-app-xyz12 -o jsonpath='{.status.containerStatuses[0].state.terminated.reason}'
OOMKilled
Cause:
- Container exceeded memory limits.
Fix:
- Increase memory limits in Deployment YAML:
resources:
limits:
memory: "512Mi"
requests:
memory: "256Mi"
- Monitor memory usage:
kubectl top pod my-app-xyz12
What do you know about kubeconfig file in Kubernetes ?
A file used to configure access to a cluster is called a kubeconfig file. This is the generic way of referring to a configuration file. This doesn’t mean the file name is kubeconfig.
K8s components like kubectl, kubelet, or kube-controller-manager use the kubeconfig file to interact with the K8s API.
The default location of the kubeconfig file is ~/.kube/config
. There are other ways to specify the kubeconfig location, such as the KUBECONFIG
environment variable or the kubectl —kubeconfig
parameter.
The kubeconfig file a YAML file contains groups of clusters, users, and contexts.
- A cluster is a K8s cluster.
- A user is a credential used to interact with the K8s API.
- A context is a combination of cluster and user.Below is the basic template of a kubeconfig file for a kind clusters.
The clusters section lists all clusters that you already connected.
- certificate-authority contains a certificate for the certificate authority (CA) that signed all internal Kubernetes certificates. This can be a file path or a Base64 string of the certificate’s Privacy Enhanced Mail (PEM) format.
- server is the address of the server.
The users section lists all users already used to connect to a cluster. There are some possible keys for the user:
- client-certificate/client-certificate-data contains a certificate for the user signed by the Kubernetes CA. This can be a file path or a Base64 string in the certificate PEM format.
- client-key/client-key-data contains the key that signed the client certificate. This can be a file path or a Base64 string in the key PEM format.
- token contains a token for this user when there is no certificate.
The context section links a user and a cluster and can set a default namespace. The context name is arbitrary, but the user and cluster must be predefined in the kubeconfig file. If the namespace doesn’t exist, commands will fail with an error.
What are Selectors & Labels in Kubernetes?
Services use selectors and labels to identify the Pods they should target. Labels are key-value pairs attached to Pods, and selectors define which Pods the Service should include. For example, a Service with a selector might target all Pods with the label “app=web.”
Can you explain me the Volumes in Kubernetes ?
Volumes are a way to provide persistent storage to containers within Pods. They enable data to be shared and preserved across container restarts, rescheduling, and even Pod failures. Volumes enhance the flexibility and reliability of containerized applications.
1. Types of Volumes: Kubernetes supports various types of volumes to accommodate different storage needs:
- EmptyDir: A temporary storage that’s created when a Pod is assigned to a node and deleted when the Pod is removed or rescheduled.
- HostPath: Mounts a file or directory from the host machine’s filesystem into the container. Useful for testing and development, but not recommended for production due to portability and security concerns.
- PersistentVolumeClaim (PVC): An abstract way to request and manage a persistent storage volume. PVCs can dynamically provision storage based on a predefined storage class.
- ConfigMap and Secret Volumes: Special volumes that allow you to inject ConfigMap or Secret data as files into containers.
- NFS: Network File System volumes allow you to use network-attached storage in your containers.
- CSI (Container Storage Interface) Volumes: A plugin framework that allows storage providers to develop volume plugins without having to modify Kubernetes core code.
Can you explain me the purpose of namespaces in Kubernetes ?
In Kubernetes, namespaces are a way to organize and partition resources within a cluster. They provide a way to create multiple virtual clusters within the same physical cluster, allowing you to separate and manage resources for different teams, projects, or environments.
Namespaces serve several purposes:
- Resource Isolation: Namespaces create isolated environments, so resources in one namespace are distinct from those in another. This helps prevent resource conflicts and accidental interference.
- Resource Management: Namespaces help organize and manage resources more effectively, especially in large and complex deployments.
- Access Control: Namespaces enable fine-grained access control. You can grant different teams or users access only to specific namespaces and the resources within them.
- Tenant Separation: If you’re using Kubernetes to offer services to multiple customers or tenants, namespaces help isolate their resources.
Certain resources, like Nodes and PersistentVolumes, are not tied to a specific namespace and are accessible from all namespaces. Other resources, such as Pods, Services, ConfigMaps, and Secrets, belong to a specific namespace.
Namespaces allow you to implement Role-Based Access Control (RBAC) to manage who can access or modify resources within a namespace. Additionally, you can set resource quotas and limits on namespaces to prevent resource overuse.
What are the Pod Security Policies in Kubernetes?
Pod Security Policies (PSPs) are a critical security feature in Kubernetes that define a set of conditions a pod must meet to be accepted into the cluster. PSPs control aspects like the user a pod runs as, the use of privileged containers, and access to the host’s network and storage. Implementing PSPs helps enforce security standards and prevent potential vulnerabilities.
How does Cluster Health Checks is maintained?
Maintaining the health of a Kubernetes cluster involves regular monitoring and health checks. Kubernetes provides built-in mechanisms like liveness and readiness probes to check the health of individual pods. Tools like Prometheus and Grafana can be used to monitor the overall cluster health, providing insights into resource usage, performance metrics, and potential issues.
What is the role of CRD’s in Kubernetes?
Custom Resource Definitions (CRDs) enable you to extend Kubernetes’ functionality by defining your own custom resources. CRDs allow you to create and manage new types of resources beyond the built-in Kubernetes objects. This extensibility is useful for implementing custom controllers and operators to automate complex workflows and integrations.
CRDs allow you to define your own custom resources, extending the Kubernetes API to fit your specific needs.
What do you mean by Multi-Tenancy in Kubernetes?
Multi-tenancy in Kubernetes involves running multiple tenants (teams, applications, or customers) on a shared cluster while ensuring isolation and security. This can be achieved using namespaces, network policies, and resource quotas to segregate resources and control access. Implementing multi-tenancy enables efficient resource utilization and simplifies management.
What do you understand by Service Mesh in Kubernetes?
A service mesh is a dedicated infrastructure layer for managing service-to-service communication within a Kubernetes cluster. Tools like Istio, Linkerd, and Consul provide features such as traffic management, security, and observability. Implementing a service mesh enhances the reliability, security, and observability of microservices-based applications.
It is mesh of proxy components that run alongside your application (most likely as a sidecar in Kubernetes) that offloads a lot of the networking responsibilities to the proxy. Like Kubernetes, service mesh has a control and a data plane. At a high-level, the control plane exposes an API and coordinates the lifecycle of proxies in the data plane. The data plane in turn manages network calls between services.
The key thing to know here is that the proxy components in a service mesh facilitate communication between services. This is in contrast with other networking components like ingress or API gateways that facilitate networking calls from outside the cluster to internal services. The two most popular implementations of service meshes are Istio (which uses envoy proxy underneath) and Linkerd (which uses Linkerd-proxy)
If Kubernetes already provides automatic service discovery and routing via kube-proxy, you may be wondering why a service mesh is needed. It may even look like a bad design choice to add a proxy per service (or node depending on the implementation) which would add latency and maintenance burden.
To understand the benefits of a service mesh, we have to look at the complexities of running a lot of services at scale. As you add in more services, managing how those services talk to one another seamlessly starts to become challenging. In fact, there’s a lot of “stuff” that needs to happen to make sure everything “works”. These things include:
- Security: encrypting messages in transit (e.g., mTLS), access control (i.e. service-to-service authorization)
- Traffic Management: load balancing, routing, retries, timeouts, traffic splitting, etc
- Observability: tracking latencies/errors/saturation, visualizing service topology, etc
These are features that all your services can benefit from and share, so offloading these capabilities to the proxy layer that sits in between services helps to decouple these “operational” concerns from your business logic.
As a developer, this decoupling of the operational logic and service logic is where you’ll see the most benefit. Most commonly, a service mesh obviates the need to generate self-signed certs and support TLS between services at the code level. Instead you can offload that logic to the service mesh and have that behavior coded at the mesh layer.
What do you understand by AutoScaling in Kubernetes?
Auto-scaling in Kubernetes involves dynamically adjusting the number of pod replicas based on performance metrics like CPU and memory usage. The Horizontal Pod Autoscaler (HPA) automatically scales applications based on these metrics, ensuring optimal resource utilization. Implementing auto-scaling helps maintain performance and handle varying workloads efficiently.
What do you understand by Operator Pattern in Kubernetes?
One of the best parts of the overall Kubernetes design is its extensibility.
Kubernetes allows custom resources to be defined and added via the Kubernetes API. To control these resources via the familiar controller pattern, Kubernetes also exposes custom controllers. In turn, operator pattern combines custom resources and controllers in an operator.
Operators are useful when the existing Kubernetes resources and controllers cannot quite support your application behaviors robustly. For example, you may have a complex application with the following set of features:
- Generates secrets via some distributed key generation protocol.
- Uses consensus algorithm to elect a new leader in case of failover.
- Utilizes nonstandard protocols to communicate to other services.
- Upgrading requires a series of interdependent processes.
Native Kubernetes resources and APIs do not provide sufficient ways to accomplish the above. So in this case, you can elect to define your resources and controller actions (e.g., key generation, leader election, etc) to accomplish what you need.
How does Resource Management works in Kubernetes?
Kubernetes allows you to specify resources that your application needs for each container, under the resourcessection. Most commonly, you will be setting requests and limits for CPU and memory (RAM).
The names are fairly self-explanatory, but the details have subtle consequences for your applications:
- Requests: Requests are used by
kube-scheduler
to determine which node to schedule your pod onto. If you request 2 CPUs and 512 MiB of memory,kube-scheduler
will look for nodes with that much free capacity. When there is excess capacity on the node, the application can use more than the requested resources. - Limits: Limits are used by
kubelet
andcontainer runtime
to enforce resource limits. If your application uses more CPU than the limit, it’ll start to be throttled. Likewise, when the container consumes more memory than the limit, it’ll be terminated by the system kernel with an out of memory (OOM) error.
There are a few more subtle yet important points you need to understand with resources:
- When doing capacity planning, it’s important to understand that not all the CPU and memory on the underlying nodes can be used for your application. Each node needs some reserved capacity to run its operating system, Kubernetes components, and other DaemonSets like monitoring or security agents.
- We can set the application’s quality of service (QoS) by combining requests and limits. For example, if we need guaranteed QoS, we can set requests equal to the limit. On the other hand, for burstable applications, we can set request resources based on average load and give a higher limit for peak load.
- Finding the “ideal” request and limits requires some experimentation. You don’t want to request too many resources and make your application hard to schedule or waste resources. On the other hand, setting the limits too low, you risk your application getting throttled or killed.
How does the Scheduling works in Kubernetes?
Kubernetes has more fine-grained options to help you schedule pods to different nodes. You may wonder why this would matter: shouldn’t Kubernetes just schedule pods to available nodes?
Consider the following use cases:
- You have specific hardware (e.g., instances with GPUs, ARM or Windows nodes, different instance types, or discounted nodes like spot or preemptibles) that you want certain pods to take advantage of.
- You want to spread out your nodes across multiple nodes in different zones or topologies for high-availability. Alternatively, you may want to co-locate certain pods to minimize network latency.
- You need to ensure that certain pods are not scheduled onto the same hardware for compliance or security reasons.
- You want to prioritize scheduling certain pods over others to carry more critical tasks when resources are constrained.
Kubernetes lets you specify the behavior you want to match your use case. Let’s take a look at some ways to influence scheduling decisions:
- The simplest way is to use `nodeSelector`. You can use node labels to tell Kubernetes to only schedule pods onto nodes with those labels.
- If you need more granular control over node selection behavior, you can use the
nodeAffinity
field. You could set soft rules viapreferredDuringSchedulingIgnoredDuringExecution
or hard rules viarequiredDuringSchedulingIgnoredDuringExecution
to evaluate expressions. Common examples include selecting an availability zone or labels. - Alternatively, you can also set scheduling behavior based on pod labels. This can be useful to either co-locate certain pods or make sure pods are not scheduled onto the same node. The syntax works similarly as
nodeAffinity
. - If you want to prevent pods from being scheduled onto certain nodes, you can use
taints
. Only pods with tolerations that match the taint labels are allowed to be scheduled onto those nodes. This can be useful to isolate critical workloads to have access to nodes with specific hardware. For example, you may have a taint on nodes with GPU support so only your machine learning workloads are scheduled and other pods are not taking up resources. - You can also define topology spread constraints to fine-tune high-availability behavior and resource utilization such as max skew (i.e. how unevenly are pods distributed) and topology domains.
- Finally,
PriorityClass
can be defined to tellkube-scheduler
which pods it should prioritize in scheduling. When pods cannot be scheduled, it can also evict lower priority pods to make room for more critical pods.
How does Capacity Planning works in Kubernetes?
Capacity planning for Kubernetes is a critical step to running production workloads on clusters optimized for performance and cost. Given too little resources, Kubernetes may start to throttle CPU or kill pods with out-of-memory (OOM) error. On the other hand, if the pods demand too much, Kubernetes will struggle to allocate new workloads and waste idle resources.
Unfortunately, capacity planning for Kubernetes is not simple. Allocatable resources depend on underlying node type as well as reserved system and Kubernetes components (e.g. OS, kubelet, monitoring agents). Also, the pods require some fine-tuning of resource requests and limits for optimal performance.
One of the first things to understand is that not all the CPU and memory on the Kubernetes nodes can be used for your application. The available resources in each node are divided in the following way:
- Resources reserved for the underlying VM (e.g. operating system, system daemons like sshd, udev)
- Resources need to run Kubernetes (e.g. kubelet, container runtime, kube-proxy)
- Resources for other Kubernetes-related add-ons (e.g. monitoring agents, node problem detector, CNI plugins)
- Resources available for my applications
- Capacity determined by the eviction threshold to prevent system OOMs
Key Considerations for Effective Capacity Planning
To build a resilient and cost-efficient Kubernetes cluster, consider the following practices:
- Right-size pod requests and limits: Use monitoring data to tune CPU/memory requests to actual usage patterns. Avoid both under-requesting (risking throttling or eviction) and over-requesting (wasting resources).
- Node sizing and pool strategy: Use appropriately sized nodes for different workloads. For instance, batch jobs might use spot instances or burstable instances, while critical workloads get placed on high-memory or high-CPU nodes.
- Use vertical and horizontal autoscaling:
- Horizontal Pod Autoscaler (HPA) helps scale pods based on CPU/memory/custom metrics.
- Vertical Pod Autoscaler (VPA) adjusts pod resource requests based on observed usage.
- Monitor and tune system reservations:
Kubernetes useskube-reserved
andsystem-reserved
flags to define system overhead. These can be tuned to prevent unexpected interference from add-ons or OS processes. - Use capacity-aware schedulers: Tools like Karpenter, Cluster Autoscaler, and descheduler can help dynamically optimize node usage based on workload demands and bin-packing efficiency.
- Simulate and test failure scenarios: Ensure that your clusters can handle pod evictions, node failures, or autoscaling events without degrading user experience.