Introduction: Why Container Orchestration?

Problem Statement:
As microservices-based applications scale, managing containers across multiple environments manually becomes inefficient and error-prone.

Solution:
Container orchestration automates the deployment, scaling, networking, and lifecycle management of containers.

Key Benefits of Kubernetes Orchestration:


Virtual Machines vs Containers vs Kubernetes

Virtual MachinesDocker ContainersKubernetes
Hardware-level virtualizationOS-level virtualizationContainer orchestration
HeavyweightLightweight and fastAutomates container ops
Boot time: MinutesBoot time: SecondsSelf-healing, scalable

Key Insight:
Containers solve the portability problem. Kubernetes solves the scalability and reliability problem of containers in production.


Storage in Kubernetes (Dynamic & CSI)

Problem Statement:
How do we abstract and dynamically provision storage in Kubernetes without being tied to a specific cloud or on-premise provider?

Solution:

Flow:
App → PVC → StorageClass + CSI → PV

Reference: https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/


Kubernetes Architecture

Control Plane (Master Node):

Together, these components form the master control plane, which acts as the brain and command center of the Kubernetes cluster.

Worker Node (Data Plane):

Worker nodes, also known as worker machines or worker servers, are the heart of a Kubernetes cluster. They are responsible for running containers and executing the actual workloads of your applications.

Architecture Flow Example:

kubectl run mypod -n dev<br>

Triggers → API Server → Scheduler → Etcd → Node → Kubelet → Container Runtime

Instruction Flow (From YAML to Running Pod)

  1. YAML applied → API Server receives the desired state
  2. State saved in etcd
  3. Scheduler places pod on a node
  4. Kubelet receives pod spec and starts it using container runtime
  5. Kube-Proxy configures networking
  6. Metrics and HPA may scale resources as needed

PODS

Pods are fundamental building blocks in Kubernetes that group one or more containers together and provide a shared environment for them to run within the same network and storage context.

Allows you to colocate containers that need to work closely together within the same network namespace.

They can communicate using localhost and share the same IP address and port space.

Containers within a Pod share the same storage volumes, which allows them to easily exchange data and files.

Volumes are attached to the Pod and can be used by any of the containers within it.

Kubernetes schedules Pods as the smallest deployable unit.

If you want to scale or manage your application, you work with Pod replicas, not individual containers.

A Pod can include init containers, which are containers that run before the main application containers.


Kubernetes High Availability & Failure Scenarios

ComponentFailure ImpactRecovery
API ServerCluster becomes unmanageableRestart or HA deployment
EtcdState loss, no new schedulingRestore from backup, use HA etcd
SchedulerNo new pods scheduledRestart scheduler
Controller ManagerAuto-scaling and replication brokenRestart or recover HA
KubeletNode disconnected, unmonitored podsRestart kubelet or reboot node
Kube-ProxyService communication brokenRestart kube-proxy
CoreDNSDNS lookup failure for servicesRestart CoreDNS

Reference: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/


Kubernetes Services

In Kubernetes, Services are a fundamental concept that enables communication and load balancing between different sets of Pods, making your applications easily discover able and resilient.

Why Do We Need Kubernetes Services?

Types of Services:

Cluster IP: The default service type. It provides internal access within the cluster.

NodePort: Opens a port (30000–32767) on each node, allowing external access to services. Make sure to configure security groups accordingly.

LoadBalancer: Distributes incoming traffic across multiple pods, ensuring high availability and better performance.

Ingress: HTTP routing with host/path rules


Network Policies (Ingress & Egress)

Problem Statement:
How do we secure communication between microservices in a Kubernetes cluster?

Use Case: 3-Tier Microservice Architecture

Ingress Policy:

Egress Policy:


Secrets & ConfigMaps

ResourcePurposeSecurity Level
Config MapStore non-sensitive configPlain text in etcd
SecretStore Sensitive DataBase-64 encoded, more secure

Practical Use Case:


Kubernetes CI/CD Integration (Brief Outline)

Problem Statement:

How do we automate builds, tests, and deployments on Kubernetes?

Approach:

  1. GitOps (ArgoCD, FluxCD)
  2. Pipelines (Jenkins X, GitHub Actions, GitLab CI)
  3. Helm or Kustomize for manifest management
  4. Canary/Blue-Green deployments

How to handle CrashLoopBackOff Error ?

Error Message:

kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 CrashLoopBackOff 5 3m

Cause:

Application inside the container is crashing repeatedly.
Missing dependencies, incorrect configuration, or resource limitations.
Fix:

Check logs for error messages:

kubectl logs my-app-5c78f8d6f5-xyz12

Describe the pod for more details:

kubectl describe pod my-app-5c78f8d6f5-xyz12

Fix application errors or adjust resource limits.


How to fix ImagePullBackOff Error ?

Error Message:

kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 ImagePullBackOff 0 3m

Cause:

Fix:

kubectl describe pod my-app-5c78f8d6f5-xyz12
containers:
- name: my-app
image: myregistry.com/my-app:latest

kubectl create secret docker-registry regcred \
–docker-server=myregistry.com \
–docker-username=myuser \
–docker-password=mypassword


How to fix Pod Stuck in “Pending” State ?

Error Message:

kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-5c78f8d6f5-xyz12 0/1 Pending 0 5m

Cause:

Fix:

kubectl describe pod my-app-5c78f8d6f5-xyz12
kubectl get nodes
kubectl get pvc

How to fix Node Not Ready ?

Error Message:

kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 NotReady <none> 50m v1.27.2

Cause:

Fix:

kubectl describe node node-1
journalctl -u kubelet -n 100
systemctl restart kubelet

df -h


How to fix Service Not Accessible error?

Error Message:

curl: (7) Failed to connect to my-service port 80: Connection refused

Cause:

Fix:

kubectl get svc my-service kubectl describe svc my-servic
kubectl get pods -o wide

How to fix “OOMKilled” (Out of Memory) ?

Error Message:

kubectl get pod my-app-xyz12 -o jsonpath='{.status.containerStatuses[0].state.terminated.reason}'
OOMKilled

Cause:

Fix:

resources:   
limits:
memory: "512Mi"
requests:
memory: "256Mi"

kubectl top pod my-app-xyz12


What do you know about kubeconfig file in Kubernetes ?

A file used to configure access to a cluster is called a kubeconfig file. This is the generic way of referring to a configuration file. This doesn’t mean the file name is kubeconfig.

K8s components like kubectl, kubelet, or kube-controller-manager use the kubeconfig file to interact with the K8s API.

The default location of the kubeconfig file is ~/.kube/config. There are other ways to specify the kubeconfig location, such as the KUBECONFIG environment variable or the kubectl —kubeconfig parameter.

The kubeconfig file a YAML file contains groups of clusters, users, and contexts.

The clusters section lists all clusters that you already connected.

The users section lists all users already used to connect to a cluster. There are some possible keys for the user:

The context section links a user and a cluster and can set a default namespace. The context name is arbitrary, but the user and cluster must be predefined in the kubeconfig file. If the namespace doesn’t exist, commands will fail with an error.


What are Selectors & Labels in Kubernetes?

Services use selectors and labels to identify the Pods they should target. Labels are key-value pairs attached to Pods, and selectors define which Pods the Service should include. For example, a Service with a selector might target all Pods with the label “app=web.”


Can you explain me the Volumes in Kubernetes ?

Volumes are a way to provide persistent storage to containers within Pods. They enable data to be shared and preserved across container restarts, rescheduling, and even Pod failures. Volumes enhance the flexibility and reliability of containerized applications.

1. Types of Volumes: Kubernetes supports various types of volumes to accommodate different storage needs:


Can you explain me the purpose of namespaces in Kubernetes ?

In Kubernetes, namespaces are a way to organize and partition resources within a cluster. They provide a way to create multiple virtual clusters within the same physical cluster, allowing you to separate and manage resources for different teams, projects, or environments.

Namespaces serve several purposes:

Certain resources, like Nodes and PersistentVolumes, are not tied to a specific namespace and are accessible from all namespaces. Other resources, such as Pods, Services, ConfigMaps, and Secrets, belong to a specific namespace.

Namespaces allow you to implement Role-Based Access Control (RBAC) to manage who can access or modify resources within a namespace. Additionally, you can set resource quotas and limits on namespaces to prevent resource overuse.


What are the Pod Security Policies in Kubernetes?

Pod Security Policies (PSPs) are a critical security feature in Kubernetes that define a set of conditions a pod must meet to be accepted into the cluster. PSPs control aspects like the user a pod runs as, the use of privileged containers, and access to the host’s network and storage. Implementing PSPs helps enforce security standards and prevent potential vulnerabilities.


How does Cluster Health Checks is maintained?

Maintaining the health of a Kubernetes cluster involves regular monitoring and health checks. Kubernetes provides built-in mechanisms like liveness and readiness probes to check the health of individual pods. Tools like Prometheus and Grafana can be used to monitor the overall cluster health, providing insights into resource usage, performance metrics, and potential issues.


What is the role of CRD’s in Kubernetes?

Custom Resource Definitions (CRDs) enable you to extend Kubernetes’ functionality by defining your own custom resources. CRDs allow you to create and manage new types of resources beyond the built-in Kubernetes objects. This extensibility is useful for implementing custom controllers and operators to automate complex workflows and integrations.

CRDs allow you to define your own custom resources, extending the Kubernetes API to fit your specific needs.


What do you mean by Multi-Tenancy in Kubernetes?

Multi-tenancy in Kubernetes involves running multiple tenants (teams, applications, or customers) on a shared cluster while ensuring isolation and security. This can be achieved using namespaces, network policies, and resource quotas to segregate resources and control access. Implementing multi-tenancy enables efficient resource utilization and simplifies management.


What do you understand by Service Mesh in Kubernetes?

A service mesh is a dedicated infrastructure layer for managing service-to-service communication within a Kubernetes cluster. Tools like Istio, Linkerd, and Consul provide features such as traffic management, security, and observability. Implementing a service mesh enhances the reliability, security, and observability of microservices-based applications.

It is mesh of proxy components that run alongside your application (most likely as a sidecar in Kubernetes) that offloads a lot of the networking responsibilities to the proxy. Like Kubernetes, service mesh has a control and a data plane. At a high-level, the control plane exposes an API and coordinates the lifecycle of proxies in the data plane. The data plane in turn manages network calls between services.

The key thing to know here is that the proxy components in a service mesh facilitate communication between services. This is in contrast with other networking components like ingress or API gateways that facilitate networking calls from outside the cluster to internal services. The two most popular implementations of service meshes are Istio (which uses envoy proxy underneath) and Linkerd (which uses Linkerd-proxy)

If Kubernetes already provides automatic service discovery and routing via kube-proxy, you may be wondering why a service mesh is needed. It may even look like a bad design choice to add a proxy per service (or node depending on the implementation) which would add latency and maintenance burden.

To understand the benefits of a service mesh, we have to look at the complexities of running a lot of services at scale. As you add in more services, managing how those services talk to one another seamlessly starts to become challenging. In fact, there’s a lot of “stuff” that needs to happen to make sure everything “works”. These things include:

These are features that all your services can benefit from and share, so offloading these capabilities to the proxy layer that sits in between services helps to decouple these “operational” concerns from your business logic.

As a developer, this decoupling of the operational logic and service logic is where you’ll see the most benefit. Most commonly, a service mesh obviates the need to generate self-signed certs and support TLS between services at the code level. Instead you can offload that logic to the service mesh and have that behavior coded at the mesh layer.


What do you understand by AutoScaling in Kubernetes?

Auto-scaling in Kubernetes involves dynamically adjusting the number of pod replicas based on performance metrics like CPU and memory usage. The Horizontal Pod Autoscaler (HPA) automatically scales applications based on these metrics, ensuring optimal resource utilization. Implementing auto-scaling helps maintain performance and handle varying workloads efficiently.


What do you understand by Operator Pattern in Kubernetes?

One of the best parts of the overall Kubernetes design is its extensibility.

Kubernetes allows custom resources to be defined and added via the Kubernetes API. To control these resources via the familiar controller pattern, Kubernetes also exposes custom controllers. In turn, operator pattern combines custom resources and controllers in an operator.

Operators are useful when the existing Kubernetes resources and controllers cannot quite support your application behaviors robustly. For example, you may have a complex application with the following set of features:

Native Kubernetes resources and APIs do not provide sufficient ways to accomplish the above. So in this case, you can elect to define your resources and controller actions (e.g., key generation, leader election, etc) to accomplish what you need.


How does Resource Management works in Kubernetes?

Kubernetes allows you to specify resources that your application needs for each container, under the resourcessection. Most commonly, you will be setting requests and limits for CPU and memory (RAM).

The names are fairly self-explanatory, but the details have subtle consequences for your applications:

There are a few more subtle yet important points you need to understand with resources:

  1. When doing capacity planning, it’s important to understand that not all the CPU and memory on the underlying nodes can be used for your application. Each node needs some reserved capacity to run its operating system, Kubernetes components, and other DaemonSets like monitoring or security agents.
  2. We can set the application’s quality of service (QoS) by combining requests and limits. For example, if we need guaranteed QoS, we can set requests equal to the limit. On the other hand, for burstable applications, we can set request resources based on average load and give a higher limit for peak load.
  3. Finding the “ideal” request and limits requires some experimentation. You don’t want to request too many resources and make your application hard to schedule or waste resources. On the other hand, setting the limits too low, you risk your application getting throttled or killed.

How does the Scheduling works in Kubernetes?

 Kubernetes has more fine-grained options to help you schedule pods to different nodes. You may wonder why this would matter: shouldn’t Kubernetes just schedule pods to available nodes?

Consider the following use cases:

Kubernetes lets you specify the behavior you want to match your use case. Let’s take a look at some ways to influence scheduling decisions:

  1. The simplest way is to use `nodeSelector`. You can use node labels to tell Kubernetes to only schedule pods onto nodes with those labels.
  2. If you need more granular control over node selection behavior, you can use the nodeAffinity field. You could set soft rules via preferredDuringSchedulingIgnoredDuringExecution or hard rules via requiredDuringSchedulingIgnoredDuringExecution to evaluate expressions. Common examples include selecting an availability zone or labels.
  3. Alternatively, you can also set scheduling behavior based on pod labels. This can be useful to either co-locate certain pods or make sure pods are not scheduled onto the same node. The syntax works similarly as nodeAffinity.
  4. If you want to prevent pods from being scheduled onto certain nodes, you can use taints. Only pods with tolerations that match the taint labels are allowed to be scheduled onto those nodes. This can be useful to isolate critical workloads to have access to nodes with specific hardware. For example, you may have a taint on nodes with GPU support so only your machine learning workloads are scheduled and other pods are not taking up resources.
  5. You can also define topology spread constraints to fine-tune high-availability behavior and resource utilization such as max skew (i.e. how unevenly are pods distributed) and topology domains.
  6. Finally, PriorityClass can be defined to tell kube-scheduler which pods it should prioritize in scheduling. When pods cannot be scheduled, it can also evict lower priority pods to make room for more critical pods.

How does Capacity Planning works in Kubernetes?

Capacity planning for Kubernetes is a critical step to running production workloads on clusters optimized for performance and cost. Given too little resources, Kubernetes may start to throttle CPU or kill pods with out-of-memory (OOM) error. On the other hand, if the pods demand too much, Kubernetes will struggle to allocate new workloads and waste idle resources.

Unfortunately, capacity planning for Kubernetes is not simple. Allocatable resources depend on underlying node type as well as reserved system and Kubernetes components (e.g. OS, kubelet, monitoring agents). Also, the pods require some fine-tuning of resource requests and limits for optimal performance.

One of the first things to understand is that not all the CPU and memory on the Kubernetes nodes can be used for your application. The available resources in each node are divided in the following way:

  1. Resources reserved for the underlying VM (e.g. operating system, system daemons like sshd, udev)
  2. Resources need to run Kubernetes (e.g. kubeletcontainer runtimekube-proxy)
  3. Resources for other Kubernetes-related add-ons (e.g. monitoring agentsnode problem detectorCNI plugins)
  4. Resources available for my applications
  5. Capacity determined by the eviction threshold to prevent system OOMs

Key Considerations for Effective Capacity Planning

To build a resilient and cost-efficient Kubernetes cluster, consider the following practices:




Leave a Reply

Your email address will not be published. Required fields are marked *