- Published on
Improving the Validation of Static Pods in Kubernetes
Recently I worked on a feature in Kubernetes to better enforce validations for static pods. Static pods aren't allowed to reference other API objects since these pods are managed by the kubelet and doesn't go through the apiserver. Also, we need static pods to run before we can spin up pods for the kube-apiserver itself. At this point, when the apiserver is not even up, it doesn't make sense to reference an API object.
Currently when creating static pods with API references, the corresponding mirror pod would not get created but the static pod container would be started in the node in which the corresponding kubelet is running. This pod wouldn't really work since whatever API references we specify will not work. We need to tighten the validation so that the static pod container is not silently started and running in the node.
These are my devlogs from working on this issue.
Notes from initial research
These are my notes from my initial research about static pods from reading the documentation here.
- Static pods are created by the kubelet daemon on each specific node and doesn't go through the kube-apiserver. Instead of the control plane (read kube-apiserver), kubelet watches the static pods and restarts them if it fails.
- Static pods are bound to the kubelet on a specific node.
- The kubelet creates a mirror Pod on the kube-apiserver for each static pod. So if you exec into the node the kubelet is running on and run
crictl ps
you'd be able to see the container of the static pod running natively in the cluster. - But in order to be able to query this from kubectl, the kubelet creates a "mirror pod" in the kube-apiserver. Each static pod has a corresponding mirror pod. The mirror pod is only for reading and the apiserver doesn't allow you to configure or update the static pod by updating the mirror pod. The names of the mirror pod would have a suffix of
-nodename
. - Since the static pods are created before the apiserver is up, the spec of static pods cannot refer to other API objects like the ServiceAccount, ConfigMap, Secrets or Volumes.
- Static pods doesn't support ephemeral containers.
Logs
- pkg/kubelet/config/common.go seems to be having functions related to processing and validating static pods.
- Specifically the
tryDecodeSinglePod
function seems to be for static pods because pohly had added static pod specific logic in the function in commit 1ec7231. The commit history also suggests the same since logic related to validating names of static pods are added in this function. There is also a generatePodName function which generates the pod names for static pods (which as mentioned above includes the name of the node the pod is scheduled to as well). All of this tells me that all these functions are to deal with static pods specifically. I plan to ping @liggitt about this in my PR.TODO: Confirm this with @liggit.
- Looking at the function itself, it is easy to mistake it for a generic pod function which is not just specific to static pods. The
tryDecodeSinglePod
function's comment says that this function takes a data as an array of bytes and tries to extract Pod config information from it. It returns a v1.Pod object. - Searching for where this function is called from, we can see that its called from
pkg/kubelet/config/file.go
andpkg/kubelet/config/http.go
. The file.go file makes sense since we're reading static pod configs from files. I'm not entirely sure if the http.go file is hit when we send a request to the kube-apiserver to create a pod, when we apply a manifest. I will ask this to Jordan as well. - Apart from these two files,
tryDecodeSinglePod
is called from its tests. The tests are interesting because the Pod Spec is first constructed with the API types and it is converted into the data bytes withruntime.Encode(clientscheme.Codecs.LegacyCodec(v1.SchemeGroupVersion), pod)
.TODO: Come back to this and understand what
clientscheme.codec
is and whatv1.SchemeGroupVersion
means here. - The initial approach I had taken for the PR was to write a new pod admit handler for validating static pods. This admit handler was then added to the list of admit handlers the kubelet would process for each pod during pod admission.
- The Kubelet type has an admitHandlers variable, which is a list of
lifecycle.PodAdmitHandlers
. The PodAdmitHandler is just an interface that implements anAdmit
function, which takes pod attributes and returns a PodAdmitResult. The PodAdmitResult type has a boolean that stores whether the pod was admitted or not, and a reason and a message string if the pod was not admitted. - By writing a new admit handler for static pods, I was able to add the logic for denying admission for static pods in the
Admit
function of the newStaticPodAdmitHandler
. This would return false if the static pod references API objects.
- The Kubelet type has an admitHandlers variable, which is a list of
- After the initial code review from @liggitt, I learnt that the same logic is present in the noderegistration admission controller's admitPodCreate function.
- After spending some time to understand what is going on, I refactored the
admitPodCreate
function as Jordan mentioned, by extracting the common logic to reject pod admission into aHasAPIReferences
function in the pod utils file in pkg/api/pod/util.go. This common function would be used by the noderegistration admission controller, as well as thetryDecodeSinglePod
function we saw above to process static pods. In this way the extra API reference based admission denial logic would only be executed if we're dealing with static pods and would't be run for every single pod unlike the admit handler approach. - Once I implemented this, I got the following error when trying to copy a manifest file into
/etc/kubernetes/manifests
to try create an invalid static pod. The following was seen in the kubelet logs:
May 20 20:35:48 staticpodcluster-control-plane kubelet[720]: E0520 20:35:48.857229 720 file.go:108] "Unable to process watch event" err="can't process config file \"/etc/kubernetes/manifests/test.yaml\": can not create static pods that reference serviceaccounts"
After a few rounds of reviews regarding the different API objects and the overarching switch case statements, the PR was merged into k/k master on June 27th.
Interesting links
- PR from @pohly adding logic to reject static pods which references ResourceClaims (DRA): This PR was really helpful to understand how the existing pod admission denial works.
- [NodeRestriction Admission Controller]: I need to learn more about admission controllers and how they work. I always thought that admission control is something done manually by cluster admins based on their needs. But I learnt that these are code that lives in the api server and can be turned on/off.
- Original issue: https://github.com/kubernetes/kubernetes/issues/103587
- My PR: https://github.com/kubernetes/kubernetes/pull/131837