Multi-tenancy in kubernetes cluster

8 minute read

The adoption of containerization and kubernetes is increasing rapidly. Fast provisioning, lightweight, auto scaling, serverless architecture are some of the major benefits of running your application in kubernetes cluster. However, they also bring up new problems to solve. One of the issue is since resources are shared, how to ensure fair utilization of resources and avoid one compromised tenant impacting other. The other and far more concerning issue is security and how to achieve secure isolation of resources between tenants.

What is Multi-tenancy

Multi-tenancy is an architecture paradigm where multiple customers are supported with single instance of application. Each customer is called as tenant of application and can effectively be an individual or an organization. Multi-tenancy is a very compelling value proposition when your system is supporting large number of customers as it avoids maintaining systems individually. To avoid one tenant impacting other, well defined isolation of resource is provided for each tenant. In a multi-tenant cluster, multiple applications are deployed from different tenants. It is the responsibility of the provider to ensure tenants are isolated from each other.

We faced this issue while migrating our platform to kubernetes. A Kubernetes cluster consists of various layers of resources eg. node, namespace, pod and container and thus isolation can be achieved at multiple levels. Default isolation suggested for kubernetes is to separate out each tenant in a different namespace. However, it does bring various important security considerations. Being a multi-tenant platform, it was critical for our business to provide security and isolation. After much consideration, we decided to go for node level isolation. In this article, I will discuss namespace level isolation and node level isolation with their pros and cons.

Namespace based isolation

Most of the kubernetes resources are in a namespace. If you don’t specify namespace for your resource, it will go into default namespace. Namespace is more of a logical entity to represent and manage cluster resources. You can think of it as a virtual cluster within a cluster itself. Namespace based isolation is one approach that can be used to achieve multi-tenancy. The idea is to have each tenant running in a different namespace.

You can create any number of namespaces within your cluster. A namespace can be created like this:

{
  "apiVersion": "v1",
  "kind": "Namespace",
  "metadata": {
    "name": "tenant1",
    "labels": {
      "name": "tenant1"
    }
  }
}

kubectl create -f namespace.yaml

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: tenant1-role
  namespace: tenant1
rules:
  - apiGroups: [""]
    resources: ["pods", "secrets"]
    verbs: ["get", "list", "watch"]

kubectl create -f role.yaml

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: tenant1-role-binding
  namespace: tenant1
subjects:
  - kind: ServiceAccount
    name: tenant1-account
    namespace: tenat1
roleRef:
  kind: Role
  name: tenant1-role
  apiGroup: rbac.authorization.k8s.io

kubectl create -f rolebinding.yaml

Next, you need to create network policy to ensure cross-namespace communication is blocked. By default traffic is allowed between pods with in the cluster, however, you can define policy to block all traffic and then enable explicit communications. Note support of these policies depend on the network plugin used by cluster provider.

This policy will block all the ingress traffic to the pod.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress

kubectl create -f defaultpolicy.yaml

Then you can explicitly specify another network policy to allow traffic within the namespace. Note network policies are additive in nature.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: within-namespace
  namespace: tenant1
spec:
  podSelector:
  ingress:
    - from:
      - NamespaceSelector:
          matchLabels:
            name: tenant1

kubectl create -f namespacepolicy.yaml

You can specify egress policies in the same manner.

By doing all this, we ensure that our resources are separated out for tenant. However there are still some issues. What if one tenant tries to create too many resources, it can lead to other tenants being slowed down or not having enough compute available. Kubernetes support specifying limit ranges and resource quota that can be applied per container or across namespace. For example, this resource quota will ensure that within the namespace, total usage cannot go beyond specified limit.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-demo
spec:
  hard:
    requests.cpu: "1"
    requests.memory: 1Gi
    limits.cpu: "2"
    limits.memory: 2Gi

kubectl apply -f resourcequota.yaml –namespace=tenant1

Is this enough?

The question arises whether doing all this is enough and can be relied on. The fact is kernel is still shared and there can be some vulnerabilities that can lead to user getting access to the node or the containers with in the node. You can look at some of the recent vulnerabilities in docker. This can cause significant impact to the business and should be definitely avoided to have a multi-tenant platform. There are pod security policies that can achieve some of the isolations and securities however they don’t seem enough to counter this. If you are planning to run your own code in cluster rather than custom code or if you are planning to support internal system or multiple teams with in a customer org, you can choose this strategy. That wasn’t the case for us. So, we decided to come up with node level isolation.

Node based isolation

Not all resources in kubernetes are bound within a namespace. Low-level resources such as nodes and persisteny volumes are shared across namespace. The idea is to ensure that pods of a tenant are scheduled on different nodes. This ensure that kernel is shared by containers of same tenant and volume mounts and host kernel is no longer at risk. In this case, we label node with the tenant info using below command.

kubectl label nodes worker1 tenant=tenant1

Now, you can specify that pod should be scheduled to nodes with this label only.

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: some-container
    image: customImage
    imagePullPolicy: Always
  nodeSelector:
    tenant: tenant1

This way you can ensure that your application gets scheduled on the desired node. Additionally, you will need to create additional infrastructure to keep a watch on nodes associated in kubernetes and automatically identify when new nodes are needed to be pooled in. This solution does achieve multi-tenancy however it can lead to lower utilization of resources. We found this approach to suit our purpose given the lack of proper solution of this in kubernetes today. There is still one issue inspite of all this that master nodes are shared. In our case, we were not having much option over there since we were using hosted cluster. However, I think kubernetes should provide more isolation going forward in this regard. However, this served our purpose well.

Why not have cluster per tenant

Cluster per tenant is not really an option comparable to these two. However, there are some instances when you have only few tenants or few teams to manage and you find creating separate cluster for each of these. If this strategy works for you, by all means go for it. This will avoid lot of headache you will have in the previously described scenrios.

The road ahead

As can be seen, both of the issues have their pros and cons. Though namespace isolation provide better utilization of resources, it does lead to potential security issues. On the other hand, node isolation does lead to unefficient resource utilization. There seems a gap in multi-tenacy requirements and solution we have today and various other solutions are ramping up to fill it. Couple of solutions that looks promising are:

gVisor: gVisor is a user-space kernel built by google and used in many google projects. It implements substantial portion of linux system interface including file systems, pipes, signals etc. It also utilizes runsc utility to ensure isolation boundary between the application and the host kernel. Thus gVisor provides a lightweight alternative of virtual machines while having clear separation of resources. There is effort going on in making this work with kubernetes.

Kata Containers: Another interesting concept under development is kata containers. Kata containers are lightweight virtual machines which runs on a dedicated kernel providing isolation of network, memory, I/O etc. They are built on Open Container Initiative (OCI) standards and thus can be directly supported by kubernetes. However, this project is still in early development.

Updated:

Leave a comment