Bare metal load balancer for Kubernetes cluster

Bare metal load balancer for Kubernetes cluster
Photo by Adrien Olichon / Unsplash

Once an application is deployed in the Kubernetes cluster, we definitely want to make it reachable. We can define the visibility of an application i.e., to be seen externally, from outside of the cluster, or internally, so within the cluster. Whatever option we choose to expose, we always have to define a Service.

In Kubernetes, a Service is an abstraction which defines a logical set of Pods and a policy by which to access them (sometimes this pattern is called a micro-service). The set of Pods targeted by a Service is usually determined by a selector. For example, consider a stateless image-processing backend which is running with 3 replicas. Those replicas are fungible—frontends do not care which backend they use. While the actual Pods that compose the backend set may change, the frontend clients should not need to be aware of that, nor should they need to keep track of the set of backends themselves. The Service abstraction enables this decoupling.

Kubernetes gives us control over how Service is published using different Service types. Currently, the following values are foreseen:

  • ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default that is used if we don't explicitly specify a type for a Service. You can expose the service to the public with an Ingress or Gateway API.
  • NodePort: Exposes the Service on each Node's IP at a static port (the NodePort). To make the node port available, Kubernetes sets up a cluster IP address, the same as if we had requested a Service of type: ClusterIP.
  • LoadBalancer: Exposes the Service externally using a cloud provider's load balancer.
  • ExternalName: Maps the Service to the contents of the externalName field (e.g., by returning a CNAME record with its value. No proxying of any kind is set up.

On bare metal deployments, we can use NodePort to expose Service to the external world. Then we can set up a rule in the router to forward traffic to the designated IP address of the node. However, this approach has flaws. Once the node goes down, the IP address becomes unreachable and we have to change the forwarding rule manually to point to the other node. We can approach this issue in different ways. We could probably either update IP addresses dynamically on the router or share the same IP address amongst multiple nodes. Actually, both methods are feasible and implemented by MetalLB.


MetalLB, bare metal load-balancer for Kubernetes

Kubernetes does not implement network load balancers for bare-metal clusters. The implementations of network load balancers that Kubernetes ships are all dedicated to various IaaS platforms (i.e. GCP, AWS, Azure). If we’re not running on a supported IaaS platform, LoadBalancer will remain in the “pending” state indefinitely when created. MetalLB offers a network load balancer implementation that integrates with standard network equipment so that external services on bare-metal clusters just work.

Note that MetalLB is a young project. It should be treated as a beta system.

MetalLB hooks into the Kubernetes cluster and allows us to create services of the type LoadBalancer in the cluster. It has two features that work together to provide this service: address allocation, and external announcement. After MetalLB has assigned an external IP address to a service, it needs to make the network beyond the cluster aware that the IP “lives”. MetalLB uses standard networking or routing protocols to achieve this, depending on which mode is used: ARP, NDP, or BGP.

In layer 2 mode, one machine in the cluster takes ownership of the service and uses standard address discovery protocols (ARP for IPv4, NDP for IPv6) to make those IPs reachable on the local network. From the LAN’s point of view, the announcing machine simply has multiple IP addresses.

In BGP mode, all machines in the cluster establish BGP peering sessions with nearby routers that you control and tell those routers how to forward traffic to the service IPs. Using BGP allows for true load balancing across multiple nodes, and fine-grained traffic control thanks to BGP’s policy mechanisms.

I my opinion the easiest approach is to use layer 2 mode as this doesn't require any special features enabled in the home router, but just setting the correct port forwarding rule. We will not get a load-balancing effect but at least the cluster will be resilient.

Installation and configuration

I will skip the setting of the forwarding rule part as this differs amongst different routers and focus on the installation and configuration part.
Firstly we have to install MetalLB in the cluster. The most convenient way of doing that is using Helm Chart. Let's add a new repository issuing the following command.

❯ helm repo add metallb

Once the new repository is added, we can install it by running the command.

❯ helm install metallb metallb/metallb --create-namespace --namespace metallb

Now new Pods will be created in the cluster. By default, multiple speakers will be deployed, one on each node and just a single controller.

❯ kubectl get pods
NAME                                  READY   STATUS    RESTARTS      AGE
metallb-controller-7f6b8b7fdd-dxw4s   1/1     Running   1 (34d ago)   34d
metallb-speaker-h9ncd                 1/1     Running   0             33d
metallb-speaker-jzmct                 1/1     Running   0             33d
metallb-speaker-mbcv8                 1/1     Running   0             34d
metallb-speaker-r6hkh                 1/1     Running   0             34d
metallb-speaker-w65t7                 1/1     Running   0             34d
metallb-speaker-zndz8                 1/1     Running   0             34d

Once they are up and running, we can start with the configuration. We have to define IPAddressPool that can be used by LoadBalancer.

kind: IPAddressPool
  name: main-pool
  namespace: metallb

We can define a single range, multiple ranges, CIDRs, or a mixture of everything. In my case, I defined a single range with just a single IP address. To finalize the configuration one last step is needed. We have to create L2Advertisement to make the router aware of the fact the IP addresses from the pool are actually alive.

kind: L2Advertisement
  name: l2-main-pool
  namespace: metallb
  - main-pool

When creating L2Advertisement we have to point to pools we want to advertise. In my case, it is just a single pool with one IP address.

Using LoadBalancer type

Now we are ready to create a Service with type: LoadBalancer. I deployed NGINX as a reverse proxy and Ingress controller and it is bound to the IP address defined in the pool.

apiVersion: v1
kind: Service
  name: ingress-nginx-controller
  namespace: ingress-nginx
  allocateLoadBalancerNodePorts: true
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  - IPv4
  ipFamilyPolicy: SingleStack
  - appProtocol: http
    name: http
    nodePort: 30384
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    nodePort: 32623
    port: 443
    protocol: TCP
    targetPort: https
  selector: controller ingress-nginx ingress-nginx
  sessionAffinity: None
  type: LoadBalancer

When listing Services we can see their types as well as assigned IPs.

❯ kubectl get service
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                                   AGE
ingress-nginx-controller             LoadBalancer   80:30384/TCP,443:32623/TCP                34d
ingress-nginx-controller-admission   ClusterIP   <none>           443/TCP                                   34d

ingress-nginx-controller has an external IP address assigned while ingress-nginx-controller-admission doesn't have one at all. It has just cluster IP.

Keep in mind that if you want to have multiple Services using type: LoadBalancer then it is required to have multiple IP addresses in the pool. If you don't provide them, then the Service will be stuck in a creating state and waiting for IP assignment.

Happy building your own cluster! Cheers!