Fixing An LVM Issue In Kubernetes On AWS

Fixing An LVM Issue In Kubernetes On AWS

By March 9, 2018Engineering

We have been migrating our legacy infrastructure into Kubernetes. The ability to deploy in seconds makes a huge difference in terms of the number of lean experiments we can take on. Kubernetes gives us a way to manage Docker container’s simplicity at a scale that can handle 1 billion widget events with the corresponding data processing that needs to happen.

While Kubernetes is awesome, it is still relatively new and there are plenty of places to contribute. It all started when our master node went down on AWS – it didn’t come back up. It should have. The nodes on AWS are set up to come back up no problem. This led to my first commit. After digging into the system logs I saw this glaring error:

The disk drive for /mnt/ephemeral is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery

Digging into the source, I found that on AWS, Kubernetes scripts create an LVM volume to store the data.

lvcreate -l 100%FREE --thinpool pool-ephemeral vg-ephemeral

Running the code seemed to work. More than that, the master worked on startup. What was going on? I logged into an existing master and looked for the logical volume. It wasn’t there! The directory was the there, but the volume was not.

Going back to the system logs (from first boot), we see this error on ‘lvcreate’:

Insufficient free space: 3905 extents needed, but only 3897 available

Apparently, this is a problem on lvcreate in general: you can’t use 100%FREE with a thin pool! It will fail. You can see more details in the ticket.

Well, there was no need to use a thinpool for this. We are not overprovisioning the disk in Kubernetes —> my first pull request! I signed the Contributer’s License Agreement and the rest is history.

We now have 4 production Kubernetes clusters with dozens of pods (and restartable master nodes) and are looking forward to the stability and high availability work coming in “Ubernetes”. If you’re interested in solving problems like this we would love to have you join our team!

About ShareThis

ShareThis has unlocked the power of global digital behavior by synthesizing social share, interest, and intent data since 2007. Powered by consumer behavior on over three million global domains, ShareThis observes real-time actions from real people on real digital destinations.

Subscribe to our Newsletter

Get the latest news, tips, and updates

Subscribe

Related Content