We have been migrating our legacy infrastructure into Kubernetes. The ability to deploy in seconds makes a huge difference in terms of the number of lean experiments we can take on. Kubernetes gives us a way to manage Docker container’s simplicity at a scale that can handle 1 billion widget events with the corresponding data processing that needs to happen.
While Kubernetes is awesome, it is still relatively new and there are plenty of places to contribute. It all started when our master node went down on AWS – it didn’t come back up. It should have. The nodes on AWS are set up to come back up no problem. This led to my first commit. After digging into the system logs I saw this glaring error:
The disk drive for /mnt/ephemeral is not ready yet or not present.
keys:Continue to wait, or Press S to skip mounting or M for manual recovery
Digging into the source, I found that on AWS, Kubernetes scripts create an LVM volume to store the data.
lvcreate -l 100%FREE --thinpool pool-ephemeral vg-ephemeral
Running the code seemed to work. More than that, the master worked on startup. What was going on? I logged into an existing master and looked for the logical volume. It wasn’t there! The directory was the there, but the volume was not.
Going back to the system logs (from first boot), we see this error on ‘lvcreate’:
Insufficient free space: 3905 extents needed, but only 3897 available
Apparently, this is a problem on lvcreate in general: you can’t use 100%FREE with a thin pool! It will fail. You can see more details in the ticket.
Well, there was no need to use a thinpool for this. We are not overprovisioning the disk in Kubernetes —> my first pull request! I signed the Contributer’s License Agreement and the rest is history.
We now have 4 production Kubernetes clusters with dozens of pods (and restartable master nodes) and are looking forward to the stability and high availability work coming in “Ubernetes”. If you’re interested in solving problems like this we would love to have you join our team!
