Creating VHD disk for MS Azure

Azure is the cloud service offering from Microsoft. One of the many options it gives is to create off-line a disk image to your very own taste, then upload it and run a VM with it. At this point Azure seems unexpectedly friendly to open-source enthusiasts: we know that the disk has to be in VHD format and we have the qemu-img utility which can convert any other popular disk format into VHD, so it should be easy, eh? As usual with Microsoft, it’s not.

The Problem

The main obstacle is the fact that Azure does not take just any VHD. It has match two additional requirements:

(1) Be of whole number of megabytes (why this restriction exists Microsoft never made  clear), and
(2) Its size should translate to a whole number of cylinders.

While (1)  is clearly outlined by Microsoft in Azure’s documentation, they never mention (2). Even worse, because they overlooked (2), their advice how to deal with (1) is wrong (or, more precisely, only works in few rare situations, which still makes it rather useless).

So, why did Microsoft overlook such an important requirement? Probably because the VHD format dates back to the time when they launched Virtual PC. More often than not it was used to run virtualised copies of actual machines – and an actual machine would always come from an actual hard disk, which in turn means that the requirement to have a whole number of cylinders in the definition of the disk would, naturally, always be satisfied. Alas, this is not the case when installing onto a file which emulates the disk (which is atypical scenario when installing OS directly in a VM); its size would rarely translate onto a whole number of cylinders – a fact that, after all, most hypervisors have learned to live with. Most, but not Microsoft’s.

The Remedy

A brief word on disks and their geometry: since early days of computing, each hard disk is described by 3 parameters: the number of disk surfaces (or heads, H) it has; the number of tracks on each surface (or cylinders C); the number of sectors S into which each track is split. By convention each sector is 512 bytes long (with some exceptions in the recent years) and each track is divided into 63 sectors. The number of heads may be up to 16 which, while rarely a fact within physical disks, is now used as a de-facto standard for disk images. With S and H locked at the described values, the only way to get a larger disk is to put more cylinders in it. While in the world of physical hardware the size of a disk is determined by multiplying C, H, S and 512 (bytes), in the world of virtual hardware the size of the disk is divided into 512, S and H, then the remaining value of C is rounded to the next biggest integer. This results in a virtual disk description that is marginally larger (by a few sectors) that the actual size of the virtual disk – just enough to confuse Hyper-V, the Microsoft’s hypervisor.

Solving the problem requires some math: knowing the desired size of the virtual disk, we need to create a disk which is slightly larger so that both requirements (1) and (2) are met.

First, let’s first look at calculating the minimum required number of cylinders. Since we want megabytes M, for starter we convert them into good old bytes: B = M * 1024 * 1024 (don’t trust marketing: in real computing it is always 1024 and never 1000). We next divide these bytes by 512 (bytes per sector), 63 (sectors per cylinder) and 16 (heads). Unfortunately, the resulting C in most cases will not be an integer, so we need to bump it up a little – but how little?

(3) C = M * 128 / 63

We now need to make sure that the whole number of cylinders will yield a whole number of megabytes. A single cylinder has 63 sectors of 512 bytes each multiplied by 16 heads; one megabyte has 1024 * 1024 bytes. Dividing the first to the second we get 128/63 cylinders per megabyte. Since these two numbers don’t have a common denominator, to maintain both the number of cylinders and megabytes integers we actually need to stick to batches of 128 cylinders, each of which will gives us 63 MB of disk space.In other words, we now know that the actual number of cylinders C’ must be divisive by 128 without a reminder. Rounding up an arbitrary N to integer N’ that is divisive by integer Z means that:

(4) N' = Z * (1 + int(N / Z))

Replacing (3) in (4) and setting Z=128, we get

(5) C' = 128 * (1 + int(M / 63))

Now calculating the size in megabytes M’ of the virtual disk which corresponds to this geometry is easy:

(6) M' = C' * 16 * 63 * 512 / (1024 * 1024)

The grand finale is to substitute (5) in (6):

(7) M' = 63 * (1 + int(M / 63))

Bonus Track

One more detail, however, remains: automating the virtual disk creation when using KVM. The most popular tool for unattended installations using KVM is virt-install. For our purposes, however, virt-install has a single, but huge deficiency: it expects the size of the virtual disk to be specified in gigabytes only. Since the size we need will almost never match a whole number of gigabytes, this options becomes useless. We cannot, therefore, rely on virt-install to create our virtual disk; rather we need to create it beforehand ourselves. Luckily, virt-install will use an existing disk if we supply one. To create a sparse disk with a size of “m” megabytes, just call dd with the proper options:

dd if=/dev/zero of=disk.img bs=1 count=0 seek=mM

Being a sparse disk, it will show itself as being “m” megabytes of size, but will, in reality, occupy 0 bytes on the disk in a single inode; the disk will expand as the KVM starts writing to it.

Read the next story on how to uploade a dynamic VHD using MS Azure API.

This entry was posted in Нули и единици. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.