Often, when I’m working with my clients, they request that I upload a disk image for them that they will use in Azure. I usually discourage this practice, because it makes standing up VMs more challenging, and very quickly, the uploaded image will be out of date. A preferred deployment method is to use the standard Azure Marketplace images (of which there’s thousands), which are usually kept pretty current, and use a good configuration management system (such as Azure Automation/PowerShell DSC, Chef, Puppet, or other supported tools) to ensure that they are brought into compliance with corporate requirements. That’s an interesting subject, but not for this post.
tl;dr – Azure PowerShell cmdlets
Add-AzureVhd
and
Add-AzureRMVhd
check the file you’re trying to upload and make sure it’s a proper Azure-compatible VHD, calculates a MD5 hash to validate proper file transfer, and detects empty blocks and doesn’t upload them, saving time on the transfer as well as preventing unnecessary work.
However, sometimes this doesn’t work out. Whether it’s an application that’s a real challenge to configure, or the client doesn’t have the time, resources, or culture to create a configuration management infrastructure, it may be unworkable. In those situations, uploading an image is required. Whether you’re coming from VMWare, a physical infrastructure, or Hyper-V, you can upload an image – there are tools to create the required VHD from the platforms other than Hyper-V.
The VHD needs to be uploaded to Azure Storage. Uploading very large files (multiple-gigabyte files) via a web browser is not efficient or reliable, so it’s not possible to do this natively. Usually, a GUI is requested, but the most-efficient way to upload is via PowerShell. With the Azure PowerShell module, there are two commands: Add-AzureVhd and Add-AzureRMVhd which allow you to upload images. Unlike third-party tools, these cmdlets check that the file you’re uploading meet the requirements to be uploaded to Azure:
- VHD, not VHDX – while Server 2012 R2 and above strongly encourages the use of VHDX for a number of excellent reasons, Azure doesn’t support VHDX.
- Disk Type – There’s some good reasons to use dynamic disks in production, but not too many reasons to use differencing disks. Regardless, Azure doesn’t support dynamic or differencing disks, so the tool will convert your disk to a fixed disk if necessary. It’s important to note, however, that empty blocks in a fixed disk aren’t counted to your storage consumption if you’re using Standard Storage.
- Empty Blocks – if the disk you’re uploading is fixed, Azure will check the disk for unused space, and won’t transfer that (null) data.
Additionally, the cmdlets will calculate a MD5 hash of the file to ensure that the file is successfully transferred. This is important, as the transfer necessarily traverses the Internet and may, for a variety of reasons, miss some data. By comparing the MD5 hash of the uploaded file and the original file, it is possible to determine whether the upload has been successful.
Usually, using a GUI tool results in simple errors being made, which result in a lot of lost time waiting for the upload before determining it failed – for example, uploading as a block blob, rather than page, or as a dynamic VHD. Additional time-savings are gained by scanning for empty data blocks.
Using the cmdlet is pretty simple:
$rgName = “My-New-Resource-Group”
$myVHD = C:\ClusterStorage\Volume2\VHDs\Server.vhd
Add-Azure-RmVhd -ResourceGroupName $rgName -Destination https://coolazurestorage.blob.core.windows.net/vhds/Server.vhd -LocalFilePath $myVHD
MD5 hash is being calculated for the file C:\ClusterStorage\Volume2\VHDs\Server.vhd
MD5 hash calculation is completed.
Elapsed time for the operation: 00:26:44
Creating new page blob of size 73274491392…
Detecting the empty data blocks in the local file.
Detecting the empty data blocks completed.
Mike Morgan
You described how to upload, but completely ignored the primary subject; how to make VHD uploads faster. Currently uploading a VHD runs at about 14.58 MB/s. And this is from a VM running in Azure. Uploading a 300GB file takes nearly six hours. Regardless of what they claim, they DO send the zeros in the file. The VHD file contains only 4GB of data, the rest was empty.