Quick Start Guide - Iterative MapReduce - KMeans Clustering
This guide will take you through running couple of Twister4Azure samples using the Azure local development fabric. In the next step, we’ll explore how to run them in Azure Cloud.
Pre-requisites
- Visual Studio 2010
- Azure SDK latest version of Twister4Azure source code supports version 1.7 of the Azure SDK.
- Download the latest release of Twister4Azure and unzip it.
Running KMeans Clustering Sample Locally
- Start Visual Studio 2010 as administrator and open the “Twister4Azure” solution. Run it (F5).
- Create a blob containers named “kminput”. Download and unzip kminput.zip. Upload the contents of the unzipped “kminput”
folder to the “kminput” container. You can use a freely available third party Azure storage client (eg: CloudBerry explorer for Azure Blob Storage) to perform the above.
- Create a blob containers named “kmcenters”. Download and unzip kmcenters.zip. Upload the contents of the unzipped “kmcenters”
folder to the “kmcenters” container.
- Open “SampleClients” solution in a different Visual studio instance. Make sure “KMeansMRClient” project is selected as the startup project. Go to the project properties of “KMeansMRClient” project and open the Debug tab.
Specify the following in the “Command line arguments” text area. The arguments for KMeans Client are,
<jobID(needs to be unique across different runs)> <inputContainerName>
<outContainerName> <NumberOfReduceTasks> <cluster centers>
<vector length>
Eg: test1 kminput kmoutput 1 kmcenters/centers_400_20 20
Run the client (F5). You’ll be able to monitor the computation using the Twister4Azure monitoring console, which will open in a new browser window.
Running KMeans Clustering Sample in Azure Cloud