2010-09-27

MapReduce in C# using Task Parallel Library

Back in August I starting playing with a C# implementation of Google's MapReduce algorithm. The implementation was based on something Stephan Brenner did, although I completely refactored it.

Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.

Counting Words in Files


public static List<KeyValuePair<string, int>> Map(FileInfo document, string text)
{
var items = text.Split('\n', ' ', '.', ',','\r');
return items.Select(item => new KeyValuePair<string, int>(item, 1)).ToList();
}


public static List<int> Reduce(string word, List<int> wordCounts)
{
if (wordCounts == null) return null;

var result = new List<int> { 0 };

foreach (var value in wordCounts)
{
result[0] += value;
}

return result;
}

public static void Main()
{
var fileSearchData = new Dictionary<FileInfo, string>();
di.GetFiles().ToList().ForEach(f => fileSearchData.Add(f, File.ReadAllText(f.FullName)));

var output = MapReduce.Execute(Map, Reduce, fileSearchData);

Console.WriteLine(output["needle"][0]);
}


Let me know if you have any questions or want to share any implementation concerns.