MapReduce in C# using Task Parallel Library
Back in August I starting playing with a C# implementation of Google's MapReduce algorithm. The implementation was based on something Stephan Brenner did, although I completely refactored it.
Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.
Counting Words in Files
Let me know if you have any questions or want to share any implementation concerns.
Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.
Counting Words in Files
public static List<KeyValuePair<string, int>> Map(FileInfo document, string text)
{
var items = text.Split('\n', ' ', '.', ',','\r');
return items.Select(item => new KeyValuePair<string, int>(item, 1)).ToList();
}
public static List<int> Reduce(string word, List<int> wordCounts)
{
if (wordCounts == null) return null;
var result = new List<int> { 0 };
foreach (var value in wordCounts)
{
result[0] += value;
}
return result;
}
public static void Main()
{
var fileSearchData = new Dictionary<FileInfo, string>();
di.GetFiles().ToList().ForEach(f => fileSearchData.Add(f, File.ReadAllText(f.FullName)));
var output = MapReduce.Execute(Map, Reduce, fileSearchData);
Console.WriteLine(output["needle"][0]);
}
Let me know if you have any questions or want to share any implementation concerns.
Comments