Back in August I starting playing with a C# implementation of Google's MapReduce algorithm. The implementation was based on something Stephan Brenner did, although I completely refactored it.
Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.
Counting Words in Files
public static List<KeyValuePair<string, int>> Map(FileInfo document, string text)
{
var items = text.Split('\n', ' ', '.', ',','\r');
return items.Select(item => new KeyValuePair<string, int>(item, 1)).ToList();
}
public static List<int> Reduce(string word, List<int> wordCounts)
{
if (wordCounts == null) return null;
var result = new List<int> { 0 };
foreach (var value in wordCounts)
{
result[0] += value;
}
return result;
}
public static void Main()
{
var fileSearchData = new Dictionary<FileInfo, string>();
di.GetFiles().ToList().ForEach(f => fileSearchData.Add(f, File.ReadAllText(f.FullName)));
var output = MapReduce.Execute(Map, Reduce, fileSearchData);
Console.WriteLine(output["needle"][0]);
}
Let me know if you have any questions or want to share any implementation concerns.
1 comments:
nice dude!
Post a Comment