MapReduce in C# using Task Parallel Library

Back in August I starting playing with a C# implementation of Google's MapReduce algorithm. The implementation was based on something Stephan Brenner did, although I completely refactored it.

Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.

Counting Words in Files

public static List<KeyValuePair<string, int>> Map(FileInfo document, string text)
var items = text.Split('\n', ' ', '.', ',','\r');
return items.Select(item => new KeyValuePair<string, int>(item, 1)).ToList();

public static List<int> Reduce(string word, List<int> wordCounts)
if (wordCounts == null) return null;

var result = new List<int> { 0 };

foreach (var value in wordCounts)
result[0] += value;

return result;

public static void Main()
var fileSearchData = new Dictionary<FileInfo, string>();
di.GetFiles().ToList().ForEach(f => fileSearchData.Add(f, File.ReadAllText(f.FullName)));

var output = MapReduce.Execute(Map, Reduce, fileSearchData);


Let me know if you have any questions or want to share any implementation concerns.


Anonymous said…
nice dude!

Popular posts from this blog

Deleting Cookies (or Managing Cookie Domains) in ASP.NET

Fluent NHibernate + Encrypting Values

Done button on iOS NumberPad with Xamarin