MapReduce in C# using Task Parallel Library

September 27, 2010

Back in August I starting playing with a C# implementation of Google's MapReduce algorithm. The implementation was based on something Stephan Brenner did, although I completely refactored it.

Today I added a little bit of logic to split up the actual execution of Map & Reduce in this implementation using the Task Parallel Library in .NET 4.0. Check out the source code for MapReduce in C# on GitHub. Below is an excerpt from the Tests on how to implement the library.

Counting Words in Files


public static List<KeyValuePair<string, int>> Map(FileInfo document, string text)
{
    var items = text.Split('\n', ' ', '.', ',','\r');
    return items.Select(item => new KeyValuePair<string, int>(item, 1)).ToList();
}


public static List<int> Reduce(string word, List<int> wordCounts)
{
    if (wordCounts == null) return null;

    var result = new List<int> { 0 };
            
    foreach (var value in wordCounts)
    {
        result[0] += value;
    }

    return result;
}

public static void Main()
{
    var fileSearchData = new Dictionary<FileInfo, string>();
    di.GetFiles().ToList().ForEach(f => fileSearchData.Add(f, File.ReadAllText(f.FullName)));

    var output = MapReduce.Execute(Map, Reduce, fileSearchData);

    Console.WriteLine(output["needle"][0]);
}

Let me know if you have any questions or want to share any implementation concerns.

Comments

Anonymous said…

nice dude!

October 10, 2010 at 10:34 PM

Encoding Kockerbeck

MapReduce in C# using Task Parallel Library

Comments

Popular posts from this blog

Fluent NHibernate - Incorrect syntax near the keyword 'Group'

Done button on iOS NumberPad with Xamarin

Fluent NHibernate + Encrypting Values