TLDR - Map Reduce (2004)

Table of contents

paper

map reduce is a programming model and associated implementation to process large data sets. this was inspired from map-reduce functionality in the functional programming languages.

Programming Model

def map(key:str, value:str):
	# key: document name
	# value: document content
	for word in value:
		emit_intermediate(word, 1)

def reduce(key:str, values:[Iterator]):
	# key: a word from map
	# values: list of aggregated values from intermediate step
	# count of word
	freq = {}
	for k, vals in values.items():
		for v in vals:
			freq[k] = freq.get(k, 0) + int(v)
		emit(freq[k])

Untitled

How it works

Fault Tolerance:

Worker Failures

Master Failure

Map-Reduce is just a programming model, supported by a master-worker style architecture. There are multiple use cases and enhancements done over this.

Advantages:

Tags: #distributed systems