17 May 2016

Engine Paradigms & Systems

Paradigm
System
Explanation
MapReduceHadoopSmall recoverable code tasks, sequential tasks inside map and reduce functions
Dryad/NepheleTezExtends the mapreduce model to DAGs model, backtracking based recovery
PACTsFlinkEmbeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs
RDDsSPARKFunctional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API
Engine Comparison
Hadoop
Tez
Spark
Flink
APImapreduce on
k/v pairs
k/v pairs readers/writerstransformation
on k/v pair collections
iterative transformation
on collections
ParadigmmapreduceDAGRDDCyclic Dataflows
Optimizationnonenoneoptimization
of
SQL
queries
Optimization
in all APIs
Executionbatch sortingbatch sorting and partitioningbatch with memory pinningstream with
out-of-core algorithms
Courtesy of Apache Flink