Paradigm
|
System
|
Explanation
|
---|---|---|
MapReduce | Hadoop | Small recoverable code tasks, sequential tasks inside map and reduce functions |
Dryad/Nephele | Tez | Extends the mapreduce model to DAGs model, backtracking based recovery |
PACTs | Flink | Embeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs |
RDDs | SPARK | Functional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API |
Engine Comparison
|
Hadoop
|
Tez
|
Spark
|
Flink
|
---|---|---|---|---|
API | mapreduce on k/v pairs | k/v pairs readers/writers | transformation on k/v pair collections | iterative transformation on collections |
Paradigm | mapreduce | DAG | RDD | Cyclic Dataflows |
Optimization | none | none | optimization of SQL queries | Optimization in all APIs |
Execution | batch sorting | batch sorting and partitioning | batch with memory pinning | stream with out-of-core algorithms |
Courtesy of Apache Flink