Mabble Rabble: Big Data Graph Processing

7 September 2014

Big Data Graph Processing

The web with its many hyperlinked documents is a massive graph network for interlinks. Such links provide big data complexities for processing. There are many use cases for where graph processing becomes essential from contextual ads to social network analysis to even linked data. Processing such graphs in the large still remains a challenge even with its many data forms. However, graph processing from standard graph theory and network science has provided many advances for Big Data. The functional programming approaches have also facilitated more robust solutions. In OLTP, it is about the processing low-latency of workloads for accessing small portions of graphs. In OLAP, it is about batch processing workloads for accessing large portions of graphs. A graph can be stored in a specific graph database or even a column store such as Accumulo or Cassandra. They can even be stored on the HDFS. Real-time processing of graphs is also a challenge. In general, standard NoSQL stores will be able to cope with limited lookups and small number of traversals at scale. For complex traversals over the Web of Data, it would require alternative and even combined approaches for scalable batch processing in a distributed way. The below provide some options for frameworks in the big data graph processing.

Giraph
Cassovary
Drill
Impala
JUNG
SNAP
Shark
Hama
GraphX
Titan / Faunas
GraphLab / GraphChi