Mabble Rabble: Scala Data Tools

26 October 2016

Scala Data Tools

A list is provided below of the general mathematics and machine learning data tools that have emerged in Scala aside from the Hadoop and Scala API's for databases.

Algebird: Twitter’s API for abstract algebra that can be used with almost any Big Data API.
Factorie: A toolkit for deployable probabilistic modeling, with a succinct language for creating relational factor graphs, estimating parameters, and performing inference.
Figaro: A toolkit for probabilistic programming.
H2O: A high-performance, in-memory distributed compute engine for data analytics. Written in Java with Scala and R APIs.
Relate: A thin database access layer focused on performance.
ScalaNLP: A suite of Machine Learning and numerical computing libraries. It is an umbrella project for several libraries, including Breeze, for machine learning and numerical computing, and Epic, for statistical parsing and structured prediction.
ScalaStorm: A Scala API for Storm.
Scalding: Twitter’s Scala API around Cascading that popularized Scala as a language for Hadoop programming.
Scoobi: A Scala abstraction layer on top of MapReduce with an API that’s similar to Scalding’s and Spark’s.
Slick: A database access layer developed by Typesafe.
Spark: The emerging standard for distributed computation in Hadoop environments, as well in Mesos clusters and on single machines (“local” mode).
Spire: A numerics library that is intended to be generic, fast, and precise.
Summingbird: Twitter’s API that abstracts computation over Scalding (batch mode) and Storm (event streaming).