Mabble Rabble: September 2016

30 September 2016

24 September 2016

22 September 2016

Coursera Specializations for BigData

Machine Learning
Data Science
Data Science at Scale
Data Mining
Cloud Computing
Big Data
Recommender Systems
Functional Programming in Scala
Applied Data Science with Python
Data Structures and Algorithms
Business Analytics

15 September 2016

14 September 2016

HTTP 2.0

http2
HTTP/2
getting ready for http2
hpbn
ietf
http2 a new excerpt

13 September 2016

Every programming language has its strengths and weaknesses. After having achieved an understanding of the theory of programming languages it becomes easier to adapt and quickly learn new ones, with a bit of practice. There are always going to be some programming languages that are more popular in industry verses others that are more used academically. With a bit of pragmatism one can work out when to use which language and treat it as a dispensable tool. When new programming constructs and dialects evolve, out of research, to reduce the increasing dynamic changes in system and application complexity, it equally becomes paramount to keep one's language skills up-to-date. The below list provides the current popularity in programming languages that are most used in industry and those that could potentially be the next ones on a programmer's radar for learning.

Popular Programming Languages in Industry:

Scala
Groovy

Clojure
Objective-C
C++

Evolving Programming Language Trends:

Rust (Mozilla)

Go (Google)
Swift (Apple)
Julia
Erlang
Haskell
Hack (Facebook)
Kotlin
Ceylon
Dart (Google)
Prolog
Idris

Elixir

Lua

Wolfram

Elm

Code Katas

MapR vs Cloudera vs Hortonworks

Distributions Compared

Cloudera
MapR
Hortonworks
Pivotal HD

dezyre
curiousinsight
Four factors for comparing the top Hadoop distributions
comparing hadoop distributions

Certifications Compared

MapR has a more accessible free courseware option and has a less complex pathway to learning. Although, they provide more customizations to their platform. Cloudera pathways are more rigorous and more expensive. But, their certifications are recognized as a pedigree in the big data space. Cloudera also have significant customizations to their commercial product offerings which means a more stable platform. Hortonworks provide flexibility between the developer, administrator, and data analyst. They also cover mostly open source stacks which also means the product offering can be less stable. Also, they provide a full self-paced training but with a premium price tag as material from their essential courses may not be sufficient for a certification study. If one wants to focus on open source choose the Hortonworks pathway. If one wants more rigor and a data scientist pathway choose Cloudera for CCP exam. MapR can offer a developer pathway somewhere in between which also is more easy on the pocket. But, ultimately the employer dictates the appropriate certification choice that one takes for the workplace and the requirements of Hadoop distribution to use/support. In end, it is down to requirements and the value one puts towards such attainment and measure of certifications.

Quick Vocabulary Lesson

Kafka (publish/subscribe messaging system)
Mahout (machine learning)
Hive (map data to structures and use SQL-like queries)
Pig (data transformation language for big data)
Zookeeper (used to manage and administer Hadoop)
Sqoop (extract external sources and load to Hadoop)
Storm (real-time ETL)
Oozie (workflow scheduler)
Avro (data serialization like JSON)
Flume (ingest unstructured data)
Nutch (crawler)
Ambari (provisioning, managing, and monitoring Hadoop)
Chukwa (data collection)
Tez (data-flow framework)
Hama (big data analytics)

Columnar (HBase, Cassandra)
KeyValue (Riak, Redis)
Document (MongoDB, CouchDB)
Graph (Neo4J, Titan)

11 September 2016

10 September 2016

PoolParty Academy

PoolParty appears to be providing various self-study certifications for semantic web and linked data which is available on their website under the academy section.

Semantic Web Associate
Knowledge Engineering Specialist
Semantic Integration Expert

Active Programming Languages

the 9 best languages for crunching data
ten top languages for crunching big data
List of markup languages
big list 256 programming
TIOBE
List of programming languages
PYPL
redmonk programming languages

7 September 2016

Timeseries Platforms

InfluxData
Prometheus
OpenTSDB
Graphite + StatsD + CollectD
Druid
Elastic
Axibase
KairosDB
RRDTool

6 September 2016

Delta Architecture

Delta

Lambda Architecture

Lambda architecture essentially is composed of the batch layer, speed layer, and the serving layer.

Kappa Architecture

Kappa architecture is essentially composed of the speed layer and serving layer. The batch layer becomes a subset of the speed layer.

Apache Oryx

Oryx

5 September 2016

SKOS

SKOS is a very common data model for representing knowledge in form of thesauri or controlled vocabularies which can provide for interlinked knowledge graphs as a form of linked data. SKOS is a lightweight and flexible OWL ontology representation format available in various RDF syntax. OWL on the other hand is an ontology language. It is possible to convert from SKOS to OWL and even to combine them. The below links provide some related tools and libraries for working with SKOS models.

JSKOS
SKOSAPI
OWLAPI
SKOSEd
OpenSKOS
TemTres
THManager
PoolParty
TopBraid
Thesaurus Master
Lexaurus
Fluent Editor
Intelligent Topic Manager
SKOS2OWL
Protege
SKOSIFY
Poolparty Consistency Checker
KEA
SKOSMOS
SILK

W3C SKOS
SKOS: A Guide for Information Professionals
SKOS Taxonomy
The Accidental Taxonomist
Knowledge Engineering with Semantic Web Technologies
LinkedData Engineering
PoolParty Academy
Gate
Ontotext
Knowledge Extraction
Taxonomy Warehouse
Synaptica

3 September 2016

Machine Learning Mindmap

Scikitlearn

Weka

2 September 2016

KeyValue Databases

Hazelcast
Infinispan
LevelDB
MapDB
Memcached
Redis
Riak
RocksDB
Voldemort
Dynamo

1 September 2016

Distributed SQL Query Engines for BigData

Presto
Drill
Hive
Pig
Impala
HAWQ
Tajo
SPARKSQL
BigQuery
Ignite
Kylin
Phoenix
Jaql
MRQL

Mabble Rabble