13 September 2016

Pragmatic Programmer

Every programming language has its strengths and weaknesses. After having achieved an understanding of the theory of programming languages it becomes easier to adapt and quickly learn new ones, with a bit of practice. There are always going to be some programming languages that are more popular in industry verses others that are more used academically. With a bit of pragmatism one can work out when to use which language and treat it as a dispensable tool. When new programming constructs and dialects evolve, out of research, to reduce the increasing dynamic changes in system and application complexity, it equally becomes paramount to keep one's language skills up-to-date. The below list provides the current popularity in programming languages that are most used in industry and those that could potentially be the next ones on a programmer's radar for learning.

Popular Programming Languages in Industry:

R

Evolving Programming Language Trends:

D

MapR vs Cloudera vs Hortonworks

Distributions Compared

Cloudera
MapR
Hortonworks
Pivotal HD



dezyre
curiousinsight
Four factors for comparing the top Hadoop distributions
comparing hadoop distributions

Certifications Compared

MapR has a more accessible free courseware option and has a less complex pathway to learning. Although, they provide more customizations to their platform. Cloudera pathways are more rigorous and more expensive. But, their certifications are recognized as a pedigree in the big data space. Cloudera also have significant customizations to their commercial product offerings which means a more stable platform. Hortonworks provide flexibility between the developer, administrator, and data analyst. They also cover mostly open source stacks which also means the product offering can be less stable. Also, they provide a full self-paced training but with a premium price tag as material from their essential courses may not be sufficient for a certification study. If one wants to focus on open source choose the Hortonworks pathway. If one wants more rigor and a data scientist pathway choose Cloudera for CCP exam. MapR can offer a developer pathway somewhere in between which also is more easy on the pocket. But, ultimately the employer dictates the appropriate certification choice that one takes for the workplace and the requirements of Hadoop distribution to use/support. In end, it is down to requirements and the value one puts towards such attainment and measure of certifications.

Quick Vocabulary Lesson

Kafka (publish/subscribe messaging system)
Mahout (machine learning)
Hive (map data to structures and use SQL-like queries)
Pig (data transformation language for big data)
Zookeeper (used to manage and administer Hadoop)
Sqoop (extract external sources and load to Hadoop)
Storm (real-time ETL)
Oozie (workflow scheduler)
Avro (data serialization like JSON)
Flume (ingest unstructured data)
Nutch (crawler)
Ambari (provisioning, managing, and monitoring Hadoop)
Chukwa (data collection)
Tez (data-flow framework)
Hama (big data analytics)

Columnar (HBase, Cassandra)
KeyValue (Riak, Redis)
Document (MongoDB, CouchDB)
Graph (Neo4J, Titan)

6 September 2016

Delta Architecture

Delta

Lambda Architecture

Lambda architecture essentially is composed of the batch layer, speed layer, and the serving layer.

Kappa Architecture

Kappa architecture is essentially composed of the speed layer and serving layer. The batch layer becomes a subset of the speed layer.

Apache Oryx

Oryx

5 September 2016

SKOS

SKOS is a very common data model for representing knowledge in form of thesauri or controlled vocabularies which can provide for interlinked knowledge graphs as a form of linked data. SKOS is a lightweight and flexible OWL ontology representation format available in various RDF syntax. OWL on the other hand is an ontology language. It is possible to convert from SKOS to OWL and even to combine them. The below links provide some related tools and libraries for working with SKOS models. 

JSKOS
SKOSAPI
OWLAPI
SKOSEd
OpenSKOS
TemTres
THManager
PoolParty
TopBraid
Thesaurus Master
Lexaurus
Fluent Editor
Intelligent Topic Manager
SKOS2OWL
Protege
SKOSIFY
Poolparty Consistency Checker
KEA
SKOSMOS
SILK

W3C SKOS
SKOS: A Guide for Information Professionals
SKOS Taxonomy
The Accidental Taxonomist
Knowledge Engineering with Semantic Web Technologies
LinkedData Engineering
PoolParty Academy
Gate
Ontotext
Knowledge Extraction
Taxonomy Warehouse
Synaptica