Learn MapR
MapR Resources
30 September 2016
24 September 2016
Spark vs Flink
Labels:
big data
,
data science
,
distributed systems
,
event-driven
,
flink
,
hadoop
,
Java
,
machine learning
,
scala
,
spark
22 September 2016
Coursera Specializations for BigData
Machine Learning
Data Science
Data Science at Scale
Data Mining
Cloud Computing
Big Data
Recommender Systems
Functional Programming in Scala
Applied Data Science with Python
Data Structures and Algorithms
Business Analytics
Data Science
Data Science at Scale
Data Mining
Cloud Computing
Big Data
Recommender Systems
Functional Programming in Scala
Applied Data Science with Python
Data Structures and Algorithms
Business Analytics
Labels:
big data
,
data science
,
machine learning
,
python
,
R
,
recommender
,
scala
,
text analytics
15 September 2016
14 September 2016
13 September 2016
Pragmatic Programmer
Every programming language has its strengths and weaknesses. After having achieved an understanding of the theory of programming languages it becomes easier to adapt and quickly learn new ones, with a bit of practice. There are always going to be some programming languages that are more popular in industry verses others that are more used academically. With a bit of pragmatism one can work out when to use which language and treat it as a dispensable tool. When new programming constructs and dialects evolve, out of research, to reduce the increasing dynamic changes in system and application complexity, it equally becomes paramount to keep one's language skills up-to-date. The below list provides the current popularity in programming languages that are most used in industry and those that could potentially be the next ones on a programmer's radar for learning.
Popular Programming Languages in Industry:
Evolving Programming Language Trends:
MapR vs Cloudera vs Hortonworks
Distributions Compared
Cloudera
MapR
Hortonworks
Pivotal HD
dezyre
curiousinsight
Four factors for comparing the top Hadoop distributions
comparing hadoop distributions
Certifications Compared
Quick Vocabulary Lesson
Kafka (publish/subscribe messaging system)
Mahout (machine learning)
Hive (map data to structures and use SQL-like queries)
Pig (data transformation language for big data)
Zookeeper (used to manage and administer Hadoop)
Sqoop (extract external sources and load to Hadoop)
Storm (real-time ETL)
Oozie (workflow scheduler)
Avro (data serialization like JSON)
Flume (ingest unstructured data)
Nutch (crawler)
Ambari (provisioning, managing, and monitoring Hadoop)
Chukwa (data collection)
Tez (data-flow framework)
Hama (big data analytics)
Columnar (HBase, Cassandra)
KeyValue (Riak, Redis)
Document (MongoDB, CouchDB)
Graph (Neo4J, Titan)
Cloudera
MapR
Hortonworks
Pivotal HD
dezyre
curiousinsight
Four factors for comparing the top Hadoop distributions
comparing hadoop distributions
Certifications Compared
MapR has a more accessible free courseware option and has a less complex pathway to learning. Although, they provide more customizations to their platform. Cloudera pathways are more rigorous and more expensive. But, their certifications are recognized as a pedigree in the big data space. Cloudera also have significant customizations to their commercial product offerings which means a more stable platform. Hortonworks provide flexibility between the developer, administrator, and data analyst. They also cover mostly open source stacks which also means the product offering can be less stable. Also, they provide a full self-paced training but with a premium price tag as material from their essential courses may not be sufficient for a certification study. If one wants to focus on open source choose the Hortonworks pathway. If one wants more rigor and a data scientist pathway choose Cloudera for CCP exam. MapR can offer a developer pathway somewhere in between which also is more easy on the pocket. But, ultimately the employer dictates the appropriate certification choice that one takes for the workplace and the requirements of Hadoop distribution to use/support. In end, it is down to requirements and the value one puts towards such attainment and measure of certifications.
Quick Vocabulary Lesson
Kafka (publish/subscribe messaging system)
Mahout (machine learning)
Hive (map data to structures and use SQL-like queries)
Pig (data transformation language for big data)
Zookeeper (used to manage and administer Hadoop)
Sqoop (extract external sources and load to Hadoop)
Storm (real-time ETL)
Oozie (workflow scheduler)
Avro (data serialization like JSON)
Flume (ingest unstructured data)
Nutch (crawler)
Ambari (provisioning, managing, and monitoring Hadoop)
Chukwa (data collection)
Tez (data-flow framework)
Hama (big data analytics)
Columnar (HBase, Cassandra)
KeyValue (Riak, Redis)
Document (MongoDB, CouchDB)
Graph (Neo4J, Titan)
11 September 2016
Kafka Ecosystem
Kafka Ecosystem
kafka Clients
Burrow
running a multi broker apache kafka cluster on a single node
apache kafka training deck and tutorial
streaming architecture using apache kafka mapr streams
Learning Apache Kafka
Apache Kafka
Apache Kafka Cookbook
Kafka The Definitive Guide
Kafka Documentation
Confluent
kafka Clients
Burrow
running a multi broker apache kafka cluster on a single node
apache kafka training deck and tutorial
streaming architecture using apache kafka mapr streams
Learning Apache Kafka
Apache Kafka
Apache Kafka Cookbook
Kafka The Definitive Guide
Kafka Documentation
Confluent
10 September 2016
PoolParty Academy
PoolParty appears to be providing various self-study certifications for semantic web and linked data which is available on their website under the academy section.
- Semantic Web Associate
- Knowledge Engineering Specialist
- Semantic Integration Expert
Labels:
big data
,
data science
,
databases
,
intelligent web
,
Java
,
linked data
,
metadata
,
natural language processing
,
semantic web
,
text analytics
7 September 2016
6 September 2016
Lambda Architecture
Lambda architecture essentially is composed of the batch layer, speed layer, and the serving layer.
Kappa Architecture
Kappa architecture is essentially composed of the speed layer and serving layer. The batch layer becomes a subset of the speed layer.
5 September 2016
SKOS
SKOS is a very common data model for representing knowledge in form of thesauri or controlled vocabularies which can provide for interlinked knowledge graphs as a form of linked data. SKOS is a lightweight and flexible OWL ontology representation format available in various RDF syntax. OWL on the other hand is an ontology language. It is possible to convert from SKOS to OWL and even to combine them. The below links provide some related tools and libraries for working with SKOS models.
JSKOS
SKOSAPI
OWLAPI
SKOSEd
OpenSKOS
TemTres
THManager
PoolParty
TopBraid
Thesaurus Master
Lexaurus
Fluent Editor
Intelligent Topic Manager
SKOS2OWL
Protege
SKOSIFY
Poolparty Consistency Checker
KEA
SKOSMOS
SILK
W3C SKOS
SKOS: A Guide for Information Professionals
SKOS Taxonomy
The Accidental Taxonomist
Knowledge Engineering with Semantic Web Technologies
LinkedData Engineering
PoolParty Academy
Gate
Ontotext
Knowledge Extraction
Taxonomy Warehouse
Synaptica
Labels:
data science
,
linked data
,
metadata
,
natural language processing
,
rdf
,
semantic web
,
sparql
,
text analytics
3 September 2016
2 September 2016
Subscribe to:
Posts
(
Atom
)