Mabble Rabble: May 2016

26 May 2016

Knowledge Bases and Upper Mappings

Knowledge bases can be used to harness the cognitive abilities and representations in order to enable Intelligent Agents to reason over concepts and things. Upper Mappings also provide added structure to custom domain ontologies. By building on the shoulders of giants one can add semantic enrichment from various data sources to create and design a customized knowledge base. The below list provides some ways of curating with upper mappings, knowledge bases, and commonsense reasoning about the open world.

23 May 2016

Employment Checks

When an employer uses third-party kyc checks for compliance why do they make mistakes in data entry? Isn't validation and verification part of compliance? It is even more shocking when potential candidates are checked against federal databases with incorrectly spelt derivatives of names especially when the candidate has provided correct information. In most circumstances such lack of due diligence on part of the third-party should be legally liable for penalty and held responsible inclusive of the employer that is using them to conduct such compliance checks. Intelligent Agents are necessary in all aspects of data entry to protect the personal information of individuals as well as to avoid potential data discrepancies where they could potentially get linked incorrectly through human error. Why must we be forced to trust another human to process our personal information especially someone we have no information of and have not done our own background checks on to assure ourselves of integrity. Sharing personal information is a risky affair especially when a candidate has no idea of the third-parties that are utilized for such processing to develop any level of assurances on trust and even such trust is questionable at best.

21 May 2016

Open Source Data Science Masters

One doesn't have to have a Phd to be a Data Scientist. Many have transferred from Software Engineering or Data Analyst into Data Scientist roles. While others have self-taught on the job. Many move away from Data Scientist role in favor of the more illustrious Big Data Engineer taking on numerous hats as they transition into a more satisfying occupation. Although, it is an occupational hazard if a Data Scientist ends up asking a Big Data Engineer what unit testing is or how to search for data sources in which case the odd frown and possibly a questionable glance over merits would be well deserved. The below link provides some relevant tracks for self-training online in data science.

data science masters

17 May 2016

Engine Paradigms & Systems

Paradigm	System	Explanation
MapReduce	Hadoop	Small recoverable code tasks, sequential tasks inside map and reduce functions
Dryad/Nephele	Tez	Extends the mapreduce model to DAGs model, backtracking based recovery
PACTs	Flink	Embeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs
RDDs	SPARK	Functional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API

Engine Comparison	Hadoop	Tez	Spark	Flink
API	mapreduce on k/v pairs	k/v pairs readers/writers	transformation on k/v pair collections	iterative transformation on collections
Paradigm	mapreduce	DAG	RDD	Cyclic Dataflows
Optimization	none	none	optimization of SQL queries	Optimization in all APIs
Execution	batch sorting	batch sorting and partitioning	batch with memory pinning	stream with out-of-core algorithms

Courtesy of Apache Flink

Graph Comparison

Analytical
Type	Backend	Supported Frameworks	Context of Use
Giraph	Hadoop/HDFS	Spark/Hadoop	Data Processing for Analytics
GraphX	Titan, Neo4J, HDFS	Spark	Data Processing for Analytics (in-memory)
GraphLab	Hadoop/HDFS	Spark/Hadoop	Data Processing for Analytics, using PowerGraph and GAS models

Operational
Type	Backend	Supported Frameworks	Context of Use
Cayley	MongoDB or LevelDB	Custom Implementation in Go	Knowledge Graph
Titan	Cassandra, HBase, HDFS	Tinkerpop & RDF SPARQL	Massive Knowledge Graphs OLAP/OLTP (now part of Datastax)
Neo4J	Custom	Tinkerpop	Data Visualization, Web Browsing, Portfolio Analytics, Gene Sequencing, Mobile Social Application
OrientDB	Custom	Tinkerpop & RDF SPARQL	Embedded and Standalone, Knowledge Graph, Multimodel (Document + Graph)

Semantic
Type	Backend	Supported Frameworks	Context of Use
Blazegraph and MapGraph	Custom	Sesame RDF SPARQL Tinkerpop	Massive Knowledge Graphs on GPU, includes support for Semantic Web Standards of W3C (used by Wikidata, a Wikimedia project)
Stardog	Custom	RDF SPARQL	In cloud the semantic data use case (third-party)
OntoText GraphDB	Custom	Sesame Jena RDF SPARQL	Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by BBC, Euromoney, FinancialTimes, etc)
Virtuoso	Custom/Hybrid	Sesame Jena RDF SPARQL	Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by DBPedia)
Allegrograph	Custom	Sesame RDF SPARQL	Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C
OpenCog	Custom	Semantic Knowledge	Massive Artificial General Intelligence Graph Knowledge Base

wikidata graph comparison

OLTP/Graph Databases
OLTP/Analytical Databases
Graph Database as a Service
Native Semantic Graph Databases
Graph Query / Interfaces

16 May 2016

Streaming

Below is a curated list of stream processing frameworks, applications, and beyond.

awesome-streaming

Subscribe to: Posts ( Atom )