26 May 2016

Knowledge Bases and Upper Mappings

Knowledge bases can be used to harness the cognitive abilities and representations in order to enable Intelligent Agents to reason over concepts and things. Upper Mappings also provide added structure to custom domain ontologies. By building on the shoulders of giants one can add semantic enrichment from various data sources to create and design a customized knowledge base. The below list provides some ways of curating with upper mappings, knowledge bases, and commonsense reasoning about the open world. 

23 May 2016

Employment Checks

When an employer uses third-party kyc checks for compliance why do they make mistakes in data entry? Isn't validation and verification part of compliance? It is even more shocking when potential candidates are checked against federal databases with incorrectly spelt derivatives of names especially when the candidate has provided correct information. In most circumstances such lack of due diligence on part of the third-party should be legally liable for penalty and held responsible inclusive of the employer that is using them to conduct such compliance checks. Intelligent Agents are necessary in all aspects of data entry to protect the personal information of individuals as well as to avoid potential data discrepancies where they could potentially get linked incorrectly through human error. Why must we be forced to trust another human to process our personal information especially someone we have no information of and have not done our own background checks on to assure ourselves of integrity. Sharing personal information is a risky affair especially when a candidate has no idea of the third-parties that are utilized for such processing to develop any level of assurances on trust and even such trust is questionable at best.

21 May 2016

Open Source Data Science Masters

One doesn't have to have a Phd to be a Data Scientist. Many have transferred from Software Engineering or Data Analyst into Data Scientist roles. While others have self-taught on the job. Many move away from Data Scientist role in favor of the more illustrious Big Data Engineer taking on numerous hats as they transition into a more satisfying occupation. Although, it is an occupational hazard if a Data Scientist ends up asking a Big Data Engineer what unit testing is or how to search for data sources in which case the odd frown and possibly a questionable glance over merits would be well deserved. The below link provides some relevant tracks for self-training online in data science.

17 May 2016

Engine Paradigms & Systems

Paradigm
System
Explanation
MapReduceHadoopSmall recoverable code tasks, sequential tasks inside map and reduce functions
Dryad/NepheleTezExtends the mapreduce model to DAGs model, backtracking based recovery
PACTsFlinkEmbeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs
RDDsSPARKFunctional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API
Engine Comparison
Hadoop
Tez
Spark
Flink
APImapreduce on
k/v pairs
k/v pairs readers/writerstransformation
on k/v pair collections
iterative transformation
on collections
ParadigmmapreduceDAGRDDCyclic Dataflows
Optimizationnonenoneoptimization
of
SQL
queries
Optimization
in all APIs
Executionbatch sortingbatch sorting and partitioningbatch with memory pinningstream with
out-of-core algorithms
Courtesy of Apache Flink

Graph Comparison

Analytical

TypeBackendSupported FrameworksContext of Use
GiraphHadoop/HDFSSpark/HadoopData Processing for Analytics
GraphXTitan, Neo4J, HDFSSparkData Processing for Analytics (in-memory)
GraphLabHadoop/HDFSSpark/HadoopData Processing for Analytics, using PowerGraph and GAS models

Operational

TypeBackendSupported FrameworksContext of Use
CayleyMongoDB or LevelDBCustom Implementation in GoKnowledge Graph
TitanCassandra, HBase, HDFSTinkerpop & RDF
SPARQL
Massive Knowledge Graphs OLAP/OLTP (now part of Datastax)
Neo4JCustomTinkerpopData Visualization, Web Browsing, Portfolio Analytics, Gene Sequencing, Mobile Social Application
OrientDBCustomTinkerpop & RDF
SPARQL
Embedded and Standalone, Knowledge Graph, Multimodel (Document + Graph)

Semantic

TypeBackendSupported FrameworksContext of Use
Blazegraph and MapGraphCustomSesame
RDF
SPARQL
Tinkerpop
Massive Knowledge Graphs on GPU, includes support for Semantic Web Standards of W3C (used by Wikidata, a Wikimedia project)
StardogCustomRDF
SPARQL
In cloud the semantic data use case (third-party)
OntoText GraphDBCustomSesame
Jena
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by BBC, Euromoney, FinancialTimes, etc)
VirtuosoCustom/HybridSesame
Jena
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by DBPedia)
AllegrographCustomSesame
RDF
SPARQL
Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C
OpenCogCustomSemantic KnowledgeMassive Artificial General Intelligence Graph Knowledge Base

OLTP/Graph Databases
OLTP/Analytical Databases
Graph Database as a Service
Native Semantic Graph Databases
Graph Query / Interfaces

16 May 2016

Streaming

Below is a curated list of stream processing frameworks, applications, and beyond.

awesome-streaming