Knowledge bases can be used to harness the cognitive abilities and representations in order to enable Intelligent Agents to reason over concepts and things. Upper Mappings also provide added structure to custom domain ontologies. By building on the shoulders of giants one can add semantic enrichment from various data sources to create and design a customized knowledge base. The below list provides some ways of curating with upper mappings, knowledge bases, and commonsense reasoning about the open world.
26 May 2016
23 May 2016
Employment Checks
When an employer uses third-party kyc checks for compliance why do they make mistakes in data entry? Isn't validation and verification part of compliance? It is even more shocking when potential candidates are checked against federal databases with incorrectly spelt derivatives of names especially when the candidate has provided correct information. In most circumstances such lack of due diligence on part of the third-party should be legally liable for penalty and held responsible inclusive of the employer that is using them to conduct such compliance checks. Intelligent Agents are necessary in all aspects of data entry to protect the personal information of individuals as well as to avoid potential data discrepancies where they could potentially get linked incorrectly through human error. Why must we be forced to trust another human to process our personal information especially someone we have no information of and have not done our own background checks on to assure ourselves of integrity. Sharing personal information is a risky affair especially when a candidate has no idea of the third-parties that are utilized for such processing to develop any level of assurances on trust and even such trust is questionable at best.
Labels:
artificial intelligence
,
big data
,
data science
,
databases
,
human resources
,
machine learning
21 May 2016
Open Source Data Science Masters
One doesn't have to have a Phd to be a Data Scientist. Many have transferred from Software Engineering or Data Analyst into Data Scientist roles. While others have self-taught on the job. Many move away from Data Scientist role in favor of the more illustrious Big Data Engineer taking on numerous hats as they transition into a more satisfying occupation. Although, it is an occupational hazard if a Data Scientist ends up asking a Big Data Engineer what unit testing is or how to search for data sources in which case the odd frown and possibly a questionable glance over merits would be well deserved. The below link provides some relevant tracks for self-training online in data science.
17 May 2016
Engine Paradigms & Systems
Paradigm
|
System
|
Explanation
|
---|---|---|
MapReduce | Hadoop | Small recoverable code tasks, sequential tasks inside map and reduce functions |
Dryad/Nephele | Tez | Extends the mapreduce model to DAGs model, backtracking based recovery |
PACTs | Flink | Embeded query processing runtime in DAGs engine, extend DAGs to cyclic graphs, incremental construction of graphs |
RDDs | SPARK | Functional implementation of Dryad recovery (RDDs), restrict to coarse-grained transformations, direct execution of API |
Engine Comparison
|
Hadoop
|
Tez
|
Spark
|
Flink
|
---|---|---|---|---|
API | mapreduce on k/v pairs | k/v pairs readers/writers | transformation on k/v pair collections | iterative transformation on collections |
Paradigm | mapreduce | DAG | RDD | Cyclic Dataflows |
Optimization | none | none | optimization of SQL queries | Optimization in all APIs |
Execution | batch sorting | batch sorting and partitioning | batch with memory pinning | stream with out-of-core algorithms |
Courtesy of Apache Flink
Graph Comparison
Analytical | |||
---|---|---|---|
Type | Backend | Supported Frameworks | Context of Use |
Giraph | Hadoop/HDFS | Spark/Hadoop | Data Processing for Analytics |
GraphX | Titan, Neo4J, HDFS | Spark | Data Processing for Analytics (in-memory) |
GraphLab | Hadoop/HDFS | Spark/Hadoop | Data Processing for Analytics, using PowerGraph and GAS models |
Operational | |||
---|---|---|---|
Type | Backend | Supported Frameworks | Context of Use |
Cayley | MongoDB or LevelDB | Custom Implementation in Go | Knowledge Graph |
Titan | Cassandra, HBase, HDFS | Tinkerpop & RDF SPARQL | Massive Knowledge Graphs OLAP/OLTP (now part of Datastax) |
Neo4J | Custom | Tinkerpop | Data Visualization, Web Browsing, Portfolio Analytics, Gene Sequencing, Mobile Social Application |
OrientDB | Custom | Tinkerpop & RDF SPARQL | Embedded and Standalone, Knowledge Graph, Multimodel (Document + Graph) |
Semantic | |||
---|---|---|---|
Type | Backend | Supported Frameworks | Context of Use |
Blazegraph and MapGraph | Custom | Sesame RDF SPARQL Tinkerpop | Massive Knowledge Graphs on GPU, includes support for Semantic Web Standards of W3C (used by Wikidata, a Wikimedia project) |
Stardog | Custom | RDF SPARQL | In cloud the semantic data use case (third-party) |
OntoText GraphDB | Custom | Sesame Jena RDF SPARQL | Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by BBC, Euromoney, FinancialTimes, etc) |
Virtuoso | Custom/Hybrid | Sesame Jena RDF SPARQL | Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C (used by DBPedia) |
Allegrograph | Custom | Sesame RDF SPARQL | Optimized as a Semantic Graph Database based on Semantic Web Standards of W3C |
OpenCog | Custom | Semantic Knowledge | Massive Artificial General Intelligence Graph Knowledge Base |
OLTP/Graph Databases
OLTP/Analytical Databases
Graph Database as a Service
Native Semantic Graph Databases
Graph Query / Interfaces
16 May 2016
Streaming
Below is a curated list of stream processing frameworks, applications, and beyond.
awesome-streaming
awesome-streaming
Subscribe to:
Posts
(
Atom
)