Mabble Rabble: 2019

31 December 2019

Limited machine learning algorithms and limited ways to optimize them for use cases
Limited flexibility of supporting data features
Limited Sub-selection model results from the limited choice of models
Ugly visualization for performance and comparisons
The blender option for ensemble methods and choices is limited
To do anything complex it is too limited
Doesn't replace the value of handcrafted models and the value of feature engineering process
Why automate feature engineering especially when the process is what allows one to build a generalizable model and with a better understanding of the business cases
Limited ways to evaluate the choice of model
Limited ways of import/export model for build/deployment
Tight coupling to a third-party way of doing things
Doesn't follow the formal data science method
In fact, doesn't even replace 99% of the work
GDPR and governance issues when processing data through the third-party models
Benchmarks on models are not available for public peer review
Cost far exceeds the benefit
Many of the models provided have less than optimal outcomes - low confidence scores
Not good for complicated analysis, best to keep use cases as simple as possible
Productivity gain is only with very limited and simple cases
Invariably, most business cases have noisy data and require custom models where it will be unworkable and useless to work against complexity and ambiguity
One needs to learn another third-party tool and be willing to trust the model solution blindly
Quick wins and successes is not the answer nor the solution if it does not solve the business case
The reality is there is no free or easy lunch for most business cases and one has to put in the time and effort
No need to cheat one's way through the process
Unconvincing autopilot feature - there is no such thing as autopilot in machine learning in fact there is no real community standards or even patterns defined so how can one jump decades in progress
Solution is useful to people that can't code, don't like to code, don't understand the data science method, and prefer simple drag and drop options
Data input is treated in most cases as a table, not workable for noisy unstructured data
Often will end up with overfitted models when the whole point of machine learning is to build generalizable models
No flexible options for transfer learning on unseen data
And, that isn't even the end of it...

DataRobot

30 December 2019

Some More Awesome Knowledge Graphs

Awesome Knowledge Graphs

Foundations of Computational Agents

27 December 2019

21 December 2019

There are some glaring practices in recruitment industry. Some agencies actively practice reverse discrimination by only hiring women for roles such as nurse, personal assistant, receptionist, secretary, catwalk model, and others. However, if the tables were turned and if an agency were to only hire men there would likely be an uproar. This could equally be construed as sexism on part of the organizations that these recruitment agencies are working with to fill such roles. In a society where there is now apparently 100+ genders, how does one even know that the person is a man or a woman anymore? Would a transwoman/transsexual be rejected from such agency for employment? And, should one even care what gender they are? Institutional racism is also a major issue where the recruiter tends to pick candidates based on their inherent bias. Often under the covers the practice continues in some companies in form of cultural fit as an umbrella term. In fact, it doesn't just stop at gender and race. In many cases, it emerges there are all forms of hypocrisy that makes recruitment a very humanly flawed discipline. Also, how can a recruiter profile a candidate purely on basis of a phone and email conversation? And, it is practically impossible for a recruiter to be fully aware of all the skills listed on a person's resume. Such practices have a very dampening effect on the economic demand and supply for jobs and candidates in the market. AI certainly is the way to go to avoid such recruitment risks by retargeting on the things that really matter in employment which is more around the fairness for qualifications, practical experience, and skills to conduct one's job rather than to pass a judgement on their likability factor. However, an important aspect here is that AI should not only be approached via probabilistic methods. Just imagine what would happen if one were to probabilistically cluster bias based on someone's name in order to identify their ethnicity. Can you really tell someone's background based on their name alone? What if they are of mixed background? Won't they be an outlier? AI combines both logical reasoning as well as probabilistic methods.

10 December 2019

Reality of DevOps

There are some typical attitudes that emerge among devops engineers in many organizations. Partly, it is as a result of dealing with a lot of clueless employees. And, partly, because they seem to think they are the most important group in the whole organization so develop an air of superiority while giving other business groups an impression that everything is way too complicated for people to understand. Let's face it devops is not exactly rocket science. It is a merge of two interdependent functions - development and operations. The following are some typical examples of what happens when people in the workplace interact with devops.

They make changes without informing others, and without providing sufficient advanced notification of a planned outage
They will tell you something is too complicated, when it's a simple case of drag/drop or window click
They will over-exaggerate on the time it will take them to get something done
They will rarely get it done within the agreed timescales but expect unforeseen/unplanned delays
They will expect you to log a ticket for everything even if logging a ticket takes more time than to do the work
They will reject everything you say, then they will say exactly what you had said as if to make out what they said was quite deep
They will use methods and tools only they can understand and with little to no shared documentation
They will reject the way you had done something, then approach it exactly the same way
They will look down at you for everything even while sitting in a meeting, explaining anything is complete chore for them, but they will expect you to have the highest level of communication skills and patience with them
When you explain something simple to them expect to explain it in french, english, swahili, plenty of other languages, and still expect them to not understand any of it, even while everyone else in room likely understood it in full
Sometimes your login accesses might be randomly revoked and then randomly start working again, this isn't magic but the devops are likely doing their usual unplanned maintenance or config changes
When you can't ssh into a server because the devops have randomly destroyed your instance, during their regular cleanup sessions, and you have to start all over again with your work
When they never acknowledge their mistakes that lead to critical outage issues in production
When the wrong model or code is deployed into production
Using too many rigid processes and creating barriers between themselves and the user
Using metaphors of oversimplification
Not understanding the value of automation
Poor methods for testing what they deliver to business
Lack of a formal architecture evaluation
Badly managed incident management
Lack of discipline for conducting effective, sensible, and responsive postmortem evaluations
Misuse of metrics that don't display the full picture and cost businesses
They will tell you it will take at least a month to spin up and setup instances in cloud when it should probably only take them less them 2 minutes to maybe 2 days if they need to cluster and setup security groups
They will block you from doing anything yourself, or even to assist them to speed things up
Each member in a devops team almost works like in a silo of their own bubble of ego
Expect them never to be available when you have an outage situation so they can feel an air of importance

MiniKube

Ludwig

Horovod

9 December 2019

Blynk

4 December 2019

Segmentation Golden Rules

pragmatic segmenter

19 November 2019

KBPedia

9 November 2019

Code Search

ack

the platinum searcher

the silver searcher

Manticore

Pubchem

7 November 2019

Clarion

Awesome Artificial General Intelligence

Awesome Artificial General Intelligence
Cognitive Architectures

ACT-R

Soar

Impala AGI

6 November 2019

LIDA

LIDA
Models of Consciousness (Scholarpedia)
Models of Consciousness (Wikipedia)
Neuroscience
Neuroscience Online
Harvard Neuroscience Online

BICA Table

BICA Table of Cognitive Architectures
BICA Society

5 November 2019

What is AI

What is AI?	Thinking	Acting
Humanly	Cognitive Science	Turing Test, Behaviorism
Rationally	Laws of Thought	Doing The Right Thing

3 November 2019

Machine Super Intelligence

2 November 2019

Ladder Ontologies

Asocial Ontologies
Social Ontologies
Cultural Ontologies
Oral Linguistic Ontologies
Literate Ontologies
Civilization-scale Ontologies

1 November 2019

Philosophy Ontology

Inpho Project

Java Demise

The speed with which new versions are being released every year spells the end of Java in the practical business world in the foreseeable future. There are two release schedules each year (every 6 months) which is significant. The biggest hurdle for businesses is maintenance and resources. There are many products that are still dependent on Java 8 and while there is a requirement for commercial licenses for upgrades since 2019. The other being technical debt and backwards compatibility constraints especially when the product is implemented in Java and then sold to customers. In a very short span of time there have been quite a few changes to the language and an ample set of versions. One can say that the Java release cycle has exploded in speed that the majority of the community for all practical intents and purposes will not be able to keep up. What this also means is that the ecosystem of tools and libraries take a while to upgrade making it a frustration in management for the engineering and the support teams. The Java ecosystem is huge, the fall back mechanism with lots of boilerplate code, formal testing processes from lack of design patterns baked into the language, and dependency hell is a massive hurdle with the language. It seems like gradually more and more organizations will distance themselves away from Java in order to keep maintenance costs down, meet customer expectations and demand for new product features as well as to reduce complexity especially in mobile and cloud environments. Another susceptible reason is the Oracle ownership of the language and the expectations provided in terms of license for the end user. Unfortunately, there is a love hate relationship for the language in the community. Even if the language were to reduce in interest in the community, it would still appear as the underdog from under the covers and rear its ugly head as a dependency for other languages like Groovy, Kotlin, and several open source Microservices and Big Data platforms.

XLNet

31 October 2019

StratML

European Council Plan Against Disinformation

30 October 2019

Office Search

office search
hubblehq
complete office search
prime office search
workspaces
flexioffices
free office finder
office genie
coworker
croissant
included.co
instant offices
pickspace
share desk
happy desk
desksnear.me
desktime
coworking
desk pass
copass
coworking wiki
desk surfing
near desk
liquid space
gocowo
find workspaces
sneed
ofixu
qdesq
share your office
spacelist
42 floors
office list
coffices
breather
peerspace
bizly
spacewhiz
beewake
awfis
all office centers
splacer
lexc
desk camping
worksnug
seats 2 meet
coworking.coffee
commercial cafe
preferred office network
heydesk
office freedom
flexas
kontor
labs
spacesworks
thebrew
cowork

Hugo

Apollo

Grandstack

28 October 2019

Curse of Dimensionality

As you increase the number of input features, the combination of inputs can grow exponentially. As the combinations grows, each training sample covers a smaller percentage of possibilities. The result being, as you add features, you need to increase the size of your training set, which may be exponentially. As the number of dimensions goes up, the model must train on significantly more data in order to learn an accurate representation of the input space.

26 October 2019

Corpus Approaches To Social Science

CASS Projects

Annotation Services

Figure Eight
Hypothes.is
Open Annotations
Annotatorjs
OpenAnnotate
Prodi.gy
Doccano
Brat
Tagtog
X-Lisa
LightTag
DataTurks
Supervise.ly
AnnotatedStar
Folia
Annotable
Diigo
Zap
Lionbridge
Gate
UIMA
RSTWeb
LabelIMG
VGG Image Annotator
LabelBox
LabelMe
ImageTagger
RectLabel
Diffgram
Fast Annotation Tool

Further Annotation Tools

25 October 2019

Obituaries

After Me
PBInfo
Funeral Guide
Tributes
Ancestry

QL2

data world

data.world

It's your skills

It's your skills
skills project

ESCO-EURES

EURES
ESCO

Datahub Collections

People Data Labs

People Data Labs
pipl

Pasta Types

guide to pasta shapes

24 October 2019

Sentence Pairs

ManyThings
Tatoeba

23 October 2019

Recommendations

Deep Learning Based Recommender System
Recsys
Recommendation Systems Paperlist
youtube recommending next video

21 October 2019

BigSi

BigSi
Bigsi on biorxiv
EBI

Neural Information Retrieval

Neural Information Retrieval Review
Deep Learning Relevance
Neural Information Retrieval

Description Logics

description logic
description logic primer
description logics
complexity of reasoning
foundations of description logics
ontologies and the semantic web
description logics
list of reasoners

Alternative Sequences

Attention - memory added to other networks to guide focus
Transformers - networks that use attention exclusively instead of recurrent and convolutional layers
Temporal Convolutional Networks - CNN designed for sequences

Attention Is All You Need

Scholarly Data

ScholarlyData
Awesome Scholarly Data Analysis

17 October 2019

Motion Transfer

Everybody Dance Now

Arxiv Sanity

OpenIE

16 October 2019

ALCOMO

Human Activity Recognition

AnyBURL

Join-T

PatyBred

WebisLOD

WebIsLOD

DBkWik

T2D

Winter

ACL Anthology

ICML Papers

ICML 2019
ICML 2018
ICML 2017
ICML 2016

MarineTLO

15 October 2019

AllenNLP Interpret

Pulumi

Firecracker

Firecracker MicroVM

Enthymemes and Argument Mining

Finding Enthymemes in Real-World Texts
Argument Mining Using Argumentation Scheme Structures
Argumentative Approaches to Reasoning with Maximal Consistency
Dave the Debater
Argument Mining Papers (Filtered)

Workshop Synthesis:
EMNLP 2019
EMNLP 2018
EMNLP 2017
EMNLP 2016
EMNLP 2015
EMNLP 2014

Tutorials:
Computational Argumentation
Argument Mining
Unsupervised Corpus Wide Claim Detection
Argument Mining
Argumentation Mining (Synthesis Series)

14 October 2019

Bounding Box Annotation Tools

RectLabel
Labelbox
Simple Image Annotator
Sloth
CVAT

Are We There Yet With Rust

Are We Web Yet

13 October 2019

Platform Marketer

Marketing Technology Landscape

Fake News Detection Resources

12 October 2019

Prime Spirals

Binary Code Analysis

Dive into Deep Learning with MXNet

MXNet

BibFrame

Netron

11 October 2019

Opensource Firewall

Untangle
Pfsense
Netfilter/iptables

9 October 2019

TimescaleDB

Stackshare

8 October 2019

Cloud Machine Learning

Google CoLaboratory
Paperspace
Crestle
Floydhub
LeaderGPU
AWS

6 October 2019

Papers with code

SentenceBert

SciBert

KBert

5 October 2019

FastBert

ESC 50

ESC-50

2 October 2019

News Manual

The News Manual

1 October 2019

Marketing Mix

Packaging
Partnership
Passion
Penetration
People
Perception
Personality
Persuasion
Phrases
Physical
Place
Placement
Planning
Popularity
Population
Positioning
Positiveness
Power
Pragmatism
Preference
Price
Privacy
Process
Product
Productivity
Professionalism
Profit
Promotion
Prospect
Publicity
Purchase
Push-Pull
Picture
Part
Pilot
Persona
Peers
Pass-Along-Value
Party
Pandemic
Pandemonium
Pain
Placebo
Planting
Playfulness
Pleasure
Plot
Politics
Praise
Prediction
Premeditation
Press
Pressure
Preview
Principle
Prominence
Promise
Proof
Properties
Prosperous
Protection
Purple Cow
Purpose
Production

Medical Codes

ICD-10 - Diagnosis
CPT - Procedures
LOINIC - Laboratory
RxNorms - Medications
ICF - Disabilities
CDT - Dentistry Procedures
DSM-IV-TR - Psychiatric Illnesses
NDC - Drugs
DRG - Diagnosis Group
HCPC - Procedures

Survey of Embeddings Use Cases for Clinical Healthcare

W3C Events 2019

30 September 2019

Programming Paradigms

There are various programming paradigms that have come about in computer science. However, none has replicated the abstractions of philosophic logic in their entirety to leverage the full capacity for artificial intelligence. Building programs that replicate philosophic constructs might be a way of leveraging better abstractions for artificial intelligence programs as well as reasoning systems. The following are some constructs derived for abstractions and concreteness that could be baked in or extended as part of the evolution of programming languages to converge the mind and the mechanisms of machines:

Class
Object
Concept
Thing
Predicate
Actor
Agent
Subject
Standard
Data
Thought
Type
Being
Action
Intent
Event
Belief
Desire
Message
Axiom
Restrict
Rule
Function
Relation
Attribute
Instance
Policy
Critic
Reward
Agency
Context
Domain
Range
Observe
Process

Stanford Encyclopedia of Philosophy

28 September 2019

Jamstack

Static Site Generators

StaticGen

27 September 2019

Python AWS Frameworks

Chalice
Boto

24 September 2019

CoreDNS

19 September 2019

Encoded Archival Description

EAD
EAC-CPF

Open Images

18 September 2019

Classification (Binary, Multi-Class, Multi-Label)

Binary Classifier - choose single category for an object from two categories

Multi-Class Classifier - choose single category for an object from multiple categories

Multi-Label Classifier - choose as many categories as applicable for the same object

17 September 2019

Distributed Tracing

OpenTelemetry
TraceContext
Jaeger

SVCCA

16 September 2019

Google ML Kit

13 September 2019

ACM SAC 2020 Technical Tracks

10 September 2019

Tensorflow Probability

Tensorflow Probability Framework

7 September 2019

Turi Create

2 September 2019

Ballerina

Data Visualization

1 September 2019

Weaviate

30 August 2019

PyTorch BigGraph

Computer Science Conferences

28 August 2019

News Event Exploration

Event Registry
GDELT
STICS
EMM
MediaCloud

Europe Media Monitor

EMM

26 August 2019

Language Metadata Table

21 August 2019

PyTorch NLP

PyTorch-NLP

NLP Architect

Magnitude

19 August 2019

Omega

Applications of Data Science

Anomaly Detection
Assistive Services
Auto-Insurance Risk Prediction
Automated Closed Captioning
Automated Image Captioning
Automated Investing
Autonomous Ships
Brain Mapping
Caller Identification
Cancer Diagnosis/Treatment
Carbon Emissions Reduction
Classifying Handwriting
Computer Vision
Credit Scoring
Crime: Predicting Locations
Crime: Predicting Recidivism
Crime: Predicting Policing
Crime: Prevention
CRISPR Gene Editing
Crop-Yield Improvement
Customer Churn
Customer Experience
Customer Retention
Customer Satisfaction
Customer Service
Customer Service Agents
Customized Diets
Cybersecurity
Data Mining
Data Visualization
Detecting New Viruses
Diagnosing Breast Cancer
Diagnosing Heart Disease
Diagnostic Medicine
Disaster-Victim Identification
Drones
Dynamic Driving Routes
Dynamic Pricing
Electronic Health Records
Emotion Detection
Energy-Consumption Reduction
Facial Recognition
Fitness Tracking
Fraud Detection
Game Playing
Genomics and Healthcare
Geographic Information Systems
GPS Systems
Health Outcome Improvement
Hospital Readmission Reduction
Human Genome Sequencing
Identity-Theft Prevention
Immunotherapy
Insurance Pricing
Intelligent Assistants
Internet of Things and Medical Device Monitoring
Internet of Things and Weather Forecasting
Inventory Control
Language Translation
Location-Based Services
Loyalty Programs
Malware Detection
Mapping
Marketing
Marketing Analytics
Music Generation
Natural Language Translation
New Pharmaceuticals
Opioid Abuse Prevention
Personal Assistants
Personalized Medicine
Personalized Shopping
Phishing Elimination
Pollution Reduction
Precision Medicine
Predicting Cancer Survival
Predicting Disease Outbreaks
Predicting Health Outcomes
Predicting Student Enrollments
Predicting Weather-Sensitive Product Sales
Predictive Analytics
Preventative Medicine
Preventing Disease Outbreaks
Reading Sign Language
Real-Estate Valuation
Recommendation Systems
Reducing Overbooking
Ride Sharing
Risk Minimization
Robo Financial Advisors
Security Enhancements
Self-Driving Cars
Sentiment Analysis
Sharing Economy
Similarity Detection
Smart Cities
Smart Homes
Smart Meters
Smart Thermostats
Smart Traffic Control
Social Analytics
Social Graph Analytics
Spam Detection
Spatial Data Analysis
Sports Recruiting and Coaching
Stock Market Forecasting
Student Performance Assessment
Summarizing Text
Telemedicine
Terrorist Attack Prevention
Theft Prevention
Travel Recommendations
Trend Spotting
Visual Product Search
Voice Recognition
Voice Search
Weather Forecasting

Dweet.io

PubNub

18 August 2019

Types of Data Discovery

CDR
Emails
ERP
Social Media
Web Logs
Server Logs
System Logs
HTML Pages
Sales
Photos
Videos
Audios
Tabulated
CRM
Transactions
XDR
Sensor Data
Call Center
Knowledge Bases
Google Search
Google Trends
News
Sanctions Data
Profile Data

17 August 2019

Himalayan Backpacker's Heaven

15 August 2019

Wake Word Voice

Yet Another Wake Word Engine

Wake Up Word Speech Recognition
Choosing A Wake Word
Using Wake-Up Word To Filter Out Background Speech
PocketSphinx
Porcupine
Mycroft-Precise
Help Us Improve Precise
Snips.Ai
Federated Learning for Wake Word
Customize Your Voice Assistant With Personal Wake-Word
Alexa Wake-Word Techniques
Visual Wake-Word Dataset
Rhasspy
Snowboy
Revisiting Wake-Word Accuracy and Privacy - Sensory
ExpressIf
How to do Real-Time Trigger Word Detection
On Convolutional LSTM for Joint Wake-Word Detection
Matrix Wake Word Sphinx
GassistPi
Direct Modelling of Raw Audio for Wake-Word Detection
Without Wake-Word
Amazon Alexa
Offline Voice Recognition
Sequence Models for Trigger Words
Arxiv Wake Word
Donut CTC Query-By-Example - Keyword Spotting
How To Easily Command Your App With Hotword Detection
Hotword Cleaner
Challenges To Open Voice Interfaces
DSP Illustrated
Houndify Wake Word
Wake Word Benchmark
Alexa Dataset Wake Word
Detecting Wake Words In Speech
Alexa Wake Words
Custom Alexa Wake Word Generation Dataset
Trigger Word Detection

Common Voice

8 August 2019

Quantum AI for Psychic Abilities

The 3 T's already are an active research area under teleportation, telepathy, and telekinesis. However, following psychic abilities could also come into the mix in AI:

Thoughtography - imprinting images in one's mind onto physical surfaces
Scrying - able to look into mediums to view and detect suitable information
Second Sight - able to see future and past events or perceive information (precognition)
Retrocognition - supernaturally perceive past events (postcognition)
Remote Viewing - able to see distant or unseen target with extrasensory perception
Pyrokinesis - able to manipulate fire through mind
Psychometry - able to get information about a person or object by touch
Psychic Surgery - able to remove disease or disorder within or over the body with energetic incision to heal the body
Prophecy - able to predict the future
Precognition - able to perceive future events
Mediumship - able to communicate with the spirit world
Levitation - able to float or fly by psychic means
Energy Medicine - able to heal one's own empathic etheric, astral, mental, or spiritual energy
Energy Manipulation - able to manipulate non-physical/physical energy with mind
Dowsing - able to locate water, gravesites, metals, and materials without scientific apparatus
Divination - able to gain insight into a situation
Conjuration - able to materialize physical objects from thin air
Clairvoyance - able to perceive people, objects, locations, or events through extrasensory perception
Clairsentience - able to perceive messages from emotions and feelings
Clairolfactance - able to perceive knowledge through smell
Clairgustance - able to perceive taste without physical contact
Claircognizance - able to perceive knowledge through intrinsic knowledge
Clairaudience - able to perceive knowledge through paranormal auditory means
Chronokinesis - able to alter perception of time causing sense of time to slow down or speed up
Biokinesis - able to change or control the DNA
Automatic Writing - able to draw or write without conscious intent
Aura Reading - able to perceive energy fields around people, places, and objects
Astral Projection - out-of-body experience or the voluntary projection of consciousness
Apportation - able to materialize, disappear, or teleport objects

7 August 2019

Drawbacks of Reinforcement Learning

Reproducibility
Resource Efficiency
Susceptibility to Attacks
Explainability/Accountability

Types of Filtering for Recommendations

Adaptive
Contextual (Context Similarity)
Cognitive (Personality/Behavior)
Content

Bayesian
Relevance Feedback
Evolutionary Computation
Deep Learning

Collaborative (Model vs Memory)

Matrix Factorization
Tensor Factorization
Clustering
SVD
Deep Learning
PCA
Pearson
Bayesian
Markov Decision Processes

Interest/Intent

Intent

Interest

Content Consumption

Impact/Influence

Social Feedback

Likes
Dislikes
Mentions
Shares
Subscribes
Hashtags
Emojis
Reviews
Comments
Trends
Endorsements
Opinions from Person of Influence
Associative Connections (Primary/Secondary)
Six-Degrees of Separation

Item-based
User-based
Personalization
Reinforcement Learning

Reward
Optimization
Exploration/Exploitation
Competitive
Cooperative

Semantic (with a Knowledge Graph)
Demographic

Deep Learning Approaches for Recommendations:

Autoencoders
Neural Autoregressive Distribution Estimate
Convolutional Neural Networks
Recurrent Neural Network
Long Short Term Memory
Restricted Boltzmann Machine
Adversarial Network
Attentional Model
Multilayer Perceptron

30 July 2019

Eating Out

Steaks
Burgers
Seafood
Chinese
Thai
Italian
Turkish
Brunches
Outdoors
Food Markets
Rooftop Bars
100 Best
Bakeries
Bar List
Cheap Eats
Sunday Roasts
Chippies
Beefeater
Harvester
Tabletable
Brewersfayre

Medicine Ontologies

UMLS
ATC
MedDRA
MeSH

Bioportal

28 July 2019

Cloud Providers

Most of Azure cloud service offerings are basically drop-in replacements for their biased standalone software tools. For Microsoft, it seems like Azure is an alternative way of vendor lock-in of the customer via the re-purposed cloud option which has so far proven to be useful through heavy gimmicky marketing. GCP, on the other hand, provides many alternatives for big data but with very ineffective pricing, lacking business critical reliability, security constraints, lots of options to re-invent the wheel with vendor lock-in, still limited in SQL use cases, and their limited over all services. AWS has proven to be a very effective pricing model as well as catering to a wide range of services to cover business needs including a strong reliability and flexible options for management of services. For most organizations, especially for data science work, AWS is the go to cloud solution. Azure and GCP still lag behind considerably in reliability, cloud service offerings, ineffective pricing, and the biggest concern being vendor lock-in. In many cases, cloud providers are limited by their mission statements of what they are trying to achieve through their solutions to businesses and their future progressive infrastructure development goals. For Microsoft, windows is the ultimate success story which one can see has evolved in parallel from Apple. But, linux has become the defacto operating system for the cloud and for obvious reasons. Data as a commodity is a valuable asset to most organizations. And, the management of risk in security and compliance is an enduring struggle for many organizations. Especially, in meeting GDPR compliance many organizations will want a transparent data lineage. Can one trust storage and processing of data on GCP? All Google services converge to some degree or another and get indexed by their search engine. Invariably, the cost and risk of using the third-party cloud infrastructure vs in-house infrastructure will always be a concern for companies to weigh out. It seems, in the long-run, organizations will take back control of their own data storage and processing needs. The peddles of trends are towards portable, smarter, and stackable private cloud ownerships, more flexibility in management of infrastructure, and with virtualization modes at an affordable cost. While start-ups may find it easier with reducing setup costs by leveraging third-party infrastructure. As companies grow with their market value of their products, they may increase their independence by eventually moving away from the third-party cloud dependency to their own in-house converged infrastructure allowing for greater flexibility to meet consumer expectations and the demands of their product services - enterprise enablement drives creative and profitable growth.

25 July 2019

Deep Learning Datasets

Deep Learning Datasets
Deep Learning Datasets 2
Skymind Datasets
Dataset List
List of Wikipedia Datasets
Tensorflow Datasets

Google Datasearch (depends on how up-to-date are their indexed dataset results)

Radio Streaming

Shoutcast
Icecast

List of streaming media systems

24 July 2019

Everyday Robots

Robots, over the years, have proven themselves as worthy candidates for replacing the manual mundane labor intensive work for humans both for commercial and home. Not only do robots work more effectively, they are also extremely productive. In general, robots can be applied to most specialist labor so they can be trained to be good at a particular aspect of work. But, they may not be sufficiently capable yet to do multiple things through adaptability in multi-class transfer learning. The following highlight some examples of robot use cases.

automotive breakdown repair man/woman
home and office cleaner
rubbish disposal
grocery shopper
home and office security officer/inspector
laundry service
cook (chef)
critic / reviewer
gardener
table setter
mechanical turk
post man/woman
babysitter
mystery shopper
chauffeur
home and office mover
handy man/woman
telephone/broadband installer/repair man/woman
call centre agent
lollypop man/woman
school teacher
office secretary
family mediator
office mediator
crop duster
nursing home nurse
doctor and nurse
nanny
lawyer
accountant
assembly line worker
dentist
data entry clerk
journalist
financial analyst
comedian
musician
artist
telemarketer
paramedic
commercial and defence pilots
public transport worker
rail repair
air traffic controller
land traffic controller
sea traffic controller
metrologist
kitchen porter
crop pickers
police man/woman
fire man/woman
immigration/border controller
politician
director
photographer
creative writer
curator
cheerleader
gamer
construction worker
programmer
logging worker
fisher man/woman
steel worker
street sweeper
refuse collector
carpenter
stunt man/woman
courier
wrestler
boxer
sports man/woman
recycle waste worker
power worker
farmer
roofer
astronaut
army & military officer
bodyguard
slaughterhouse worker
mechanic
metalcrafter
search & rescue
special forces (SAS, Delta Force, Seal, etc)
sanitation worker
land mine remover
miner
bush pilot
lumberjack
librarian
human resources assistant
salesman
editor
dance instructor
bus conductor
tourist guide
stewardess
cashier
store replenisher
data center operator
taxi cab driver
train driver
lorry driver
customer service advisor
electrician
vehicle washer
bed maker
bathroom cleaner
pet walker
oilfield driver
derrick hand
roustabout
offshore diver
rodent killer
insect killer
therapist
architect
actor
backup singer
backup dancer
house builder
waiter
presenter
manager
hacker
stripper (exotic dancer)
sex worker
hairdresser
makeup artist
fashion designer
cameraman
researcher
chemist
pharmacist
landscapist
baker
ship builder
car maker
broadcast technician
hotel helpdesk
store helpdesk
mall helpdesk
site assistant
tailor
tutor
pet trainer
cartoonist
reporter
moderator
painter
plumber
auditor
financial trader
financial broker
financial advisor
compliance advisor
fraud advisor
risk advisor
surveillance agent
social media agent
bricklayer
choreographer
actuarian
physiotherapist
tea/coffee maker
pizza maker
burger maker
welder
surveyor
surgeon
glazier
tiler
stonemason
optician
tool maker
artisan
sonographer
radio technician
sports coach
bartender / barmaid
bellboy
paperboy
drain inspector
pet feeder

21 July 2019

Bookkeeper

Apache Bookkeeper

Pulsar

Apache Pulsar

19 July 2019

Sefaria

Library of Jewish Texts
TorahText
Torah Database
HebrewBooks

AI Novelist

The Times List of 50 Greatest Writers Since 1945
Project Gutenberg
Open Library
Smashwords
Goodreads
Wattpad
Wayback Machine
Universal Library

List of digital library projects

13 July 2019

Lucid Pipeline

Most AI solutions can be built as pipelined implementations with various sources to sinks from a set of generalizable models. Invariably, knowledge graph will act as a key layer for evolvable feature engineering that can be translated into ontological vectors and fed into AI models. Split the pipeline as a lucid funnel, lucid reactor, lucid ventshaft, and lucid refinery using a loose analogy of a distillation process. The following components highlight the key abstractions:

AI/DS Engine Layers:

Disc (frontends - discovery/visualization layer)
Docs (live specs via swagger, etc - documentation layer)
APIs (proxy/gateway services connected with elasticsearch or solr - application layer)
DS (models and semantics - AI layer)
Eval (benchmarks, workbench and metrics - evaluation layer)
Human (optional human in the loop - human/annotation layer)
Tests (load, unit, uat, service, etc - testing layer)
Admin (control for access management, operations workloads, and automation - administration layer)
Funnel (ingestion, pre-process, post-process layer using brokers like Kafka/Kinesis)
Reactor (reactive processes - workflow/transformational layer - via Spark, Beam, Flink, Dask, etc)
Ventshaft (fuzzy matches, distance matches, probabilistic filters, relational matches, clusters, fake filters, fake matches, feature selection filters, component factors, informed searches, uninformed searches, string matches, projection filters, samplings, tree searches, validations, verifications - functional/utility layer)
Refinery (context types, objects, attributes and methods as blueprints - entity/object layer)
Datapiles (indexed data sources as services for document/column/graph stores - data access layer)
Conf (environment configurations for nginx, etc - configuration layer)
Cloud (connected services for AWS/GCP orchestration - infrastructure/platform layer)

7 July 2019

GPT-2

5 July 2019

Various GANs

The GAN Zoo

2 July 2019

Ethics in NLP

Ethics in NLP
ACL Wiki Ethics in NLP

1 July 2019

Sweet Wine

Top 100
Vivino's
RobbReport
Winemag
Winesearcher
Local Wine Events
Wine Ontology
ProductOntology Wine

30 June 2019

Bitcoin Development

Bitcoin Developer's Guide

Sweet Beer

Sweet Stouts
Stouts
Lagers
Pale Ales
Belgians
Beers of World
Best Fruit Beers
Dessert Beers
World Beer Awards
International Beer Challenge
Brewing Awards
Beer Ranker
ProductOntology Beer

28 June 2019

Ethereum

Web3.py
Web3.js
Web3.java
Web3.scala

Web3.io

Fever

Fact Extraction and Verification

NLP Progress Track

NLP Progress Track
NLP Progress

26 June 2019

Deep Averaging Network

Deep Averaging Network
Deep Adversarial Averaging Network

21 June 2019

Central Securities Depository

Deep Learning Benchmarking Datasets

Deep Learning Datasets

20 June 2019

W3C Community Groups

W3C Community

UMap

Umap

19 June 2019

Machine Translation Parallel Corpora

Opus
Moses
Clarin
Europarl
Eur-lex
SeedLing
LanguageGrid

JRC-Acquis (EU Law)

ML in Go

Gorgonia
GoNum
GoCV

PermId

18 June 2019

Lucene4IR

Traditional KG Refinement

Traditional KG Refinement
Which KG
Linked Data Quality

17 June 2019

Top Google Scholar Publications

Scholar Publications

Awesome Sentence Embedding

16 June 2019

Graph Embeddings

KGlove
RDF2Vec
Node2Vec
Graph2Vec

Word in Context

WiC Dataset

Stanford Contextual Word Similarity

OpenAI

OpenAI Progress Papers

13 June 2019

PoolParty NLP

Natural Language Processing with PoolParty

Awesome Prolog

Awesome-Prolog

12 June 2019

Tensorflow

PyTorch

Cross-Lingual Word Embeddings

Cross-lingual Word Embedding
Massively Multilingual Word Embedding
MUSE

11 June 2019

Sports Extended Reality

Moment is everything (pre-event, event, post-event)
Nothing is actually live
Flat images and sense of presence
20/80 rule
Time matters

Attributes of a sports experience:

Hawkeye cameras
Replays
Live scores
Clock graphics
Broadcast video + audio (including commentary)
Live text updates
Fast cuts
Music
High-paced graphics
Social media interaction (optional)

Live is a belief, an illusion in sports. There are three stages of live sports: Live, Live Live, Live-To-Record or Live-To-Tape. An important aspect arises in the focus of driving a story (narrative) and providing consumers a sense of control.

10 June 2019

Neuroscience Online

Neuroscience Online
Brain and Cognitive Sciences
Awesome Neuroscience

Deep Speech Processing

Tacotron
Tacotron 2
DeepSpeech
DeepSpeech 2
Wavenet
EdgeSpeechNets

Deep Audio Signal Processing (methods survey)

7 June 2019

Metacert

Ethnea Ethnicity Prediction

Ethnea

GeoPy

Nel Entity Linking

Nel

Chicksexer

Open Event Data Alliance

Open Event Data

Scrapely

Ahmia

ScrapingHub Splash

Splash

6 June 2019

Wikipedia List of ML Datasets

List of datasets for machine learning research

5 June 2019

Spark NLP

Spark NLP

Flink NLP or Beam NLP might be an alternative approach with Tensorflow

ReactVR

React-360

27 May 2019

Awesome Knowledge Graphs

Vapor

25 May 2019

Realm

Kitura

AllenNLP Models

24 May 2019

Recruitment Taxonomies and Vocabularies

ISCO
ESCO
O*Net
Nesta

23 May 2019

Financial Regulation Ontologies

Financial Requlation Ontologies

22 May 2019

Chatbot Libraries

DeepPavlov
Rasa
Lex
DialogFlow

21 May 2019

Qiskit

QuTiP

20 May 2019

GOT Countdown

16 May 2019

Lexicon Model for Ontologies

OntoLex

Industrial Data Science

Over the years, Data Science as a field has emerged to play a major pivotal role for many industry sectors providing avenues for analytical growth and insights towards more effective products and services. However, there are several glaring aspects of the field that is riddled with misconceptions and ineffective practices. Traditional Data Science was about Data Warehouses, relational way of thinking, Business Intelligence, and overfitted models. However, in the current landscape, Artificial Intelligence, as a discipline, is more about out-of-the-box style of thinking and is having an impact to Data Science practice. Data Engineering and Data Science functions tend to merge as one in AI practice. Relational Algebra is replaced with semantics and context via Knowledge Graphs that form the important metadata layer for a Linked Data Lake. While traditional Data Science relied fully on statistical methods, the new approaches rely on combining Machine Learning and Knowledge Representation and Reasoning approaches in a hybrid model for better Transfer Learning and generalizability. Deep Learning, which is a pure statistical method and a sub-field of Neural Networks, is by its very nature implemented as a set of distributed and probabilistic graphical models. It makes very little sense to split teams between Data Engineering and Data Science as the person building the model also has to think about scalability and performance. Invariably, splitting teams means duplication of work, communication issues, and degradation of output in production (when passed from Data Science to Data Engineering). In many AI domains, there is an inclination towards open box thinking about problems. In AI, only 30% of the effort is Machine Learning while the rest 70% is Computer Science principles and theory. An evidence of this can be seen in the Norvig book which is often the basis of many taught AI 101 courses. Often at universities, in advanced courses, they reluctantly forget to cover the entire Data Science method and only stress on Machine Learning and statistical methods, at exploratory stage, while forgetting the rest of the Computer Science concepts. As a result we see many Data Scientists with Phds that are ill-equipped to tackle practical business cases with productionization of their models against small and large datasets, with appropriate Feature Engineering for semantics, and the associated pipelining. Furthermore, at many institutions Feature Engineering is often skipped entirely which is really 70% of the Data Science method, and possibly the most important stage of the process. Invariably, this Feature Engineering step is partially transferred over as part of the Data Engineering function. One needs to wonder why the Data Scientist is only doing 30% off the work from the Data Science method even after holding a Phd, while passing the reminder of the hard work to the Data Engineer comprising as part of the formal ETL process. The whole point of a Knowledge Graph is really to add the value of semantics and context to your data, and moving towards information and knowledge. This becomes a very important aspect to not only Feature Engineering but a feedback mechanism where one can cyclically improve the model learning while allowing the model to improve on the semantics in a semi-supervised manner. The Knowledge Graph also enables natural language queries, making the data available to the entire organization. No longer the need to hire specialists who understand SQL in order to produce Business Intelligence reports for the business. The whole point is to make data available and accessible for the entire organization while also increasing efficiencies as well as enabling a manageable way of attaining trust through centralized governance and provenance of the data. Thus, enabling data to adapt to the organizational needs and not the organization having to adjust resources for the needs of working with the data. There needs to be a shift in the way many organizations build Data Science teams, how the subject matter is taught at universities, as well as how they architecture for AI transformational solutions. Although, Deep Learning is good at representation learning, it initially requires a large amount of training data. Where large amount of training data is lacking one can rely on semantic Knowledge Graphs, human input, and clustering techniques to get further with Data Science executions which in the long-term will have a far greater benefits to an organization. Many organizations seem to ignore the value of metadata at the start and with the growth of data adds to the complexity and its many challenges for integration. Why must we always push for only statistical methods if many of the direct value can be attained through inference over semantic metadata or a combination of both approaches. By nature for humans probability is unintuitive. When does the average human ever think in statistics when they go about their daily lives from traveling to work to buying groceries at a supermarket to talking to a colleague on the phone - hardly ever. And, yet, an average human is still smarter in many respects, across domains of understanding and adaptability through transfer learning and semantic associations, compared to the most sophisticated Machine Learning algorithm that can be trained to be good at a particular task. However, when the human Data Scientist arrives at work they reduce the scope of the business problem-solution case to a mere statistically derived methods. If the AI is to move forward we must think beyond statistical methods of thinking through complex business cases, flexible semantics, and take more inspiration from the human mind for all the things that we already take for granted in our daily lives that machines still find significantly complex to understand, adapt, and learn.

13 May 2019

Open BioMedical Ontologies

OBO

Tomplay

10 May 2019

AR Tools

ARKit
ARCore
ARgon
Wikitude
Vuforia
Maxst
DeepAR
EasyAR
ARToolkit
XZimg

VR Tools

Google Tiltbrush
Medium
Unbound
Quill
Google Blocks
VRTK

7 May 2019

Contextual Signals of Measurement

Interest
Intent
Impact
Influence

Holdings Channel

Dividend

6 May 2019

Five Rules of Computer Input Popularity

Cheap
Reliable
Comfortable
Software that makes use of it
Acceptable user error rate

Complexity Explorer

3 May 2019

Top Beaches UK

2 May 2019

Go Libraries

Go Libraries for ML and NLP
Go NLP
GO ML
Awesome Go

1 May 2019

Quantum Computing

Awesome Quantum Computing
Awesome Quantum Computing 2

Solid

Solid
w3c Solid

29 April 2019

Legal Ontologies

Legal Norms = LegalRuleML, NRV
Policies = ODRL, LDR
Licenses = CC, L4LOD
Legal Doc Index = Eurovoc, ELI
Privacy GDPR = GDPRtEXT
Tenders and Procurement = LOTED2, PPROC

OpenLaws
Lynx

26 April 2019

Blockchain

HyperLedger - B2B
Ethereum - B2C

25 April 2019

Serverless Framework

18 April 2019

ETF

Just ETF

8 April 2019

Anago

Flair

7 April 2019

Auth0

Initializr

5 April 2019

OpenNMT

Marian

OpenGraph

29 March 2019

AI Ethics

There are a lot of people claiming to be AI ethics experts, but the field is only just emerging. In mainstream, the topic has only been around for a short while. So, how can someone be an expert in it as there is still lots of unanswered research questions in area?

How can one focus on ethics without also focusing on morals? Morals are the basis of ethics?
Does ethics in AI, intrinsically, have a universal equivalence? i.e something that is codified as ethical in west may not be sufficiently compatible for the east. What attributes in ethics form for-all-there-exists vs for-some-there-exists as an existential quantification?
How can one control the abuse and falsely manipulated justification of AI ethics? i.e someone trying to drive political/cultural change/influence in a society/organization using AI ethics?
How do you make sure that the people in control of ethics, who are by their own accounts calling themselves as ethical experts, are in fact ethical? AI is only as ethical as the human that programmed it? Can the codification of AI ethics be programmed to mutate as defined by the environment and changing norms of society? In so doing, allowing the AI agent to question the ethical and moral dilemmas for/against humans?
If one builds a moral reasoner in horn clauses, can such reasoning then genetically mutate for ethics, on a case by case basis, for conditioning of an AI agent? Can AI agents be influenced by other AI agents, like in a multiagent distributed system - argumentation via game theory, reinforcement policies, towards mediation and consensus?
Can ethics and morals be defined in a semantically equivalent language?
If one defines horn clauses for moral reasoning and a set of ethical rules, can such moral/ethical conundrums be further defined using markov decision processes, in form of neural network, for any and all states as a good enough coverage for a global search space that can be further reasoned over with transfer learning?
How do you resolve human bias in a so called AI ethics expert?
Who defines what is ethical and moral for AI? Is there an agreed gold standard of measure?

In general, a moral person wants to do the right thing with a moral impulse that drives the best intentions. Morals define our principles. While ethics tend to be more practical towards a set of codified rules that define our actions and behaviors. Although, the two concepts are similar, they are not interchangeable nor aligned in every case. Ethics are not always moral. While a moral action can also be unethical.

AI Ethics Lab

19 March 2019

Audio Event Classification Datasets

AudioSet
UrbanSounds
Mivia
Spoken Digit
FMA
Million Song
Freesound
DCase Acoustic Scene
Voxceleb
Spoken Wikipedia
LibriSpeech
Flickr Audio Captions

17 March 2019

Endler Types

Swamp River Aquatics

Endler Types

12 February 2019

DeepHack

26 January 2019

Corpus Tools

Sketch and NoSketch Engine

Sketch-NoSketch

22 January 2019

Huginn

19 January 2019

Cocktails

List of IBA official cocktails
List of Cocktails
Mixolopedia

17 January 2019

NLP Games with Purpose

Games with a purpose are essentially types of games applied to annotations in NLP to make the process fun for the oracle (annotator), often in a crowdsourced manner. A few examples in context are listed below:

Phrase Detective
Sentiment Quiz
Guess What
ESP Game

Active Learning

Active Learning Approaches:

Pool-based Sampling
Member Query Synthesis
Stream-based Selective Sampling
Uncertainty Sampling
Query-by-Committee
Expected Model Change
Expected Error Reduction
Variance Reduction
Density-Weighted Methods
Query from Diverse Subspaces
Exponential Gradient Exploration
Balance Exploration and Exploitation

The Language Grid

16 January 2019

TimeML

TimeML
TimeBank
TARSQI
TempEval

13 January 2019

Data Science Methods

Generalizable Method (your mileage may vary, given business case and time constraints):

Identify and understand business case (as a use case or story) - in most cases you are not provided a translated use case or story so it is really about understanding the problem
Explore and prototype including background research (exploratory stage)
Identify cases for reuse
Identify whether this story even requires a model
Identify relevant datasets - curation
Visualize the data (how sparse/dense/dirty it is, multiple open source tools available for refinement steps for features, identify additional effort necessary for model build)
Identify the relevant variances and biases (will the model steps lead to an overfitting or underfitting - the objective is to build a generalizable model)
Feature Selection/Extraction (may use other ML or natural computation techniques here)
Feature engineering (this may also include curation/enrichment of metadata)
Feature re-engineering (this may also include curation/enrichment of metadata)
Identify the simplest solution that is possible
Identify the reasoning of using a complex solution
Custom model to solve the business case (do not just copy model out of a research paper - this is what the exploratory stage was for)
Evaluation and Benchmarking (formal tests may/may not be used, depends on business case)
How well does the model scope against small data and large data - identify sufficiency at average and worst time/cost
Re-Tune/Rinse/Repeat
Incrementally improve the model
Incrementally optimize/scale the model (scale only when necessary)
One simple one complex - one that is sub-par, and one that is riskier
Evaluation and Documentation
Pipeline the Solution in Dev-mode (Identify bottlenecks with the model - dry run/end-2-end for integration - at this stage a repeatable build/test/deploy/evaluate cycle may be used - DS/DE)
A/B/N/Bandit Testing in Stage (generally this stage is covered by the product team, alongside automated acceptance tests, if they know the techniques, or DS/DE maybe involved)
Release/Integrate for Production (depends whether this is a B2B or B2C case, or beta mode)
Storytelling (how well does the model answer/solve the question or problem statement - 'through the looking glass’ - refers to both dev, stage/prod cases)
User/Stakeholder/Client Feedback (Rinse & Repeat, depending on B2B or B2C cases)
Incremental Analysis and Review of Models
Rinse & Repeat (some of the steps above repeated multiple times before production release)

Process Flow:

R&D → Dev/UI/UX → Prod

Generally, with a heavy R&D/Backend focused team, the features and functionality tend to be dictated by the forward flow (Bottom-up approach), most AI projects at startups tend to be built that way. The frontend then becomes a thin client as a view to the world for assimilation of the backend efforts, typical pattern tends to be an informational dashboard for storytelling- 'through the looking glass'. This is because, in a top-down approach many of the backend efforts would get lost in translation (equally, in some business cases it may work better).

Data → Information → Knowledge

State-of-the-Art may not imply state-of-the-art for your business case and may in fact lead to a sub-optimal results and more effort. It is all very subjective, depends on the data, the associated features for training a model, and the business case you are trying to solve. Work towards least effort, mostly efficient or sensible outcome.

Subscribe to: Posts ( Atom )