31 December 2019

NLP for Hackers

NLP for Hackers Libraries Roundup







Why DataRobot Sucks

  • Limited machine learning algorithms and limited ways to optimize them for use cases
  • Limited flexibility of supporting data features
  • Limited Sub-selection model results from the limited choice of models
  • Ugly visualization for performance and comparisons
  • The blender option for ensemble methods and choices is limited
  • To do anything complex it is too limited
  • Doesn't replace the value of handcrafted models and the value of feature engineering process
  • Why automate feature engineering especially when the process is what allows one to build a generalizable model and with a better understanding of the business cases
  • Limited ways to evaluate the choice of model
  • Limited ways of import/export model for build/deployment
  • Tight coupling to a third-party way of doing things
  • Doesn't follow the formal data science method
  • In fact, doesn't even replace 99% of the work
  • GDPR and governance issues when processing data through the third-party models
  • Benchmarks on models are not available for public peer review
  • Cost far exceeds the benefit
  • Many of the models provided have less than optimal outcomes - low confidence scores
  • Not good for complicated analysis, best to keep use cases as simple as possible
  • Productivity gain is only with very limited and simple cases
  • Invariably, most business cases have noisy data and require custom models where it will be unworkable and useless to work against complexity and ambiguity
  • One needs to learn another third-party tool and be willing to trust the model solution blindly
  • Quick wins and successes is not the answer nor the solution if it does not solve the business case
  • The reality is there is no free or easy lunch for most business cases and one has to put in the time and effort
  • No need to cheat one's way through the process
  • Unconvincing autopilot feature - there is no such thing as autopilot in machine learning in fact there is no real community standards or even patterns defined so how can one jump decades in progress
  • Solution is useful to people that can't code, don't like to code, don't understand the data science method, and prefer simple drag and drop options
  • Data input is treated in most cases as a table, not workable for noisy unstructured data
  • Often will end up with overfitted models when the whole point of machine learning is to build generalizable models
  • No flexible options for transfer learning on unseen data
  • And, that isn't even the end of it...

21 December 2019

Recruitment Agencies

There are some glaring practices in recruitment industry. Some agencies actively practice reverse discrimination by only hiring women for roles such as nurse, personal assistant, receptionist, secretary, catwalk model, and others. However, if the tables were turned and if an agency were to only hire men there would likely be an uproar. This could equally be construed as sexism on part of the organizations that these recruitment agencies are working with to fill such roles. In a society where there is now apparently 100+ genders, how does one even know that the person is a man or a woman anymore? Would a transwoman/transsexual be rejected from such agency for employment? And, should one even care what gender they are? Institutional racism is also a major issue where the recruiter tends to pick candidates based on their inherent bias. Often under the covers the practice continues in some companies in form of cultural fit as an umbrella term. In fact, it doesn't just stop at gender and race. In many cases, it emerges there are all forms of hypocrisy that makes recruitment a very humanly flawed discipline. Also, how can a recruiter profile a candidate purely on basis of a phone and email conversation? And, it is practically impossible for a recruiter to be fully aware of all the skills listed on a person's resume. Such practices have a very dampening effect on the economic demand and supply for jobs and candidates in the market. AI certainly is the way to go to avoid such recruitment risks by retargeting on the things that really matter in employment which is more around the fairness for qualifications, practical experience, and skills to conduct one's job rather than to pass a judgement on their likability factor. However, an important aspect here is that AI should not only be approached via probabilistic methods. Just imagine what would happen if one were to probabilistically cluster bias based on someone's name in order to identify their ethnicity. Can you really tell someone's background based on their name alone? What if they are of mixed background? Won't they be an outlier? AI combines both logical reasoning as well as probabilistic methods.

10 December 2019

Reality of DevOps

There are some typical attitudes that emerge among devops engineers in many organizations. Partly, it is as a result of dealing with a lot of clueless employees. And, partly, because they seem to think they are the most important group in the whole organization so develop an air of superiority while giving other business groups an impression that everything is way too complicated for people to understand. Let's face it devops is not exactly rocket science. It is a merge of two interdependent functions - development and operations. The following are some typical examples of what happens when people in the workplace interact with devops.

  • They make changes without informing others, and without providing sufficient advanced notification of a planned outage
  • They will tell you something is too complicated, when it's a simple case of drag/drop or window click
  • They will over-exaggerate on the time it will take them to get something done
  • They will rarely get it done within the agreed timescales but expect unforeseen/unplanned delays
  • They will expect you to log a ticket for everything even if logging a ticket takes more time than to do the work
  • They will reject everything you say, then they will say exactly what you had said as if to make out what they said was quite deep
  • They will use methods and tools only they can understand and with little to no shared documentation
  • They will reject the way you had done something, then approach it exactly the same way
  • They will look down at you for everything even while sitting in a meeting, explaining anything is complete chore for them, but they will expect you to have the highest level of communication skills and patience with them
  • When you explain something simple to them expect to explain it in french, english, swahili, plenty of other languages, and still expect them to not understand any of it, even while everyone else in room likely understood it in full
  • Sometimes your login accesses might be randomly revoked and then randomly start working again, this isn't magic but the devops are likely doing their usual unplanned maintenance or config changes
  • When you can't ssh into a server because the devops have randomly destroyed your instance, during their regular cleanup sessions, and you have to start all over again with your work
  • When they never acknowledge their mistakes that lead to critical outage issues in production
  • When the wrong model or code is deployed into production
  • Using too many rigid processes and creating barriers between themselves and the user
  • Using metaphors of oversimplification 
  • Not understanding the value of automation
  • Poor methods for testing what they deliver to business
  • Lack of a formal architecture evaluation
  • Badly managed incident management
  • Lack of discipline for conducting effective, sensible, and responsive postmortem evaluations
  • Misuse of metrics that don't display the full picture and cost businesses
  • They will tell you it will take at least a month to spin up and setup instances in cloud when it should probably only take them less them 2 minutes to maybe 2 days if they need to cluster and setup security groups
  • They will block you from doing anything yourself, or even to assist them to speed things up
  • Each member in a devops team almost works like in a silo of their own bubble of ego
  • Expect them never to be available when you have an outage situation so they can feel an air of importance

GCP Training

GCP Free Labs

AWS Training

AWS Training & Certs

Azure Training

Azure Learning Paths







5 November 2019

What is AI

What is AI?ThinkingActing
HumanlyCognitive ScienceTuring Test, Behaviorism
RationallyLaws of ThoughtDoing The Right Thing

2 November 2019

Ladder Ontologies

  • Asocial Ontologies
  • Social Ontologies
  • Cultural Ontologies
  • Oral Linguistic Ontologies
  • Literate Ontologies
  • Civilization-scale Ontologies

1 November 2019

Philosophy Ontology

Inpho Project

Java Demise

The speed with which new versions are being released every year spells the end of Java in the practical business world in the foreseeable future. There are two release schedules each year (every 6 months) which is significant. The biggest hurdle for businesses is maintenance and resources. There are many products that are still dependent on Java 8 and while there is a requirement for commercial licenses for upgrades since 2019. The other being technical debt and backwards compatibility constraints especially when the product is implemented in Java and then sold to customers. In a very short span of time there have been quite a few changes to the language and an ample set of versions. One can say that the Java release cycle has exploded in speed that the majority of the community for all practical intents and purposes will not be able to keep up. What this also means is that the ecosystem of tools and libraries take a while to upgrade making it a frustration in management for the engineering and the support teams. The Java ecosystem is huge, the fall back mechanism with lots of boilerplate code, formal testing processes from lack of design patterns baked into the language, and dependency hell is a massive hurdle with the language. It seems like gradually more and more organizations will distance themselves away from Java in order to keep maintenance costs down, meet customer expectations and demand for new product features as well as to reduce complexity especially in mobile and cloud environments. Another susceptible reason is the Oracle ownership of the language and the expectations provided in terms of license for the end user. Unfortunately, there is a love hate relationship for the language in the community. Even if the language were to reduce in interest in the community, it would still appear as the underdog from under the covers and rear its ugly head as a dependency for other languages like Groovy, Kotlin, and several open source Microservices and Big Data platforms. 



28 October 2019

Curse of Dimensionality

As you increase the number of input features, the combination of inputs can grow exponentially. As the combinations grows, each training sample covers a smaller percentage of possibilities. The result being, as you add features, you need to increase the size of your training set, which may be exponentially. As the number of dimensions goes up, the model must train on significantly more data in order to learn an accurate representation of the input space. 

16 October 2019



Human Activity Recognition

Human Activity Recognition















ACL Anthology

ACL Anthology

ICML Papers

ICML 2019
ICML 2018
ICML 2017
ICML 2016



1 October 2019

Marketing Mix

  • Packaging
  • Partnership
  • Passion
  • Penetration
  • People
  • Perception
  • Personality
  • Persuasion
  • Phrases
  • Physical
  • Place
  • Placement
  • Planning
  • Popularity
  • Population
  • Positioning
  • Positiveness
  • Power
  • Pragmatism
  • Preference
  • Price
  • Privacy
  • Process
  • Product
  • Productivity
  • Professionalism
  • Profit
  • Promotion
  • Prospect
  • Publicity
  • Purchase
  • Push-Pull
  • Picture
  • Part
  • Pilot
  • Persona
  • Peers
  • Pass-Along-Value
  • Party
  • Part
  • Pandemic
  • Pain
  • Placebo
  • Planting
  • Playfulness
  • Pleasure
  • Plot
  • Politics
  • Porn
  • Praise
  • Prediction
  • Premeditation
  • Press
  • Pressure
  • Preview
  • Priest
  • Prince
  • Principle
  • Prominence
  • Promise
  • Proof
  • Properties
  • Prosperous
  • Protection
  • Purple Cow
  • Purpose
  • Production

Medical Codes

ICD-10 - Diagnosis
CPT - Procedures
LOINIC - Laboratory
RxNorms - Medications
ICF - Disabilities
CDT - Dentistry Procedures
DSM-IV-TR - Psychiatric Illnesses
NDC - Drugs
DRG - Diagnosis Group
HCPC - Procedures

Survey of Embeddings Use Cases for Clinical Healthcare 

W3C Events 2019

W3C Events 2019

30 September 2019

Programming Paradigms

There are various programming paradigms that have come about in computer science. However, none has replicated the abstractions of philosophic logic in their entirety to leverage the full capacity for artificial intelligence. Building programs that replicate philosophic constructs might be a way of leveraging better abstractions for artificial intelligence programs as well as reasoning systems. The following are some constructs derived for abstractions and concreteness that could be baked in or extended as part of the evolution of programming languages to converge the mind and the mechanisms of machines:
  • Class
  • Object
  • Concept
  • Thing
  • Predicate
  • Actor
  • Agent
  • Subject
  • Notion
  • Standard
  • Data
  • Thought
  • Idea
  • Category
  • Being
  • Action
  • Intent
  • Event
  • Belief
  • Desire
  • Message
  • Observer
  • Axiom
  • Restrict
  • Rule
  • Function
  • Relation
  • Attribute
  • Instance

18 September 2019

Classification (Binary, Multi-Class, Multi-Label)

Binary Classifier - choose single category for an object from two categories

Multi-Class Classifier - choose single category for an object from multiple categories

Multi-Label Classifier - choose as many categories as applicable for the same object

19 August 2019



Applications of Data Science

  • Anomaly Detection
  • Assistive Services
  • Auto-Insurance Risk Prediction
  • Automated Closed Captioning
  • Automated Image Captioning
  • Automated Investing
  • Autonomous Ships
  • Brain Mapping
  • Caller Identification
  • Cancer Diagnosis/Treatment
  • Carbon Emissions Reduction
  • Classifying Handwriting
  • Computer Vision
  • Credit Scoring
  • Crime: Predicting Locations
  • Crime: Predicting Recidivism
  • Crime: Predicting Policing
  • Crime: Prevention
  • CRISPR Gene Editing
  • Crop-Yield Improvement
  • Customer Churn
  • Customer Experience
  • Customer Retention
  • Customer Satisfaction
  • Customer Service
  • Customer Service Agents
  • Customized Diets
  • Cybersecurity
  • Data Mining
  • Data Visualization
  • Detecting New Viruses
  • Diagnosing Breast Cancer
  • Diagnosing Heart Disease
  • Diagnostic Medicine
  • Disaster-Victim Identification
  • Drones
  • Dynamic Driving Routes
  • Dynamic Pricing
  • Electronic Health Records
  • Emotion Detection
  • Energy-Consumption Reduction
  • Facial Recognition
  • Fitness Tracking
  • Fraud Detection
  • Game Playing
  • Genomics and Healthcare
  • Geographic Information Systems
  • GPS Systems
  • Health Outcome Improvement
  • Hospital Readmission Reduction
  • Human Genome Sequencing
  • Identity-Theft Prevention
  • Immunotherapy
  • Insurance Pricing
  • Intelligent Assistants
  • Internet of Things and Medical Device Monitoring
  • Internet of Things and Weather Forecasting
  • Inventory Control
  • Language Translation
  • Location-Based Services
  • Loyalty Programs
  • Malware Detection
  • Mapping
  • Marketing
  • Marketing Analytics
  • Music Generation
  • Natural Language Translation
  • New Pharmaceuticals
  • Opioid Abuse Prevention
  • Personal Assistants
  • Personalized Medicine
  • Personalized Shopping
  • Phishing Elimination
  • Pollution Reduction
  • Precision Medicine
  • Predicting Cancer Survival
  • Predicting Disease Outbreaks
  • Predicting Health Outcomes
  • Predicting Student Enrollments
  • Predicting Weather-Sensitive Product Sales
  • Predictive Analytics
  • Preventative Medicine
  • Preventing Disease Outbreaks
  • Reading Sign Language
  • Real-Estate Valuation
  • Recommendation Systems
  • Reducing Overbooking
  • Ride Sharing
  • Risk Minimization
  • Robo Financial Advisors
  • Security Enhancements
  • Self-Driving Cars
  • Sentiment Analysis
  • Sharing Economy
  • Similarity Detection
  • Smart Cities
  • Smart Homes
  • Smart Meters
  • Smart Thermostats
  • Smart Traffic Control
  • Social Analytics
  • Social Graph Analytics
  • Spam Detection
  • Spatial Data Analysis
  • Sports Recruiting and Coaching
  • Stock Market Forecasting
  • Student Performance Assessment
  • Summarizing Text
  • Telemedicine
  • Terrorist Attack Prevention
  • Theft Prevention
  • Travel Recommendations
  • Trend Spotting
  • Visual Product Search
  • Voice Recognition
  • Voice Search
  • Weather Forecasting





18 August 2019

Types of Data Discovery

  • CDR
  • Emails
  • ERP
  • Social Media
  • Web Logs
  • Server Logs
  • System Logs
  • HTML Pages
  • Sales
  • Photos
  • Videos
  • Audios
  • Tabulated
  • CRM
  • Transactions
  • XDR
  • Sensor Data
  • Call Center
  • Knowledge Bases
  • Google Search
  • Google Trends
  • News
  • Sanctions Data
  • Profile Data

8 August 2019

Quantum AI for Psychic Abilities

The 3 T's already are an active research area under teleportation, telepathy, and telekinesis. However, following psychic abilities could also come into the mix in AI:
  • Thoughtography - imprinting images in one's mind onto physical surfaces
  • Scrying - able to look into mediums to view and detect suitable information
  • Second Sight - able to see future and past events or perceive information (precognition)
  • Retrocognition - supernaturally perceive past events (postcognition)
  • Remote Viewing - able to see distant or unseen target with extrasensory perception
  • Pyrokinesis - able to manipulate fire through mind
  • Psychometry - able to get information about a person or object by touch
  • Psychic Surgery - able to remove disease or disorder within or over the body with energetic incision to heal the body
  • Prophecy - able to predict the future
  • Precognition - able to perceive future events
  • Mediumship - able to communicate with the spirit world
  • Levitation - able to float or fly by psychic means
  • Energy Medicine - able to heal one's own empathic etheric, astral, mental, or spiritual energy
  • Energy Manipulation - able to manipulate non-physical/physical energy with mind
  • Dowsing - able to locate water, gravesites, metals, and materials without scientific apparatus
  • Divination - able to gain insight into a situation
  • Conjuration - able to materialize physical objects from thin air
  • Clairvoyance - able to perceive people, objects, locations, or events through extrasensory perception
  • Clairsentience - able to perceive messages from emotions and feelings
  • Clairolfactance - able to perceive knowledge through smell
  • Clairgustance - able to perceive taste without physical contact
  • Claircognizance - able to perceive knowledge through intrinsic knowledge
  • Clairaudience - able to perceive knowledge through paranormal auditory means
  • Chronokinesis - able to alter perception of time causing sense of time to slow down or speed up
  • Biokinesis - able to change or control the DNA
  • Automatic Writing - able to draw or write without conscious intent
  • Aura Reading - able to perceive energy fields around people, places, and objects
  • Astral Projection - out-of-body experience or the voluntary projection of consciousness
  • Apportation - able to materialize, disappear, or teleport objects

7 August 2019

Drawbacks of Reinforcement Learning

  • Reproducibility
  • Resource Efficiency
  • Susceptibility to Attacks
  • Explainability/Accountability

Types of Filtering for Recommendations

  • Adaptive
  • Contextual (Context Similarity)
  • Cognitive (Personality/Behavior)
  • Content
    • Bayesian
    • Relevance Feedback
    • Evolutionary Computation
    • Deep Learning
  • Collaborative (Model vs Memory)
    • Matrix Factorization
    • Tensor Factorization
    • Clustering
    • SVD
    • Deep Learning
    • PCA
    • Pearson
    • Bayesian
    • Markov Decision Processes
  • Interest/Intent
    • Intent 
      • Search
    • Interest 
      • Content Consumption
  • Impact/Influence
    • Social Feedback  
      • Likes
      • Dislikes
      • Mentions
      • Shares
      • Subscribes
      • Hashtags
      • Emojis
      • Reviews
      • Comments
      • Trends
      • Endorsements
      • Opinions from Person of Influence
      • Associative Connections (Primary/Secondary)
      • Six-Degrees of Separation
  • Item-based
  • User-based
  • Personalization
  • Reinforcement Learning 
    • Reward
    • Optimization
    • Exploration/Exploitation
    • Competitive
    • Cooperative
  • Semantic (with a Knowledge Graph)
  • Demographic

Deep Learning Approaches for Recommendations:
  • Autoencoders
  • Neural Autoregressive Distribution Estimate
  • Convolutional Neural Networks
  • Recurrent Neural Network
  • Long Short Term Memory
  • Restricted Boltzmann Machine
  • Adversarial Network
  • Attentional Model
  • Multilayer Perceptron

28 July 2019

Cloud Providers

Most of Azure cloud service offerings are basically drop-in replacements for their biased standalone software tools. For Microsoft, it seems like Azure is an alternative way of vendor lock-in of the customer via the re-purposed cloud option which has so far proven to be useful through heavy gimmicky marketing. GCP, on the other hand, provides many alternatives for big data but with very ineffective pricing, lacking business critical reliability, security constraints, lots of options to re-invent the wheel with vendor lock-in, still limited in SQL use cases, and their limited over all services. AWS has proven to be a very effective pricing model as well as catering to a wide range of services to cover business needs including a strong reliability and flexible options for management of services. For most organizations, especially for data science work, AWS is the go to cloud solution. Azure and GCP still lag behind considerably in reliability, cloud service offerings, ineffective pricing, and the biggest concern being vendor lock-in. In many cases, cloud providers are limited by their mission statements of what they are trying to achieve through their solutions to businesses and their future progressive infrastructure development goals. For Microsoft, windows is the ultimate success story which one can see has evolved in parallel from Apple. But, linux has become the defacto operating system for the cloud and for obvious reasons. Data as a commodity is a valuable asset to most organizations. And, the management of risk in security and compliance is an enduring struggle for many organizations. Especially, in meeting GDPR compliance many organizations will want a transparent data lineage. Can one trust storage and processing of data on GCP? All Google services converge to some degree or another and get indexed by their search engine. Invariably, the cost and risk of using the third-party cloud infrastructure vs in-house infrastructure will always be a concern for companies to weigh out. It seems, in the long-run, organizations will take back control of their own data storage and processing needs. The peddles of trends are towards portable, smarter, and stackable private cloud ownerships, more flexibility in management of infrastructure, and with virtualization modes at an affordable cost. While start-ups may find it easier with reducing setup costs by leveraging third-party infrastructure. As companies grow with their market value of their products, they may increase their independence by eventually moving away from the third-party cloud dependency to their own in-house converged infrastructure allowing for greater flexibility to meet consumer expectations and the demands of their product services - enterprise enablement drives creative and profitable growth. 

24 July 2019

Everyday Robots

Robots, over the years, have proven themselves as worthy candidates for replacing the manual mundane labor intensive work for humans both for commercial and home. Not only do robots work more effectively, they are also extremely productive. In general, robots can be applied to most specialist labor so they can be trained to be good at a particular aspect of work. But, they may not be sufficiently capable yet to do multiple things through adaptability in multi-class transfer learning. The following highlight some examples of robot use cases.

  • automotive breakdown repair man/woman
  • home and office cleaner
  • rubbish disposal
  • grocery shopper
  • home and office security officer/inspector
  • laundry service
  • cook (chef)
  • critic / reviewer
  • gardener
  • table setter
  • mechanical turk
  • post man/woman
  • babysitter
  • mystery shopper
  • chauffeur
  • home and office mover
  • handy man/woman
  • telephone/broadband installer/repair man/woman
  • call centre agent
  • lollypop man/woman
  • school teacher
  • office secretary
  • family mediator
  • office mediator
  • crop duster
  • nursing home nurse
  • doctor and nurse
  • nanny
  • lawyer
  • accountant
  • assembly line worker
  • dentist
  • data entry clerk
  • journalist
  • financial analyst
  • comedian
  • musician
  • artist
  • telemarketer
  • paramedic
  • commercial and defence pilots
  • public transport worker
  • rail repair
  • air traffic controller
  • land traffic controller
  • sea traffic controller
  • metrologist
  • kitchen porter
  • crop pickers
  • police man/woman
  • fire man/woman
  • immigration/border controller
  • politician
  • director
  • photographer
  • creative writer
  • curator
  • cheerleader
  • gamer
  • construction worker
  • programmer
  • logging worker
  • fisher man/woman
  • steel worker
  • street sweeper
  • refuse collector
  • carpenter
  • stunt man/woman
  • courier
  • wrestler
  • boxer
  • sports man/woman
  • recycle waste worker
  • power worker
  • farmer
  • roofer
  • astronaut
  • army & military officer
  • bodyguard
  • slaughterhouse worker
  • mechanic
  • metalcrafter
  • search & rescue
  • special forces (SAS, Delta Force, Seal, etc)
  • sanitation worker
  • land mine remover
  • miner
  • bush pilot
  • lumberjack
  • librarian
  • human resources assistant
  • salesman
  • editor
  • dance instructor
  • bus conductor
  • tourist guide
  • stewardess
  • cashier
  • store replenisher
  • data center operator
  • taxi cab driver
  • train driver
  • lorry driver
  • customer service advisor
  • electrician
  • vehicle washer
  • bed maker
  • bathroom cleaner
  • pet walker
  • oilfield driver
  • derrick hand
  • roustabout
  • offshore diver
  • rodent killer
  • insect killer
  • therapist
  • architect
  • actor
  • backup singer
  • backup dancer
  • house builder
  • waiter
  • presenter
  • manager
  • hacker
  • stripper (exotic dancer)
  • sex worker
  • hairdresser
  • makeup artist
  • fashion designer
  • cameraman
  • researcher
  • chemist
  • pharmacist
  • landscapist
  • baker
  • ship builder
  • car maker
  • broadcast technician
  • hotel helpdesk
  • store helpdesk
  • mall helpdesk
  • site assistant
  • tailor
  • tutor
  • pet trainer
  • cartoonist
  • reporter
  • moderator
  • painter
  • plumber
  • auditor
  • financial trader
  • financial broker
  • financial advisor
  • compliance advisor
  • fraud advisor
  • risk advisor
  • surveillance agent
  • social media agent
  • bricklayer
  • choreographer
  • actuarian
  • physiotherapist
  • tea/coffee maker
  • pizza maker
  • burger maker
  • welder
  • surveyor
  • surgeon
  • glazier
  • tiler
  • stonemason
  • optician
  • tool maker
  • artisan
  • sonographer
  • radio technician
  • sports coach
  • bartender / barmaid
  • bellboy
  • paperboy
  • drain inspector
  • pet feeder

13 July 2019

Lucid Pipeline

Most AI solutions can be built as pipelined implementations with various sources to sinks from a set of generalizable models. Invariably, knowledge graph will act as a key layer for evolvable feature engineering that can be translated into ontological vectors and fed into AI models. Split the pipeline as a lucid funnel, lucid reactor, lucid ventshaft, and lucid refinery using a loose analogy of a distillation process. The following components highlight the key abstractions:

AI/DS Engine Layers:

  • Disc (frontends - discovery/visualization layer)
  • Docs (live specs via swagger, etc - documentation layer)
  • APIs (proxy/gateway services connected with elasticsearch or solr - application layer)
  • DS (models and semantics - AI layer)
  • Eval (benchmarks, workbench and metrics - evaluation layer)
  • Human (optional human in the loop - human/annotation layer)
  • Tests (load, unit, uat, service, etc - testing layer)
  • Funnel (ingestion, pre-process, post-process layer using brokers like Kafka/Kinesis)
  • Reactor (reactive processes - workflow/transformational layer - via Spark, Beam, Flink, Dask, etc)
  • Ventshaft (filter, match, fuzzy, distance, probabilistic, relational - functional layer)
  • Refinery  (context types, objects, attributes and methods as blueprints - entity/object layer)
  • Datapiles (indexed data sources as services for document/column/graph stores - data access layer)
  • Conf (environment configurations for nginx, etc - configuration layer)
  • Cloud (connected services for AWS/GCP orchestration - infrastructure/platform layer)

11 June 2019

Sports Extended Reality

  • Moment is everything (pre-event, event, post-event)
  • Nothing is actually live
  • Flat images and sense of presence
  • 20/80 rule
  • Time matters
Attributes of a sports experience:
  • Hawkeye cameras
  • Replays
  • Live scores
  • Clock graphics
  • Broadcast video + audio (including commentary)
  • Live text updates
  • Fast cuts
  • Music
  • High-paced graphics
  • Social media interaction (optional)
Live is a belief, an illusion in sports. There are three stages of live sports: Live, Live Live, Live-To-Record or Live-To-Tape. An important aspect arises in the focus of driving a story (narrative) and providing consumers a sense of control.

16 May 2019

Lexicon Model for Ontologies


Industrial Data Science

Over the years, Data Science as a field has emerged to play a major pivotal role for many industry sectors providing avenues for analytical growth and insights towards more effective products and services. However, there are several glaring aspects of the field that is riddled with misconceptions and ineffective practices. Traditional Data Science was about Data Warehouses, relational way of thinking, Business Intelligence, and overfitted models. However,  in the current landscape, Artificial Intelligence, as a discipline, is more about out-of-the-box style of thinking and is having an impact to Data Science practice. Data Engineering and Data Science functions tend to merge as one in AI practice. Relational Algebra is replaced with semantics and context via Knowledge Graphs that form the important metadata layer for a Linked Data Lake. While traditional Data Science relied fully on statistical methods, the new approaches rely on combining Machine Learning and Knowledge Representation and Reasoning approaches in a hybrid model for better Transfer Learning and generalizability. Deep Learning, which is a pure statistical method and a sub-field of Neural Networks, is by its very nature implemented as a set of distributed and probabilistic graphical models. It makes very little sense to split teams between Data Engineering and Data Science as the person building the model also has to think about scalability and performance. Invariably, splitting teams means duplication of work, communication issues, and degradation of output in production (when passed from Data Science to Data Engineering). In many AI domains, there is an inclination towards open box thinking about problems. In AI, only 30% of the effort is Machine Learning while the rest 70% is Computer Science principles and theory. An evidence of this can be seen in the Norvig book which is often the basis of many taught AI 101 courses. Often at universities, in advanced courses, they reluctantly forget to cover the entire Data Science method and only stress on Machine Learning and statistical methods, at exploratory stage, while forgetting the rest of the Computer Science concepts. As a result we see many Data Scientists with Phds that are ill-equipped to tackle practical business cases with productionization of their models against small and large datasets, with appropriate Feature Engineering for semantics, and the associated pipelining. Furthermore, at many institutions Feature Engineering is often skipped entirely which is really 70% of the Data Science method, and possibly the most important stage of the process. Invariably, this Feature Engineering step is partially transferred over as part of the Data Engineering function. One needs to wonder why the Data Scientist is only doing 30% off the work from the Data Science method even after holding a Phd, while passing the reminder of the hard work to the Data Engineer comprising as part of the formal ETL process. The whole point of a Knowledge Graph is really to add the value of semantics and context to your data, and moving towards information and knowledge. This becomes a very important aspect to not only Feature Engineering but a feedback mechanism where one can cyclically improve the model learning while allowing the model to improve on the semantics in a semi-supervised manner. The Knowledge Graph also enables natural language queries, making the data available to the entire organization. No longer the need to hire specialists who understand SQL in order to produce Business Intelligence reports for the business. The whole point is to make data available and accessible for the entire organization while also increasing efficiencies as well as enabling a manageable way of attaining trust through centralized governance and provenance of the data. Thus, enabling data to adapt to the organizational needs and not the organization having to adjust resources for the needs of working with the data. There needs to be a shift in the way many organizations build Data Science teams, how the subject matter is taught at universities, as well as how they architecture for AI transformational solutions. Although, Deep Learning is good at representation learning, it initially requires a large amount of training data. Where large amount of training data is lacking one can rely on semantic Knowledge Graphs, human input, and clustering techniques to get further with Data Science executions which in the long-term will have a far greater benefits to an organization. Many organizations seem to ignore the value of metadata at the start and with the growth of data adds to the complexity and its many challenges for integration. Why must we always push for only statistical methods if many of the direct value can be attained through inference over semantic metadata or a combination of both approaches. By nature for humans probability is unintuitive. When does the average human ever think in statistics when they go about their daily lives from traveling to work to buying groceries at a supermarket to talking to a colleague on the phone - hardly ever. And, yet, an average human is still smarter in many respects, across domains of understanding and adaptability through transfer learning and semantic associations, compared to the most sophisticated Machine Learning algorithm that can be trained to be good at a particular task. However, when the human Data Scientist arrives at work they reduce the scope of the business problem-solution case to a mere statistically derived methods. If the AI is to move forward we must think beyond statistical methods of thinking through complex business cases, flexible semantics, and take more inspiration from the human mind for all the things that we already take for granted in our daily lives that machines still find significantly complex to understand, adapt, and learn.

29 April 2019

29 March 2019

AI Ethics

There are a lot of people claiming to be AI ethics experts, but the field is only just emerging. In mainstream, the topic has only been around for a short while. So, how can someone be an expert in it as there is still lots of unanswered research questions in area?

  • How can one focus on ethics without also focusing on morals? Morals are the basis of ethics?
  • Does ethics in AI, intrinsically, have a universal equivalence? i.e something that is codified as ethical in west may not be sufficiently compatible for the east. What attributes in ethics form for-all-there-exists vs for-some-there-exists as an existential quantification?
  • How can one control the abuse and falsely manipulated justification of AI ethics? i.e someone trying to drive political/cultural change/influence in a society/organization using AI ethics?
  • How do you make sure that the people in control of ethics, who are by their own accounts calling themselves as ethical experts, are in fact ethical? AI is only as ethical as the human that programmed it? Can the codification of AI ethics be programmed to mutate as defined by the environment and changing norms of society? In so doing, allowing the AI agent to question the ethical and moral dilemmas for/against humans?
  • If one builds a moral reasoner in horn clauses, can such reasoning then genetically mutate for ethics, on a case by case basis, for conditioning of an AI agent? Can AI agents be influenced by other AI agents, like in a multiagent distributed system - argumentation via game theory, reinforcement policies, towards mediation and consensus?
  • Can ethics and morals be defined in a semantically equivalent language?
  • If one defines horn clauses for moral reasoning and a set of ethical rules, can such moral/ethical conundrums be further defined using markov decision processes, in form of neural network, for any and all states as a good enough coverage for a global search space that can be further reasoned over with transfer learning?
  • How do you resolve human bias in a so called AI ethics expert?
  • Who defines what is ethical and moral for AI? Is there an agreed gold standard of measure?

In general, a moral person wants to do the right thing with a moral impulse that drives the best intentions. Morals define our principles. While ethics tend to be more practical towards a set of codified rules that define our actions and behaviors. Although, the two concepts are similar, they are not interchangeable nor aligned in every case. Ethics are not always moral. While a moral action can also be unethical.

AI Ethics Lab 

17 January 2019

NLP Games with Purpose

Games with a purpose are essentially types of games applied to annotations in NLP to make the process fun for the oracle (annotator), often in a crowdsourced manner. A few examples in context are listed below:

  • Phrase Detective
  • Sentiment Quiz
  • Guess What
  • ESP Game

Active Learning

Active Learning Approaches:
  • Pool-based Sampling
  • Member Query Synthesis
  • Stream-based Selective Sampling
  • Uncertainty Sampling
  • Query-by-Committee
  • Expected Model Change
  • Expected Error Reduction
  • Variance Reduction
  • Density-Weighted Methods
  • Query from Diverse Subspaces
  • Exponential Gradient Exploration
  • Balance Exploration and Exploitation

The Language Grid

The Language Grid

13 January 2019

Data Science Methods

Generalizable Method (your mileage may vary, given business case and time constraints):
  • Identify and understand business case (as a use case or story) - in most cases you are not provided a translated use case or story so it is really about understanding the problem 
  • Explore and prototype including background research (exploratory stage)
  • Identify cases for reuse
  • Identify whether this story even requires a model
  • Identify relevant datasets - curation
  • Visualize the data (how sparse/dense/dirty it is, multiple open source tools available for refinement steps for features, identify additional effort necessary for model build)
  • Identify the relevant variances and biases (will the model steps lead to an overfitting or underfitting - the objective is to build a generalizable model)
  • Feature Selection/Extraction (may use other ML or natural computation techniques here)
  • Feature engineering (this may also include curation/enrichment of metadata)
  • Feature re-engineering (this may also include curation/enrichment of metadata)
  • Identify the simplest solution that is possible
  • Identify the reasoning of using a complex solution
  • Custom model to solve the business case (do not just copy model out of a research paper - this is what the exploratory stage was for)
  • Evaluation and Benchmarking (formal tests may/may not be used, depends on business case)
  • How well does the model scope against small data and large data - identify sufficiency at average and worst time/cost
  • Re-Tune/Rinse/Repeat
  • Incrementally improve the model
  • Incrementally optimize/scale the model (scale only when necessary)
  • One simple one complex - one that is sub-par, and one that is riskier
  • Evaluation and Documentation
  • Pipeline the Solution in Dev-mode (Identify bottlenecks with the model - dry run/end-2-end for integration - at this stage a repeatable build/test/deploy/evaluate cycle may be used - DS/DE)
  • A/B/N/Bandit Testing in Stage (generally this stage is covered by the product team, alongside automated acceptance tests, if they know the techniques, or DS/DE maybe involved)
  • Release/Integrate for Production (depends whether this is a B2B or B2C case, or beta mode)
  • Storytelling (how well does the model answer/solve the question or problem statement - 'through the looking glass’ - refers to both dev, stage/prod cases)
  • User/Stakeholder/Client Feedback (Rinse & Repeat, depending on B2B or B2C cases)
  • Incremental Analysis and Review of Models
  • Rinse & Repeat (some of the steps above repeated multiple times before production release)

Process Flow:

R&D → Dev/UI/UX → Prod

Generally, with a heavy R&D/Backend focused team, the features and functionality tend to be dictated by the forward flow (Bottom-up approach), most AI projects at startups tend to be built that way. The frontend then becomes a thin client as a view to the world for assimilation of the backend efforts, typical pattern tends to be an informational dashboard for storytelling- 'through the looking glass'. This is because, in a top-down approach many of the backend efforts would get lost in translation (equally, in some business cases it may work better).

Data → Information → Knowledge

State-of-the-Art may not imply state-of-the-art for your business case and may in fact lead to a sub-optimal results and more effort. It is all very subjective, depends on the data, the associated features for training a model, and the business case you are trying to solve. Work towards least effort, mostly efficient or sensible outcome.