1 October 2019

Marketing Mix

  • Packaging
  • People
  • Place
  • Popularity
  • Positioning
  • Price
  • Product
  • Phrases
  • Promotion
  • Prospect
  • Process
  • Physical
  • Penetration
  • Population
  • Passion
  • Personality
  • Pragmatism
  • Purchase
  • Preference
  • Persuasion
  • Push-Pull
  • Publicity
  • Planning
  • Profit
  • Productivity
  • Partnership
  • Power
  • Perception
  • Positiveness
  • Professionalism

Medical Codes

ICD-10 - Diagnosis
CPT - Procedures
LOINIC - Laboratory
RxNorms - Medications
ICF - Disabilities
CDT - Dentistry Procedures
DSM-IV-TR - Psychiatric Illnesses
NDC - Drugs
DRG - Diagnosis Group
HCPC - Procedures

Survey of Embeddings Use Cases for Clinical Healthcare 

W3C Events 2019

W3C Events 2019

30 September 2019

Programming Paradigms

There are various programming paradigms that have come about in computer science. However, none has replicated the abstractions of philosophic logic in their entirety to leverage the full capacity for artificial intelligence. Building programs that replicate philosophic constructs might be a way of leveraging better abstractions for artificial intelligence programs as well as reasoning systems. The following are some constructs derived for abstractions and concreteness that could be baked in or extended as part of the evolution of programming languages to converge the mind and the mechanisms of machines:
  • Class
  • Object
  • Concept
  • Thing
  • Predicate
  • Actor
  • Agent
  • Subject
  • Notion
  • Standard
  • Data
  • Thought
  • Idea
  • Category
  • Being
  • Action
  • Intent
  • Event
  • Belief
  • Desire
  • Message
  • Observer
  • Axiom
  • Restrict
  • Rule
  • Function
  • Relation
  • Attribute
  • Instance

18 September 2019

Classification (Binary, Multi-Class, Multi-Label)

Binary Classifier - choose single category for an object from two categories

Multi-Class Classifier - choose single category for an object from multiple categories

Multi-Label Classifier - choose as many categories as applicable for the same object

19 August 2019



Applications of Data Science

  • Anomaly Detection
  • Assistive Services
  • Auto-Insurance Risk Prediction
  • Automated Closed Captioning
  • Automated Image Captioning
  • Automated Investing
  • Autonomous Ships
  • Brain Mapping
  • Caller Identification
  • Cancer Diagnosis/Treatment
  • Carbon Emissions Reduction
  • Classifying Handwriting
  • Computer Vision
  • Credit Scoring
  • Crime: Predicting Locations
  • Crime: Predicting Recidivism
  • Crime: Predicting Policing
  • Crime: Prevention
  • CRISPR Gene Editing
  • Crop-Yield Improvement
  • Customer Churn
  • Customer Experience
  • Customer Retention
  • Customer Satisfaction
  • Customer Service
  • Customer Service Agents
  • Customized Diets
  • Cybersecurity
  • Data Mining
  • Data Visualization
  • Detecting New Viruses
  • Diagnosing Breast Cancer
  • Diagnosing Heart Disease
  • Diagnostic Medicine
  • Disaster-Victim Identification
  • Drones
  • Dynamic Driving Routes
  • Dynamic Pricing
  • Electronic Health Records
  • Emotion Detection
  • Energy-Consumption Reduction
  • Facial Recognition
  • Fitness Tracking
  • Fraud Detection
  • Game Playing
  • Genomics and Healthcare
  • Geographic Information Systems
  • GPS Systems
  • Health Outcome Improvement
  • Hospital Readmission Reduction
  • Human Genome Sequencing
  • Identity-Theft Prevention
  • Immunotherapy
  • Insurance Pricing
  • Intelligent Assistants
  • Internet of Things and Medical Device Monitoring
  • Internet of Things and Weather Forecasting
  • Inventory Control
  • Language Translation
  • Location-Based Services
  • Loyalty Programs
  • Malware Detection
  • Mapping
  • Marketing
  • Marketing Analytics
  • Music Generation
  • Natural Language Translation
  • New Pharmaceuticals
  • Opioid Abuse Prevention
  • Personal Assistants
  • Personalized Medicine
  • Personalized Shopping
  • Phishing Elimination
  • Pollution Reduction
  • Precision Medicine
  • Predicting Cancer Survival
  • Predicting Disease Outbreaks
  • Predicting Health Outcomes
  • Predicting Student Enrollments
  • Predicting Weather-Sensitive Product Sales
  • Predictive Analytics
  • Preventative Medicine
  • Preventing Disease Outbreaks
  • Reading Sign Language
  • Real-Estate Valuation
  • Recommendation Systems
  • Reducing Overbooking
  • Ride Sharing
  • Risk Minimization
  • Robo Financial Advisors
  • Security Enhancements
  • Self-Driving Cars
  • Sentiment Analysis
  • Sharing Economy
  • Similarity Detection
  • Smart Cities
  • Smart Homes
  • Smart Meters
  • Smart Thermostats
  • Smart Traffic Control
  • Social Analytics
  • Social Graph Analytics
  • Spam Detection
  • Spatial Data Analysis
  • Sports Recruiting and Coaching
  • Stock Market Forecasting
  • Student Performance Assessment
  • Summarizing Text
  • Telemedicine
  • Terrorist Attack Prevention
  • Theft Prevention
  • Travel Recommendations
  • Trend Spotting
  • Visual Product Search
  • Voice Recognition
  • Voice Search
  • Weather Forecasting





18 August 2019

Types of Data Discovery

  • CDR
  • Emails
  • ERP
  • Social Media
  • Web Logs
  • Server Logs
  • System Logs
  • HTML Pages
  • Sales
  • Photos
  • Videos
  • Audios
  • Tabulated
  • CRM
  • Transactions
  • XDR
  • Sensor Data
  • Call Center
  • Knowledge Bases
  • Google Search
  • Google Trends
  • News
  • Sanctions Data
  • Profile Data

8 August 2019

Quantum AI for Psychic Abilities

The 3 T's already are an active research area under teleportation, telepathy, and telekinesis. However, following psychic abilities could also come into the mix in AI:
  • Thoughtography - imprinting images in one's mind onto physical surfaces
  • Scrying - able to look into mediums to view and detect suitable information
  • Second Sight - able to see future and past events or perceive information (precognition)
  • Retrocognition - supernaturally perceive past events (postcognition)
  • Remote Viewing - able to see distant or unseen target with extrasensory perception
  • Pyrokinesis - able to manipulate fire through mind
  • Psychometry - able to get information about a person or object by touch
  • Psychic Surgery - able to remove disease or disorder within or over the body with energetic incision to heal the body
  • Prophecy - able to predict the future
  • Precognition - able to perceive future events
  • Mediumship - able to communicate with the spirit world
  • Levitation - able to float or fly by psychic means
  • Energy Medicine - able to heal one's own empathic etheric, astral, mental, or spiritual energy
  • Energy Manipulation - able to manipulate non-physical/physical energy with mind
  • Dowsing - able to locate water, gravesites, metals, and materials without scientific apparatus
  • Divination - able to gain insight into a situation
  • Conjuration - able to materialize physical objects from thin air
  • Clairvoyance - able to perceive people, objects, locations, or events through extrasensory perception
  • Clairsentience - able to perceive messages from emotions and feelings
  • Clairolfactance - able to perceive knowledge through smell
  • Clairgustance - able to perceive taste without physical contact
  • Claircognizance - able to perceive knowledge through intrinsic knowledge
  • Clairaudience - able to perceive knowledge through paranormal auditory means
  • Chronokinesis - able to alter perception of time causing sense of time to slow down or speed up
  • Biokinesis - able to change or control the DNA
  • Automatic Writing - able to draw or write without conscious intent
  • Aura Reading - able to perceive energy fields around people, places, and objects
  • Astral Projection - out-of-body experience or the voluntary projection of consciousness
  • Apportation - able to materialize, disappear, or teleport objects

7 August 2019

Drawbacks of Reinforcement Learning

  • Reproducibility
  • Resource Efficiency
  • Susceptibility to Attacks
  • Explainability/Accountability

Types of Filtering for Recommendations

  • Adaptive
  • Contextual (Context Similarity)
  • Cognitive (Personality/Behavior)
  • Content
    • Bayesian
    • Relevance Feedback
    • Evolutionary Computation
    • Deep Learning
  • Collaborative (Model vs Memory)
    • Matrix Factorization
    • Tensor Factorization
    • Clustering
    • SVD
    • Deep Learning
    • PCA
    • Pearson
    • Bayesian
    • Markov Decision Processes
  • Interest/Intent
    • Intent 
      • Search
    • Interest 
      • Content Consumption
  • Impact/Influence
    • Social Feedback  
      • Likes
      • Dislikes
      • Mentions
      • Shares
      • Subscribes
      • Hashtags
      • Emojis
      • Reviews
      • Comments
      • Trends
      • Endorsements
      • Opinions from Person of Influence
      • Associative Connections (Primary/Secondary)
      • Six-Degrees of Separation
  • Item-based
  • User-based
  • Personalization
  • Reinforcement Learning 
    • Reward
    • Optimization
    • Exploration/Exploitation
    • Competitive
    • Cooperative
  • Semantic (with a Knowledge Graph)
  • Demographic

Deep Learning Approaches for Recommendations:
  • Autoencoders
  • Neural Autoregressive Distribution Estimate
  • Convolutional Neural Networks
  • Recurrent Neural Network
  • Long Short Term Memory
  • Restricted Boltzmann Machine
  • Adversarial Network
  • Attentional Model
  • Multilayer Perceptron

28 July 2019

Cloud Providers

Most of Azure cloud service offerings are basically drop-in replacements for their biased standalone software tools. For Microsoft, it seems like Azure is an alternative way of vendor lock-in of the customer via the re-purposed cloud option which has so far proven to be useful through heavy gimmicky marketing. GCP, on the other hand, provides many alternatives for big data but with very ineffective pricing, lacking business critical reliability, security constraints, lots of options to re-invent the wheel with vendor lock-in, still limited in SQL use cases, and their limited over all services. AWS has proven to be a very effective pricing model as well as catering to a wide range of services to cover business needs including a strong reliability and flexible options for management of services. For most organizations, especially for data science work, AWS is the go to cloud solution. Azure and GCP still lag behind considerably in reliability, cloud service offerings, ineffective pricing, and the biggest concern being vendor lock-in. In many cases, cloud providers are limited by their mission statements of what they are trying to achieve through their solutions to businesses and their future progressive infrastructure development goals. For Microsoft, windows is the ultimate success story which one can see has evolved in parallel from Apple. But, linux has become the defacto operating system for the cloud and for obvious reasons. Data as a commodity is a valuable asset to most organizations. And, the management of risk in security and compliance is an enduring struggle for many organizations. Especially, in meeting GDPR compliance many organizations will want a transparent data lineage. Can one trust storage and processing of data on GCP? All Google services converge to some degree or another and get indexed by their search engine. Invariably, the cost and risk of using the third-party cloud infrastructure vs in-house infrastructure will always be a concern for companies to weigh out. It seems, in the long-run, organizations will take back control of their own data storage and processing needs. The peddles of trends are towards portable, smarter, and stackable private cloud ownerships, more flexibility in management of infrastructure, and with virtualization modes at an affordable cost. While start-ups may find it easier with reducing setup costs by leveraging third-party infrastructure. As companies grow with their market value of their products, they may increase their independence by eventually moving away from the third-party cloud dependency to their own in-house converged infrastructure allowing for greater flexibility to meet consumer expectations and the demands of their product services - enterprise enablement drives creative and profitable growth. 

24 July 2019

Everyday Robots

Robots, over the years, have proven themselves as worthy candidates for replacing the manual mundane labor intensive work for humans both for commercial and home. Not only do robots work more effectively, they are also extremely productive. In general, robots can be applied to most specialist labor so they can be trained to be good at a particular aspect of work. But, they may not be sufficiently capable yet to do multiple things through adaptability in multi-class transfer learning. The following highlight some examples of robot use cases.

  • automotive breakdown repair man/woman
  • home and office cleaner
  • rubbish disposal
  • grocery shopper
  • home and office security officer/inspector
  • laundry service
  • cook (chef)
  • critic / reviewer
  • gardener
  • table setter
  • mechanical turk
  • post man/woman
  • babysitter
  • mystery shopper
  • chauffeur
  • home and office mover
  • handy man/woman
  • telephone/broadband installer/repair man/woman
  • call centre agent
  • lollypop man/woman
  • school teacher
  • office secretary
  • family mediator
  • office mediator
  • crop duster
  • nursing home nurse
  • doctor and nurse
  • nanny
  • lawyer
  • accountant
  • assembly line worker
  • dentist
  • data entry clerk
  • journalist
  • financial analyst
  • comedian
  • musician
  • artist
  • telemarketer
  • paramedic
  • commercial and defence pilots
  • public transport worker
  • rail repair
  • air traffic controller
  • land traffic controller
  • sea traffic controller
  • metrologist
  • kitchen porter
  • crop pickers
  • police man/woman
  • fire man/woman
  • immigration/border controller
  • politician
  • director
  • photographer
  • creative writer
  • curator
  • cheerleader
  • gamer
  • construction worker
  • programmer
  • logging worker
  • fisher man/woman
  • steel worker
  • street sweeper
  • refuse collector
  • carpenter
  • stunt man/woman
  • courier
  • wrestler
  • boxer
  • sports man/woman
  • recycle waste worker
  • power worker
  • farmer
  • roofer
  • astronaut
  • army & military officer
  • bodyguard
  • slaughterhouse worker
  • mechanic
  • metalcrafter
  • search & rescue
  • special forces (SAS, Delta Force, Seal, etc)
  • sanitation worker
  • land mine remover
  • miner
  • bush pilot
  • lumberjack
  • librarian
  • human resources assistant
  • salesman
  • editor
  • dance instructor
  • bus conductor
  • tourist guide
  • stewardess
  • cashier
  • store replenisher
  • data center operator
  • taxi cab driver
  • train driver
  • lorry driver
  • customer service advisor
  • electrician
  • vehicle washer
  • bed maker
  • bathroom cleaner
  • pet walker
  • oilfield driver
  • derrick hand
  • roustabout
  • offshore diver
  • rodent killer
  • insect killer
  • therapist
  • architect
  • actor
  • backup singer
  • backup dancer
  • house builder
  • waiter
  • presenter
  • manager
  • hacker
  • stripper (exotic dancer)
  • sex worker
  • hairdresser
  • makeup artist
  • fashion designer
  • cameraman
  • researcher
  • chemist
  • pharmacist
  • landscapist
  • baker
  • ship builder
  • car maker
  • broadcast technician
  • hotel helpdesk
  • store helpdesk
  • mall helpdesk
  • site assistant
  • tailor
  • tutor
  • pet trainer
  • cartoonist
  • reporter
  • moderator
  • painter
  • plumber
  • auditor
  • financial trader
  • financial broker
  • financial advisor
  • compliance advisor
  • fraud advisor
  • risk advisor
  • surveillance agent
  • social media agent
  • bricklayer
  • choreographer
  • actuarian
  • physiotherapist
  • tea/coffee maker
  • pizza maker
  • burger maker
  • welder
  • surveyor
  • surgeon
  • glazier
  • tiler
  • stonemason
  • optician
  • tool maker
  • artisan
  • sonographer
  • radio technician
  • sports coach
  • bartender / barmaid
  • bellboy
  • paperboy
  • drain inspector
  • pet feeder

13 July 2019

Lucid Pipeline

Most AI solutions can be built as pipelined implementations with various sources to sinks from a set of generalizable models. Invariably, knowledge graph will act as a key layer for evolvable feature engineering that can be translated into ontological vectors and fed into AI models. Split the pipeline as a lucid funnel, lucid reactor, lucid ventshaft, and lucid refinery using a loose analogy of a distillation process. The following components highlight the key abstractions:

AI/DS Engine Layers:

  • Disc (frontends - discovery/visualization layer)
  • Docs (live specs via swagger, etc - documentation layer)
  • APIs (proxy/gateway services connected with elasticsearch or solr - application layer)
  • DS (models and semantics - AI layer)
  • Eval (benchmarks, workbench and metrics - evaluation layer)
  • Human (optional human in the loop - human/annotation layer)
  • Tests (load, unit, uat, service, etc - testing layer)
  • Funnel (ingestion, pre-process, post-process layer using brokers like Kafka/Kinesis)
  • Reactor (reactive processes - workflow/transformational layer - via Spark, Beam, Flink, Dask, etc)
  • Ventshaft (filter, match, fuzzy, distance, probabilistic, relational - functional layer)
  • Refinery  (context types, objects, attributes and methods as blueprints - entity/object layer)
  • Datapiles (indexed data sources as services for document/column/graph stores - data access layer)
  • Conf (environment configurations for nginx, etc - configuration layer)
  • Cloud (connected services for AWS/GCP orchestration - infrastructure/platform layer)

11 June 2019

Sports Extended Reality

  • Moment is everything (pre-event, event, post-event)
  • Nothing is actually live
  • Flat images and sense of presence
  • 20/80 rule
  • Time matters
Attributes of a sports experience:
  • Hawkeye cameras
  • Replays
  • Live scores
  • Clock graphics
  • Broadcast video + audio (including commentary)
  • Live text updates
  • Fast cuts
  • Music
  • High-paced graphics
  • Social media interaction (optional)
Live is a belief, an illusion in sports. There are three stages of live sports: Live, Live Live, Live-To-Record or Live-To-Tape. An important aspect arises in the focus of driving a story (narrative) and providing consumers a sense of control.

16 May 2019

Lexicon Model for Ontologies


Industrial Data Science

Over the years, Data Science as a field has emerged to play a major pivotal role for many industry sectors providing avenues for analytical growth and insights towards more effective products and services. However, there are several glaring aspects of the field that is riddled with misconceptions and ineffective practices. Traditional Data Science was about Data Warehouses, relational way of thinking, Business Intelligence, and overfitted models. However,  in the current landscape, Artificial Intelligence, as a discipline, is more about out-of-the-box style of thinking and is having an impact to Data Science practice. Data Engineering and Data Science functions tend to merge as one in AI practice. Relational Algebra is replaced with semantics and context via Knowledge Graphs that form the important metadata layer for a Linked Data Lake. While traditional Data Science relied fully on statistical methods, the new approaches rely on combining Machine Learning and Knowledge Representation and Reasoning approaches in a hybrid model for better Transfer Learning and generalizability. Deep Learning, which is a pure statistical method and a sub-field of Neural Networks, is by its very nature implemented as a set of distributed and probabilistic graphical models. It makes very little sense to split teams between Data Engineering and Data Science as the person building the model also has to think about scalability and performance. Invariably, splitting teams means duplication of work, communication issues, and degradation of output in production (when passed from Data Science to Data Engineering). In many AI domains, there is an inclination towards open box thinking about problems. In AI, only 30% of the effort is Machine Learning while the rest 70% is Computer Science principles and theory. An evidence of this can be seen in the Norvig book which is often the basis of many taught AI 101 courses. Often at universities, in advanced courses, they reluctantly forget to cover the entire Data Science method and only stress on Machine Learning and statistical methods, at exploratory stage, while forgetting the rest of the Computer Science concepts. As a result we see many Data Scientists with Phds that are ill-equipped to tackle practical business cases with productionization of their models against small and large datasets, with appropriate Feature Engineering for semantics, and the associated pipelining. Furthermore, at many institutions Feature Engineering is often skipped entirely which is really 70% of the Data Science method, and possibly the most important stage of the process. Invariably, this Feature Engineering step is partially transferred over as part of the Data Engineering function. One needs to wonder why the Data Scientist is only doing 30% off the work from the Data Science method even after holding a Phd, while passing the reminder of the hard work to the Data Engineer comprising as part of the formal ETL process. The whole point of a Knowledge Graph is really to add the value of semantics and context to your data, and moving towards information and knowledge. This becomes a very important aspect to not only Feature Engineering but a feedback mechanism where one can cyclically improve the model learning while allowing the model to improve on the semantics in a semi-supervised manner. The Knowledge Graph also enables natural language queries, making the data available to the entire organization. No longer the need to hire specialists who understand SQL in order to produce Business Intelligence reports for the business. The whole point is to make data available and accessible for the entire organization while also increasing efficiencies as well as enabling a manageable way of attaining trust through centralized governance and provenance of the data. Thus, enabling data to adapt to the organizational needs and not the organization having to adjust resources for the needs of working with the data. There needs to be a shift in the way many organizations build Data Science teams, how the subject matter is taught at universities, as well as how they architecture for AI transformational solutions. Although, Deep Learning is good at representation learning, it initially requires a large amount of training data. Where large amount of training data is lacking one can rely on semantic Knowledge Graphs, human input, and clustering techniques to get further with Data Science executions which in the long-term will have a far greater benefits to an organization. Many organizations seem to ignore the value of metadata at the start and with the growth of data adds to the complexity and its many challenges for integration. Why must we always push for only statistical methods if many of the direct value can be attained through inference over semantic metadata or a combination of both approaches. By nature for humans probability is unintuitive. When does the average human ever think in statistics when they go about their daily lives from traveling to work to buying groceries at a supermarket to talking to a colleague on the phone - hardly ever. And, yet, an average human is still smarter in many respects, across domains of understanding and adaptability through transfer learning and semantic associations, compared to the most sophisticated Machine Learning algorithm that can be trained to be good at a particular task. However, when the human Data Scientist arrives at work they reduce the scope of the business problem-solution case to a mere statistically derived methods. If the AI is to move forward we must think beyond statistical methods of thinking through complex business cases, flexible semantics, and take more inspiration from the human mind for all the things that we already take for granted in our daily lives that machines still find significantly complex to understand, adapt, and learn.

29 April 2019

29 March 2019

AI Ethics

There are a lot of people claiming to be AI ethics experts, but the field is only just emerging. In mainstream, the topic has only been around for a short while. So, how can someone be an expert in it as there is still lots of unanswered research questions in area?

  • How can one focus on ethics without also focusing on morals? Morals are the basis of ethics?
  • Does ethics in AI, intrinsically, have a universal equivalence? i.e something that is codified as ethical in west may not be sufficiently compatible for the east. What attributes in ethics form for-all-there-exists vs for-some-there-exists as an existential quantification?
  • How can one control the abuse and falsely manipulated justification of AI ethics? i.e someone trying to drive political/cultural change/influence in a society/organization using AI ethics?
  • How do you make sure that the people in control of ethics, who are by their own accounts calling themselves as ethical experts, are in fact ethical? AI is only as ethical as the human that programmed it? Can the codification of AI ethics be programmed to mutate as defined by the environment and changing norms of society? In so doing, allowing the AI agent to question the ethical and moral dilemmas for/against humans?
  • If one builds a moral reasoner in horn clauses, can such reasoning then genetically mutate for ethics, on a case by case basis, for conditioning of an AI agent? Can AI agents be influenced by other AI agents, like in a multiagent distributed system - argumentation via game theory, reinforcement policies, towards mediation and consensus?
  • Can ethics and morals be defined in a semantically equivalent language?
  • If one defines horn clauses for moral reasoning and a set of ethical rules, can such moral/ethical conundrums be further defined using markov decision processes, in form of neural network, for any and all states as a good enough coverage for a global search space that can be further reasoned over with transfer learning?
  • How do you resolve human bias in a so called AI ethics expert?
  • Who defines what is ethical and moral for AI? Is there an agreed gold standard of measure?

In general, a moral person wants to do the right thing with a moral impulse that drives the best intentions. Morals define our principles. While ethics tend to be more practical towards a set of codified rules that define our actions and behaviors. Although, the two concepts are similar, they are not interchangeable nor aligned in every case. Ethics are not always moral. While a moral action can also be unethical.

AI Ethics Lab 

17 January 2019

NLP Games with Purpose

Games with a purpose are essentially types of games applied to annotations in NLP to make the process fun for the oracle (annotator), often in a crowdsourced manner. A few examples in context are listed below:

  • Phrase Detective
  • Sentiment Quiz
  • Guess What
  • ESP Game

Active Learning

Active Learning Approaches:
  • Pool-based Sampling
  • Member Query Synthesis
  • Stream-based Selective Sampling
  • Uncertainty Sampling
  • Query-by-Committee
  • Expected Model Change
  • Expected Error Reduction
  • Variance Reduction
  • Density-Weighted Methods
  • Query from Diverse Subspaces
  • Exponential Gradient Exploration
  • Balance Exploration and Exploitation

The Language Grid

The Language Grid

13 January 2019

Data Science Methods

Generalizable Method (your mileage may vary, given business case and time constraints):
  • Identify and understand business case (use story) - in most cases you are not provided a use story so it is really about understanding the problem
  • Explore and prototype including background research (exploratory stage)
  • Identify cases for reuse
  • Identify whether this story even requires a model
  • Identify relevant datasets - curation
  • Visualize the data (how sparse/dense/dirty it is, multiple open source tools available for refinement steps for features, identify additional effort necessary for model build)
  • Identify the relevant variances and biases (will the model steps lead to an overfitting or underfitting - the objective is to build a generalizable model)
  • Feature Selection/Extraction (may use other ML or natural computation techniques here)
  • Feature engineering (this may also include curation/enrichment of metadata)
  • Feature re-engineering (this may also include curation/enrichment of metadata)
  • Identify the simplest solution that is possible
  • Identify the reasoning of using a complex solution
  • Custom model to solve the business case (do not just copy model out of a research paper - this is what the exploratory stage was for)
  • Evaluation and Benchmarking (formal tests may/may not be used, depends on business case)
  • How well does the model scope against small data and large data - identify sufficiency at average and worst time/cost
  • Re-Tune/Rinse/Repeat
  • Incrementally improve the model
  • Incrementally optimize/scale the model (scale only when necessary)
  • One simple one complex - one that is sub-par, and one that is riskier
  • Evaluation and Documentation
  • Pipeline the Solution in Dev-mode (Identify bottlenecks with the model - dry run/end-2-end for integration - at this stage a repeatable build/test/deploy/evaluate cycle may be used - DS/DE)
  • A/B/N/Bandit Testing in Stage (generally this stage is covered by the product team, alongside automated acceptance tests, if they know the techniques, or DS/DE maybe involved)
  • Release/Integrate for Production (depends whether this is a B2B or B2C case, or beta mode)
  • Storytelling (how well does the model answer/solve the question or problem statement - 'through the looking glass’ - refers to both dev, stage/prod cases)
  • User/Stakeholder/Client Feedback (Rinse & Repeat, depending on B2B or B2C cases)
  • Incremental Analysis and Review of Models
  • Rinse & Repeat (some of the steps above repeated multiple times before production release)

Process Flow:

R&D → Dev/UI/UX → Prod

Generally, with a heavy R&D/Backend focused team, the features and functionality tend to be dictated by the forward flow (Bottom-up approach), most AI projects at startups tend to be built that way. The frontend then becomes a thin client as a view to the world for assimilation of the backend efforts, typical pattern tends to be an informational dashboard for storytelling- 'through the looking glass'. This is because, in a top-down approach many of the backend efforts would get lost in translation (equally, in some business cases it may work better).

Data → Information → Knowledge

State-of-the-Art may not imply state-of-the-art for your business case and may in fact lead to a sub-optimal results and more effort. It is all very subjective, depends on the data, the associated features for training a model, and the business case you are trying to solve. Work towards least effort, mostly efficient or sensible outcome.

8 January 2019

NLP High-Performance Computing

There are two primary approaches for working towards high-performance computing in NLP domains:
  • Add GPUs to server
  • Connect CPUs on multiple servers
Scaled out approaches generally tend to work towards maximization of constant RAM utilization, where they are able to automatically traverse the computational graph to allocate resources and optimize on throughput. In many cases, in particular, to deep learning models, the heavy acceleration of parallelized matrix multiplications makes a big difference. In neural networks, backpropagation is more computationally expensive than forward activation. Once the model is trained the weights and structures can be exported on any hardware for model prediction whether that be a forward pass or an inference pass.

Approximate Nearest Neighbor Matching

BallTree (NMSLib)
Brute Force (BLAS)
Brute Force (NMSLib)
KeyedVectors (Gensim)
SW-Graph (NMSLib)
KGraph (NMSLib)

ANN Benchmarks

Chatbot Prizes

Loebner Prize
Alexa Prize
Winograd Schema
Marcus Test
Lovelace Test

Sentence Piece

Sentence Piece

Sentence Segmentation

NLTK - Punkt



DeepMind QA


Visual Question Answering

Visual Question Answering