31 August 2016

Artificial Intelligence for Retail

The below outlines some areas for feature engineering that could be applied using various machine learning and deep learning techniques. There is also options here to build significant robots or drones.

In-Store Analytics (conversion of customers when in the store)
  • sentiments of customers
  • product vs purchase stock history
  • order fulfillment
  • stock and inventory monitoring
  • loyalty promotions
  • personalization
  • helping customers find bargains
  • identifying customer shopping basket history
  • maximization of conversion
  • shelving analysis what product gets bought more next to what product
  • supply-chain on-demand by product (product just bought, re-shelf, check stock availability)
  • curiosity shopping conversion
  • price analytics
  • offer tracking
  • streaming offers for loyalty customers
  • ereceipts
  • discount tiers (more customer buys the more discounts they get)
  • targeting age group buying habits
  • semantic search relevance
  • customer agents
  • cashier agents
  • warehouse agents
  • morelikethis
  • track customer experience
  • deep insights on product recommendations (i like this heel, this color, this buckle, perfect!)
  • nutritionist/wellness agents (people that are positively conscientious of their health)
  • product tabs

Out-Store Analytics (People passing outside the store)
  • competitor Insights (is the product cheaper at Y retailer, is the product available at X retailer)
  • insight on window dressing (engage the right window dressing to attract customers)
  • insights on social media/viral marketing
  • promotions
  • augmented reality (customers can check product availability at X retailer, price, promotions, etc)
  • lead generation
  • product trends
  • reviews
  • social sentiment about the store/identify why the customer does not enter the store
  • effectiveness of advertising to conversion ratio
  • local store optimization (location)

Types of customers:
  • Curiosity/Bargain Shoppers
  • Spendthrift/Impulse Shoppers (heavy shopping one day, no shopping the next - mood swingers)
  • Loyalty/Informed Shoppers
  • Indecisive Shoppers
  • Wanderers
  • Complainers
  • Green Shoppers (Vegans, Weight Watchers, Calorie/Nutrition, Organic, Free Range, Gluten-Free, Religious)

Other Areas:
  • Time of Day (night, morning, afternoon, weekday, weekend, bank holiday)
  • Product Categorization and Labeling
  • Teens
  • Professionals
  • Parents
  • Pensioners
  • Students
  • Tourist
  • Singles
  • Kids

Core Areas of Retail Analytics:
  • In-Store (local user experience)
  • Out-Store (local user experience)
  • Ecommerce (online user experience)
  • Home Delivery (remote user experience)
  • Supply-Chain & Logistics (inventory/product/stock/transportation/warehousing)
  • Pricing/Loyalty (core sales/loyalty user experience/competition)
  • Contextual Advertising (core marketing)
  • Social Media/Rumor Mill (Sentiment Analysis - Reviews/Brand/Product/Event/Experience)
  • Recruitment (staffing)
  • Geographical (Locational/Regulatory Compliance)
  • Security (Local/Online/Remote)
  • CRM (core 360)

Key In-Store Q’s for Analytics:
  • Shrinkage - shoplifters/theft/weak links
  • Managing the Moment - achieving customer needs in real-time
  • Measure Customer In-store experience (wants/needs/desires)
  • What drives sales conversion
  • What is not in the basket
  • Feedback on Promotions/Effectiveness of Promotions
  • Complex Data Insights/Summaries
  • Information on ROI - Return on Investment
  • Optimization for Product Mix - What moves the Shopper to Purchase
  • Shopper vs Buyer

Pricing and Semantic Publishing Pipeline


Microservices Subsystems for Data Protection


Apache Spark Architecture



CV and JobSeeker Profile Enrichment Pipeline


Entity Extraction Enrichment Pipeline


Generalization of Machine Learning Pipeline

Open-Domain Question/Answering Pipeline


Data Science Projects

Data Source
Description
Land Registry 10 Years DataBuild a story visualization of sold property prices and timeline of trends across UK
Marvel APIUsing the Marvel API and social media, collect, mine and build a comical visualization story for characters
TFL Data FeedsTrack TFL Data across London
Local Urban DataWhatsOn, Congestion, Events, Hubbub, GeoLocation
Social Media, Blogs, News, ReviewsProduct or Brand tracking/engagement on the web
Github, Twitter, Meetups, Quora, Stackoverflow, MailingLists, stackshareMonitor/track technology trends (BigData, ML, Batch/Stream Processing, etc)
Social Media, Blogs, News, AlertsMonitor and visualize political risk, events, and trends with a story timeline
Google N-Grams, Gutenberg, Wiktionary, WordNet, etcSpelling Checker using word2vec/glove
Single and Multi-Documents (News Feeds, Journals, Business Documents, etc)Information Extraction (Summary, Topic Tags, Language Detection, Author, etc)
SantanderMeasuring customer satisfaction
HomeDepotSearch relevance of search terms
Company House, Social Media, Corporate Sites, Compliance, AngelistTrack companies with partners, creditors, suppliers, sponsors, buyers
WalmartUse historical data to predict store sales
Historical Stock Prices, NewsMonitor and track stock prices and news for forecasting
WorldBank Datasets & Indicators, UK Office of National Statistics, US Census Data, IMF Data, Census Hub, and othersTrack and visualization of census data across regions
World University RankingsFind the best universities of the world
World Food FactsFind the nutrition facts in foods
Reddit CommentsStorytelling and visualization of contextualized comments on Reddit
Handwriting and DigitsTraining a computer to detect handwriting
FacesTraining a computer to detect facial expressions
Twitter and othersBuilding a profile of how people view the EU
Cats and Dogs DatasetDistinguish Dogs from Cats
Any music/video streamWrite a Stream Sampler that takes a random (representative) sample of size k from a stream of values of unknown and possibly very large length:
Receiving data the sampler should work with two kinds of inputs:
-values piped directly into process (stdin)
-values generated using a good random source
Expedia Hotelswhich hotel type will an expedia customer book
learning to rank hotels
Amazon Fine Foodsanalyze reviews
what does the product-reviewer graph look like?
what words tend to indicate positive and negative reviews?
what types of food products get reviewed the most?
how does review score distribution vary across reviewers?
what makes a review helpful?
NIPS 2015analyze and explore research papers, citations
Data Curation/Scraping + DBPediaontology engineering of a few custom/domain contexts, scraping, building a commonsense graph/reasoning
Anomaly Detection (Spam, Fraud, Fault, Network)Monitor/Track/Identify Anomalies from Data
Domain DataMonitor/Track Domain Websites
Images/Videos/Music/Shows/News Feeds/Twitter/Facebook/ReviewsDevelop semantic recommendations (processing multiple types of streaming)
FAQ sourcesBuild a FAQ graph and recommendation for technology
Recipes, Barcodes, etcmining ingredients for: wellness, nutrition, religion, quantified self, fitness and health
museum, gallery, and library (worldcat) datasets, catalogs, library of congress, etcmining and visualization of connected archives
relevant contextual datasettopic extraction in NLP in real time to do recommendations using LDA

Public Data Sources

24 August 2016

NER Projects

Named Entity Recognizers are a form of information extraction focusing precisely on named entities in order to classify them into specifically defined categories which may utilize entity linking. Annotation is a fundamental aspect of this classification. Quality measures often incorporate the use of precision, recall and F1 score (harmonic mean). Evaluations are also often compared against a gold standard: a benchmark that is available under reasonable conditions or the most accurate test possible without restrictions which is defined as the ground truth for the absolute state of information. The below highlight a few open source and commercial projects for NER. One can even utilize semantic web in form of a thesaurus server to incorporate SKOS schemes as a way of classification or annotation of terms in form of embedded URIs. One can view further examples from applications of PoolParty or Apache Stanbol.

20 August 2016

Words and Vectors

Clustering has become an active research area driven through deep learning techniques in deriving vectors of understanding in Natural Language Processing. Word2Vec is a fairly actively used technique for clustering. Its input is a text corpus and its output is a set of feature vectors for words. There are many libraries available that provide implementations for word embeddings including Gensim, DL4J, Spark, and others. The following are some variational areas within the same Word2Vec approach.

Doc2Vec (aka Paragraph2Vec, Sentence2Vec, Text2Vec)
Phrase2Vec
Sequence2Vec

16 August 2016

Open Semantic Search

Seems like a new open source project in semantic search, quite useful in the coverage of features that they are trying to achieve. Although, it appears it is still a very new project with much to be implemented. However, tracking it would be still very useful.

Popular BigData and Machine Learning Libraries

Machine Learning libraries and frameworks are constantly evolving. However, there is no harmonization with one tool that fits all solutions.  It seems quite apparent that as more and more libraries evolve the plethora of Machine Learning libraries to choose from will grow to such levels that they will eventually be shunned and refactored towards the cloud in order to utilize greater data processing requirements for scale out. However, certain libraries have a massive following already in industry as examples of some are listed below. Languages like Python, Java, Scala, and C++ are most suited to such contextual work. However, languages like Go are not far behind either. Most of these libraries are directly related to the progress in academic research in the area which can equally provide an indication of what new approaches can be utilized now and what may be possible in the future.

TensorFlow
DL4J
DataFlow
Flink
Spark
Theano
ScikitLearn
GraphLab
Mahout
SpringXD