14 December 2018

Document Similarity Measures

String Matching
  • Edit Distance
    • Levenstein
    • Smith-Waterman
    • Affine
  • Alignment
    • Jaro-Winkler
    • Soft-TFIDF
    • Monge-Elkan
  • Phonetic
    • Soundex
    • Translation
Distance Matching
    • Euclidean
    • Manhattan
    • Minkowski
  • Text Analytics
    • Jaccard
    • TFIDF
    • Cosine Similarity
Relational Matching
  • Set Based
    • Dice
    • Tanimoto (Jaccard)
    • Common Neighbors
    • Adar Weighted
  • Aggregates
    • Average values
    • Max/Min values
    • Medians
    • Frequency (Mode)
Other Matching
    • Numeric distance
    • Boolean equality
    • Fuzzy matching
    • Domain specific
  • Gazettes
    • Lexical matching
    • Named Entities (NER)

12 December 2018

Argument Mining

Argument Interchange Format (AIF)
Argument Markup Language (AML)
SALT - Rhetorical Ontology (SRO)
Ethical Reasoners
Argument Mappings
Solvers
Factor Graphs for Inference
Ensemble approaches
SWAN: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4536833/
Discourse Elements: https://sparontologies.github.io/deo/current/deo.html
ORB: https://www.w3.org/TR/hcls-orb/
SPAR: http://www.sparontologies.net/ontologies

DebateGraph: https://debategraph.org/Stream.aspx?nid=61932&vt=ngraph&dc=focus

Bias Lexicons: https://www.cs.cornell.edu/~cristian/Biased_language.html | https://web.stanford.edu/~cgpotts/data.html
Argument Corpus: https://nlds.soe.ucsc.edu/iac

Argument Mining
Argument Mining2
Argument Mining3
Argument Mining Workshops
ArgumenText
Multi-Task Learning for Argumentation Mining in Low-Resource Settings
Argument Mining A Data-Driven Analysis
Role of Argumentation in the Rhetorical Analysis
Unsupervised Corpus-Wide Claim Detection
Opinion based Argument Mining
Argument Mining Implementation
Argumentation

Companies Tackling Fake Content

Veriflix
Fabula
Factmata
Distil Networks
Digital Shadows
Perimeterx
Crisp Thinking
Rappler
UserFeeds
Google FactCheck
Meedan
CrowdTangle
Snopes
Newswhip
Le Decodex
Pheme
Contratobook
Ananas
Talos
New Knowledge
Axios
WikiTribune
StopFake
Adverif
knowhere
nwzer
deepnews.ai
robhat labs
civil
storyzy
newsguard
cheq
bs detector
trustednews
surfsafe
cyabra
captainfact
astroscreen

http://bit.ly/2PifdKV



Journalism Credibility

Entity Linking

Babelfy
DBPedia Spotlight
Fred
AIDA
Dexter
Freme
Nerd
xLisa
Tagme
Kea
EntityClassifier
Fox+

Gerbil Benchmarking Framework

Snorkel

Snorkel

DeepType

DeepType

DeepDive

DeepDive

7 October 2018

Types of Deep Learning

TypeGroup
Attentional InterfaceAttention-Memory
Memory-Attention NetworksAttention-Memory
One-Shot Associative MemoryAttention-Memory
KeyValue Memory NetworksAttention-Memory
Compositional Attention NetworkAttention-Memory
Deep Memory NetworkAttention-Memory
Structured Attention NetworkAttention-Memory
Hyperbolic Attention NetworkAttention-Memory
Multi-Cast Attention NetworkAttention-Memory
Bi-Directional Attention FlowAttention-Memory
Variational AutoencoderAutoencoder
AutoencoderAutoencoder
Denoising AutoencoderAutoencoder
Sparse AutoencoderAutoencoder
Contrastive AutoencoderAutoencoder
FeedforwardBasic
PerceptronBasic
Multilayer PerceptronBasic
Deep Convolutional NetworkCNN
Convolutional Deep Belief NetworkCNN
Convolutional GANCNN
DeConvolutional NetworkCNN
Deep Convolutional Inverse Graphics NetworkCNN
Geometric Deep LearningCNN
Convolutional Kernel NetworksCNN
Convolutional AutoencoderCNN
Hierarchical Convolutional Deep Maxout NetworkCNN
Deep Belief NetworkDBN
Continuous DQNDQN
Deep Q NetworkDQN
Dueling DQNDQN
Episodic-Memory DQNDQN
Bidirectional LSTMLSTM
Convolutional LSTMLSTM
Grid LSTMLSTM
Long Short Term MemoryLSTM
Peephole LSTMLSTM
Phrasal LSTMLSTM
Hierarchical LSTMLSTM
Gated Recurrent UnitLSTM
Adaptive Resonance TheoryModular
Maximum EntropyModular
CounterpropogationModular
SplineModular
GaussianModular
NeocognitronNeural
Neural ProgrammerNeural
Neural Turing MachineNeural
Neuro-FuzzyNeural
NeuroevolutionNeural
Neural Associative MemoryNeural
Neural Hawkes Process MemoryNeural
Sequence-2-SequenceOther
Deep FeedforwardOther
Deep Neural NetworkOther
Helmholtz MachineOther
Hopfield NetworkOther
Kohonen NetworkOther
Compound Hierarchical Deep ModelOther
Dense Associative MemoryOther
Hierarchical Temporal MemoryOther
Large Memory Storage and Retrieval NetworkOther
Generative Adversarial NetworkOther
Associative Neural NetworkOther
Adaptive Computation TimeOther
Deep Coding NetworkOther
Deep Deterministic Policy GradientOther
Deep Predictive Coding NetworkOther
Deep Reservoir ComputingOther
Deep Residual NetworkOther
Deep Stacking NetworkOther
Diffusion NetworkOther
Echo state NetworkOther
Elman Jordan NetworkOther
Extreme Learning MachineOther
Instantaneously Trained Neural NetworkOther
Learning Vector QuantizationOther
Liquid State MachinesOther
Spiking Neural NetworkOther
Tensor Deep Stacking NetworkOther
Radial Basis FunctionOther
Recursive Neural NetworkOther
Markov ChainProbabilistic
Deep Bayesian Neural NetworkProbabilistic
Deep Markov ModelProbabilistic
Stochastic Neural NetworkProbabilistic
Spike and Slab RBMRBM
Boltzmann MachineRBM
Restricted Bolzmann MachineRBM
Bidirectional RNNRNN
Clockwork RNNRNN
Continuous Time RNNRNN
Dilated RNNRNN
Hierarchical RNNRNN
Recurrent Neural NetworkRNN
Second Order RNNRNN
Multi-Time Scales RNNRNN
Recurrent Multilayer PerceptronRNN
Deep Kernel MachineSVM
Support Vector MachineSVM
Shallow Neural NetworksThoughtVectors/WordVectors

*Shallow = one hidden layer in NN
*Deep = more than one hidden layer in NN

7 July 2018

Mailer Campaign Uplift Modeling

Profit(C) = ExpectedProfit(C) x [P(B | V) - P(B | C)] - AdCost(C)

  • P(B | C) - probability of buying given control without ad campaign (Naive Bayes)
  • ExpectedProfit(C) - profit to make from customer if they decide to buy (Regression)
  • P(B | V) - probability of buying given variant of ad campaign (Naive Bayes)
  • AdCost(C) - cost to mail campaign to customer as a constant
  • likely to take into account market or customer segmentation
  • regression could be either logistic or linear
  • total profit would be determined by how much the customer decided to buy either with control and/or ad campaign
  • optimization of ad campaign given the customer conversion ratio
  • use customer data as part of expected profit measures for average spend
  • additionally, more ways to approach the same contextual measures of profit

3 July 2018

Test-Driven Machine Learning

TDD -> Kent Beck
BDD -> Dan North
Refactoring -> Martin Fowler
Agile -> James Shore

Random processes in machine learning need to be measured and controlled, various simple testing strategies can make this possible.

24 June 2018

Probabilistic Reasoning

Factorie (Scala)
Figaro (Scala)
PyMC4 (Python)
PyMC3 (Python)
Probability (Python)
BayesLoop (Python)
Tweety (Java)
Dimple (Java)
Chimple (Java)
WebPPL (JavaScript)

Probabilistic Programming and Bayesian Methods for Hackers
The Design and Implementation of Probabilistic Programming Languages

Natural Computation

  • Cellular Automata
  • Evolutionary Computation
  • Swarm Intelligence
  • Artificial Immune Systems
  • Artificial Life
  • Quantum Computing
  • Systems Biology
  • Synthetic Biology
  • Cellular Computing
  • DNA Computing
  • Amorphous Computing
  • Membrane Computing
  • Neural Computation

18 June 2018

Conversational Dialogs

Machine Translation

Mining Knowledge Graphs from Text

Mining Knowledge Graphs from Text

Entity Linking and Disambiguation

Natural Language Understanding

Natural Language Generation

16 June 2018

Generative Models

  • Hidden Markov Model
  • Gaussian Mixture Model
  • Naive Bayes
  • Latent Dirichlet Allocation
  • Restricted Boltzmann Machines
  • Generative Adversarial Networks
  • Variational Autoencoder
  • Probabilistic Context Free Grammar
  • Generative Long-Short-Term-Memory
  • Helmholtz Machine

4 June 2018

Markov Chain Monte Carlo Sampling

  • Metropolis-Hastings
  • Gibbs Sampling
  • Slice Sampling
  • Reversible-Jump
  • Multiple-Try Metropolis
  • Langevin Rule
  • Hamiltonian
  • Simulated Tempering

Demand Forecasting

One can utilize the various macro-environmental factors to evaluate demand forecasting. The below list the various types. However, they are invariably grouped under PEST, PESTEL, PESTLE, SLEPT, STEPE, STEEPLE, STEEPLED, DESTEP, SPELIT, STEER. B2B market places tend to be affected more by social factors. Defense contractors tend to be affected by political factors. Factors that are more frequent or volatile may have higher importance. Conglomerates may tend to divide factors by departmental assessment or even specific to a geographical location. One can use these models to connect with micro-environmental and internal factors. Additionally, SWOT analysis may also be used: Strength, Weakness, Opportunities, Threats.
  • Political
  • Social
  • Economic
  • Technological
  • Legal
  • Environmental
  • Demographics
  • Regulatory
  • Inter-cultural
  • Ethical
  • Educational
  • Physical
  • Religious
  • Security
  • Competition
  • Ecological
  • Geographical
  • Historical
  • Organizational
  • Temporal

6 May 2018

Common Deep Learning Recipe

  • Specification of Dataset
  • Cost Function
  • Optimization Procedure
  • Model

30 April 2018

Structured Prediction

  • Graphical Models
    • Bayesian Networks 
    • Markov Networks
  • Inference Methods
    • Message Passing 
    • Integer Programs
    • Dynamic Programming
    • Variational Methods
  • Classical Discriminative Learning
    • Structured SVM 
    • Structured Perceptron
    • Conditional Random Fields
  • Non-Linear Approaches
    • Structured Random Forests
    • Deep Structured Prediction
  • More Complex Structures
    • Hierarchical Classification
    • Sequence Prediction/Generation
  • Application Areas
    • Computer Vision
    • Speech Recognition 
    • Natural Language Processing
    • Bioinformatics

20 April 2018

Consumer Protection

A few areas of consumer protection that provide for certain indicators of measure for rights of consumers, fair trading practices, competition, and accurate information in the marketplace:

  • Access
  • Complaints Handling
  • Dispute Resolution and Redress
  • Economic Interests
  • Education and Awareness
  • Empowerment Index
  • Protection Index
  • Fraud Detection
  • Governance and Participation
  • Information and Transparency
  • Verifiable Practices and Standards
  • Privacy and Data Security
  • Safety and Reliability
  • Product and Service Reviews

Identity and Access Management

Tools:
  • OpenAM
  • OpenSSO
  • Shibboleth
  • OpenDJ
  • OpenIDM

A Few Machine Learning Use Cases in IAM:
  • Provisioning accounts and permissions management
  • Dynamic risk scoring
  • Identification of Friend or Foe
  • Fraud and Threat patterns via detection of anomalies
  • Feature Engineering (attributes, subjects, resources, environments, roles, entitlements)
  • Rule profiling using decision functions
  • Clustering to identify threshold patterns, excess, shared identity attributes, overlaps
  • Potential for use with blockchain for digital identity and trust
  • Deep identification with biometrics and fingerprints
  • Mining for visibility of IAM and Security Information and Event Management

18 April 2018

Consumer Behavior

Consumer spending behavior is directly correlated to household income that dictates disposable income. One can build a user profile of consumers with a set of attributes that could be contextualized towards specific market trends. Globally different regions have their own taxation. But, invariably to map an entire user behavior one would have to look at an entire calendar period - day, week, month, year. So, in UK this would pertain to the April-to-April tax year. By doing this one can obtain clearer set of patterns during bank holidays, weekends, weekdays, seasonal, social events, and other periods to glean on specific contextual behaviors.  Once an anonymized user is mapped to Y1 the following Y2, Y3, Yn could be mapped to discover historical trends. Using machine learning approaches like clustering can provide for a means of visualization of complex networks to identify churn, segmentation, and intents for conversion. Additionally, semantic enrichment could provide further context for answering specific data science questions and end-to-end predictive storytelling. From looking at big data standpoint it would certainly help to process batch and in stream mode. However, one would have to take into account the difference between processing and event time of recorded behavior as well as to maximize in-memory computation. The below highlight key indicators that could be analyzed for consumer behavior.
  • Economic conditions
  • Group/Social Influence
  • Historical Trends
  • Location-Awareness
  • Marketing Campaigns
  • Personal Preferences
  • Purchase Power
Additionally, the following could further add value in cyclical process to identify, discover, and understand:
  • Consumer Habits
  • Conversion Targets
  • Product Choices
  • Consumer Reviews and Ratings
  • Consumer Sentiments
  • Identifying and Predicting Churn, Segmentation, Price Optimization
  • Profiling for insights, forecasting, personalized promotions/offers/discounts
  • Consumer Experience
  • Consumer Price Index
  • Consumer Satisfaction Index
  • Consumer Protection
  • Market Trends
  • Consumer Interests - unconscious consumption
  • Consumer Intents - conscious search

4 April 2018

Feature Structure Goals in Spark

Classification & Regression
End Goal:
  • Column of type Double to represent Label
  • Column of type Vector (Sparse or Dense)
Recommendations
End Goal:
  • Column of Users
  • Column of Items
  • Column of Ratings
Unsupervised Learning
End Goal:
  • Column of Type Vector (Sparse or Dense)
Graph Analytics
End Goal:
  • DataFrame of Vertices
  • DataFrame of Edges

Gremlin Guide

Gremlin Guide

5 March 2018

Beam Capability Matrix

Beam Capability Matrix

Types of RDF Storage

Native
  • Main Memory-based
  • Disk-based
Non-native
  • RDBMS
    • Schema-based
      • Vertical partitioning
      • Hierarchical property table
      • Property table
    • Schema-free
      • Triple table
  • NoSQL
    • Key-value
    • Column Family
    • Document store
    • Graph database