Mabble Rabble: January 2019

26 January 2019

22 January 2019

19 January 2019

Cocktails

List of IBA official cocktails
List of Cocktails
Mixolopedia

17 January 2019

Games with a purpose are essentially types of games applied to annotations in NLP to make the process fun for the oracle (annotator), often in a crowdsourced manner. A few examples in context are listed below:

Phrase Detective
Sentiment Quiz
Guess What
ESP Game

Active Learning

Active Learning Approaches:

Pool-based Sampling
Member Query Synthesis
Stream-based Selective Sampling
Uncertainty Sampling
Query-by-Committee
Expected Model Change
Expected Error Reduction
Variance Reduction
Density-Weighted Methods
Query from Diverse Subspaces
Exponential Gradient Exploration
Balance Exploration and Exploitation

The Language Grid

16 January 2019

TimeML

TimeML
TimeBank
TARSQI
TempEval

13 January 2019

Data Science Methods

Generalizable Method (your mileage may vary, given business case and time constraints):

Identify and understand business case (as a use case or story) - in most cases you are not provided a translated use case or story so it is really about understanding the problem
Explore and prototype including background research (exploratory stage)
Identify cases for reuse
Identify whether this story even requires a model
Identify relevant datasets - curation
Visualize the data (how sparse/dense/dirty it is, multiple open source tools available for refinement steps for features, identify additional effort necessary for model build)
Identify the relevant variances and biases (will the model steps lead to an overfitting or underfitting - the objective is to build a generalizable model)
Feature Selection/Extraction (may use other ML or natural computation techniques here)
Feature engineering (this may also include curation/enrichment of metadata)
Feature re-engineering (this may also include curation/enrichment of metadata)
Identify the simplest solution that is possible
Identify the reasoning of using a complex solution
Custom model to solve the business case (do not just copy model out of a research paper - this is what the exploratory stage was for)
Evaluation and Benchmarking (formal tests may/may not be used, depends on business case)
How well does the model scope against small data and large data - identify sufficiency at average and worst time/cost
Re-Tune/Rinse/Repeat
Incrementally improve the model
Incrementally optimize/scale the model (scale only when necessary)
One simple one complex - one that is sub-par, and one that is riskier
Evaluation and Documentation
Pipeline the Solution in Dev-mode (Identify bottlenecks with the model - dry run/end-2-end for integration - at this stage a repeatable build/test/deploy/evaluate cycle may be used - DS/DE)
A/B/N/Bandit Testing in Stage (generally this stage is covered by the product team, alongside automated acceptance tests, if they know the techniques, or DS/DE maybe involved)
Release/Integrate for Production (depends whether this is a B2B or B2C case, or beta mode)
Storytelling (how well does the model answer/solve the question or problem statement - 'through the looking glass’ - refers to both dev, stage/prod cases)
User/Stakeholder/Client Feedback (Rinse & Repeat, depending on B2B or B2C cases)
Incremental Analysis and Review of Models
Rinse & Repeat (some of the steps above repeated multiple times before production release)

Process Flow:

R&D → Dev/UI/UX → Prod

Generally, with a heavy R&D/Backend focused team, the features and functionality tend to be dictated by the forward flow (Bottom-up approach), most AI projects at startups tend to be built that way. The frontend then becomes a thin client as a view to the world for assimilation of the backend efforts, typical pattern tends to be an informational dashboard for storytelling- 'through the looking glass'. This is because, in a top-down approach many of the backend efforts would get lost in translation (equally, in some business cases it may work better).

Data → Information → Knowledge

State-of-the-Art may not imply state-of-the-art for your business case and may in fact lead to a sub-optimal results and more effort. It is all very subjective, depends on the data, the associated features for training a model, and the business case you are trying to solve. Work towards least effort, mostly efficient or sensible outcome.

10 January 2019

Corpus Resources

LDC
ELRA
LRE Map

Developing a Good Corpus

TheLinguist

Linguist List - New
Linguist List - Old

8 January 2019

NLP High-Performance Computing

There are two primary approaches for working towards high-performance computing in NLP domains:

Add GPUs to server
Connect CPUs on multiple servers

Scaled out approaches generally tend to work towards maximization of constant RAM utilization, where they are able to automatically traverse the computational graph to allocate resources and optimize on throughput. In many cases, in particular, to deep learning models, the heavy acceleration of parallelized matrix multiplications makes a big difference. In neural networks, backpropagation is more computationally expensive than forward activation. Once the model is trained the weights and structures can be exported on any hardware for model prediction whether that be a forward pass or an inference pass.

Mabble Rabble

26 January 2019

Corpus Tools

Sketch and NoSketch Engine

22 January 2019

Huginn

19 January 2019

Cocktails

17 January 2019

NLP Games with Purpose

Active Learning

The Language Grid

16 January 2019

TimeML

13 January 2019

Data Science Methods

10 January 2019

Corpus Resources

TheLinguist

8 January 2019

NLP High-Performance Computing

Approximate Nearest Neighbor Matching

Chatbot Prizes

Sentence Piece

Sentence Segmentation

TM-Town

DeepMind QA

Visual Question Answering

Manythings