15 December 2015

Automatic Summarization

Automatic Summarization is a valuable aspect of Information Extraction in Natural Language Processing. It is applied within Information Retrieval, news summaries, building research abstracts, and within various knowledge exploration contexts. Automatic Summarization can be applied either over single or multiple documents. There is even aspect of building extractions over simple verses rich textual documents. The following extrapolate the various aspects concerning Automatic Summarization processes that are under active research and utilized for development within the various textual domain contexts.

Summarization Types:
extractive
abstractive
single document
multi-document
indicative
informative
keyword
headline
generic
query-focused
update
main point
key point
outline
descriptive

Summary Sentence Approaches:
revision
ordering
fusion
compression
sentence selection vs summary selection

Unsupervised Methods:
word frequency
word probability
tf*idf weighting
log-likelihood ratio for topic signatures
sentence clustering
graph based methods for sentence ranks

Semantics and Discourse:
lexical chaining
latent semantic analysis
coreference
rhetorical structure
discourse-driven graph representation

Summary Generation Methods:
compression
rule-based compression
statistical compression
headline
fusion
context dependent revision
ordering of information

Various Genre and Domains:
medical
journal article
news
email
web
speech
conversation log
financial data
book
social media
legal

Evaluation:
precision
recall
utility
manual
automatic
pyramid
linguistic quality
accuracy

14 December 2015

Question/Answering Approaches In Perspective

Question/Answering has become a hot top in recent years as it can be applied in a variety of domain contexts for data mining for knowledge discovery and as an application of natural language processing. However, one of the core underpinnings has always been about matching a question to an answer and a reformulation. In simple terms, one could apply a decision tree style approach or formalize a keyphrase matching over a set of rules. In recent years, there has been much growth towards applications of probabilistic techniques over rules-based systems. A hybrid approach in artificial intelligence has proven to be an optimal solution in many contexts. And, including semantic constructs through ontologies allows an agent to understand and reason over domain knowledge through inference and deduction. Furthermore, one can take such an intelligent metaphor of understanding a step further into the BDI context of multi-agent systems and mediations for argumentation and game theory. Deep Learning has also provided some robust alternatives. The below is a listing of some proposed ideas on how potentially effective question/answering strategies could be achieved for open/closed-domain understanding. In every case, a semantic ontological understanding becomes important for a somewhat guided way of reasoning about the open world. One can view question/answering as almost like a data funnel or pipeline of question to answer matching through a series of filtration steps in form of Sentiment Analysis, Sentence Comprehension as form of thought chains or tokens, Machine Learning for Classification and Clustering, as well as aspects of semantic domain concepts. In such respects, one can formulate a respective knowledge graph from a generalized view of the open world and gradually apply layers on top of specialized curated domain ontologies to provide for a Commonsense Reasoning, analogous to a human. DBPedia is a starting point to the open world and the entire web is another. A separate lexical store could also be used such as wordnet, sentiwordnet, and wiktionary. Alternative examples to further build on the knowledgebase include: Yago-Sumo, UMBELSenticNet, OMCS, and ConceptNet. One could even build a graph of the various curated FAQ sites for a connected knowledge source. However, one day the Web of Data would itself provide a gigantic linked data graph of queryable knowledge via metadata. Today such options are in form of Schema.org and others. In future as research evolves, cognitive agents will be more self-aware of their world with granular and more efficient ways of understanding without much guidance. Another aspect of practical note here is the desirability for a feedback loop between short-term and long-term retention of knowledge cues to avoid excessive repeated backtracking for inference on similar question patterns in context.

Description StepsAgent Belief
QA Semantic Domain Ontologies/NLP + BDI Multiagent Ensemble Classifiers (potential for Deep Learning)Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Belief Networks using Radial Basis Functions (Autoencoders vs Argumentation)Multiple BDI
QA Semantic Domain Ontologies/NLP + BDI Multiagent Reinforcement Learning/Q LearningMultiple BDI
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive InferenceSingle
QA Semantic Domain Ontologies/NLP + Basic Commonsense ReasoningSingle
QA Semantic Domain Ontologies/NLP + Deep Learning (DBN/Autoencoders)Single
QA Semantic Domain Ontologies/NLP + LDA/LSA/Search DrivenSingle
QA Semantic Domain Ontologies/NLP + Predicate Calculus for Deductive Inference + Commonsense ReasoningSingle
QA Semantic Domain Ontologies/NLP + Groovy/Prolog RulesSingle
QA Semantic Domain Ontologies/NLP + Bayesian NetworksSingle
QA TopicMap/NLP + DeepLearning (Recursive Neural Tensor Network)Single
QA Semantic Domain Ontologies/NLP + QATopicMap + Self-Organizing MapSingle
QA Semantic Domain Ontologies/NLP + Connected Memory/Neuroscience (Associative Memory/HebianLearning)Single
QA Semantic Domain Ontologies/NLP + Machine Learning/Clustering in a DataGrid like GridGainSingle