23 March 2014

Keyphrase Extraction With Maui And Kea

Keyphrase extraction is a method of obtaining for indexing the most frequently occurring or important phrases in the context of the application. They can be useful for search engines in indexing document collections, for advertising, and many other domains. Carrot2 is often used as an embedded clustering service within Solr search. However, for a semantic web point of view, it is useful to have vocabularies to work with in order to attain rich extractions in context based on the defined dictionary of terms. Kea is one useful keyphrase extraction library that utilizes SKOS vocabularies making rich extraction very useful. It is further extended through Maui as an indexer and has an integration for machine learning via Weka. It is ideal when one wants to extract on specific context and then to further disambiguate such contexts based on substantial custom controlled vocabulary of terms over a large set of document sources. This can further be extended with use of Maui acting as the automatic indexer for topical tagging, keyphrases, keywords, descriptors, and specific terms.