31 March 2021

Three Approaches to Word Similarity Measures

  • Geometric/Spatial to evaluate relative positions of two words in semantic space defined as context vectors 
  • Set-based that relies on analysis of overlap of the set of contexts in which words occur 
  • Probabilistic using probabilistic models and measures as proposed in information theory

17 March 2021

TDD and BDD for Ontology Modelling

The end goal of most ontologies is to meet the semantic representation in a specific context of generalization that follows the open-world assumption. However, such model building approaches can manifest in different ways. One approach is to apply test-driven development and behavior-driven development techniques towards building domain-level ontologies where constraint-based testing can be applied as part of the process. The process steps are elaborated below.

  • Create a series of high-level question/answering requirements which can be defined in form of specification by example
  • Create SHACL/SHEX tests as granular to individual specification examples in context. Each, SHACL/SHEX validation basically tests 'ForSome' case as part of predicate logic per defined question where subset of domain/ranges can be tested.
  • Create BDD based acceptance tests and programmatic unit tests that can test logic constraints
  • At this stage all tests fail. In order to make them pass implement 'ForSome' closed-word assumption defined in the SHACL/SHEX validation i.e implement the representation so SPARQL query can answer the given contextual question for subset cases. Then make the test pass.
  • Keep repeating the test-implement-refactor stages until all tests pass the given set of constraints. Incrementally, refactor the representation ontology. The refactoring is more about building working generalizations that can transform the closed-world assumption of asserted facts to the partial open-world assumption of unknowns for the entire set.
  • Finally, when all tests pass, refactor the entire ontology solution so it conforms to the open-world assumption for the entire set i.e 'ForAll, there exists' which can further be tested using SPARQL against the subsumption hypothesis.
  • If the ontology needs to be integrated with other ontologies build a set of specification by examples for that and implement a set of integration tests in a similar manner.
  • Furthermore, in any given question/answer case identify topical keywords that provide bounded constraints for a separate ontology initiative, it maybe helpful here to apply natural language processing techniques in order to utilize entity linkage for reuse.
  • All tests and implementations can be engineered so it follows best practices for maintainability, extensibility, and readability. The tests can be wired through a continuous integration and a maintainable live documentation process.
  • Expose the ontology as a SPARQL API endpoint
  • Apply release and versioning process to your ontologies that complies with the W3C process
  • It is easier to go from a set of abstractions in a closed-world assumption to an open-world assumption than from an open-world assumption to a closed-world assumption. One can use a similar metaphor of going from relational to graph vs graph to relational in context. 
  • Focus on making ontologies accessible to users
  • OWA is all about incomplete information and the ability to infer on new information, constraint-based testing may not be exhaustive in the search space, but one can try to test against a subsumption hypothesis

5 March 2021

Allotrope Framework

Allotrope is used as a semantic framework in context of scientific data. Or is it really? The framework seems to have received awards. But, on deeper inspection, it looks like a hacked out solution, not only does it increase complexity, it also enforces tight coupling of semantics, lacks consistency, laced with bugs, lacks sufficiency in test coverage, and impedes the data lineage steps necessary for integrating with machine learning components for better explainability. The framework looks like a work in progress, lacks clarity, and accessibility of documentation. In fact, the documentation is inaccessible to public unless one becomes a member. It also only supports Java and C#, with no real equivalent Python API. However, an eventual Python API does appear to be in the works but only time will tell when such a solution is available for use. Although, the use of HDF5 as a reference data format is a good choice, the implementation on the whole as a semantic execution platform is not. And, worst of all most of the testing is enforced via SHACL validation using reversed engineering from open-world assumption to a closed-world assumption especially as data can take on multiple dimensions of complexity, not to mention running into the vicious cycle of unmaintainable and unreadable recursive validation cases where there is no clear cut way for requirement elicitation and subsumption testing. Enforcing what and how one validates semantic data is questionable at best. The framework tries to tackle data complexity with more complexity rather than keeping things simple, accessible, and reusable. After all, it is suppose to be an effort in using standards but seems to be applying them in a very opinionated fashion. That is another case in point, the entire framework lacks reusability and a lot of duplication of the work with reinventing the wheel. Data and design patterns are half-hearted and not well baked into the framework. There must be better ways of doing things than to use frameworks that impede productivity and only increase in the frustration of working with semantic data complexity and constraints where inevitably the solution becomes the problem.