16 May 2021

What is a Knowledge Graph

A knowledge graph has a few key characteristics:

  • Is a type of knowledge base containing semantically integrated and interconnected graph-structured data
  • The representation is in a machine-readable standard that follows an open-world assumption
  • The representation forms an abstraction of a connected graph of relationships 
  • An ability to reason over the connected representation, defined in an open-world assumption, in order to build inference on new relationships
  • The queryable representation is iteratively evolvable through machine-inference, therefore dynamic and not static
  • The ability to semantically infer over machine-readable data allows the abstract representation of data to extend from information into knowledge
  • The semantic context is prevalent and embedded in the knowledge representation, namespaced, and tends to be defined as interlinked URI resources
  • The representation can be stored in any noSQL database, that is able to support the serialized data format, but for performance reasons it tends to be stored in a native graph-like triplestore or a quadstore
  • Property graph without the machine-readable representation and inference layer is not a knowledge graph 
  • Without machine-readable representation and inference layer the representation is static, merely data, likely to be defined in a closed-world assumption that can be queried and managed in a database leaving much of the form of inference to the person running the analytics
  • With additional inference layer and rich machine-readable representation, the database can take a performance hit in both reads and writes, in most cases writes are sub-optimal to reads, and therefore a write operation tends to be done in the background as a periodic bulk function and only a read operation is made available to the end user (an example of this can be seen between Wikipedia, which is available for read/writes, and DBpedia, as knowledge graph which is bulk loaded every so often and made available for read-only)
  • Many semantic graphs still work on the client-server basis with vertical scaling option of the pre-2000s era, which may pose an issue for heavy read/writes where reliance is on one core server, to avoid downtime make bulk writes offline on a hot swappable as a replicated instance, once resolved then swap the instances 
  • In a knowledge graph, only metadata is stored (data about data which is in a connected and machine-readable form) and not the changing data itself (keep the data separate from the metadata)
  • A knowledge graph is generally intended for searchability, queryability, discoverability, and findability cases and not for heavy transactional cases
  • If one wants heavy read/writes then opt for a transactional option, in most cases where such databases may not support inference nor provide semantic compliance, but may provide sharding, horizontal scaling, static property graphs in their serialized data representations which may be compliant with the tinkerpop stack, alternatively, an in-memory analytical graph may also be an option to avoid heavy I/O performance hits (bare in mind, without inference + machine-readable representation it is no longer a knowledge graph but merely a static connected graph-structured data representation stored in a database)
  • Knowledge Graph is a special case of a Knowledge Base defined as per the graphical representation that it holds in the data abstraction in form of subject-predicate-object triples, however, machine-readable knowledge does not necessarily have to be stored in form of a triple, the W3C have defined a specific set of standards for working with semantic data, but is not necessarily the only way