Deep full relation extraction

Relation Extraction on the linguistic level is the task to decide whether two entities (as recognized by a named entity recognizer (link zu NER) in some particular text are in a particular relation, or not. This can be a person is-boss-of company, or a gene induces disease. Our algorithms use powerful language-independed components to pre-analyze typical patterns in a large background corpus. This allows them to learn to recognize relations from relatively few training examples with a high quality. We offer pre-trained relation types for a large variety of life sciences oriented relation types, such as:

  • Protein up/down regulates Gene
  • several ADMET related entity and corresponding relation types
  • drug induces disease
  • and many others

Document Clustering & Cluster Labeling

Based on semantic matching, documents are written into our high-performance graph database and connected. Thus retrieval is extremely fast. Document similarity is quite high also for short texts because of the semantic matching. We have implemented a key-phrase extraction that works on documents and document clusters so that a cluster can be labelled nicely and in a helpful way for the user. Especially for isolating languages like Chinese, simple keywords didn’t do the trick.

Document Similarity

We are using language-neutral key word extraction and our thesaurus to determine the similarity of documents, i.e. even two documents that don’t share a single word can be identified as similar (e.g. one talking about a house and the other about a building).