Publications & Patents

Selected publications, patents & patent applications.

Is singular value decomposition useful for word similarity extraction?

This study compares simple hash-based methods for computing distributional similarity with the complex singular value decomposition approach. It comes to the conclusion that the simpler methods produce both better quality and can be computed at a fraction of the time required for SVD. (2011)

A comparison of co-occurrence and similarity measures as simulations of context

Like the title says, several possible measures of word co-occurrence and distributional similarity are compared in a systematic way with each other and on large data. The quality differences were found be quite strong. (2008)

A private living lab for requirements based evaluation.

We participated in an effort to build a user-centric and realistic evaluation for email related NLP technologies. Special attention was paid to cost-effectivity as well as data protection. (2013)

Association of Information Entities Along a Time Line

How to work effectively with different, interrelated time lines especially when navigating them? This patent application covers some of our UI experiments mostly on tablets that allow for interactively moving informational entities on the screen to filter time lines. (2013)

Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences.

Early work on automatic learning of meaningful relations between words. It was supersided by more recent work on distributional semantics, but it is nonetheless interesting. (2004)

Automatic Association of Informational Entities

One of the techniques that we use to build up a knowledge graph. The Cognitive Workbench actually uses multiple techniques depending on the type and structure of the incoming information. (2013)

Automatic Crowd Sourcing for Machine Learning in Information Extraction

A technology to crowd source semantics from unstructured data gained from a massive amount of mobile devices. Notably, it covers the privacy aspect, i.e. The data extracted does not allow for guessing the identity, name or any personal details of the user. The technology is currently not part of our publicly available products. (2013)

Compact Visualisation of Search Strings for the Selection of Related Information Sources

In this patent application, we cover the Magneto technology for differential semantic tag clouds. (2013)

Comparison of Structured vs. Unstructured Data for Industrial Quality Analysis.

This comparison evaluates the exploitation of unstructured data in industrial quality analysis methods. It shows that textual resources provides tremendously more and more detailed information for some tasks than established data mining methods on structured data.

Hänig, C., Schierle, M. und Trabold, D.: Comparison of Structured vs. Unstructured Data for Industrial Quality Analysis. In: Proceedings of the World Congress on Engineering and Computer Science 2010 Vol I (WCECS 2010), IAENG, 2010

(Best Paper Award)

Distinct processing of function verb categories in the human brain.

In this study, we investigated differences between “heavy” (“take a computer”) and “light” verbs (“take a shower”).

Brain Research, Volume 1249, 16 January 2009, Pages 173–180

Elements of Knowledge-free and Unsupervised lexical acquisition

Theoretical and practical background for some of our core NLP technology. The goal was to improve our understanding of language independent algorithms that produce language-specific knowledge which then can be used in more specific solutions. Areas covered include lexical knowledge, lexical ambiguity, morphological level, as well as syntactical topics. 2007.

(PDF)

 

Gland Segmentation in Colon Histology Images: The GlaS Challenge Contest

In this paper we present the specifics of the ExB algorithm which obtained the 2nd position in the Gland Segmentation Challenge (GlaS) organised at MICCAI 2015. Our method is based on a Multi-Path Convolutional Neural Network  architecture for image segmentation. A major innovation of our model is the specialized border identification network which improves accuracy at the borders of glands and substantially improve the overall segmentation accuracy.

Sirinukunwattana, Korsuk, Josien PW Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J Matuszewski, Elia Bruni, Urko Sanchez, Anton Böhm, Olaf Ronneberger, Bassem Ben Cheikh, Daniel Racoceanu, Philipp Kainz, Michael Pfeiffer, Martin Urschler, David RJ Snead, Nasir M Rajpoot (2016): Gland Segmentation in Colon Histology Images: The GlaS Challenge Contest

Grammar or Serial Order? Discrete Combinatorial Brain Mechanisms Reflected by the Syntactic Mismatch Negativity

Neuro-scientific research into how the brain builds up syntactic representations. This is fundamental research following a buttom up approach to understanding how the human brain is understanding a sentence.

June 2007, Vol. 19, No. 6, Pages 971-980 doi:10.1162/jocn.2007.19.6.971

Improvements in Unsupervised Co-Occurrence Based Parsing.

This work extends our previous unsupervised parsing model by head detection and phrase type clustering and significantly improves the capability to parse sentences without labelled training data.

Hänig, C.: Improvements in Unsupervised Co-Occurrence Based Parsing. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL 2010), Association for Computational Linguistics, 2010

Knowledge-free Verb Detection through Tag Sequence Alignment.

This work presents an algorithm that is able to detect verbs relying on completely unsupervised language processing methods. Being able to recognize action clues in textual resources enables our applications to apply methods for deeper language understanding (such as relation extraction) in a completely unsupervised manner.

Hänig, C.: Knowledge-free Verb Detection through Tag Sequence Alignment. In: Proceedings of the 18th International Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia, Northern European Association for Language Technology (NEALT), 2011

Language-independent methods for compiling monolingual lexical data

Early description of a large scale language resources project and the required technologies around it. (2004)

Multilingual Singledocument Summarization and Multilingual Multi-document Summarization

This work presents our state of the art multilingual text summarizer capable of single as well as multi-document text summarization. The algorithm is based on repeated application of TextRank on a sentence similarity graph, a bag of words model for sentence similarity and a number of linguistic pre- and post-processing steps using standard NLP tools. We submitted this algorithm for two different tasks of the MultiLing 2015 summarization challenge: Multilingual Singledocument Summarization and Multilingual Multi-document Summarization.

Thomas, Stefan, Christian Beutenmüller, Xose de la Puente, Robert Remus, and Stefan Bordag (2015): 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 260

Neural Network Classification of Word Evoked Neuromagnetic Brain Activity

Early experiments in automatic classification of brain activity using neural networks. The setup was visual presentation of single words, processing by the human brain. Neuromagnetic brain activity was recorded and then fed into an artificial neuronal network to classify which of the visually presented words had actually been processed by the brain.

Emergent Neural Computational Architectures Based on Neuroscience
Lecture Notes in Computer Science Volume 2036, 2001, pp 311-319

Online Analysis and Display of Correlated Information

The patent application covers some of our technologies to analyse a document that the user is viewing or editing and to compute relevant contextual information and to show it to the user in the right point of time. (2013)

PACE Corpus: a Multilingual Corpus of Polarity-Annotated Textual Data from the Domains Automotive and Cellphone.

This work points out the challenges when analyzing polarity within a specific domain and when dealing with user-generated textual resources. Two comprehensively annotated corpora (English and German) constisting of user-generated data were made publicly available as gold standard data sets for experiments and evaluations.

C. Hänig, A. Niekler und C. Wünsch: PACE Corpus: a Multilingual Corpus of Polarity-Annotated Textual Data from the Domains Automotive and Cellphone. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), 2014

Relation Extraction based on Unsupervised Syntactic Parsing.

Hänig, C. und Schierle, M.: Relation Extraction based on Unsupervised Syntactic Parsing. In: Proceedings of the conference on Text Mining Services (TMS 2009), 2009

Representation of the verb’s argument-structure in the human brain

A verb’s argument structure defines the number and relationships of participants needed for a complete event. One-argument (intransitive) verbs require only a subject to make a complete sentence, while two- and three-argument verbs (transitives and ditransitives) normally take direct and indirect objects. In this MEG study, we scrutinised the neuro-magnetic brain response to different argument structures.

BMC Neuroscience 2008, 9:69 doi:10.1186/1471-2202-9-69

Resource Efficient Document Search

This patent application describes how we can efficiently search huge document collections in a resource constrained system like a mobile phone or tablet. Certainly, the applied techniques also improve the performance in server-based implementations. (2014)

Significant Advances in Medical Image Analysisorem

In the past 4 years Deep Learning (DL) has re-entered the computer vision scene dramatically, by completely shifting the design paradigm compared to the last 20 years. Whereas before the error rates in image analysis were more or less stagnant, since 2012 DL kept halving them each year, in some recent cases even achieving super-human performance! All typical tasks such classification, detection and segmentation benefited across all related applications such as traffic sign recognition, natural image analysis, automatic captioning. These developments move computer vision from a scientific playground to a productizable technology.

Bruni, Elia, and Stefan Bordag (2016): Significant Advances in Medical Image Analysisorem. Bildverarbeitung für die Medizin 2016, 1-1

The representation of the verb’s argument structure as disclosed by fMRI

Buttom up, neuro-scientific research into the verb’s argument structure: “John gave Jim a book.” How does the brain know and represent the parts in the sentence that turn the verb “gave” into a story?

BMC Neuroscience 2009, 10:3 doi:10.1186/1471-2202-10-3

Towards Well-grounded Phrase-level Polarity Analysis.

Our approach to sentiment analysis shows that polarity of phrases can be composed out of the word’s polarity. Our polarity model is language-indepedent and thus, can be easily adapted to new languages / domains.

Robert Remus und Christian Hänig: Towards Well-grounded Phrase-level Polarity Analysis. In: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing) , Springer, 2011

UnsuParse: Unsupervised Parsing with unsupervised Part of Speech tagging.

The approach presented in this paper applies unsupervised syntactic parsing and language-independant statistical relation extraction on noisy data. A taxonomy is used to abstract away from the language-dependent word level to language-independent concepts and thus, this apprach can be adapted to new languages / domains without huge manual effort regarding the relation extraction approach.

Hänig, C., Bordag, S. und Quasthoff, U.: UnsuParse: Unsupervised Parsing with unsupervised Part of Speech tagging. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), 2008

Unsupervised and knowledge-free morpheme segmentation and analysis

An algorithm is described which can analyse the morphological structure of words without knowing anything about the language in advance. Hence, it is a completely unsupervised approach and it produced decent numbers in the Morphochallenge competition. (2008)

Web services for language resources and language technology applications

Early work on viewing language resources and language technology as a potential web service source. (2004)

Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation.

A graph based word clustering approach shows that it is possible and feasible to have a completely unsupervised algorithm determine the various meanings of words. For example in a news paper corpus for the word space it would result in several meanings, one of which would be the outer space where space craft fly around in and the other meaning would be rentable office space. (2006)