Document Structure Generalizer

A component which analyzes a collection of documents and extracts their typical structure. If the collection contains multiple different types of documents, it will produce a list of those types with their typical structure. It is neutral regarding which elements are used to structure documents and works both on plain text files as well as xml and html of various types. Currently, we have a prototype which is being worked on and which shows really promising results.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)