THESEUS Basic Technologies
There is more knowledge on the Internet than in all of the archives and databases in human history. The THESEUS program is developing the basic technologies and standards necessary to make this knowledge more widely available in the future.
THESEUS focuses on the development of semantic technologies for capturing the meaning of information. This allows computer programs to automatically analyze the contents of texts, images and sound and video recordings, link them and draw logical conclusions.
In order to make this evolution of the Internet possible, it is essential to develop technical standards and innovative basic technologies, and these technologies are being created and tested by the partners in the THESEUS consortium in six application scenarios. The researchers are also exploring how these technologies might be used, as soon as possible, for creating new tools, services and business models on the Internet.
An overview of these basic technologies:
1. Generating metadata automatically
THESEUS researchers are developing new methods of comprehending media contents, based on metadata. Metadata contain information about an item included in a database, for example the name of an author or the period in which a film takes place. These new methods make it possible to create metadata for various types of media content, including texts, photographs and audio and video files. One focus is to find ways of generating metadata automatically. Another involves the use of semantic technologies for understanding the context of media content. The goal is to take similar contents from a variety of sources and put the information together in a grouping that provides the user with as much information as possible.
2. Rapid processing of multimedia documents
To avoid unnecessary delays as users wait for the results of a search of a complex multimedia database, THESEUS is developing highly efficient algorithms for producing metadata. They make it possible to search images and videos quickly, even in databases containing several hundred thousand items. To enhance image searches, researchers are also developing image recognition systems that will allow computers to identify objects in a photograph or video. Another focus area is data compression, particularly for image files.
3. Innovative ontology management
THESEUS is exploring semantic technologies based on so-called ontologies, which enable computers to “understand” the meaning of content. Ontologies are formal knowledge models that conceptually represent the knowledge within a given subject area and make it possible to process that knowledge automatically at the level of meaning – something that, thus far, only human beings have been able to do. The THESEUS working group on ontology management is developing methods for improving the design and development of ontologies and enhancing automatic reasoning through their use.
4. Machine learning
THESEUS is also working on intelligent data analysis processes that facilitate automatic recognition of data relationships and interconnections so that they can be modeled and structured, much as is done with the help of ontologies. These methods are being applied to texts, images and audio and video data, and they help identify relationships between different types of data.
5. Situation-sensitive dialogue processing
Before a computer can act on behalf of a user, it needs to understand what the user wants. To facilitate this dialogue between humans and machines, researchers in the THESEUS program are developing new functions that can be deployed in different applications. Innovative algorithms make it possible to create multimodal user interfaces that can be controlled using speech, gestures and other inputs. Such interfaces allow users to formulate their queries intuitively and refine them through spoken dialogue with the system. A special component within the computer serves as an interface between the multimodal user interface and the various sources of metadata. It transforms a spoken query into the semantically appropriate data record required by the system for running a search.
6. Innovative user interfaces
THESEUS is also developing new graphic user interfaces to make it easier to identify the relationships between data, metadata and documents. For example, the results of a query can be presented in the form of a knowledge network, which shows how the search results are related to the search term as well as to one another. This provides users with a clear overview of the topic at hand and helps them to find the information they need more quickly.
7. Evaluating basic technologies
Experts are assessing the quality of the basic technologies developed within the framework of THESEUS. New technologies for speech and image recognition and for the automatic classification of metadata are being tested to determine their reliability, functionality and suitability, in an effort to ensure that the research meets quality standards. The results of this evaluation are also taken into account in the research and development process, helping to further optimize end results.