7 September 2021

AI Provides Better Text Comprehension of Search Engines

DATEV conducts research at semantic search with its project SEMIARID.

Something major search engines already master relatively well these days, proves to be difficult within companies and communities so far: Getting relevant search results by requests in natural language through context sensitivity in a fast and precise manner. Especially in subject-specific contexts this issue becomes increasingly apparent, for the search engine helpers need tons of data to be as accurate as possible. Yet in a manageable specialist environment there is no such abundance of data. As part of the research project SEMIARID supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy, DATEV experts and their partners seek new ways to teach search engines how to better understand semantic relations with the help of artificial intelligence (AI).

Project SEMIARID is Being Supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy

Search engines that already allow a very efficient search today, are based on so called transformer networks. They are part of the deep learning processes, the supreme discipline in the field of AI. General search engines that are used by millions of users, contain enough data to create such transformer networks. Compared to this typical data stock in search environments within the company are significantly smaller and can’t be used for statistical evaluations in a lot of cases due to high requirements concerning data protection and confidentiality. For transformer networks can’t be trained under such circumstances, keyword-based processes still dominate this field. However, those are not able to identify complex linguistic connections. That is why keyword-based processes only provide good results, if there are concrete textual matches between the searched term and the target information.

For Better Contextual Comprehension Even With Only Little Training Data

To find a solution for this problem DATEV partnered with Intrafind Software AG as well as the Deggendorf Institute of Technology and started the research initiative SEMIARID. During this three-year project, supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy, the partners develop a search engine technology that will meet the high data protection and data security standards, but can still identify the meaning of a search request and will be able to reach a high accuracy concerning search results.

The project is being built on transformer networks. However, these are going to be adjusted through specific modifications and add-ons so that they also work for smaller document counts.

Moreover, available expert knowledge will be woven into the AI to further reduce the required amount of training data. The resulting improvements also provide added value for DATEV users: They contribute to DATEV search applications like the data base for technical and service information LEXinform or the online-platform SmartExperts.