Testing tools for AI systems

The media ubiquity of OpenAI's new AI application ChatGPT shows that artificial intelligence has reached an impressive level of maturity. The chatbot, which has been trained with data and texts from all over the Internet, responds to questions with answers that are difficult if not impossible to distinguish from texts created by humans. But what about the quality testing of AI systems?

The ScrutinAI tool makes it possible to detect errors in AI models or training data and analyze the causes. In this example, an AI model for detecting anomalies and diseases on CT images is examined. (Image: Fraunhofer IAIS)

ChatGPT has triggered a new hype around Artificial Intelligence, the possibilities of AI are impressive. At the same time, quality assurance and control of AI systems is becoming increasingly important - especially when they take on responsible tasks. This is because chatbot results are based on huge amounts of data on texts from the Internet. However, systems such as ChatGPT only calculate the most probable answer to a question and output this as a fact. But what testing tools exist to measure the quality of the texts generated by ChatGPT, for example?

KI test catalog

ChatGPT has increased the prominence of AI. But AI is, of course, not limited to this tool. From voice assistance systems to the analysis of job application documents to autonomous driving - as a key technology of the future, artificial intelligence (AI) is used everywhere. This makes it all the more important to design AI applications in such a way that they act reliably and securely and handle data transparently and reliably. This is a necessary prerequisite for AI to be used in sensitive areas and for users to have lasting trust in the technology. For this reason, the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS has developed an AI test catalog. This provides companies with a practice-oriented guide that enables them to make their AI systems trustworthy. In around 160 pages, it describes how AI applications can be systematically evaluated with regard to risks, formulates suggestions for test criteria to measure the quality of the systems, and proposes measures that can mitigate AI risks. 

Testing tools in use

Researchers from Fraunhofer IAIS will also be presenting various testing tools and procedures that can be used to systematically examine AI systems along their lifecycle for vulnerabilities and safeguard against AI risks at the Fraunhofer joint booth in Hall 16, Booth A12 at Hannover Messe 2023 from April 17 to 21. The tools support developers and testing institutes in systematically evaluating the quality of AI systems and thus ensuring their trustworthiness. One example is the "ScrutinAI" tool. It enables testers to systematically search for weak points in neural networks and thus test the quality of AI applications. A concrete example is an AI application that detects anomalies and diseases on CT images. The question here is whether all types of anomalies are detected equally well, or some better and others worse. This analysis helps investigators assess whether an AI application is suitable for its intended context of use. At the same time, developers can also benefit by being able to identify shortcomings in their AI systems at an early stage and take appropriate improvement measures, such as enriching the training data with specific examples.

Source and further information: Fraunhofer IAIS

(Visited 248 times, 1 visits today)

More articles on the topic