ZHAW researchers develop AI solution for cleansing machine data
Researchers at the ZHAW School of Engineering have developed an innovative framework that detects anomalies and defects in machines more efficiently, even when training data is contaminated. This development addresses a central challenge in AI research: precise error detection without being able to fall back on error-free training data.
Detecting unusual or abnormal patterns in industrial data is one of the most common tasks of AI algorithms in commercial applications. It enables the early detection of degradation, defects and errors in production and allows these problems to be rectified in good time, thus saving costs and reducing downtime.
Anomaly detection in machines is usually based on "learning from normality". This means that AI algorithms are trained using data from perfectly functioning machines in order to later recognize deviations in operating data. In practice, however, there is often no completely error-free data available, which significantly impairs the effectiveness of the models. Training with contaminated data means that the models are no longer able to distinguish between normal and faulty operating conditions - a challenge that has so far hardly been solved by research.
Use of AI algorithms without human intervention
"By working with various companies, we have realized that there is a need for AI algorithms that can be used directly and without prior human intervention for data labeling," explains Dr. Lilach Goren Huber from the Smart Maintenance Team at the ZHAW Institute for Data Analysis and Process Design (IDP).
New framework for unsupervised data refinement
To close this gap, the ZHAW researchers have developed a novel framework that automatically evaluates historical, potentially contaminated data and extracts normally functioning data samples completely unsupervised. In this way, the cleaned data can be used for training anomaly detection algorithms without the need for time-consuming manual sorting.
Simple concept, powerful effect
The framework is based on a central observation: erroneous data samples have a stronger influence on the performance of the AI models than normal samples. Based on this principle, each data sample is assigned a score that measures its influence on the training. Samples with a high score are identified as potentially erroneous and removed from the training data. In tests, the framework achieved comparable performance to manually cleaned datasets with this refined data.
Successful application and prospects
The ZHAW has tested the method on a variety of machine types, including pumps, valves, fans and engines, and has achieved promising results. In most cases, the framework was able to fully compensate for the lack of error-free training data. "Our approach is not only simple and robust, but also universally applicable. It can be combined with any type of data and existing fault detection methods," explains Dr. Lilach Goren Huber.
Source: www.zhaw.ch