Wind Turbine Blade State Anomaly Detection Based on Deep Learning
Zhang Yunfeng 1, Li Min 1, Huang Jiuping 1, Wang Deyu 1, Infantry 1, Wang Qiuqiang 1, Liu Zhaochen 1, Wu Jiarui 1, Yu Jie 1, Fu Xiaolin 2, Liu Jin 2
(1. Guodian Hefeng Wind Power Development Co., Ltd., Shenyang 110000, China; 2. Chengdu Fate Technology Co., Ltd., Chengdu 610000, China)
summary:blade is hard to avoid cracks, wear, icing, imbalance and other problems due to its own structural characteristics, operating environment and other adverse factors. If there is no effective means to timely detect and maintain the abnormal state of the blade, it will lead to various injuries of the blade, and even seriously threaten the safe operation of the whole wind turbine, therefore, it is of great significance to study the state detection and fault diagnosis of wind turbine blades, and this paper first reviews the research status of blade state detection technology at home and abroad, and then puts forward the concrete implementation scheme of blade state detection based on deep learning model and discusses the feasibility and effectiveness of the scheme.
:wind turbine; blade detection; deep learning; variational auto-encoder; clustering
0 Introduction
Project Background
As one of the key core components in the whole wind turbine, the blade is easily affected by various adverse factors such as weather or man-made during the working process due to its nature, and cannot avoid problems such as cracks, abrasion, icing, imbalance, etc. If there is no effective means to detect the abnormal state in time, and if it is not handled in time, it will lead to various injuries of the blade, which seriously threatens the complete operation of the whole wind turbine, therefore, it is of great significance to study the state detection and fault diagnosis of wind turbine blades. Blade detection can avoid the possible failure of the blade in the process of operation, reduce the unnecessary loss caused by sudden accidents and the loss of power generation caused by shutdown maintenance, reduce the cost of blade maintenance, which directly affect the overall reliability and stability of the unit and comprehensive benefits.
Blade State Detection Technology at Home and Abroad
The blade is a key component in the wind turbine, once the damage occurs, due to the complexity of the load, the damage will expand at a very fast speed, and if not found in time, it may lead to the destruction of the whole structure, the consequences are unimaginable. Foreign scholars such as J. B. Ruan and other scholars have done a lot of research on blade structure health detection based on acoustic emission technology. The state of the blade is monitored by the sensor system, and the stiffness degradation and nonlinear strain response of the blade are observed by the collected signals, so that the damage identification is realized. In order to obtain the stress distribution of the blade surface, G.Dutton et al. used the thermal imaging method in the non-destructive testing of the blade, and achieved good results. Zhao Guangxin, a domestic scholar, detects the damage of the blade, uses wavelet analysis to process the acoustic emission signal of the blade when identifying and diagnosing the blade fault, and distinguishes the acoustic emission signal characteristics of different types of damage by means of denoising and reconstruction. Using vibration monitoring technology and BP neural network, Li Luping obtained the quantitative relationship between blade damage and its vibration characteristics.
Current application of artificial intelligence technology in the field of wind turbine blade state detection
The online detection and fault diagnosis of the running state of wind turbine blades can effectively prevent the occurrence of sudden and major random events, and provide theoretical support for the shutdown and maintenance of wind turbine blades. In recent years, with the maturity and wide application of artificial intelligence technology, the application of artificial intelligence in fault detection and diagnosis of wind turbine blades is a hot spot in the current structural fault detection and diagnosis. At present, artificial intelligence technology structural damage pattern recognition includes: deep learning, machine learning, wavelet analysis and expert system, these mature technologies are widely used in the diagnosis of structural damage. In real life, deep learning detects the normal running state of wind power and integrates several technologies to detect and diagnose wind turbine blade faults. The application of deep learning also makes wind turbine blade damage identification gradually transition to a simple and intelligent target.
1 Deep learning model
1.1 AutoEncoder(VE)
AutoEncoder, called autoencoder in Chinese, is an unsupervised learning model. The main reason why autoencoders have received more and more attention is their feature extraction ability in deep learning, which essentially uses a neural network to produce a low-dimensional representation of a high-dimensional input. AutoEncoder is similar to principal component analysis PCA, AutoEncoder overcome the limitation of PCA linearity when using a non-linear activation function. The AutoEncoder schematic diagram is shown in Figure 1 below.
Figure 1AutoEncoder Schematic
AutoEncoder consists of two main parts, the Encoder (encoder) encodes the input data to give a coded representation (I. e., the encoding stage), which is used to find a compressed representation of the given data; and then reconstructs the input based on these encodings, I .e., the Decoder (decoder) is used to reconstruct the original input, Decoder forcing the AutoEncoder to choose the most informative features during training. So AutoEncoder can be seen as a special three-layer neural network model: the input layer, the hidden layer and the reconstruction layer, and its goal is to make the output of the reconstruction layer as close or equal to the input as possible.
Assume that the input data isXvia encoderFcoding to getH, which is then passed through the decoder.Hdecoding and reconstructionR, the encoding and decoding processes can be expressed:
H=F(X)=Sf(WfXBh)
R=G(X)=Sg(WgHBr)
among them,Sf,Sgare expressed as activation functions for encoders and decoders, respectively, and there are many types of activation functions, but this paper uses ReLU activation functions for both encoders and decoders:f(x)=max(0),x), the value range is [0, ∞).Wf,Wgare expressed as the weights of the encoder and decoder, respectively,Bh,Brrepresents the bias of the presentation layer and the reconstruction layer, respectively. The ReLU activation function is shown in Figure 2 below
Figure 2ReLU Activation Function
AutoEncoder training is to make the output of the reconstruction layer.His as close to or equal to the input X as possible, so the difference between the two forms the reconstruction error and is used to describe the closeness between the two. There are many commonly used reconstruction error functions, and this paper uses the square error function:L(X,R)= ‖X-R‖2.
3 Principle of AE reconstruction error
1.2 VariationalAutoEncoder(VAE)
VariationalAutoEncoder(VAE) is a variational autoencoder. is an important class of generative models. It was proposed by DiederikP Kingma and Max Welling in 2013.[1]. In 2016 Carl Doersch wrote a tutorial of VAEs[2]. Its structure is similar to that of an auto-encoder and is also composed of an encoder and a decoder. The difference is that the coding process adds some restrictions to it to force the hidden vectors it generates to roughly follow a standard normal distribution, which is the biggest difference from the AutoEncoder. When generating new data, you simply give it a random hidden vector that obeys the standard normal distribution, and you can generate any data by decoding it using a decoder, and you don't need to encode the original input data first.
4 Principle of VAE
4 shows a graph model of VAE. The data we can observe isX, andX'is generated by the hidden variable Z, by Z & rarr;X'is a generative modelpθ(x|z), from a AutoEncoder point of view, is the decoder; and by X & rarr;Zis the recognition modelqϕ(z|x), an encoder similar to an autoencoder. VAE needs to calculate the similarity of the distributions of the two implied variables, I .e. the KL divergence is used to calculate the similarity of the two distributions.
The following are the specific calculation steps for VAE:
Pair Generated Model (Decoder)pθ(x|z) to do parameter estimation, using the log maximum likelihood method, is to maximize the following log likelihood function:
Identification mode for VAE (Encoder)qϕ(z|x(I)) to approximate the true posterior probabilitypθ(z|x (I)) To measure the similarity of the two distributions, we generally use KL divergence, I .e:
So then:
Among them:.
Since the KL divergence is non-negative, when the two distributions are consistent, the KL divergence is 0, sologpθ(x(I))≥L(θ,ϕ;x (I)), whereL(θ,ϕ;x (I)) is called the variational lower bound of the log-likelihood function.
direct optimizationlogpθ(x (I)) is not feasible, so it is generally turned to optimize its lower bound.L(θ,ϕ;x (I)). Correspondingly, the optimized log-likelihood function is transformed into an optimized
estimate using the Monte Carlo methodL(θ,ϕ;x (I)) expectations, then:
where, in order to calculate the hypothetical recognition modelqϕ(z│x) can be written as a differentiable functiongϕ(ε,x),z(I,l))=gϕ (ε(I,l)),x(I)),εis noise,ε(I,l)∼p(ε).
Based on the entire algorithmic derivation process above, only givenε,pθ(x|z),qϕ(z|x),pθ(z) distribution as well as differentiable functionsgϕ(ε,x), the VAE algorithm can only be started, but in practice these can be known through data, so here are the following assumptions:
p(ε)=N(ε;0,I)
qϕ(z│x(I))=N(z;μ(I),σ2(I)I)
pθ(z)=N(z;0,I)
z(I,l))=gϕ(ε(I,l),x(I))=μ(I),σ(I)ε(l)
pθ(x(I)│z)=N(x(I);μ'(I),σ'2(I)I)
So based on the hypothetical distribution above, it can be deduced that:
2 Clustering
Clustering is to divide some physical or abstract objects into several cluster classes according to the similarity between these objects, and make the data objects within the same cluster class have a high degree of similarity, while the data objects in different cluster classes have a large degree of dissimilarity. Clustering analysis is an unsupervised learning method that classifies existing unlabeled data without any prior information, so it can play a very good classification effect for unlabeled data.
At present, there are many kinds of clustering analysis algorithms, such as partition-based algorithms (K-Mean, etc.), hierarchy-based algorithms (BIRCH, etc.), density-based algorithms (DBSCAN, etc.). Due to the characteristics and distribution of the data in this paper, the K-Means algorithm suitable for the data in this paper is used here.
2.1 K-MEANS
The K-Means algorithm is an unsupervised clustering algorithm. Its basic idea is: given a data set containing N sample data and given the number of clusters K, K samples are randomly selected as the initial cluster class centers, and then an iterative method is adopted according to the similarity measurement function. Peng Kai, Wang Wei and others [3] proposed a cosine distance measurement learning algorithm, it is called and combined with a pseudo-nearest neighbor classification algorithm to achieve text classification. The experimental results show that the algorithm can effectively improve the accuracy of classification. In this paper, the cosine similarity method is used to calculate the distance from the undivided sample data to the center of each cluster, and the sample data is divided into the cluster class where the nearest cluster center is located. For each cluster class that has been assigned, the cluster center is continuously moved by calculating the average value of all data in the cluster class, and the clusters are re-divided until the square sum of errors in the class is the smallest and there is no change. Suppose the cluster is divided (C1,C2,C3,Ck), then our objective function is to minimize the squared error E:
among them,μIis a clusterCI, expressed:. The K-Means algorithm flow is shown in Figure 5 below.
Figure 5 K-Means flow chart
3 Experimental verification
3.1 experimental process
3.2 experimental data
This experiment uses a simulation dataset with a dataset size of 300000 samples and contains 10 features, namely: B1Mx, B2Mx, B3Mx, B1My, B2My, B3My, TowAccFA, TowAccSS, NacAccNod, and NacAccRol. The basic information is shown in Table 1 below.
Table 1 Data description information
3.3 data processing
Since the metrics of each dimension of the data in the original dataset are not uniform and have different distribution patterns, some data preprocessing, I .e. data normalization, must be done appropriately before data modeling. Data standardization is the transformation of the original data so that the resulting processed data conforms to the standard normal distribution, I .e., the mean is 0 and the standard deviation is 1, and its conversion function is:
Among themμis the mean of all sample data,σis the standard deviation of all sample data. The data after the data standardization process makes each data in the same dimension. The distribution of the processed data is shown in Figure 6 below.
Figure 6 Data distribution after data processing
3.4 experimental results
Training Results
The experimental results of AE and VAE are shown in the following figure, in which the changes and goodness of fit of AE and VAE in Loss loss function are compared respectively (R2), the size of the mean square error (MSE), and the situation between the predicted and true values of the two network structures.
First, for the Loss loss curves of the two network structures, the VAE network has a smaller loss and faster convergence than the AE network; second, by comparing the goodness of fit of the AE network and the VAE network (R2), it is found that VAE network has higher average goodness of fit than AE network, AE goodness of fit is about 95%, VAE goodness of fit is about 98%; Finally, by comparing the mean square error (MSE) of the original data, the MSE of VAE network is smaller and relatively stable than that of AE network.
Fig.7 Loss curve and R-Square curve of AE and AE
8 AE and MS curves of AE
Figures 9 and 10 show the relationship between the predicted and true values of the two networks, and by observing the data curves of both, it can be visually observed that both fit the original data well, but the effect of VAE is better than that of AE, and can be fully fitted.
Fig. 9 Fitted and True Values of AE
Fig. 10 Fit and True Value of VAE
Test Results
In order to further view the actual effect of the model after the data training is completed, the test data set is used to test the effect of the model.
The test data set has a total of 25000 data samples. According to the time-frequency characteristics of abnormal data, the first 10000 samples are marked as normal data, and the last 15000 samples are marked as abnormal data (including two kinds of anomalies). The data are calculated by AE and VAE networks respectively, and the respective error value change curves are obtained to judge the normal data and abnormal data. By comparing the scores of AE and VAE networks on the test data, it can be seen that AE network and VAE network can distinguish normal abnormal data, but in contrast, the positive abnormal value error score of VAE network is more obvious, the judgment is more intuitive, and the abnormality can be effectively judged by visualization.
Figure 11 Abnormal data distribution
During the test, the confusion matrix is used to evaluate the test results through the artificially marked positive anomaly label. The evaluation results showed that the quasi-classification rate of AE was 83%, and the accuracy rate of VAE was 94%, as shown in Table 2.
Table 2 Confounding evidence of AE and VAE
AE confusion proof | VAE Confusion Evidence | |||
0 | 1 | 0 | 1 | |
0 | 83% | 17% | 94% | 6% |
1 | 17% | 83% | 6% | 94% |
Figure 12 Confusion evidence for AE and VAE
In summary, the two networks of AE and VAE are used to detect the abnormal data of wind turbine data, and the results show that the two network structures can be used to detect abnormal data, but VAE is more effective in fitting the original distribution of data, and the effect of detecting abnormal values is better, and the overall accuracy rate is 94%.
3.5 Mini Batch K-Means clustering
Because AE and VAE networks carry out unsupervised learning, they can only detect abnormal values, and the type of abnormal values cannot be fully known. However, it is not enough to detect abnormal data, and it is impossible to accurately determine the type of abnormal data, so they cannot come up with corresponding solutions in time and accurately. Since there is no real label on the data, the clustering algorithm is used to "classify" the unknown data ".
In the traditional K-Means algorithm, the distance from all sample points to all centroids should be calculated. Due to the large sample size in this paper, it is very time-consuming to use the traditional K-Means algorithm at this time. The Mini Batch K-Means is used here, that is, some samples in the sample set are used to do the traditional K-Means, which can avoid the calculation problem when the sample size is too large, and the convergence speed of the algorithm is greatly accelerated. Of course, the price at this point is that the accuracy of our clustering will also be somewhat reduced. In general, this reduction is within an acceptable range. In order to increase the accuracy of the algorithm, under the 10 Mini Batch K-Means algorithm, different random sampling sets are used to obtain clustering clusters, and the optimal clustering cluster is selected. The partial classification effect is shown in the following figure.
Figure 13Mini Batch K-Means classification effect
From the experimental results, it can be seen that the Mini Batch K-Means can distinguish the abnormal data filtered by VAE network very well, and the optimal results are divided into two categories.
4 Conclusions
The blade state anomaly detection based on deep learning can judge the blade abnormal state. Firstly, it judges whether the blade state is abnormal by variational auto-coding, and then classifies the blade abnormal data by K-Means clustering algorithm to identify specific blade faults. This method can carry out state detection and fault diagnosis for wind turbine blades, and can detect the faults of blades during operation, this reduces unnecessary losses due to unexpected accidents and the loss of power generation caused by downtime maintenance and maintenance, and reduces blade maintenance and maintenance costs.
References
[1]. Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1-305.
[2]. DOERSCH C. Tutorial on Variational Autoencoders[J]. stat, 2016, 1050: 13.
[3]. Peng Kai, Wang Wei et al. Computer Engineering and Design of Pseudo-nearest Neighbor Text Classification Algorithm Based on Cosine Distance Metric Learning, 2013,34(6):2200-2211.