基于深度长短记忆网络的汽轮机数据清洗Turbine data cleaning based on deep LSTM
许小刚,王志香,王惠杰
摘要(Abstract):
汽轮机运行过程会产生多样且大量数据。为适应大数据驱动及仿真建模对高质量数据的要求,高效的数据清洗十分必要。利用长短记忆层对于时序数据出色的非线性拟合能力搭建了汽轮机半监督数据清洗模型。模型选取机组的3个边界条件作为输入,对待清洗数据进行预测,根据预测值与实际值的残差完成异常值剔除,之后选用模型的预测值进行数据填充,保证数据的完整性。利用模型对某电厂650 MW机组进行数据清洗,并且为克服样本失衡给清洗模型指标选取带来的问题,对准确率进行了改进并将其作为清洗效果的衡量指标。结果表明:深度长短记忆网络的数据清洗模型改进准确率高于其他3种常见清洗方法,可有效识别数据是否异常,且可利用预测值进行数据填充,保证清洗前后数据量一致。
关键词(KeyWords): 长短记忆网络;深度学习;数据清洗;异常值;汽轮机
基金项目(Foundation): 中央高校基本科研业务费专项资金资助(2019MS094)~~
作者(Author): 许小刚,王志香,王惠杰
DOI: 10.19666/j.rlfd.202210213
参考文献(References):
- [1]李琳,董博,郑玉巧.大型风力机异常功率数据清洗方法[J].兰州理工大学学报, 2022, 48(3):65-70.LI Lin, DONG Bo, ZHENG Yuqiao. Abnormal power data cleaning method of large wind turbine[J]. Journal of Lanzhou University of Technology, 2022, 48(3):65-70.
- [2]代杰杰,宋辉,杨祎,等.基于栈式降噪自编码器的输变电设备状态数据清洗方法[J].电力系统自动化,2017, 41(12):224-230.DAI Jiejie, SONG Hui, YANG Yi, et al. State data cleaning method of power transmission and transformation equipment based on stacked noise reduction autoencoder[J]. Power System Automation,2017, 41(12):224-230.
- [3]杨茂,孟玲建,李大勇,等.基于类3σ准则的光伏功率异常数据识别[J].可再生能源, 2018, 36(10):1443-1448.YANG Mao, MENG Lingjian, LI Dayong, et al.Identification of photovoltaic power anomaly data based on Class 3σcriterion[J]. Renewable Energy, 2018, 36(10):1443-1448.
- [4]何高清,肖健.轴承尺寸检测数据的异常值检测与数据处理研究[J].机电工程, 2021, 38(2):198-203.HE Gaoqing, XIAO Jian. Study on outlier detection and data processing of bearing size detection data[J].Mechanical and Electrical Engineering, 2021, 38(2):198-203.
- [5]许璟琳,彭阳,余芳强.基于k-means聚类和离群点检测算法的医院建筑节能诊断方法[J].计算机应用,2021, 41(增刊1):288-292.XU Jinglin, PENG Yang, YU Fangqiang. Diagnosis method of hospital building energy saving based on kmeans clustering and outlier detection algorithm[J].Computer Applications, 2021, 41(Suppl.1):288-292.
- [6]陈洪涛,蔡慧,李熊,等.基于k-means聚类算法的低压台区线损异常辨别方法[J].南方电网技术, 2019,13(2):2-6.CHEN Hongtao, CAI Hui, LI Xiong, et al. Identification method of line loss anomaly in low-voltage station area based on k-means clustering algorithm[J]. China Southern Power Grid Technology, 2019, 13(2):2-6.
- [7]贺玉海,周庆琨,程埮晟,等.基于改进K-Medoids的组合聚类算法及异常值检测研究[J].大连理工大学学报, 2022, 62(4):403-410.HE Yuhai, ZHOU Qingkun, CHENG Kunsheng, et al.Combinatorial clustering algorithm and outlier detection based on improved K-Medoids[J]. Journal of Dalian University of Technology, 2022, 62(4):403-410.
- [8]石玉亮,王呈.基于Pearson-LOF算法的梯联网数据采集端异常帧检测[J].控制工程, 2022, 29(8):1457-1463.SHI Yuliang, WANG Cheng. Anomaly frame detection at ladder network data acquisition end based on PearsonLOF algorithm[J]. Control Engineering, 2022, 29(8):1457-1463.
- [9] WANG B Y, LUO X Y, ZHANG S M. An improved outlier detection algorithm K-LOF based on density[J].Computing, Performance and Communication Systems,2017, 2(1):1-7.
- [10]侯振英.基于Isolation Forest的城市道路交通异常检测[D].北京:北京工业大学, 2017:1.HOU Zhenying. Detection of urban road traffic anomalies based on imaging forest[D]. Beijing:Beijing University of Technology, 2017:1.
- [11]吴磊,康英伟.基于改进粒子群优化长短时记忆神经网络的脱硫系统SO2预测模型[J].热力发电, 2021,50(12):66-73.WU Lei, KANG Yingwei. SO2 prediction model of desulfurization system based on improved particle swarm optimization long and short-term memory neural network[J].Thermal Power Generation, 2021, 50(12):66-73.
- [12]吴飘.基于深度学习的时序数据异常检测算法研究[D].大连:大连理工大学, 2021:1.WU Piao. Research on anomaly detection algorithm of time series data based on deep learning[D]. Dalian:Dalian University of Technology, 2021:1.
- [13]刘云鹏,王权,许自强,等.基于多层架构的油中溶解气体数据清洗与异常识别方法研究[J].华北电力大学学报(自然科学版), 2022, 49(1):81-89.LIU Yunpeng, WANG Quan, XU Ziqiang, et al. Research on data cleaning and anomaly identification method of dissolved gas in oil based on multilayer architecture[J].Journal of North China Electric Power University(Natural Science Edition), 2022, 49(1):81-89.
- [14]刘建伟,宋志妍.循环神经网络研究综述[J].控制与决策, 2022, 37(11):2753-2768.LIU Jianwei, SONG Zhiyan. A review of recurrent neural network research[J]. Control and Decision, 2022, 37(11):2753-2768.
- [15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
- [16]朱乔木,李弘毅,王子琪,等.基于长短期记忆网络的风电场发电功率超短期预测[J].电网技术, 2017,41(12):3797-3802.ZHU Qiaomu, LI Hongyi, WANG Ziqi, et al. Ultra-shortterm prediction of wind farm power generation power based on long-term short-term memory network[J]. Power Grid Technology, 2017, 41(12):3797-3802.
- [17]宋旭东,朱大杰,杨杰,等.一种基于L2正则化迁移学习的变负载工况条件下故障诊断方法[J].大连交通大学学报, 2022, 43(2):106-109.SONG Xudong, ZHU Dajie, YANG Jie, et al. A fault diagnosis method under variable load conditions based on L2 regularization transfer learning[J]. Journal of Dalian Jiaotong University, 2022, 43(2):106-109.
- [18] KINGMA D P, BA J. Adam:a method for stochastic optimization[C]//International Conference on Learning Representations. Ithaca, NYarXiv.org, 2014.
- [19]杨锡运,赵泽宇,杨岩,等.基于时空信息组合的分布式光伏功率预测方法研究[J].热力发电, 2022, 51(8):64-72.YANG Xiyun, ZHAO Zeyu, YANG Yan, et al. Research on distributed photovoltaic power prediction method based on spatio-temporal information combination[J].Thermal Power Generation, 2022, 51(8):64-72.
- [20]贺之豪.数据驱动的汽轮机组性能诊断研究[D].北京:华北电力大学, 2019:1.HE Zhihao. Data-driven turbine unit performance diagnosis study[D]. Beijing:North China Electric Power University, 2019:1.
- [21]徐鸿艳,孙云山,秦琦琳,等.缺失数据插补方法性能比较分析[J].软件工程, 2021, 24(11):11-14.XU Hongyan, SUN Yunshan, QIN Qilin, et al.Comparative analysis of performance of interpolation method for missing data[J]. Software Engineering, 2021,24(11):11-14.
- [22]纪德洋,金锋,冬雷,等.基于皮尔逊相关系数的光伏电站数据修复[J].中国电机工程学报, 2022, 42(4):1514-1523.JI Deyang, JIN Feng, DONG Lei, et al. Data repair of photovoltaic power plant based on Pearson correlation coefficient[J]. Proceedings of the CSEE, 2022, 42(4):1514-1523.
- [23]蔡文斌,程晓磊,王鹏,等.基于DBSCAN二次聚类的配电网负荷缺失数据修补[J].电气技术, 2021,22(12):27-33.CAI Wenbin, CHENG Xiaolei, WANG Peng, et al.Patching of missing data of distribution network load based on DBSCAN secondary clustering[J]. Electrical Technology, 2021, 22(12):27-33.
- [24]张晟斐,李天梅,胡昌华,等.基于深度卷积生成对抗网络的缺失数据生成方法及其在剩余寿命预测中的应用[J].航空学报, 2022, 43(8):441-455.ZHANG Shengfei, LI Tianmei, HU Changhua, et al.Missing data generation method based on deep convolution generation adversarial network and its application in remaining life prediction[J]. Journal of Aeronautics, 2022, 43(8):441-455.
- [25]郑欣彤,边婷婷,张德强,等. ARIMA和LSTM方法长时间温度观测数据缺失值插补的比较[J].计算机应用, 2022, 42(增刊1):130-135.ZHENG Xintong, BIAN Tingting, ZHANG Deqiang,et al. Comparison of missing value interpolation in longterm temperature observation data of ARIMA and LSTM methods[J]. Computer Applications, 2022, 42(Suppl.1):130-135.