24小时热门版块排行榜    

CyRhmU.jpeg
查看: 679  |  回复: 1

nuptsww

木虫 (小有名气)

[求助] 帮 翻译一段 论文

(Google 或 baidu 的翻译就不要回了)
T
HE practices of data-driven management and decision making have been pervasive and widely used in today’s industrial, business and governmental applications after initial successes of big data techniques in internet business. The data quality is regarded as a significant issue of industrial process, market success and decision-making activities .
However, more than 41% of the relevant projects would fail if only the original data were used due to the poor or insufficient quality of raw data according to a study by the Meta Group. Missing data which means that electronic data during some period is lost or hidden by uncontrollable factors is one of the major potential flaws in raw data and could result in severe failure. Therefore, the engineers have to sacrifice much time to retrieve this kind of data for further analysis. As a consequence, (semi-)automatic missing data prediction methods have been proposed.
A large collection of data mining and statistical methods have been proposed to improve data quality due to missing data. For example, Ma’s team proposed a good method for missing data prediction. The algorithm focused on recommender systems using improved collaborative filtering method which outperformed the traditional collaborative filtering method. Nogueira et alsolved a practical problem based on the Fast Fuzzy Clustering Algorithm in real world: the prediction of bankruptcy, in which the used data set has missing values. Lei and Wang presents a method for pre-processing the missing observed data by adopting the multiple imputation technique for Macau air pollution index (API) prediction using the Adaptive Neuro-Fuzzy Inference System (ANFIS). The API forecasting performance after missing data pre-processing is better than the conventional case without pre-processing.
In power grid systems, data missing happens so frequently due to the harsh working condition of sensors that classic methods often fail to handle. Expensive critical equipment such as main power transformers are monitored by multiple sensors. Unfortunately, these sensors are not as reliable as the equipment in the harsh open air working condition under the workload of 7*24 hours. Moreover, sensors in remote rural areas such as mountains are usually maintained at an even worse level by workers who received less training than workers in city. Thus, it is normal and inevitable for the sensor system to produce flaw data sets, which lost or hidden some necessary information . These losses affect the data quality so badly that classic data mining and statistical methods alone cannot process these data properly.
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

baoshanqiu

至尊木虫 (著名写手)

【答案】应助回帖

数据驱动管理和决策的运用在互联网业务的大数据技术中取得最初成功以后,已经得到普及,现在广泛用于工业、商业和政府部门。数据的质量被认为是工业生产过程、市场成功以及决策活动的关键。
然而根据Meta集团的研究,如果直接使用原始数据,有超过41%的相关项目会因原始数据质量差或不足而失败。缺失数据是指因某一时期电子数据丢失或因无法控制的因素造成隐匿,是原始数据最主要的潜在缺陷之一,可造成严重失效。因此工程师必须花大量时间去恢复这样的数据作进一步分析。结果就提出了(半)自动化的缺失数据预测方法。
缺失数据的存在产生了大量的数据挖掘和统计学方法以改善数据质量。如Ma的团队提出了一种预测缺失数据的好方法。其算法集中于采纳一种比传统协同筛选法优越的改良协同筛选法的推荐系统。Nogueira等利用快速模糊聚类算法解决了现实生活中的一个实际问题:预测破产,其中使用的数据组有缺失值。Lei 和 Wang报告了一种采用多重填补技术预处理缺失观察数据的方法,利用自适应神经模糊推理系统(ANFIS)对澳门空气污染指数(API)进行预测。经缺失数据预处理后API预测性能比未经预处理的常规方法更好。
在电网中,由于传感器处于恶劣的工作条件而造成数据缺失屡见不鲜,惯常的处理方法难以奏效。像电力变压器这类昂贵的关键设备是由多个传感器监控的。不幸的是,在恶劣的露天工作条件和周7天24小时工作负荷下,这些传感器并没有该设备那么可靠。此外,在偏远的农村地区比如山区这些传感器得到的维护甚至更差。那里参与维护的工人接受的训练比城里的工人少。因此这样的传感器系统产生丢失或隐藏了一些必要信息的缺陷数据组是正常的、不可避免的。这些缺失严重影响数据的质量,单纯用传统的数据挖掘和统计方法不能恰当地处理这些数据。
2楼2015-06-24 04:58:06
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
相关版块跳转 我要订阅楼主 nuptsww 的主题更新
信息提示
请填处理意见