| 查看: 734 | 回复: 1 | ||
[求助]
摘要翻译(通信、计算机、机器学习)
|
|
目前主要的访问权限控制机制为:DAC(Discretionary Access Control)、MAC(Mandatory Access Control)、RBAC(Role- based Access Control)。本文旨在提出一种新的方法,用机器学习算法建立一个访问权限自动化配置的模型。近年来,在机器学习领域,越来越多的学者把关注重点放在对原始数据集的处理上,因为如果能够用特征工程的方法尽可能的挖掘出隐藏在原始的数据集中的更多数据、更多特征,用相同的机器学习算法,可以得到更好的效果。 本文由原始的数据集生成了很多新的数据集、特征集的组合,介绍了几种机器学习算法:逻辑回归、梯度提升决策树、随机森林。用上述三种算法在数据集、特征集的组合上产生了很多分类器模型,最后在上述几种典型分类器模型的基础上,研究了一些常用的集成学习算法,并用两种集成学习算法组合了上述几种分类器模型。 具体来说,本文的工作主要体现在以下几个方面: (1)在原始数据集的基础上,产生了4个新的数据集、5个新的特征集,本文介绍了几种数学上的处理方式并选择性的应用在数据集和特征集上。尤其是在greedy数据集的产生过程中,本文利用贪婪前向选择的特征选择算法从繁杂的数据集合中选择了最优子集。 (2)介绍了逻辑回归、梯度提升决策树、随机森林等机器学习算法,分别在不同的训练集上训练,最终选择了14个典型分类器模型(五个逻辑回归模型、四个梯度提升决策树模型、五个随机森林模型),逻辑回归模型的AUC(Area Under Curve )分数分布在0.9109~0.9196;梯度提升决策树模型的AUC分布在0.8756~0.9079;随机森林模型的AUC分布在0.8782~0.9047,并用上述三个算法在三个数据集上分别训练,比较了各个模型在三个数据集上的表现。证明了特征工程的处理在单一分类模型中是非常必要的。逻辑回归在含有greedy数据集的训练集中表现不错,而梯度提升决策树和随机森林在含有tuples数据集的训练集中表现不错。但总体来说,逻辑回归模型,在某些训练集上的表现是较好的。 (3)在上述分类模型的基础上,本文介绍了投票表决和stacked generation集成学习算法,并集成上述14种典型分类器模型,投票表决集成模型的AUC达到了0.9244,相对于上述14个分类器模型的最大AUC,提高了0.0048,而stacked generation集成模型的AUC达到了0.9247,提高了0.0051。实验表明,运用集成学习算法提高了最终模型的分类能力。 |
» 猜你喜欢
最近几年招的学生写论文不引自己组发的文章
已经有11人回复
写了一篇“相变储能技术在冷库中应用”的论文,论文内容以实验为主,投什么期刊合适?
已经有4人回复
需要合成515-64-0,50g,能接单的留言
已经有3人回复
中科院杭州医学所招收博士生一名(生物分析化学、药物递送)
已经有3人回复
临港实验室与上科大联培博士招生1名
已经有8人回复
想换工作。大多数高校都是 评职称时 认可5年内在原单位取得的成果吗?
已经有4人回复
带资进组求博导收留
已经有9人回复
灿灿妹
至尊木虫 (著名写手)
- 翻译EPI: 68
- 应助: 135 (高中生)
- 贵宾: 1.097
- 金币: 14467.1
- 散金: 3921
- 红花: 102
- 沙发: 3
- 帖子: 2121
- 在线: 396.8小时
- 虫号: 1783324
- 注册: 2012-04-27
- 性别: GG
- 专业: 环境工程
【答案】应助回帖
★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★
lionel0822(RXMCDM代发): 金币+40, 多谢应助! 2015-10-04 19:23:12
lionel0822(RXMCDM代发): 金币+40, 多谢应助! 2015-10-04 19:23:12
|
At present, the main control mechanism of access permissions are: DAC(Discretionary Access Control)、MAC(Mandatory Access Control)、RBAC(Role- based Access Control). A new method that establish a automation configuration model of access permissions using machine learning algorithms was introduced in this paper. Recently, more and more scholars have focused on original data processing in machine learning field, due to that the better effects would be received using the same machine learning algorithms base on that more data and features hidden in the original data sets can be dig by feature engineering methods. Many new combinations consisted of data sets and feature sets were created base on the original data sets. This paper was introduced several machine learning algorithms, including the logistic regression, gradient boost decision tree and random forest. Many classification models were produced by using the three above algorithms in data sets and feature sets’ combinations. Some commonly used ensemble learning algorithm based on several above classification models, which are grouped by using two ensemble learning algorithm soon afterwards. Specifically speaking, The main contributions of this paper are as follows: (1) 4 data sets and 5 feature sets were generated base on the original data sets. Several mathematical treatment methods were introduced and selectively applied in data sets and feature sets, especially in the process of greedy data sets, subset regression was selected from tedious data sets using feature selection algorithm which greed to choose before. (2)The logistic regression, gradient boost decision tree and random forest were introduced in this study. 14 typical classification models( 5 logistic regression, 4 gradient boost decision tree and 5 random forest) were selected base on training in different training sets. The AUC(Area Under Curve ) distribution for logistic regression, gradient boost decision tree and random forest were 0.9109~0.9196, 0.8756~0.9079 and 0.8782~0.9047, respectively. Then, used above three algorithms training in three data sets separately to compare the performance of each model to prove the very necessary of feature engineering in single classification model. Logistic regression showed a good performance in training of greedy data sets, while gradient boost decision tree and random forest showed a better performance in training of tuples data sets. But generally speaking, logistic regression showed better in some training sets. (3)This paper introduced voting ensemble learning algorithm and stacked generation ensemble learning algorithm base on above classification models, and integrated above 14 typical classification models. The AUC distribution for voting reached 0.9244, 0.0048 higher than the biggest AUC of above 14 classification models, moreover, The AUC distribution for stacked generation was 0.9247, advanced 0.0051. Results showed that the classification capability of final model was improved by using ensemble Learning Algorithm. |

2楼2015-04-10 00:43:01













回复此楼