| 查看: 1223 | 回复: 3 | ||||
| 当前主题已经存档。 | ||||
alwens铁杆木虫 (正式写手)
老木虫
|
[交流]
文章修改意见(补做Y-randomization)
|
|||
|
最近投稿到J. Comput.-Aided Mol. Des.的一篇文章,审稿人对拟和模型的验证提出一个问题,就是希望增加一个Y-randomization test,以增加说服力。以前发的文章一般是对模型做一个内部数据集的交叉验证,再做个对外部测试集验证就可以了,Y-randomization还真没做过。查了一下常用的统计学软件如origin和SPSS,好像也没找到相关的模块或功能。因为原理很简单,就动手写了个R的脚本,现在把编辑问的问题,我的回答,以及这个脚本拿来和大家分享,希望对各位同行有参考价值。 问题:I advise the authors to perform additional validation for the models developed. For example, conduct the Y-randomization test (scramble stability test). Eighteen compounds in the test set is a low number for a training set of 108 molecules. 回答:Thanks for the reviewer’s advice. We have included the Y-randomization test results in Section 3.4, Results and Discussion. Except for the concern of the generalizability, the high internal validation performance of our xxxx models might be a result of chance correlation. To address this problem, these three models were validated by applying the Y-randomization of response test (in this work, the experimental activity values). It consists of repeating the calculation procedure several times after shuffling the Y vector randomly. If all models obtained by the Y-randomization test have relatively high values for both q2 and r2 statistics, this is due to a chance correlation and implies that the current modeling method cannot lead to an acceptable model using the available data set. This was not the case for the data set and methodology used in this work. Ten random shuffles of the Y vector were performed and the results are shown in Table 4. The low q2 and r2 values show that the good results in our original models are not due to a chance correlation or structural dependency of the training set. Table 4. Y-Randomization results of the three models. Iteration Model1 Model2 Model3 r2 q2 r2 q2 r2 q2 1 0.06087 0.02613 0.03359 0.00700 0.00004 0.27470 2 0.00050 0.06217 0.00011 0.05587 0.00675 0.01233 3 0.00287 0.07728 0.01106 0.00452 0.00673 0.01450 4 0.00309 0.06925 0.01152 0.00495 0.03866 0.00979 5 0.02495 0.00113 0.00021 0.37390 0.00426 0.04193 6 0.00080 0.27630 0.00003 0.43000 0.00200 0.15350 7 0.00424 0.04414 0.00728 0.01734 0.03321 0.00651 8 0.02441 0.00040 0.01375 0.00040 0.02045 0.00008 9 0.00199 0.09985 0.01244 0.00248 0.03296 0.00430 10 0.00795 0.01594 0.00122 0.2014 0.01232 0.00541 脚本: d<-read.table('数据文件1',header=TRUE) vp<-1:nrow(d) TIMES=100 for ( j in 1:TIMES){ x <- d$模型1 y <- sample(d$试验值) print(summary(lm(y~x))) for ( i in 1:nrow(d)){ x1 <- x x0 <- x[-i] y1 <- y y0 <- y[-i] yp<-predict(lm(y0~x0),data.frame(x0 = x), se.fit = TRUE) vp=yp$fit } print(summary(lm(y~vp))) }注:“数据文件1”为tab分隔的,行标为试验值,列标为模型预测值的文本文件。 [ Last edited by alwens on 2006-9-6 at 15:42 ] |
» 收录本帖的淘帖专辑推荐
学习工作等 |
» 猜你喜欢
拟解决的关键科学问题还要不要写
已经有7人回复
最失望的一年
已经有3人回复
存款400万可以在学校里躺平吗
已经有20人回复
国自然申请面上模板最新2026版出了吗?
已经有19人回复
请教限项目规定
已经有3人回复
基金委咋了?2026年的指南还没有出来?
已经有10人回复
基金申报
已经有6人回复
推荐一本书
已经有13人回复
疑惑?
已经有5人回复
溴的反应液脱色
已经有7人回复

goldjay
至尊木虫 (知名作家)
龙虫
- 应助: 1 (幼儿园)
- 金币: 40076.7
- 红花: 4
- 帖子: 7409
- 在线: 370小时
- 虫号: 28283
- 注册: 2003-11-09
- 性别: MM
- 专业: 环境化工

2楼2006-09-06 16:17:30
1
|
3楼2006-09-06 18:02:57
1
|
4楼2006-09-07 10:51:06













回复此楼