24小时热门版块排行榜    

查看: 728  |  回复: 4
当前主题已经存档。
当前只显示满足指定条件的回帖,点击这里查看本话题的所有回帖

liuxus20030704

银虫 (小有名气)

[交流] 【求助/交流】如何剔除光合测定中的异常数据啊!谢谢各位大虾了 已有2人参与

我在葡萄叶片光合特性测定及叶绿素荧光特性测定后,感觉有些数据异常,但无从准确判断,请教有哪些数据剔除方法啊?
回复此楼

» 猜你喜欢

已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

allengrass

新虫 (初入文坛)


小木虫(金币+0.5):给个红包,谢谢回帖交流
我曾经请教了一位老师,他告诉我如果数值超出平均值+(-)2倍标准差,则判定为歧义值,可考虑删去。删之前,尽量查查自己的原始记录,看看这些值是否是在正常的实验条件下获得的。
5楼2010-04-13 18:51:37
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
查看全部 5 个回答

boouy

银虫 (正式写手)

★ ★ ★
liuxus20030704(金币+1):谢谢参与
liuxus20030704(金币+2,VIP+0):收获挺大,但我觉得明显影响标准差应该是有个量化指标什么的?不知能否告之。liuxu@nwsuaf.edu.cn 12-29 09:18
在Excel中做散点图,把奇异点outlier去掉。
或者对同一测量点的不同重复数据做标准差分析,把明显影响标准差的数据点去掉。
2楼2009-12-29 00:00:14
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

boouy

银虫 (正式写手)


liuxus20030704(金币+1,VIP+0): 12-31 11:58
You may refer this link:
http://www.graphpad.com/articles/outlier.htm

Hope it helps!
--------------------------------------------------------------------------------

Detecting Outliers

By Dr. Harvey Motulsky
President, GraphPad Software
hmotulsky@graphpad.com
All contents are copyright © 1995-2002 by GraphPad Software, Inc. All rights reserved.

Outliers make statistical analyses difficult.

When analyzing data, you'll sometimes find that one value is far from the others. Such a value is called an "outlier", a term that is usually not defined rigorously. When you encounter an outlier, you may be tempted to delete it from the analyses. First, ask yourself these questions:

Was the value entered into the computer correctly? If there was an error in data entry, fix it.
Were there any experimental problems with that value? For example, if you noted that one tube looked funny, you have justification to exclude the value resulting from that tube without needing to perform any calculations.
Is the outlier caused by biological diversity? If each value comes from a different person or animal, the outlier may be a correct value. It is an outlier not because of an experimental mistake, but rather because that individual may be different from the others. This may be the most exciting finding in your data!
After answering no to those three questions, you have to decide what to do with the outlier. There are two possibilities.

One possibility is that the outlier was due to chance. In this case, you should keep the value in your analyses. The value came from the same population as the other values, so should be included.
The other possibility is that the outlier was due to a mistake - bad pipetting, voltage spike, holes in filters, etc. Since including an erroneous value in your analyses will give invalid results, you should remove it. In other words, the value comes from a different population than the other and is misleading.
The problem, of course, is that you can never be sure which of these possibilities is correct.

Clearly, no mathematical calculation will tell you for sure whether the outlier came from the same or different population than the others. But statistical calculations can answer this question: If the values really were all sampled from a Gaussian distribution, what is the chance that you'd find one value as far from the others as you observed? If this probability is small, then you will conclude that the outlier is likely to be an erroneous value, and you have justification to exclude it from your analyses.

Statisticians have devised several methods for detecting outliers. All the methods first quantify how far the outlier is from the other values. This can be the difference between the outlier and the mean of all points, the difference between the outlier and the mean of the remaining values, or the difference between the outlier and the next closest value. Next, standardize this value by dividing by some measure of scatter, such as the SD of all values, the SD of the remaining values, or the range of the data. Finally, compute a P value answering this question: If all the values were really sampled from a Gaussian population, what is the chance of randomly obtaining an outlier so far from the other values? If the P value is small, you conclude that the deviation of the outlier from the other values is statistically significant.

The Grubbs' method for assessing outliers is particularly easy to understand. This method is also called the ESD method (extreme studentized deviate). A separate document explains the logic of Grubbs' test and how to perform it. Download an Excel worksheet that performs the calculations (requires an unzipping program and Excel 5 or later).

The most that Grubbs' test (or any outlier test) can do is tell you that a value is unlikely to have come from the same Gaussian population as the other values in the group. You then need to decide what to do with that value. I would recommend removing significant outliers from your calculations in situations where experimental mistakes are common, so long as biological variability is not a possibility and you document your decision. Others feel that you should never remove an outlier unless you noticed an experimental problem.

Grubbs' Test for Detecting Outliers

Statisticians have devised several ways to detect outliers. Grubbs' test is particularly easy to follow. This method is also called the ESD method (extreme studentized deviate).

The first step is to quantify how far the outlier is from the others? Calculate the ratio Z as the difference between the outlier and the mean divided by the SD. If Z is large, the value is far from the others. Note that you calculate the mean and SD from all values, including the outlier.



Since 5% of the values in a Gaussian population are more than 1.96 standard deviations from the mean, your first thought might be to conclude that the outlier comes from a different population if Z is greater than 1.96. This approach only works if you know the population mean and SD from other data. Although this is rarely the case in experimental science, it is often the case in quality control. You know the overall mean and SD from historical data, and want to know whether the latest value matches the others. This is the basis for quality control charts.

When analyzing experimental data, you don't know the SD of the population. Instead, you calculate the SD from the data. The presence of an outlier increases the calculated SD. Since the presence of an outlier increases both the numerator (difference between the value and the mean) and denominator (SD of all values), Z does not get very large. In fact, no matter how the data are distributed, Z can not get larger than , where N is the number of values. For example, if N=3, Z cannot be larger than 1.555 for any set of values.

Grubbs and others have tabulated critical values for Z which are tabulated below. The critical value increases with sample size, as expected.

If your calculated value of Z is greater than the critical value in the table, then the P value is less than 0.05. This means that there is less than a 5% chance that you'd encounter an outlier so far from the others (in either direction) by chance alone, if all the data were really sampled from a single Gaussian distribution. Note that the method only works for testing the most extreme value in the sample (if in doubt, calculate Z for all values, but only calculate a P value for Grubbs' test from the largest value of Z.

Once you've identified an outlier, you may choose to exclude that value from your analyses. Or you may choose to keep the outlier, but use robust analysis techniques that do not assume that data are sampled from Gaussian populations.

If you decide to remove the outlier, you then may be tempted to run Grubbs' test again to see if there is a second outlier in your data. If you do this , you cannot use the same table. Rosner has extended the method to detecting several outliers in one sample. See the first reference below for details.


References: (Click to see full citation, and to order from amazon.com)

How to Detect and Handle Outliers by B Iglewicz and DC Hoaglin,

Outliers in Statistical Data (3rd edition) by V. Barnett and T. Lewis

Critical values for Z. Calculate Z as shown above. Look up the critical value of Z in the table below, where N is the number of values in the group. If your value of Z is higher than the tabulated value, the P value is less than 0.05.

N
Critical Z
  
N
Critical Z

3
1.15
  
27
2.86

4
1.48
  
28
2.88

5
1.71
  
29
2.89

6
1.89
  
30
2.91

7
2.02
  
31
2.92

8
2.13
  
32
2.94

9
2.21
  
33
2.95

10
2.29
  
34
2.97

11
2.34
  
35
2.98

12
2.41
  
36
2.99

13
2.46
  
37
3.00

14
2.51
  
38
3.01

15
2.55
  
39
3.03

16
2.59
  
40
3.04

17
2.62
  
50
3.13

18
2.65
  
60
3.20

19
2.68
  
70
3.26

20
2.71
  
80
3.31

21
2.73
  
90
3.35

22
2.76
  
100
3.38

23
2.78
  
110
3.42

24
2.80
  
120
3.44

25
2.82
  
130
3.47

26
2.84
  
140
3.49


Computing an approximate P value

You can also calculate an approximate P value as follows.

Calculate . N is the number of values in the sample, Z is calculated for the suspected outlier as shown above.
Using StatMate (or another program), determine the P value corresponding with that value of T. Look up the two-tailed P value for the student t distribution with N-2 degrees of freedom.
Multiply the P value you obtain in step 2 by N. The result is an approximate P value for the outlier test. This P value is the chance of observing one point so far from the others if the data were all sampled from a Gaussian distribution. If Z is large, this P value will be very accurate. With smaller values of Z, the calculated P value may be too large.
GraphPad QuickCalcs: Try GraphPad's Free online calculator for detecting outliers.

--------------------------------------------------------------------------------

GraphPad Home
3楼2009-12-30 00:47:00
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖

louyiceng

荣誉版主 (知名作家)

优秀版主


liuxus20030704(金币+1):谢谢参与
引用回帖:
Originally posted by boouy at 2009-12-30 00:47:00:
You may refer this link:
http://www.graphpad.com/articles/outlier.htm

Hope it helps!
--------------------------------------------------------------------------------

Detecting Ou ...

谢谢
4楼2010-04-13 09:19:23
已阅   回复此楼   关注TA 给TA发消息 送TA红花 TA的回帖
普通表情 高级回复 (可上传附件)
信息提示
请填处理意见