|
|
【答案】应助回帖
★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ yuxintian: 金币+50, 翻译EPI+1 2013-03-21 14:37:36
At present, most of the supervision and labeling methods can achieve good effect in the large-scale corpus environment , but in a real world application,
tagging corpus resources is not only difficult to obtain, hard also to be versatile. In this article, we present a prototype model extension algorithma based on A-method:
First of all, using the original small-scale training data conducts integration annotators with a certain accuracy rate.
Secondly, useing the A-algorithm expands the training data automatically. To predict the candidate example among untagged data, then the
numerical data which is greater than a certain thresholdto should join in a training set .
Finally, in line with the constraints existed in training data cutting the noise for clips. And using the training
data after extension to afresh the training classifier iteratively, until approaching the final stable iteration. |
|