热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

PatternRecognitioncourse笔记SemisupervisedLearning

仅供个人笔记使用Apatternrecognitionproblemgoaltherearelarge“labeled”dataonlinee.g.tweetsusinghash

仅供个人笔记使用



A pattern recognition problem


  • goal

there are large “labeled” data online e.g. tweets using hash #
can we use these unlabel data to improve our classifier


  • labeled data
    在这里插入图片描述
  • unlabeled data

在这里插入图片描述
-some applications

  • image classification (easy to obtain images e.g, from flicker)
  • protein function prediction
  • document classification
  • part of speech tagging

-semi-supervised classification

  • similar but with continuous out come measure
  • using some labels to improve a clustering solution
  • measure how well the unlabeled data could help to improve

Content


self-learning

One of the earliest studies on SSL (Hartley & Rao 1968):
• Maximum likelihood trying all possible labelings (!)
(the problem of treating unlabeled data is dealing with explosive parameter)

More feasible suggestion (McLachlan 1975):
• Start with supervised solution
• Label unlabeled objects using this classifier
• Retrain classifier treating labels as true labels
在这里插入图片描述

Also known as self-training, self-labeling or pseudo-labeling

self-learning ≈\approx EXPECTATION MAXIMIZATION

  • Linear Discriminant Analysis (LDA)
    p(X,y;θ)=∏i=1L[π0N(xi,μ0,Σ)]1−yi[π1N(xi,μ1,Σ)]y1p(X,y;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1} p(X,y;θ)=i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1
    share the covariance Σ\SigmaΣ
    N(xi,μ0,Σ)N(x_i,\mu_0,\Sigma)N(xi,μ0,Σ) gaussians for each class
  • LDA + unlabeled data

p(X,y,Xu,h;θ)=∏i=1L[π0N(xi,μ0,Σ)]1−yi[π1N(xi,μ1,Σ)]y1×∏i=1u[π0N(xi,μ0,Σ)]1−hi[π1N(xi,μ1,Σ)]h1p(X,y,X_u,h;\theta)=\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u[\pi_0N(x_i,\mu_0,\Sigma)]^{1-h_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{h_1}p(X,y,Xu,h;θ)=i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1×i=1u[π0N(xi,μ0,Σ)]1hi[π1N(xi,μ1,Σ)]h1
But we do not know h… Integrate it out!
p(X,y,Xu;θ)=∫hp(X,y,Xu,h;θ)dhp(X, y, X_u; \theta) = \int_hp(X, y, X_u, h; \theta)dhp(X,y,Xu;θ)=hp(X,y,Xu,h;θ)dh
LDA +
unlabeled data
∏i=1L[π0N(xi,μ0,Σ)]1−yi[π1N(xi,μ1,Σ)]y1×∏i=1u∑c=01πcN(xi,μc,Σ)\prod_{i=1}^L[\pi_0N(x_i,\mu_0,\Sigma)]^{1-y_i}[\pi_1N(x_i,\mu_1,\Sigma)]^{y_1}\\ \times\prod_{i=1}^u\sum^1_{c=0}\pi_cN(x_i,\mu_c,\Sigma)i=1L[π0N(xi,μ0,Σ)]1yi[π1N(xi,μ1,Σ)]y1×i=1uc=01πcN(xi,μc,Σ)
Like LDA + a gaussian mixture with the same parameters

EM algorithm

• Log sum makes optimization difficult
• Change goal: find a local maximum of this function
在这里插入图片描述
EM algorithm: finding a lower bound

what we want is construct a lower bound and touch exactyl the objective function ,and get the best lower bound which you can get


Jensen’s inequality

If f(x)f(x)f(x) concave then f(E[X])≥E[f(X)]f(E[X]) \geq E[f(X)]f(E[X])E[f(X)]

在这里插入图片描述

Does unlabeled data help?

在这里插入图片描述

θx→X\theta_x \rightarrow XθxX
X→YX \rightarrow YXY
θY∣X→Y\theta_{Y|X} \rightarrow YθYXY

Self-learning and EM conclusions

• For generative models:
• Integrate out the missing variables
• Difficult optimization problem can often be “solved” efficiently using
expectation maximization
• Only guaranteed to improve performance asymptotically, if the model is
correct
• Self-learning is a closely related technique that is applicable to any classifier
• Related: co-training (multi-view learning)
• Use labels predicted by other view(s) as newly labeled objects

Low-density assumption

Low-density assumption conclusion
• “Natural” extension for the SVM
• Local minima may be a problem
• Lots of work on optimization
• My experience: quite sensitive to parameter settings
• Other low-density approaches:
• Entropy Regularization (Bengio & Grandvalet 2005)

manifold assumption

在这里插入图片描述

  • manifold regularization
    -consistency regularication

∥f(x;w)−g(x′;wt)∥2\Vert f(x;w)-g(x';w^t) \Vert^2f(x;w)g(x;wt)2

Semi-Supervised Conclusion

• Unlabeled data is often available
• Semi-supervised learning attempts to use it to improve classifier
• Often worthwhile, but it does not come for free
• Modeling time
• Computational cost
• Remember: an unlabeled object is less valuable than a labeled one
• Labeling a few more objects can be more effective
• Remember the goal: transductive or inductive?


推荐阅读
author-avatar
血狼2732_150
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有