Download Computational Learning Theory and more Slides Artificial Intelligence in PDF only on Docsity!
Computational
Learning Theory
CS 486/686: Introduction to Artificial Intelligence
Overview
- Introduction to Computational Learning Theory
- PAC Learning Theory
Thanks to T Mitchell
Computational Learning Theory
- Are there general laws for inductive learning?
- Theory to relate
- Probability of successful learning
- Number of training examples
- Complexity of hypothesis space
- Accuracy to which f is approximated
- Manner in which training examples are presented
Computational Learning Theory
- Sample complexity
- How many training examples are needed to
learn the target function, f?
Computational Learning Theory
- Sample complexity
- How many training examples are needed to learn the
target function, f?
- Computational complexity
- How much computation effort is needed to learn the
target function, f?
- Mistake bound
- How many training examples will a learner misclassify
before learning the target function, f?
Sample Complexity
- How many examples are sufficient to learn f?
- If learner proposes instances as queries to a teacher
- learner proposes x, teacher provides f(x)
- If the teacher provides training examples
- teacher provides a sequence of examples of the form (x,
f(x))
- If some random process proposes instances
- instance x generated randomly, teacher provide f(x)
Function Approximation
How many labelled examples are needed in order to determine which of the
hypothesis is the correct one?
All 2
20 instances in X must be labelled!
There is no free lunch!
X=<x 1 ,...,x 20 >
Boolean
|X|=
20
H={h:X->{0,1}}
h
h
h
|H|=
|X|
2^
Sample Complexity
- Given
- set of instances X
- set of hypothesis H
- set of possible target concepts F
- training instances generated by a fixed unknown probability distribution D over
X
- Learner observes sequence, D, of training examples (x,f(x))
- instances are drawn from distribution^ D
- teacher provides f(x) for each instance
- Learner must output a hypothesis h estimating f
- h is evaluated by its performance on future instances drawn according to D
Training Error vs True Error
- Training error of h wrt target function f
- How often h(x)≠f(x) over training instances D
- (^) errorD(h)=Prx in D[f(x)≠h(x)]= #(f(x) ≠h(x))/|D|
- Note: A consistent h will have errorD(h)=
- True error of h wrt to target function f
- How often h(x)≠f(x) over future instances drawn at
random from D
- error D (h)=Prx in D [f(x)≠h(x)]
Version Spaces
- A hypothesis h is consistent with a set of training
examples D of target function f if and only if
h(x)=f(x).
- A version space, VSH,D, is the set of all hypothesis
from H that are consistent with all training examples
in D
How many examples will ε-
exhaust the VS?
- Theorem[Haussler, 1988]. If
- H is finite and
- D is a sequence of m≥1 independent random
examples of target function f
- Then for any 0≤ε≤1, the probability that
VSH,D is not ε-exhausted (wrt f) is less
than |H|e
16
How many examples will ε-
exhaust the VS?
- Theorem[Haussler, 1988]. If
- H is finite and
- D is a sequence of m≥1 independent random
examples of target function f
- Then for any 0≤ε≤1, the probability that
VSH,D is not ε-exhausted (wrt f) is less
than |H|e
17
Proof
How to interpret this result
- Suppose we want this probability to be at most^ δ
- How many training examples are needed?
- If errorD(h)=0 then with probability at least (1-δ)