Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computational Learning Theory, Slides of Artificial Intelligence

This document explains Sample complexity, Computational complexity and Mistake bound.

Typology: Slides

2021/2022

Uploaded on 03/31/2022

arlie
arlie 🇺🇸

4.6

(17)

245 documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computational
Learning Theory
CS 486/686: Introduction to Artificial Intelligence
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Computational Learning Theory and more Slides Artificial Intelligence in PDF only on Docsity!

Computational

Learning Theory

CS 486/686: Introduction to Artificial Intelligence

Overview

  • Introduction to Computational Learning Theory
  • PAC Learning Theory

Thanks to T Mitchell

Computational Learning Theory

  • Are there general laws for inductive learning?
  • Theory to relate
    • Probability of successful learning
    • Number of training examples
    • Complexity of hypothesis space
    • Accuracy to which f is approximated
    • Manner in which training examples are presented

Computational Learning Theory

  • Sample complexity
    • How many training examples are needed to

learn the target function, f?

Computational Learning Theory

  • Sample complexity
    • How many training examples are needed to learn the

target function, f?

  • Computational complexity
    • How much computation effort is needed to learn the

target function, f?

  • Mistake bound
    • How many training examples will a learner misclassify

before learning the target function, f?

Sample Complexity

  • How many examples are sufficient to learn f?
    • If learner proposes instances as queries to a teacher
      • learner proposes x, teacher provides f(x)
    • If the teacher provides training examples
      • teacher provides a sequence of examples of the form (x,

f(x))

  • If some random process proposes instances
    • instance x generated randomly, teacher provide f(x)

Function Approximation

How many labelled examples are needed in order to determine which of the

hypothesis is the correct one?

All 2

20 instances in X must be labelled!

There is no free lunch!

X=<x 1 ,...,x 20 >

Boolean

|X|=

20

H={h:X->{0,1}}

h

h

h

|H|=

|X|

2^

Sample Complexity

  • Given
    • set of instances X
    • set of hypothesis H
    • set of possible target concepts F
    • training instances generated by a fixed unknown probability distribution D over

X

  • Learner observes sequence, D, of training examples (x,f(x))
    • instances are drawn from distribution^ D
    • teacher provides f(x) for each instance
  • Learner must output a hypothesis h estimating f
    • h is evaluated by its performance on future instances drawn according to D

Training Error vs True Error

  • Training error of h wrt target function f
    • How often h(x)≠f(x) over training instances D
      • (^) errorD(h)=Prx in D[f(x)≠h(x)]= #(f(x) ≠h(x))/|D|
      • Note: A consistent h will have errorD(h)=
  • True error of h wrt to target function f
    • How often h(x)≠f(x) over future instances drawn at

random from D

  • error D (h)=Prx in D [f(x)≠h(x)]

Version Spaces

  • A hypothesis h is consistent with a set of training

examples D of target function f if and only if

h(x)=f(x).

  • A version space, VSH,D, is the set of all hypothesis

from H that are consistent with all training examples

in D

How many examples will ε-

exhaust the VS?

  • Theorem[Haussler, 1988]. If
    • H is finite and
    • D is a sequence of m≥1 independent random

examples of target function f

  • Then for any 0≤ε≤1, the probability that

VSH,D is not ε-exhausted (wrt f) is less

than |H|e

  • εm

16

How many examples will ε-

exhaust the VS?

  • Theorem[Haussler, 1988]. If
    • H is finite and
    • D is a sequence of m≥1 independent random

examples of target function f

  • Then for any 0≤ε≤1, the probability that

VSH,D is not ε-exhausted (wrt f) is less

than |H|e

  • εm

17

Proof

How to interpret this result

  • Suppose we want this probability to be at most^ δ
    • How many training examples are needed?
    • If errorD(h)=0 then with probability at least (1-δ)