Linear Discrimant Model

Lecture 3

Linear Discrimant Model

(update 21 Februari 2012)

Learning a Class from Examples (Alpaydin 2009) 

Class C of a “family car” Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car? 



Output: Positive (+) and negative (–) examples



Input representation: x1: price, x2 : engine power

Training set X X  {xt ,r t }tN1

 1 if x is positive r  0 if x is negative

 x1  x  x2 

p1  price  p2  AND e1  engine power  e2 

 1 if h says x is positive h( x)   0 if h says x is negative

Error of h on H E (h| X )  1hxt   r t  N

t 1

most specific hypothesis, S

most general hypothesis, G h H, between S and G is consistent and make up the version space (Mitchell, 1997)



Choose h with largest margin





Vapnik–Chervonenkis dimension (VC Dimension) is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter. It is a core concept in Vapnik–Chervonenkis theory, and was originally defined by Vladimir Vapnik and Alexey Chervonenkis.





A classification model f with some parameter vector θ is said to shatter a set of data points (x1,x2,…,xN) if, for all assignments of labels to those points, there exists a θ such that the model f makes no errors when evaluating that set of data points. Example : VC Dimension for linear classifier

3 points shattered

4 points impossible





N points can be labeled in 2N ways as +/– H shatters N if there exists h  H consistent for any of these: VC(H ) = N

An axis-aligned rectangle shatters 4 points only !

Use the simpler one because 







Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (lower variance - Occam’s razor)

simpler explanations are more plausible and any unnecessary complexity should be shaved off.

Andaikan diberikan data training yang linearly separable menjadi dua kelas, yaitu A dan B. Terdapat banyak sekali hyperplane yang memisahkan kedua kelas dari data. Mana yang dipilih?

Bagaimana menemukan hyperplane terbaik yang memisahkan kedua himpunan dengan margin terbesar? Margin: jarak hyperplane ke titik terdekat dari kedua himpunan Dalam 2 dimensi, hyperplane  garis Dalam 3 dimensi, hyperplane  bidang



   

   

Persamaan hyperplane (garis) g: w1x1+w2x2+b=0 Agar g memisahkan kelas A dan B, maka dapat dipilih w1,w2 dan b sehingga w1x1(i)+w2x2(i)+b>0 utk (x1(i),x2(i)) A w1x1(i)+w2x2(i)+b<0 utk (x1(i),x2(i)) B Andaikan (x1+,x2+) dan (x1-,x2-) masing-masing titik terdekat dari kelas A dan B terhadap g.Tanpa mengurangi keumuman, dapat dipilih: w1x1++w2x2++b=1 dan w1x1-+w2x2-+b=-1 dan w1x1(i)+w2x2(i)+b  1 utk (x1(i),y1(i)) A w1x1(i)+w2x2(i)+b  -1 utk (x1(i),y1(i)) B

Maka jarak garis g ke titik (x1+,x2+) dan (x1-,x2-) adalah

J

| w1 x1  w2 x2  b | w  w2 2 1

2



1 w  w2 2 1

2



| w1 x1  w2 x2  b | w12  w2 2

Definisikan:

yi  f ( x , x )  sign( x , x ) (i ) 1

(i ) 2

(i ) 1

(i ) 2

 1, untuk ( x1(i ) , x2(i ) )  A  (i ) (i )  1, untuk ( x , x 1 2 )B 

maka (w1x1(i)+w2x2(i)+b)yi  1, untuk I = 1, 2, …, N



Masalah Penentuan Hyperplane terbaik: Arg Max w ,b

1 w12  w22

s.t. ( w1 x1( i )  w2 x2( i )  b) yi  1,

i  1, 2,.., N

Ekivalen dengan



1 Arg Min w12  w22 w ,b 2 s.t.



( w1 x1( i )  w2 x2( i )  b) yi  1,

i  1, 2,.., N



   

   

Persamaan hyperplane g: wTx+b=0 Agar g memisahkan kelas A dan B, maka dapat dipilih w1,w2 dan b sehingga wTx (i)+b>0 utk x (i)  A wTx (i)+b<0 utk x (i)  B Andaikan x+ dan x- masing-masing titik terdekat dari kelas A dan B terhadap g.Tanpa mengurangi keumuman, dapat dipilih: wTx ++b =1 dan wTx -+b = -1 dan wTx (i)+b  1 utk x(i)  A wTx (i)+b  -1 utk x(i)  B

Maka jarak g ke titik x+,x+ dan x- adalah

J

| wT x   b | wT w



1 | wT x   b |   w2 wT w wT w 1

Definisikan:

yi  f ( x )  sign( x ) (i )

(i )

 1, untuk x (i )  A  (i ) 1, untuk x  B

maka (wTx (i)+b)yi  1, untuk i = 1, 2, …, N



Masalah Penentuan Hyperplane terbaik: 1 Arg Max w ,b w

2

s.t. ( wT x ( i )  b) yi  1,

i  1, 2,.., N

Ekivalen dengan

1 T Arg Min w w w ,b 2 s.t. ( wT x ( i )  b) yi  1,

i  1, 2,.., N

Solusi x yang memaksimumkan/ meminimumkan fungsi f(x) yang memenuhi kendala g(x) = 0 diperoleh dari solusi persamaan f(x) = g(x) Contoh: Carilah nilai maksimum/minimum untuk fungsi f(x,y) = x2 +y2 yang memenuhi x-y = 1 Titik kritis diperoleh dari 2x =  2y = - x-y = 1 Diperoleh x = ½, y=-½,  = 1

atau 2x-  = 0 2y + = 0 x-y = 1

Cari nilai maksimum/minimum f(x,y,z) = x + 2y +3z yang memenuhi x2 + y2 = 2 dan y +z = 1 g1(x,y,z) = x2 + y2 -2 =0 g2(x,y,z) = y + z – 1 = 0 Solusi masalah maks/minimum diperoleh dari: f(x,y,z) = 1g1(x,y,z) + 2g2(x,y,z)

Solusi x yang memaksimumkan/meminimumkan fungsi f(x) yang memenuhi kendala g(x) = 0 diperoleh dari solusi persamaan f(x) = g(x) Versi lain: L(x, ) = f(x)+g(x) Solusi masalah maksimum/minimum diperoleh dari L(x, ) = 0 L dikenal sebagai Lagrangian

Solusi masalah optimasi (primal) Min f(x), x s.t. g(x) 0 dan h(x)=0 Dual Problem (, ) = inf

x

Max (, ) s.t.   0 L(x, , )

Feasible Domain D={x  |g(x)0, h(x)=0} Lagrangian L(x, , ) = f(x)+g(x)+ h(x) Untuk setiap titik feasible x, (, )  L(x, , )  f(x) Duality Gap = f(x) - (, )

Dengan memaksimumkan (, ) terhadap  dan , akan meminimumkan duality gap.

Khususnya, Jika g dan h fungsi Affine, yaitu g(x) = Ax – b ( A matriks, b vektor) maka duality gap menjadi 0. Artinya, solusi masalah primal ekivalen dengan solusi masalah dual.

Solusi masalah optimasi (primal) Min x2+y2, s.t. x-y 1

Lagrangian L(x,y, ) = x2+y2+(x-y-1)

Untuk suatu nilai  yang diberikan, Dual Problem agar L minimum Max () = ¼2+ ¼2+(-/2-/2-1) 2x +  = 0 = -2/2 -  2y -  = 0 s.t.   0 Diperoleh =0, x = 0 dan y = 0 Ini berarti constraint tidak aktif!!!

Solusi masalah optimasi (primal) Min x2+y2, s.t. x-y 1

Lagrangian L(x,y, ) = x2+y2-(x-y-1)

Agar L minimum 2x -  = 0 2y +  = 0

Dual Problem Max (, ) = ¼2+ ¼2-(/2+/2-1) = -2/2 +  s.t.   0

Diperoleh =1, x = 1/2 dan y = -1/2 Ini berarti constraint aktif, artinya nilai minimum tercapai pada batas constraint.



1 Arg Min w12  w22 w ,b 2 s.t.



( w1 x1( i )  w2 x2( i )  b) yi  1,

i  1, 2,.., N

N 1 2 2 (i ) (i )  L( w1 , w2 , b,  )  ( w1  w2 )   i ( w1 x1  w2 x2  b) yi  1 2 i 1

Nilai minimum L diperoleh dari : L(w1 , w2 , b,  )  0 yaitu N

w1    i x1(i ) yi  0 i 1

N

w2    i x2( i ) yi  0 i 1

N

 y i 1

i

i

0

Substitusi ke Lagrangian: 2 2 N N  1      (i ) (i ) ( )      i x1 yi      i x2 yi   2   i 1   i 1   N   N      (i ) (i ) (i ) (i )    i     i x1 yi  x1     i x2 yi  x2  b  y j  1 j 1   i 1    i 1   N

diperoleh

2 2 N N  1      (i ) (i ) ( )      i x1 yi      i x2 yi   2   i 1   i 1   N   N     (i )  (i ) (i ) (i )    j     i x1 yi  x1     i x2 yi  x2  y j  1 i 1   i 1    i 1   N

2 2 N N  1      (i ) (i ) ( )      i x1 yi      i x2 yi   2   i 1   i 1   N   N     (i )  (i ) (i ) (i )    j     i x1 yi  x1     i x2 yi  x2  y j  1 j 1   i 1    i 1   N

Max ( ) 

s.t.  i  0, i  1, 2,..., N Studi Kasus Cari hyperplane classifier terbaik untuk data training P1(1,0), P2(0,1), P3(2,2), dan Q1(-1,0), Q2(0,-1),







(1) Generate Dual Problem untuk dataset tersebut (2) Tentukan nilai alpha[i], i=1,2,..,5 yang optimal (Hint: Gunakan gradient ascent dgn tambahan : if alpha < 0 then alpha = 0) (3) Tentukan nilai w1, w2 dan b

Linear Discrimant Model

Recommend Documents