Estimasi Prob. Density Function dengan EM Sumber: -Forsyth & Ponce Chap. 7 -Standford Vision & Modeling
Probability Density Estimation
• Parametric Representations • Non-Parametric Representations • Mixture Models
Page 1 1
Metode estimasi Non-parametric
• Tanpa asumsi apapun tentang distribusi • Estimasi sepenuhnya bergantung ada DATA • cara mudah menggunakan: Histogram
Histograms Diskritisasi, lantas ubah dalam bentuk batang:
Page 2 2
Histograms • Butuh komputasi banyak, namun sangat umum digunakan • Dapat diterapkan pada sembarang bentuk densitas (arbitrary density)
Histograms
Permasalahan: • Higher dimensional Spaces: - jumlah batang (bins) yg. Exponential - jumlah training data yg exponential - Curse of Dimensionality • size batang ? Terlalu sedikit: >> kasar Terlalu banyak: >> terlalu halus
Page 3 3
Pendekatan secara prinsip: • x diambil dari ‘unknown’ p(x) • probabiliti bahwa x ada dalam region R adalah:
P = ∫ p( x' )dx' ≈ p( x)V R
Pendekatan secara prinsip:
• x diambil dari ‘unknown’ p(x) • probabiliti bahwa x ada dalam region R adalah:
P = ∫ p( x' )dx' ≈ p( x)V R
P=
K N
Page 4 4
Pendekatan secara prinsip: • x diambil dari ‘unknown’ p(x) • probabiliti bahwa x ada dalam region R adalah:
P = ∫ p( x' )dx' ≈ p( x)V R
P=
K N
⇒ p( x) ≈
K NV
Pendekatan secara prinsip:
⇒ p( x) ≈
K NV Dengan Fix K Tentukan V
Dengan Fix V Tentukan K Metoda Kernel-Based
K-nearest neighbor
Page 5 5
Metoda Kernel-Based:
⇒ p ( x) ≈
K NV
Parzen Window: 1 | u j |< 1 / 2 H (u ) = 0 otherwise
Metoda Kernel-Based:
⇒ p ( x) ≈
K NV
Parzen Window: 1 | u j |< 1 / 2 H (u ) = 0 otherwise N
K = ∑ H ( x − xn ) n =1
Page 6 6
Metoda Kernel-Based:
⇒ p ( x) ≈
K NV
Parzen Window: 1 | u j |< 1 / 2 H (u ) = 0 otherwise N
K = ∑ H ( x − xn ) n =1
p ( x) =
1 Nh d
N
∑ H (x − x ) n =1
n
Metoda Kernel-Based:
⇒ p ( x) ≈
K NV
Gaussian Window:
1 p( x) = N
|| x − xn ||2 1 exp− ∑ 2 d /2 2 2 h n =1 ( 2πh ) N
Page 7 7
Metoda Kernel-Based:
K-nearest-neighbor:
⇒ p ( x) ≈
K NV
Kembankan V sampai dia mencapai K points.
Page 8 8
K-nearest-neighbor:
K-nearest-neighbor:
Klasifikasi secara Bayesian :
p ( x | Ck ) =
p( x) =
Kk N kV
K NV
p (Ck ) =
Nk N
Page 9 9
K-nearest-neighbor:
Klasifikasi secara Bayesian :
p ( x | Ck ) =
Kk N kV
K p( x) = NV p (Ck ) =
Nk N
p (Ck | x) =
Kk K
“aturan klasifikasi k-nearest-neighbour ”
Probability Density Estimation
• Parametric Representations • Non-Parametric Representations • Mixture Models (Model Gabungan)
Page 10 10
Mixture-Models (Model Gabungan):
Gaussians:
Non-Parametric:
- Mudah - Low Memory - Cepat - Good Properties
- Umum - Memory Intensive - Slow
Mixture Models
Campuran fungsi Gaussian (mixture of Gaussians): p(x)
x
Jumlah dari Gaussians tunggal
Page 11 11
Campuran fungsi Gaussian:
p(x)
x Jumlah dari Gaussians tunggal
Keunggulan: Dapat mendekati bentuk densitas sembarang (Arbitrary Shape)
Campuran fungsi Gaussian: p(x) x
Generative Model:
z 1 2
3
P(j) p(x|j)
Page 12 12
Campuran fungsi Gaussian: p(x) x M
p( x) = ∑ p( x | j ) P( j ) j =1
|| x − µ ||2 1 exp− p( x | j ) = (2πσ 2j ) d / 2 2σ 2j
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn ) n =1
Page 13 13
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn ) n =1
E
∂E =0 ∂µ k µk
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn ) n =1
N
∂E =0 ∂µ k
⇒ µj =
∑ P( j | x )x n =1 N
n
n
∑ P( j | x ) n =1
n
Page 14 14
Campuran fungsi Gaussian:
N
µj =
∑ P( j | x )x n =1 N
n
n
∑ P( j | x ) n
n =1
Campuran fungsi Gaussian:
N
µj =
∑ P( j | x )x n =1 N
n
∑ P( j | x ) n =1
n
P ( j | xn ) =
n
p ( xn | j ) P ( j ) M
∑ p( x k =1
n
| k ) P(k )
Page 15 15
Campuran fungsi Gaussian:
N
µj =
∑ P( j | x )x n =1 N
n
n
∑ P( j | x )
P ( j | xn ) =
n
n =1
p ( xn | j ) P ( j ) M
∑ p( x k =1
p ( xn | j ) =
n
| k ) P(k )
2 1 || xn − µ j || − exp 2σ 2j (2πσ 2j ) d / 2
Campuran fungsi Gaussian:
N
µj =
∑ P( j | x )x n =1 N
n
∑ P( j | x ) n =1
n
P ( j | xn ) =
n
p ( xn | j ) P ( j ) M
∑ p( x k =1
p ( xn | j ) =
n
| k ) P(k )
|| xn − µ j ||2 1 exp − (2πσ 2j ) d / 2 2σ 2j
Page 16 16
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn )
Tidak ada solusi pendek !
n =1
E
∂E =0 ∂µ k µk
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn ) n =1
E
Gradient Descent
Page 17 17
Campuran fungsi Gaussian:
Maximum Likelihood: N
E = − ln L = −∑ ln p ( xn ) n =1
∂E = f ( µ1 ,..., µ M , σ 1 ,..., σ M , α1 ,..., α M ) ∂µ k
Campuran fungsi Gaussian:
Optimasi secara Gradient Descent: • Complex Gradient Function (highly nonlinear coupled equations)
• Optimasi sebuah Gaussian tergantung dari seluruh campuran lainnya.
Page 18 18
Campuran fungsi Gaussian: -> Dengan strategi berbeda:
p(x)
Observed Data:
x
Campuran fungsi Gaussian:
p(x)
Densitas yg dihasilkan
Observed Data:
x
Page 19 19
Campuran fungsi Gaussian:
Variabel Hidden
y 1
p(x)
2
Observed Data:
x
Campuran fungsi Gaussian:
Variabel Hidden
y 1
p(x)
2
Observed Data: Unobserved:
x 1
1 1111 1
2
2 2222 2
y
Page 20 20
Contoh populer ttg. Chicken and Egg Problem:
p(x)
x
Max.Likelihood Utk. Gaussian #1 Anggap kita tahu
1
1 1111 1
Max.Likelihood Utk. Gaussian #2 2
2 2222 2
y
Chicken+Egg Problem:
p(x) Anggap kita tahu
x
P(y=1|x)
1
P(y=2|x)
1 1111 1
2
2 2222 2
y
Page 21 21
Chicken+Egg Problem:
p(x)
x ? Tapi yg ini kita tidak tau sama sekali
? 1
1 1111 1
2
2 2222 2
y
Chicken+Egg Problem:
p(x)
x
Coba pura2 tahu 1
1 1111 1
2
2 2222 2
y
Page 22 22
Clustering:
x Tebakan benar ?
1
1 1111 1
2
2 2222 2
y
K-mean clustering / Basic Isodata
Pengelompokan (Clustering):
Procedure: Basic Isodata µ ,..., µ
M 1. Choose some initial values for the means 1 Loop: 2. Classify the n samples by assigning them to the class of the closest mean. 3. Recompute the means as the average of the samples in their class. 4. If any mean changed value, go to Loop; otherwise, stop.
Page 23 23
Isodata: Inisialisasi
µ1
µ2
Isodata: Menyatu (Convergence)
µ1 µ2
Page 24 24
Isodata: Beberapa permasalahan
Ditebak Eggs / Terhitung Chicken
p(x)
x Max.Likelihood Utk. Gaussian #1
Max.Likelihood Utk. Gaussian #2
Disini kita berada 1
1 1111 1
2
2 2222 2
y
Page 25 25
GaussianAproximasi yg. baik
p(x)
x
• Namun tidak optimal! • Permasalahan: Highly overlapping Gaussians
Expectation Maximization (EM)
• EM adalah formula umum dari problem seperti “Chicken+Egg” (Mix.Gaussians, Mix.Experts, Neural Nets, HMMs, Bayes-Nets,…) • Isodata: adalah contoh spesifik dari EM • General EM for mix.Gaussian: disebut Soft-Clustering • Dapat konvergen menjadi Maximum Likelihood
Page 26 26
Ingat rumusan ini ?:
N
µj =
∑ P( j | x )x n =1 N
n
∑ P( j | x )
n
P ( j | xn ) =
n
n =1
p ( xn | j ) P ( j ) M
∑ p( x k =1
p ( xn | j ) =
n
| k ) P(k )
2 1 || xn − µ j || − exp 2σ 2j (2πσ 2j ) d / 2
Soft Chicken and Egg Problem:
p(x)
N
x
µj = 0.1
0.3 0.7
0.1 0.01 0.0001
0.99
0.99 0.99 0.5 0.001 0.00001
P(1|x)
∑ P( j | x )x n =1 N
n
n
∑ P( j | x ) n =1
n
Page 27 27
Soft Chicken and Egg Problem:
p(x)
N
x Weighted Mean of Data Anggap kita tahu:
0.1
0.3 0.7
0.99
0.99 0.99 0.5 0.001 0.00001
µj =
0.1 0.01 0.0001
∑ P( j | x )x n =1 N
n
∑ P( j | x ) n
n =1
P(1|x)
n
Soft Chicken and Egg Problem:
p(x)
N
x
Step-2: Hitung ulang posteriors
µj = 0.1
0.3 0.7
0.1 0.01 0.0001
0.99
0.99 0.99 0.5 0.001 0.00001
P(1|x)
∑ P( j | x )x n =1 N
n
n
∑ P( j | x ) n =1
n
Page 28 28
Langkah prosedur EM:
Procedure: EM 1. Choose some initial values for the means µ1 ,..., µ M E-Step: 2. Compute the posteriors for each class and each P ( j | xn ) sample: M-Step: 3. Re-compute the means as the weighted average of their class: µ = ∑ P( j | x )x N
j
n =1 N
n
n
∑ P( j | x ) n =1
n
4. If any mean changed value, go to Loop; otherwise, stop.
EM dan Gaussian mixture
θ ( i ) = arg max Q (θ ,θ ( i −1) ) θ
N
µ (ji ) =
∑ p ( j | x ,θ n =1 N
( i −1)
n
∑ p ( j | x ,θ n =1
n
) xn
( i −1)
)
Page 29 29
EM dan Gaussian mixture
θ ( i ) = arg max Q (θ ,θ ( i −1) ) θ
N
∑ (ji ) =
∑ p ( j | x ,θ
( i −1)
n
n =1
)( xn − µ (ji ) )( xn − µ (ji ) )T
N
∑ p ( j | x ,θ n =1
n
( i −1)
)
EM dan Gaussian mixture
θ ( i ) = arg max Q (θ ,θ ( i −1) ) θ
α
(i ) j
1 = N
N
( i −1) θ p ( j | x , ) ∑ n n =1
Page 30 30
Contoh-contoh EM:
Training Samples
Contoh-contoh EM:
Training Samples
Initialization
Page 31 31
Contoh-contoh EM:
End Result of EM
Training Samples
Contoh-contoh EM:
Training Samples
Density Isocontours
Page 32 32
Contoh-contoh EM:
Color Segmentation
Contoh-contoh EM:
Yair Weiss Layered Motion
Page 33 33