Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky + Projekty ˇ ´ FIT VUT Brno,
[email protected] Jan Cernock´ y UPGM
FIT VUT Brno
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
1/12
Sbˇ er velk´ ych ˇreˇ cov´ ych datab´ az´ı Proˇ c? ⇒ rozpozn´avaˇce se mus´ı tr´enovat na popsan´ych ˇreˇcov´ych datech. Poˇ zadavky • Prostˇred´ı mus´ı odpov´ıdat re´aln´ycm podm´ınk´am (auto, ob´yv´ak, tlf. linka). • Pokryt´ı r˚ uzn´ych kategori´ı mluvˇc´ıch (pohlav´ı, dialekt, vˇek).
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
2/12
Nahr´ avan´ e poloˇ zky • ˇc´ıslovky, kl´ıˇcov´a slova, povely • ˇretˇezce ˇc´ıslic, ˇc´ısla • jm´ena (mˇesta, lid´e) • hl´askov´an´ı • foneticky vyrovnan´a slova/vˇety: Odzbrojen´ ım, kter´ e je kl´ ıc ˇov´ ym bodem dohody, se neobtˇ ez ˇuj´ ı. Muˇ z totiˇ z nepˇ reruˇ sil klasickou onkologickou l´ ec ˇbu. M´ am neseri´ ozn´ ıho jedn´ an´ ı dost, poznamenala. Je ˇ zenat´ y, m´ a tˇ ri dˇ eti a je vynikaj´ ıc´ ım hr´ ac ˇem bridˇ ze. Urˇ citˇ e neuhodnete, z ˇ ceho to je, prohl´ asil sebevˇ edomˇ e. Bridˇ z totiˇ z hraj´ ı dvˇ e dvojice proti sobˇ e. Apr´ ılov´ e poˇ cas´ ı prov´ azelo vˇ cerejˇ sı ´ program m´ ıtinku. Mezi s´ olisty nov´ e inscenace se objev´ ı ˇ rada host˚ u. ˇ Rekl to ve ˇ ctvrtek ˇ clen veden´ ı belgick´ e strany zelen´ ych. Po mˇ estˇ e jezdˇ ete tramvaj´ ı, ta je ekologick´ a. Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
3/12
ˇ DB - projekty - vˇse spoleˇ cnˇ e s CVUT Praha ˇ ıslovky (1999) - 1227 mluvˇc´ıch, telefon, ca 7 min/mluvˇc´ı, pouze ˇc´ıslovky, ˇc´ısla, • C´ ˇretˇezce ˇc´ısel + spec. znaky. Finance od Siemens AG R&D Mnichov. V´yzkumn´a a v´yukov´a pr´ava. • SpeechDat-East (1999-2000) - 1052 mluvˇc´ıch, telefon, ca 15 min/mluvˇc´ı, vˇsechny typy poloˇzek. Finance od EU - 4. r´amcov´y program, INCO Copernicus, spolupr´ace s Matra⇒Lernout&Hauspie⇒ScanSoft. Pln´a pr´ava. • SpeeCon (2003) - 600 mluvˇc´ıch (z toho 50 dˇet´ı), ca 40 min/mluvˇc´ı, prostˇred´ı Office, Entertainment, Public, Car. 4-kan´alov´e nahr´av´an´ı, speecon´ı “bedna”, notebook, 2×VXPocket2. ˇ - 600 mluvˇc´ıch, ca 40 min/mluvˇc´ı, r˚ • TEMIC2 (ted!) uzn´a auta, 2 kan´aly. nahr´av´an´ı na DAT, pak pˇrepis. Mluvˇ c´ı z V´ ychodn´ı Moravy a Slezska needed ! Plat´ıme 200,- za max. hodinovou session :-) • poˇrizov´an´ı multimod´aln´ıch meeting dat - viz Petr Jenderka. Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
4/12
DB - anotace zapsat, co mluvˇc´ı skuteˇ cnˇ e ˇrekl.
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
5/12
DB - anotace - kontrola annotator ask annotator to correct and re-submit the batch
generation of log-file with positions of errors
annotation batch generation of pronunciation dictionary
reference pronunciation dictionary update of the reference dictionary
pronunciation dictionary comparison with the reference dictionary
correct wordforms and pronunciations
difference dictionary PASSED proof-reading
errors ?
no
yes
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
6/12
Automaticky nauˇ cen´ e ˇreˇ cov´ e jednotky Proˇ c? • DB projekty maj´ı rozpoˇcty v milionech, jednoduˇsˇs´ı by bylo zap´ıchnout poˇc´ıtaˇc do r´adia a jen nahr´avat. • jenˇze pro tr´enov´an´ı klasick´ych ˇreˇcov´ych jednotek — fon´emy, trif´ ony — potˇrebujeme anotace (nejv´ıce penˇez a nejvˇetˇs´ı opruz. . . ). • pro nˇekter´e aplikace (k´ odov´an´ı, identifikace jazyka) nepotˇrebujeme v˚ ubec souvislost s textem. • ⇒ daty ˇr´ızen´e metody pro uˇcen´ı jednotek.
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
7/12
ˇ N´ avrh, jak na to (Cernock´ eho PhD)
raw data
initial HMM training initial models
samples
HMM segmentation
LPCC parametrization
new transcriptions
LPCC vectors
temporal decomposition
HMM parameter reestimation new parameters
events
vector quantization
no
symbols
termination
multigrams
yes
sequences of symbols
conversion to transcriptions
dictionary of units
set of models
initial transcriptions
transcriptions Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
8/12
Aplikace I. – K´ odov´ an´ı VLBR input speech
CODER HMM recognizer
models of coding units
determination of synthesis unit
selection of representative
pitch, energy timing
DECODER
index of coding unit
determination of synthesis unit
selection of representative
output speech synthesis
dictionary of representatives of each synthesis unit
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
9/12
Pokraˇ cov´ an´ı • Petr Motl´ıˇcek - minimalizace pˇrechod˚ u mezi jednotkami - diphone-like jednotky. • Igor Szoke - HNM synt´eza, EHMM pro koherentnˇejˇs´ı postup tr´enov´an´ı jednotek.
http://www.fit.vutbr.cz/~szoke/speech/index.html Probl´ em: na toto n´am nikdo ned´av´a pen´ıze :-(
Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
10/12
EU projekty – M4 Multimodal Meeting Manager - EU IST 5th PCRD (10 partner˚ u z Evropy a USA) • Development of a ”smart” meeting room, collection and annotation of a multimodal meetings database. • Analysis and processing of the audio and video streams. • Integration and structuring using the output of the various recognizers. • Demonstrator. Speech@Brno tasks: • down-scaled meeting room with hyperbolic mirror - data collection and annotation. • LVCSR, phoneme recognition, feature extraction. http://www.m4project.org Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
11/12
EU projekty – AMI Augmented MultiParty Interaction - EU IST 6th PCRD (16 partner˚ u z EVropy a USA vˇcetnˇe pr˚ umyslov´ych (napˇr. Philips Smart Display) a W3C). • Multimodal input interface • Integration of modalities and coordination among modalities • Meeting dynamics and human-human interaction modelling • Content abstraction (multimodal information indexing, summarising, and retrieval) • Technology transfer • Training activities, including an international exchange programme. Speech@Brno task: • Keyword detection / Acoustic event spotting (with suporting technologies) http://www.amiproject.org Datab´ aze + Automaticky uˇ cen´ e ˇreˇ cov´ e jednotky
ˇ ´ Jan Cernock´ y, UPGM FIT VUT Brno
12/12