Newstin Real-time Web Content Categorization Presentation to WebExpo 2008
October 18, 2008
Company Background Newstin a.s. founded in 1998 as I2S in Prague Team of 30 employees 26 engineers 14 nations Since 2005 Real-time semantic content categorization Multiple patent filings on cross-language solution Past activities Business & government projects in information management and security Partnership with Business Objects/SAP RedHerring Europe 100 Winner Award
What is Newstin? Patented technology
Largest news database, catalog of news in the world 150,000+ information sources in 11 languages 250,000+ articles daily fully processed into 1,000,000+ categories US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese Japanese, Korean, Turkish coming in Q4 2008 Newstin.com Popular user applications Business Intelligence Enterprise content organization
What is Newstin? (Details) Newstin is an innovative technology that incorporates a completely new approach to content
organization. Newstin technology and its service-oriented architecture is the foundation of a unique system that features fully scalable real-time semantic, multi-language and cross-language document categorization. Newstin patented technology has the potential to become the core platform for organizing any unstructured textual data, including data from all sources on the Internet and potentially including the hidden Web. Newstin is a powerful engine which harnesses a variety of cutting-edge technologies and implements linguistic processing with semantic analysis, multilevel content categorization and cross-language taxonomy structures. The applications of Newstin technology utilize an inherent capability to make use of context in addition to conventional key word approaches. Newstin is the largest news database/catalogue in the world currently comprising 40 Million documents & 2.2 Billion metadata items and constantly growing. Newstin article collection is continuously updated from over 160,000 global and weighted sources selected from a pool of over 3 Million preprocessed sources in 12 languages. Daily up to 200,000+ articles are fully processed into 1.1 Million categories in 15 supported editions: US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese and Korean; with more languages and editions coming soon. Newstin is a complex system incorporating content retrieval, metadata processing, analysis and visualization. The extensive operation behind Newstin makes it a perfect platform for SaaS solutions. Newstin is a bi-directional application of its own. By imposing order on unstructured data Newstin leverages its own extensive metadata collection for business intelligence and enterprise performance management. It is inevitable to organize content first to maximize knowledge mining capability.
Web Content Chaos An inspiration for Newstin to develop a solution for organizing web content
Semantic Web 2.0 Organization A portion of Newstin’s taxonomy structure – a step toward organizing web content
Live Demonstration – Newstin.com
Live Demonstration – NewstinMap
Live Demonstration - Connecting VIP
Live Demonstration – BI Example
Live Demonstration – BI Example
Live Demonstration - EmergingStories
B2B: Online Categorization Firewall
Enterprise Newstin Categorization Engine
Unstructured Data
Intranet
Metadata
SaaS
Semantic Organization Contextual Search Visual Navigation Cross-language Mash up internal/external
Semantic / Web 2.0 Capability to Enterprise Market
Standard for Tagging Product synergy / enhancement Competitive advantage
Cross-language Information Retrieval Newstin enables to reach a particluar topic in all supported languages through original definitions
Life Cycle Newstin is a comprehensive information system
Shrnutí Prezentace - CZ Hlavní téma: Kategorizace webového obsahu v reálném čase Newstin a.s. je česká technologická firma se sídlem v Praze, zaměstnávající 30 inženýrů z 15 zemí. Během 3,5 roku vytvořila unikátní technologii na real-time organizování textových dokumentů s využitím sémantických a lingvistických technologií. Stěžejní a patentovanou součástí Newstin technologie je tzv. cross-lingvální řešení umožňující propojovat internetový obsah v různých jazycích bez použití překladů. Newstin vytvořil největší aktuální databázi článků internetového zpravodajství v 11 světových jazycích včetně češtiny, která obsahuje 37 milionů článků za posledních 9 měsíců a 2 miliardy metadat. V současnosti servery Newstin denně zpracují 250 tis. unikátních článků ze 160 tis. nejdůležitějších zdrojů po celém světě. Další využití technologie Newstin leží v oblasti mediálních analýz a organizaci podnikových dat.
Real-time Web Content Categorization
Thank you. Julius Rusnak CTO Newstin a.s. Lomnickeho 9 140 00 Prague Czech Republic