deep information and extraction tool
TRANSCRIPT
![Page 1: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/1.jpg)
1
Thomas Martinuzzo, Jr. Eng.
![Page 2: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/2.jpg)
2
![Page 3: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/3.jpg)
3
What is DIET ?DIET is an information extraction and manipulation toolDIET can extract information from the DEEP web by understanding
pages structures
Web surface : 20 Billion pages indexed by search engines
DEEP web : +600 Billion pages
« The 60 largest Deep Web sources contain 84 billion pages of content. That's about 750 terabytes of information, sufficient by themselves to exceed the size of the surface Web by 40 times. » Brightplanet.com
Pic from Maxumowners.org
![Page 4: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/4.jpg)
4
DIET Features & Benefits Use artificial intelligence to build automatic wrappersNo to minimal user interventionUser can easily extract and manipulate information
![Page 5: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/5.jpg)
5
Car website : Characteristics: List of cars by name with description, date, price,
picture … Over 100 pages of data ! Problem : No local search engine.
But … I am looking for Acura MDX 2005 or something like that !
…
Job website : Characteristics: List of jobs by title with small description, salary,
city. Over 800 jobs. Local search engine. Sort capabilities. Problem : We can only see 10 jobs by page. Unable to search by
salary range. Unable to sort by city.
BUT … I want to see all jobs over 75 000$ in one single page and save it for future consultation.
![Page 6: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/6.jpg)
6
DIET TechnologiesDIET Core Web Services
Access only by certified clientsDIET Web Application
Users and services managersWeb based application (JSP/Servlet/JavaServer Faces/JavaBean)
Based on Java EE 5/Glassfish/MySql technology
![Page 7: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/7.jpg)
7
Univalor WebsiteList of new technology group by domainsSimple search engine available
![Page 8: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/8.jpg)
8
Using DIETWe want to extract and them to manipulate all available technologiesGive Univalor technologies URL to DIET :
http://www.univalor.ca/companies_available_technologies.asp
![Page 9: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/9.jpg)
9
Wrapper are generatedDIET creates a Wrapper by learning the structures of Univalor
Webpages.DIET extracts data thru the Wrapper.DIET displays the results
![Page 10: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/10.jpg)
10
Manipulate information with DIETOnce the information was extracted, it can be manipulated.
![Page 11: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/11.jpg)
11
Plug-in opportunityDIET Core Web Services can be used by third party clientsInternet Explorer and Mozilla Firefox integration
Export capabilitiesExtracted information can be export on multiple storages formats
And more …Users can create their own WrappersDIET can be the perfect tool for DEEP search
![Page 12: Deep Information and Extraction Tool](https://reader033.vdocuments.site/reader033/viewer/2022052912/555a3e59d8b42a83368b4e47/html5/thumbnails/12.jpg)
12
Research and Development: Samuel Pierre, [email protected]
Commercialization and licensing
Didier Leconte, [email protected]
Thomas Martinuzzo, Jr. [email protected]
Thanks !