jdp15 import.io workshop

26
jpd15, Junio 2015 Ignacio Elola @ignacio_elola Web data? Extrayendo datos de la web

Upload: ignacio-elola-villar

Post on 10-Aug-2015

339 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: jdp15 import.io workshop

jpd15, Junio 2015

Ignacio Elola @ignacio_elola

Web data? Extrayendo datos de la web

Page 2: jdp15 import.io workshop

who I am?

web data and import.io

example: text analysis with import.io and MonkeyLearn

summary

Page 3: jdp15 import.io workshop

import.io?

the Web as a data source

Page 4: jdp15 import.io workshop

What is import.io? ● Machine reading the web● Point-and-click UI● Map the data on a web page● Algorithms will turn it into structured data ● Real-time through an API

Page 5: jdp15 import.io workshop

What is import.io? (continued) ● Custom Crawlers● Auto extraction● Authenticated APIs● Cloud scaling● Wide range of integration options

Page 6: jdp15 import.io workshop

Structure the web

Page 7: jdp15 import.io workshop

import.io consists of 4 tools

● Magic● Extractor● Crawler● Connector

Page 8: jdp15 import.io workshop

and completely free...

Page 9: jdp15 import.io workshop

import.io Magic

Page 10: jdp15 import.io workshop

Sometimes we need to train the tool ourselves

Page 11: jdp15 import.io workshop

import.io Extractor

Page 12: jdp15 import.io workshop

import.io Extractorlets you structure a single page of data

Page 13: jdp15 import.io workshop

import.io Extractorlets you structure a single page of data

Custom XPaths Custom Regex Updatable in real-time

Page 14: jdp15 import.io workshop

Sometimes we need to extract data from a lot of URLS

Page 15: jdp15 import.io workshop

Sometimes we need to extract data from a lot of URLS

import.io Crawler

Page 16: jdp15 import.io workshop

Sometimes we need to extract data from a lot of URLS

import.io Crawler import.io extractor (bulk queries)

Page 17: jdp15 import.io workshop

Sometimes we need to extract data from a lot of URLS we don’t know

import.io Crawler

Page 18: jdp15 import.io workshop

The import.io Crawler relies on minimum input and gives you

maximum output

Page 19: jdp15 import.io workshop

Sometimes we need to interact with the website

Page 20: jdp15 import.io workshop

The import.io Connector uses page interactions, such as searches and

extracts the resulting data.

Page 21: jdp15 import.io workshop

Example: analyzing newspapers with import.io and MonkeyLearn

Page 22: jdp15 import.io workshop

Example: analyzing newspapers with import.io and MonkeyLearn

https://github.com/ignacioelola/web-text-analyzer

Page 23: jdp15 import.io workshop

Example: analyzing newspapers with import.io and MonkeyLearn

Page 24: jdp15 import.io workshop

Example: analyzing newspapers with import.io and MonkeyLearn

Page 25: jdp15 import.io workshop

Example: analyzing newspapers with import.io and MonkeyLearn

Page 26: jdp15 import.io workshop

Thanks!

Q & A