scraping by examples

71
Alexandre Gomes Scraping by examples Friday, May 20, 2011

Upload: alexandre-gomes

Post on 02-Nov-2014

1.279 views

Category:

Technology


1 download

DESCRIPTION

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

TRANSCRIPT

Page 1: Scraping by examples

Alexandre Gomes

Scrapingby examples

Friday, May 20, 2011

Page 2: Scraping by examples

http://creativecommons.org/licenses/by-nc/3.0/br/Friday, May 20, 2011

Page 4: Scraping by examples

Resumo do Censo 2010

Friday, May 20, 2011

Page 5: Scraping by examples

Resumo do Censo 2010

Friday, May 20, 2011

Page 6: Scraping by examples

Friday, May 20, 2011

Page 7: Scraping by examples

Friday, May 20, 2011

Page 8: Scraping by examples

Qual a relação entre os índices de alfabetização e a proporção feminina?

Friday, May 20, 2011

Page 9: Scraping by examples

0.49mulheres da região

total de pessoas da região

7.859.539

7.859.539 + 8.004.915= =

0.89alfabetizados* da região

total de pessoas* da região

11.326.492

12.670.041= =

Exemplo

* acima de 10 anos de idade

Friday, May 20, 2011

Page 10: Scraping by examples

E nas demais

regiões?Friday, May 20, 2011

Page 11: Scraping by examples

Scraping by Examples

Friday, May 20, 2011

Page 12: Scraping by examples

Nokogiri 鋸

Friday, May 20, 2011

Page 13: Scraping by examples

#1 Acessar a página que contém o dado

desejado

Friday, May 20, 2011

Page 14: Scraping by examples

teste

Friday, May 20, 2011

Page 15: Scraping by examples

teste

codigo

Friday, May 20, 2011

Page 16: Scraping by examples

$ rspec spec/ibge_censo2010_spec.rb:8Run filtered using {:line_number=>8}

IBGECenso2010 should open page with "Razão de sexo, população de homens e mulheres"

Finished in 44.4 seconds1 example, 0 failures$

Friday, May 20, 2011

Page 17: Scraping by examples

#2 Recuperar o dado desejado

Friday, May 20, 2011

Page 18: Scraping by examples

Antes, entenda a estrutura da página

Friday, May 20, 2011

Page 19: Scraping by examples

<table> <thead>...</thead> <tfoot> <tr> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr>

</tfoot> <tbody>...</tbody></table>

Estude o caminho do dado na árvore

DOM

Friday, May 20, 2011

Page 20: Scraping by examples

Observe IDs e classes CSS que podem ser úteis.

Friday, May 20, 2011

Page 21: Scraping by examples

Friday, May 20, 2011

Page 22: Scraping by examples

class="td_numeros"

Friday, May 20, 2011

Page 23: Scraping by examples

Friday, May 20, 2011

Page 24: Scraping by examples

Friday, May 20, 2011

Page 25: Scraping by examples

".td_numeros"

[

Friday, May 20, 2011

Page 26: Scraping by examples

".td_numeros"

[ 0 1 23 4 56 7 89 10 1112 13 1415 16 17

Friday, May 20, 2011

Page 27: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

1º dado de que precisamos.

(numerador da fórmula)

Friday, May 20, 2011

Page 28: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

2º dado de que precisamos.

(para o cálculo do denominador da fórmula)

Friday, May 20, 2011

Page 29: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

mulheres da região N

total de pessoas da região N=

dados[5]

dados[4] + dados[5]

Friday, May 20, 2011

Page 30: Scraping by examples

teste

Friday, May 20, 2011

Page 31: Scraping by examples

code

Friday, May 20, 2011

Page 32: Scraping by examples

$ rspec spec

IBGECenso2010 razao de sexo should open page with "Razão de sexo, população de homens e mulheres" should get number of women

Finished in 1.78 seconds2 examples, 0 failures

Friday, May 20, 2011

Page 33: Scraping by examples

teste

Friday, May 20, 2011

Page 34: Scraping by examples

code

Friday, May 20, 2011

Page 35: Scraping by examples

#3 Recuperar o restante de dados

desejados

Friday, May 20, 2011

Page 36: Scraping by examples

Friday, May 20, 2011

Page 37: Scraping by examples

...Friday, May 20, 2011

Page 38: Scraping by examples

#4 Apresentação Web do scrapping

Friday, May 20, 2011

Page 39: Scraping by examples

application.rb

(...)Friday, May 20, 2011

Page 40: Scraping by examples

application.rb(...)

Friday, May 20, 2011

Page 41: Scraping by examples

index.erb

(...)

Friday, May 20, 2011

Page 42: Scraping by examples

http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011

Page 43: Scraping by examples

diferenciadade dados

o charme dos

mashups está na

visualização

http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011

Page 44: Scraping by examples

#5 Visualização (ainda tosca) do

scrapping

Friday, May 20, 2011

Page 45: Scraping by examples

Friday, May 20, 2011

Page 46: Scraping by examples

#6 Visualização diferenciada da

informação

Friday, May 20, 2011

Page 47: Scraping by examples

?Friday, May 20, 2011

Page 48: Scraping by examples

Agora, a mesma coisa,

apenas com

JavascriptFriday, May 20, 2011

Page 49: Scraping by examples

#1 Acessar a página que contém o dado

desejado

Friday, May 20, 2011

Page 50: Scraping by examples

test

Friday, May 20, 2011

Page 51: Scraping by examples

code

Friday, May 20, 2011

Page 52: Scraping by examples

Friday, May 20, 2011

Page 53: Scraping by examples

#2 Recuperar o dado desejado

Friday, May 20, 2011

Page 54: Scraping by examples

test

Friday, May 20, 2011

Page 55: Scraping by examples

code

Friday, May 20, 2011

Page 56: Scraping by examples

#3 Recuperar o restante de dados

desejados

Friday, May 20, 2011

Page 57: Scraping by examples

...Friday, May 20, 2011

Page 58: Scraping by examples

#4 Apresentação Web do scrapping

Friday, May 20, 2011

Page 59: Scraping by examples

index.html

Friday, May 20, 2011

Page 60: Scraping by examples

index.html

Friday, May 20, 2011

Page 61: Scraping by examples

index.html

Friday, May 20, 2011

Page 62: Scraping by examples

index.html

Friday, May 20, 2011

Page 63: Scraping by examples

index.html

(...)Friday, May 20, 2011

Page 64: Scraping by examples

index.html

(...)Friday, May 20, 2011

Page 65: Scraping by examples

index.html

(...)

Friday, May 20, 2011

Page 66: Scraping by examples

index.html

(...)

Friday, May 20, 2011

Page 67: Scraping by examples

http://chart.apis.google.com/chart?chxt=y&chbh=a&chs=500x300&cht=bvg&chco=A2C180,3D7930

&chd=t:49,51,51,50,50|89,82,94,95,93&chdl=Women|Literates&chp=0.033

Friday, May 20, 2011

Page 68: Scraping by examples

código disponível em...

Friday, May 20, 2011

Page 69: Scraping by examples

P&RFriday, May 20, 2011

Page 70: Scraping by examples

http://tinyurl.com/AvaliacaoSOO14

Friday, May 20, 2011

Page 71: Scraping by examples

Friday, May 20, 2011