java e data science na publicidade digital fabiane bizinella nardon @fabianenardon chief data...
TRANSCRIPT
![Page 1: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/1.jpg)
Java e Data Science na Publicidade Digital
Fabiane Bizinella Nardon@fabianenardon
Chief Data Scientist da TailTarget
![Page 2: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/2.jpg)
“The best minds of my generation are thinking about
how to make people click ads. That sucks.”
Jeff Hammerbacher
![Page 3: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/3.jpg)
Publicidade Digital Moderna
Muitos Dados +
Métricas (+/- Fáceis de Medir) +
Velocidade
![Page 4: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/4.jpg)
Publicidade Digital
Anunciante
Direct Orders
![Page 5: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/5.jpg)
Publicidade Digital Moderna
Anunciante
AdNetwork
AdExchangeDSP SSP
Mídia Programática
![Page 6: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/6.jpg)
Real Time Bidding
Ad Exchange
DSP DSP DSP
![Page 7: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/7.jpg)
Real Time Bidding
Ad Exchange
DSP DSP DSP
Menos de 300ms!
![Page 8: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/8.jpg)
Escalabilidade
Data Science
![Page 9: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/9.jpg)
Arquiteturas Escaláveis
Arquitete para pelo menos 3
Assuma que alguma coisa irá falhar
Isole os seus serviços
![Page 10: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/10.jpg)
Architecting for Scalability
SHARED NOTHING
![Page 11: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/11.jpg)
Shared Nothing
App Server 1 App Server 2 App Server 3
Cache 1 Cache 2 Cache 3
DB 1 DB 2 DB 3
![Page 12: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/12.jpg)
Shared Nothing
App Server 1 App Server 2 App Server 3
Cache 1 Cache 2 Cache 3
DB 1 DB 2 DB 3
Cache 4
- Sem server-side session
- Sem sticky sessions
![Page 13: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/13.jpg)
Shared Nothing
Play Framework Play Framework Play Framework
Redis Shard 1/Replica 1
Redis Shard 2/Replica 1
Redis Shard 1/Replica 2
MongoDBMaster
MongoDBSlave 1
MongoDBSlave 2
Redis Shard 2/Replica 2
![Page 14: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/14.jpg)
Escalando
Use o poder da Nuvem
mas use os seus DADOS!
![Page 15: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/15.jpg)
Predição de Tráfego – Por que?
Pode demorar de 10 a 20 min para ter uma máquina no ar. Dá pra esperar tudo isso?
Evite falsas quedas de tráfego
![Page 16: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/16.jpg)
Predição de Tráfego
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61700
800
900
1000
1100
1200
1300
1400
1500
1600
Requests
Requests
![Page 17: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/17.jpg)
Predição de Tráfego
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61700
800
900
1000
1100
1200
1300
1400
1500
1600
Requests
RequestsLinear (Requests)
![Page 18: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/18.jpg)
Predição de Tráfego
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61700
800
900
1000
1100
1200
1300
1400
Requests
RequestsLinear (Requests)
![Page 19: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/19.jpg)
Predição de Tráfego
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61700
800
900
1000
1100
1200
1300
1400
1500
RequestsLinear (Requests)Predicted Traffic
![Page 20: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/20.jpg)
Ad Exchange
DSP DSP DSP
![Page 21: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/21.jpg)
Data Science
BIG DATA+
ESTATÍSTICA+
INTELIGÊNCIA ARTIFICIAL
![Page 22: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/22.jpg)
Data Science e Java
![Page 23: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/23.jpg)
Detectando comportamentos
Ad Exchange
DSP DSP DSP
![Page 24: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/24.jpg)
Detectando comportamentos
4-Jan
8-Jan12-Ja
n16-Ja
n20-Ja
n24-Ja
n28-Ja
n1-Fe
b5-Fe
b9-Fe
b13-Fe
b17-Fe
b21-Fe
b25-Fe
b1-M
ar5-M
ar9-M
ar
13-Mar
17-Mar
21-Mar
25-Mar
29-Mar
2-Apr
6-Apr
10-Apr
14-Apr
18-Apr
22-Apr
26-Apr0
2000000
4000000
6000000
8000000
10000000
12000000
POLITICSSOCCERJOBSFINANCE
Taxes
![Page 25: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/25.jpg)
Detectando comportamentosCurly Hair
Straight Hair
![Page 26: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/26.jpg)
Detectando comportamentos
Brazilian Java Developershttp://www.tailtarget.com/blog/getting-to-know-the-java-developer-using-data-science/
![Page 27: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/27.jpg)
Detectando comportamentos
Brazilian Java Developers
![Page 28: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/28.jpg)
Dado comportamental funciona?
![Page 29: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/29.jpg)
Detectando comportamento
Classifiers
Recommendation Systems
Clustering
Deep Learning
![Page 30: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/30.jpg)
FUTEBOL
User Profile
uid=123 : [FUTEBOL: 1 view, MASCULINO: 1 view]
Detectando comportamento
![Page 31: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/31.jpg)
![Page 32: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/32.jpg)
FUTEBOL
User Profile
uid=123 : [FUTEBOL: 10 views, MASCULINO: 20 views]
uid=123 : [FUTEBOL, MASCULINO]
Detectando comportamento
![Page 33: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/33.jpg)
Implementando um Classificador
Aprendizado Supervisionado
![Page 34: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/34.jpg)
1. PREPARAÇÃO DOS DADOS SementeSegmento
FUTEBOL
http://eleicoes.uol.com.brhttp://noticias.terra.com.br/brasil/politica/http://ultimosegundo.ig.com.br/politica/
País
BRASIL
MODA BRASIL
POLÍTICA
FAMÍLIA
BRASIL
BRASIL
http://esportes.terra.com.br/futebol/http://placar.abril.com.br/materia/http://esporte.uol.com.br/futebol/
http://www.fhits.com.brhttp://www.dolcemoda.com.br
http://guiadobebe.uol.com.br/http://mdemulher.abril.com.br/familia/
![Page 35: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/35.jpg)
1. PREPARAÇÃO DOS DADOS (CRAWLER) Páginas (3.000)Segmento
FUTEBOL
(…)
País
BRASIL
MODA BRASIL
POLÍTICA
FAMÍLIA
BRASIL
BRASIL
1. http://esportes.terra.com.br/futebol/internacional/mourinho-diz-que-rafa-benitez-destruiu-a-inter-de-milao-em-2010,3652d311ab33c95fb07b01f6ee27168ew7uvRCRD.html
2. http://esportes.terra.com.br/futebol/brasileiro-serie-a/pato-que-voltar-a-selecao-e-lamenta-nao-enfrentar-corinthians,ac2efed99a4ed4ed83198389e01961f7sih9RCRD.html
(…)
(…)
![Page 36: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/36.jpg)
2. APRENDIZADO SUPERVISIONADO
80%
TRE
INO
20%
TES
TE
Pages (3,000)Segment
SOCCER
(…)
Country
BRAZIL
FASHION BRAZIL
POLITICS
FAMILY
BRAZIL
BRAZIL
1. http://esportes.terra.com.br/futebol/internacional/mourinho-diz-que-rafa-benitez-destruiu-a-inter-de-milao-em-2010,3652d311ab33c95fb07b01f6ee27168ew7uvRCRD.html
2. http://esportes.terra.com.br/futebol/brasileiro-serie-a/pato-que-voltar-a-selecao-e-lamenta-nao-enfrentar-corinthians,ac2efed99a4ed4ed83198389e01961f7sih9RCRD.html
(…)
(…)
![Page 37: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/37.jpg)
3. ENCONTRANDO O MODELOTREINO
TESTE
TREINO
TESTE
MODELO C
98%
TREINO
TESTE
MODELO B
85%MODELO A
70%
![Page 38: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/38.jpg)
4. TESTANDO O MODELO
MATRIZ DE CONFUSÃO
FUTEBOL MODA POLÍTICA FAMÍLIA
FUTEBOL 997 0 1 2
MODA 1 999 0 0
POLÍTICA 3 1 995 1
FAMÍLIA 2 1 0 997
![Page 39: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/39.jpg)
4. MODELO EM AÇÃO
DATA SC
IENCE
Algorit
mo
MODELO C ASSUNTOPágina
RETREINO
TV
![Page 40: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/40.jpg)
DATA SC
IENCE
TAILT
ARGET
APLICANDO O MODELO
Português
MODELO
FUTEBOL
com.cybozu.labslangdetect
![Page 41: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/41.jpg)
DETECÇÃO DE PAÍS
Colombia: 8.999Argentina: 100
Mexico: 87Venezuela: 50
USA: 45
![Page 42: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/42.jpg)
Muitas vezes você não precisa processar todos os seus dados
![Page 43: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/43.jpg)
Usuários por interesse (TODOS)
FUTEBOL
EMPREGO
FINANÇAS
POLÍTICA
![Page 44: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/44.jpg)
Usuários por interesse (1%)
FUTEBOL
EMPREGOS
FINANÇAS
POLÍTICA
![Page 45: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/45.jpg)
Usuários por InteresseTODOS 1%
![Page 46: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/46.jpg)
Como selecionar uma boa amostra
Tamanho tem que ser representativo
Distribuição tem que ser homogênea
![Page 47: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/47.jpg)
Tamanho da Amostra
MAIOR A AMOSTRA = MAIOR A ACURÁCIA
![Page 48: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/48.jpg)
Tamanho da Amostra
Encontre a amostra mínima para o seu problema.
Se o banco de dados tem menos que o mínimo, use todos os dados.
![Page 49: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/49.jpg)
Distribuição da Amostra(Ex.: Redis)
SHARDING 1 SHARDING 2 SHARDING 3
ITEMS POR SHARDING = TAMANHO DA AMOSTRA / NÚMERO DE SHARDINGS
RANDOMKEYs RANDOMKEYs RANDOMKEYs
![Page 50: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/50.jpg)
ITEMS POR SHARDING = TAMANHO DA AMOSTRA / NÚMERO DE SHARDINGS
Distribuição da Amostra(Ex.: Redis)
SHARDING 1 SHARDING 2 SHARDING 3
RANDOMKEYs RANDOMKEYs RANDOMKEYs
Ta = Tamanho da AmostraTt = Tamanho TotalNa = Número de itens na amostraNt = Número de itens no tt
Nt = Na * Tt / TaExample:Ta = 1000Tt = 100,000Na = 400 Women
Nt = 400 * 100,000 / 1000 = 40,000
![Page 51: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/51.jpg)
E quando você não sabe quantos itens você tem?
Reservoir Sampling
1 2 3 4 5
A B C D E
F
Random (0..1): 0.7K = Ss / i K = 5 / 6 = 0.83Se K > Random => TROCA!
![Page 52: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/52.jpg)
Reservoir Sampling Distribuído
1 2 3 4 5
A B C D E F G H I J K L M N O P Q R S T U V X Y W Z
![Page 53: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/53.jpg)
Reservoir Sampling Distribuído
1 2 3 4 5
A B C D E F G H I J K L M N O P Q R S T U V X Y W Z
![Page 54: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/54.jpg)
Reservoir Sampling Distribuído
1 2 3 4 5
A:0.1 B:0.3 C:0.2 D:0.7 E:0.9 F:0.11 G:0.4 H:0.6 I:0.76
J:0.8 K:0.2 L:0.54 M:0.4 N:0.21 O:0.33 P:0.56 Q:0.32 R:0.23
S:0.21 T:0.32 U:0.22 V:0.7 X:0.12 Y:0.23 W:0.3 Z:0.76
private SortedMap<Double, MyObject> reservoir;
...if (reservoir.size() < SAMPLE_SIZE) { reservoir.put(score, myObject);} else if (score > reservoir.firstKey()) { reservoir.remove(reservoir.firstKey()); reservoir.put(score, myObject);}
![Page 55: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/55.jpg)
Reservoir Sampling Distribuído
O L P I Z
1 2 3 4 5
A:0.1 B:0.3 C:0.2 D:0.7 E:0.9 F:0.11 G:0.4 H:0.6 I:0.76
J:0.8 K:0.2 L:0.54 M:0.4 N:0.21 O:0.33 P:0.56 Q:0.32 R:0.23
S:0.21 T:0.32 U:0.22 V:0.7 X:0.12 Y:0.23 W:0.3 Z:0.76
H:0.6 D:0.7 E:0.9 F:0.11 I:0.76 R:0.23 Q:0.32 O:0.33 L:0.54 P:0.56 S:0.21 U:0.22 Y:0.23 T:0.32 Z:0.76
COMBINER
![Page 56: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/56.jpg)
Reservoir Sampling Distribuído
import org.apache.crunch.lib.Sample;
Sample.reservoirSample(PCollection<T> input, int sampleSize)
Apache Crunch:
![Page 57: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/57.jpg)
Por que Java?
Bibliotecas de machine learning
Drivers NoSQL
Perfomance
Conectores para Frameworks Distribuídos
![Page 58: Java e Data Science na Publicidade Digital Fabiane Bizinella Nardon @fabianenardon Chief Data Scientist da TailTarget](https://reader036.vdocuments.site/reader036/viewer/2022062502/570638691a28abb823903adf/html5/thumbnails/58.jpg)
Java e Data Science na Publicidade Digital
Fabiane Bizinella Nardon@fabianenardon
Chief Data Scientist na TailTarget