uerj201212
TRANSCRIPT
marvin@goldenheart ~ $ ssh root@deepthought****WELCOME TO 1 OF YOUR 38,157,987 SERVERS. TRY THE VEAL. IT'S THE BEST IN THIS FARM.****
root@deepthought ~ $ tail -f /var/log.txt
COMO ACESSAR OS LOGS?
RFC 3164: SYSLOG
<34>Oct 11 22:14:15 mymachine su: 'su root' failed for lonvick on /dev/pts/8
<priority = facility*8+severity><date/time><host><process><message>
CHAVE: VALOR
<34>Oct 11 22:14:15 mymachine su: 'su root' failed for lonvick on /dev/pts/8
message
facility AUTH
severity CRITICAL
host mymachine
process su
date 20121011
time 221415
text su, root, failed, for, lonvick, on, /dev/pts/8
<TERM, DOC*>
ABACATE ➜ 12ABACAXI ➜ 1, 3, 9BANANA ➜ 2, 3, 42CAJU ➜ 3, 11, 42, 50DAMASCO ➜ 17, 18, 19...
(cajá~ || bana*) -damasco
Registros que contenham algo parecido com cajá (talvez cajú) ou algo que comece com
bana, mas que definitivamente não contenham
damasco.
QUERIES
ORDENAÇÃO SELEÇÃO
Como encontrar os 1000 menores inteiros de um total de 100M+ em tempo hábil?
introSelect(data, 1000) ➜ 300ms
ORDENAÇÃO SELEÇÃO public void select(int[] data, int begin, int end, int m) {
if (begin >= end) return;
m = Math.min(m, end - begin);
for (int depth = 0; depth < MAX_DEPTH; depth++) {
int pivot = partition(data, begin, end, begin + (end - begin) / 2); int d = pivot - begin + 1;
if (d == m) {
return;
} else if (m < d) {
end = pivot;
} else {
m -= d;
begin = pivot + 1;
}
}
//fall back to heap sort when too deep
heap.select(data, begin, end, m);
}
<TERM, <DOC, POSITION*>*>
ABACATE ➜ <12, 1, 2> ABACAXI ➜ <1, 1>, <3, 2>, <9, 1>BANANA ➜ <2, 1, 5>, <3, 2>, <10, 1>...
ÍNDICE
busca
ids+positions
STORAGE
ids+positions
ids+positions
ids+positions mensagens
Thread #1
Thread #2
.
.
.Thread #n
BUSCAS NO HISTÓRICO
http => avg(cputime#) - avg(systemtime#)
<search> <property>
<parseNumber>
<aggregation>
<property>
<parseNumber>
<aggregation>
<subtraction>
<aggregationQuery>
type:http status:404
=> :count
=> :avglast(5) c5, :avglast(60) c60
=> c5 > 1.25*c60 as alarm
=> alarm != alarm:prev as changed
=> alarm by host every minute if changed
Usar HashSet<T> significaria manter todos os session_ids em memória.
DISTINCT COUNT
CARDINALIDADE
http => dcount(session_id)every hour
DISTINCT COUNTCARDINALIDADE
HYPERLOGLOG SKETCH
hash(input)observar padrões de bitsacumular estimadores
DISTINCT COUNTCARDINALIDADE
HYPERLOGLOG SKETCH
hash(input)observar padrões de bitsacumular estimadores
"m" estimadores de 5 bitsStd error: 104% / sqrt(m)m=216 ➜ 40KiB ➜ ~0.41%
MULTICAST
JChannel channel = new JChannel();channel.setReceiver(new ReceiverAdapter() { public void receive(Message msg) { System.out.println( msg.getSrc() + ": " + msg.getObject()); }});
channel.connect("meuCanalDeChat");
BufferedReader reader = new BufferedReader( new InputStreamReader(System.in));while(true) { String line = reader.readLine(); channel.send(null, line);}
engine
engine
engine
last 10 "http_status:404"
before {id:84324814}
10
10
10
10
usuário
mergesort, take 10
BUSCA
sum(time) + sum(time) + sum(time)count(time) + count(time) + count(time)
AGREGAÇÃO
engine engine engine