web crawl with elixir
Post on 05-Apr-2017
178 views
TRANSCRIPT
Who am I?
• Jechol Lee ([email protected])
• Software engineer at Skelterlabs
• Loves elixir, elm, ruby
• We are HIRING!
SQLSite A ES
SaveQueue
Task
Page
Task
Item
Task
Item
product
req
Sup
Sup(site A)
Sup(simple1 for 1)
Task.SupTask.Sup
Sup(site B)
progress
start_link
Network Requests Overflow
Can't depend on random processing order.
Page 1 item
Page 2 item
Page 4
Page 3 item
Page 1 item
Page 2 itemPage 3 item
Page 3 item
Rate limit by TokenBucket
twotap.com
c21stores.com
WebProxy
PriorityQueue+
TokenBucket
HTTP
SPAWN
HTML Parser
C21
HTML Parser
C21
HTMLCrawler
C21
JSONCrawler
Lego
Fault-tolerance by Supervision Tree
Supervisor
Supervisor
C21
TaskSupervisor
ErrorMonitor
WebProxy
PriorityQueue+
TokenBucket
HTML Parser
C21
HTML Parser
C21SaveQueue
HTMLCrawler
C21
JSONCrawler
Lego
Tree for Multiple Crawlers
JSONCrawler
GNC
Supervisor
JCPenney
Supervisor
Supervisor
C21
TaskSupervisor
ErrorMonitor
WebProxy
PriorityQueue+
TokenBucket
HTML Parser
C21
HTML Parser
C21SaveQueue
HTMLCrawler
C21
JSONCrawler
Lego
Final
JSONCrawler
GNC
Supervisor
JCPenney
Supervisor
Supervisor
C21
TaskSupervisor
ErrorMonitor
sentry.io
MONITOR
{:DOWN, :page_not_found}
twotap.com
c21stores.com
WebProxy
PriorityQueue+
TokenBucket
HTTP
SPAWN
HTML Parser
C21
HTML Parser
C21
PRODUCT
DEMAND
DEMANDPRODUCT
SQL
ES
SaveQueue
HTMLCrawler
C21
JSONCrawler
Lego
GenServer vs Task
• Tasks don't provide services.→ No handle_call, etc.
• Just run a function and exit.
Task.async
vs
Task.Supervisor.async
Only later builds supervision relationship
so that visible using observer.