middleware services for search, online ads and recommender
TRANSCRIPT
Middleware Service for Search, Ads and
Recommender
Yen-Yu Chen
Web ApplicationCommon Design Patterns
FE
BE
Client/Browser
Apache/Nginx
Other Services
Memcached
Redis
DB/MySQL
The Situations
Business
Promotions
Black lists
Special rules
Product
UX changes
Frequent releases
Low latency
A/B test
Engineering
Reuse of code modules
Isolation of modules
Configuration driven
Service/Operation
Live updates/No down time deployments
Monitoring
Less machines
The Situations
Research/Science
2nd/3rd Phase ranking
Feature extraction/calculation
Apply machine learned models
Many more A/B tests
Complicated tracking/loggings
Replay ability/offline simulation
Whole page awareness
Container
Middleware Serving Container
FE
BE
Client/Browser
Apache/Nginx
Container
Other Services
Memcached
Redis
DB/MySQL
MIDDLEWARE
The Middleware Serving Container
Let the WD focus on UX
An agile software development and deployment
Provide horizontal capabilities
Execution model
Communication mechanisms
Data marshalling
Engineers focus on application logic
Single request parallel execution ability
A production playground for Research/Science
ArchitectureBoo
st.
Asio
HTTP 1.1Standard In/Out
Processor
Processor
Processor
Processor
Processor
Processor
SPDY
Application Handler
Admin Handler
Clie
nt
Lib
rary
Thre
ad
Pool
Thre
ad
Pool
Search EngineBoo
st.
Asio
HTTP 1.1
XML Formatter
Search HandlerAdmin Handler
Query Parser
Inverted Index
XML Formatter
Spell Check
Query Parser
Inverted Index
Thre
ad
Pool
Thre
ad
Pool
Execution Model
Processor: logical unit of processing (module)
Workflow: directed acyclic graph stitched with processors
User Profile
Model B
Model A
Inverted Index
Cache
The Workflow
User Profile
Model B
Model A
Inverted Index
User Profile
Model BModel A
Inverted Index
CacheCache Hit
START
BRANCHKnown User Unknown User
END
END
END
Cache Miss
Control flow vs. U shape
1. Different types of processors: BRANCH, FORK, JOIN etc.
2. Hard to describe in configuration
3. Early exit makes workflow complicated
4. Code path might be complicated
1. One of a kind processor
2. Configuration is simple as a chain of processors
3. Easy to exit early
4. Fixed/Limited code path: easy for testing and debugging
5. Natural for cache layers
6. Keep application logic together
7. Easily to split into different containers
User Profile
Model B
Model A
Inverted Index
Cache
The Processor
All implemented the virtual function “Match”
Container calls the “Match” function in each processor along the workflow
Built as a shared object, dynamically linked library
Container opens and loads the Processor form a .so file(Java: OSGi bundle as a jar file)
Support live updates
Execution interface
Result Match(Query query, Execution execution){
// could do something with query// downward part in the U shape
Result result = execution.match(query);
// could do something with result// upward part in the U shape
return result;}
Ads Serving EngineBoo
st.
Asio
HTTP 1.1
Search HandlerAdmin Handler
XML Formatter
User Profile
Query Parser
Inverted Index
Clie
nt
Lib
rary
Thre
ad
Pool
Thre
ad
Pool
Change for Asynchronous Calls
Match(Query query, Result result, Execution execution){
// do something// downward part in the U shapeexecution.match(query, result, execution);
}
Deliver(Query query, Result result, Execution execution){
// do something// upward part in the U shapeexecution.deliver(query, result, execution);
}
User Profile
Thread pools
Separate I/O thread and Worker thread into two different pools
Asynchronous calls make sense on when there will be waiting/idling
For example: calling for out-of-box services
Keeping a thread busy without switching tasks is more efficient
Administration & Operation Interface
Two virtual functions of the Processor
Get_status: to show the processor specific status
Exec_cmd: to execute a specific task inside the processor
No down time application deployment
Update configuration without code change
Deploy code change from another shared object file
Visualized Configuration
Production Configuration
Replay & Offline Simulation
Some people do:
Have another set of code to simulate
Some other people do:
Have another setup identical to production system
Prepare the log, copy over to simulation clients
Have multiple clients sending requests and saving results
Copy the result back to your research platform
Configure to use the standard I/O interface
Utilize the Hadoop streaming to simulate over hundreds of machines
Must-have for efficient research
Recommender EngineBoo
st.
Asio
HTTP 1.1
Search HandlerAdmin Handler
User Profile
Model A
Model B
Inverted Index
Clie
nt
Lib
rary
User Profile
Model A
Model B
Redis Adapter
Thre
ad
Pool
Thre
ad
Pool
EcosystemClient/Browser/App
Frontend
MiddlewareServing
Container
Hadoop
Data
Hig
hw
ay
Back Ends
Models
Indexes
RDBMS
Data
Thank You!
Have an A1 day (-: