middleware services for search, online ads and recommender

Post on 14-Jul-2015

93 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Middleware Service for Search, Ads and

Recommender

Yen-Yu Chen

Web ApplicationCommon Design Patterns

FE

BE

Client/Browser

Apache/Nginx

Other Services

Memcached

Redis

DB/MySQL

The Situations

Business

Promotions

Black lists

Special rules

Product

UX changes

Frequent releases

Low latency

A/B test

Engineering

Reuse of code modules

Isolation of modules

Configuration driven

Service/Operation

Live updates/No down time deployments

Monitoring

Less machines

The Situations

Research/Science

2nd/3rd Phase ranking

Feature extraction/calculation

Apply machine learned models

Many more A/B tests

Complicated tracking/loggings

Replay ability/offline simulation

Whole page awareness

Container

Middleware Serving Container

FE

BE

Client/Browser

Apache/Nginx

Container

Other Services

Memcached

Redis

DB/MySQL

MIDDLEWARE

The Middleware Serving Container

Let the WD focus on UX

An agile software development and deployment

Provide horizontal capabilities

Execution model

Communication mechanisms

Data marshalling

Engineers focus on application logic

Single request parallel execution ability

A production playground for Research/Science

ArchitectureBoo

st.

Asio

HTTP 1.1Standard In/Out

Processor

Processor

Processor

Processor

Processor

Processor

SPDY

Application Handler

Admin Handler

Clie

nt

Lib

rary

Thre

ad

Pool

Thre

ad

Pool

Search EngineBoo

st.

Asio

HTTP 1.1

XML Formatter

Search HandlerAdmin Handler

Query Parser

Inverted Index

XML Formatter

Spell Check

Query Parser

Inverted Index

Thre

ad

Pool

Thre

ad

Pool

Execution Model

Processor: logical unit of processing (module)

Workflow: directed acyclic graph stitched with processors

User Profile

Model B

Model A

Inverted Index

Cache

The Workflow

User Profile

Model B

Model A

Inverted Index

User Profile

Model BModel A

Inverted Index

CacheCache Hit

START

BRANCHKnown User Unknown User

END

END

END

Cache Miss

Control flow vs. U shape

1. Different types of processors: BRANCH, FORK, JOIN etc.

2. Hard to describe in configuration

3. Early exit makes workflow complicated

4. Code path might be complicated

1. One of a kind processor

2. Configuration is simple as a chain of processors

3. Easy to exit early

4. Fixed/Limited code path: easy for testing and debugging

5. Natural for cache layers

6. Keep application logic together

7. Easily to split into different containers

User Profile

Model B

Model A

Inverted Index

Cache

The Processor

All implemented the virtual function “Match”

Container calls the “Match” function in each processor along the workflow

Built as a shared object, dynamically linked library

Container opens and loads the Processor form a .so file(Java: OSGi bundle as a jar file)

Support live updates

Execution interface

Result Match(Query query, Execution execution){

// could do something with query// downward part in the U shape

Result result = execution.match(query);

// could do something with result// upward part in the U shape

return result;}

Ads Serving EngineBoo

st.

Asio

HTTP 1.1

Search HandlerAdmin Handler

XML Formatter

User Profile

Query Parser

Inverted Index

Clie

nt

Lib

rary

Thre

ad

Pool

Thre

ad

Pool

Change for Asynchronous Calls

Match(Query query, Result result, Execution execution){

// do something// downward part in the U shapeexecution.match(query, result, execution);

}

Deliver(Query query, Result result, Execution execution){

// do something// upward part in the U shapeexecution.deliver(query, result, execution);

}

User Profile

Thread pools

Separate I/O thread and Worker thread into two different pools

Asynchronous calls make sense on when there will be waiting/idling

For example: calling for out-of-box services

Keeping a thread busy without switching tasks is more efficient

Administration & Operation Interface

Two virtual functions of the Processor

Get_status: to show the processor specific status

Exec_cmd: to execute a specific task inside the processor

No down time application deployment

Update configuration without code change

Deploy code change from another shared object file

Visualized Configuration

Production Configuration

Replay & Offline Simulation

Some people do:

Have another set of code to simulate

Some other people do:

Have another setup identical to production system

Prepare the log, copy over to simulation clients

Have multiple clients sending requests and saving results

Copy the result back to your research platform

Configure to use the standard I/O interface

Utilize the Hadoop streaming to simulate over hundreds of machines

Must-have for efficient research

Recommender EngineBoo

st.

Asio

HTTP 1.1

Search HandlerAdmin Handler

User Profile

Model A

Model B

Inverted Index

Clie

nt

Lib

rary

User Profile

Model A

Model B

Redis Adapter

Thre

ad

Pool

Thre

ad

Pool

EcosystemClient/Browser/App

Frontend

MiddlewareServing

Container

Hadoop

Data

Hig

hw

ay

Back Ends

Models

Indexes

RDBMS

Data

Thank You!

Have an A1 day (-:

top related