social network architecture - part 3. big data - machine learning

20
<SOCIAL NETWORK ARCHITECTURE> @DEV ZONE

Upload: phu-luong-trong

Post on 16-Jul-2015

105 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Social network architecture - Part 3. Big data - Machine learning

<SOCIAL NETWORK ARCHITECTURE>@DEV ZONE

Page 2: Social network architecture - Part 3. Big data - Machine learning

Overview architecture

Web Apps

Web Service APIs

Mobile Apps

4. Front-end

SSOUser

ranking

1. Core User

User Data Storage

Real-time Notification

News Feed

2. User Activity System

User Activity Storage

3. Others

Real-time Chat

Search System Suggestion System

3. Big Data System

Big Data Storage

External Apps

Service Data

UserAdministrator

Page 3: Social network architecture - Part 3. Big data - Machine learning

BIG Data

Page 4: Social network architecture - Part 3. Big data - Machine learning

Definitions

Dani Ariely defined:

Page 5: Social network architecture - Part 3. Big data - Machine learning

Definitions

• Wiki. Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set

• Intel. Big data opportunities emerge in organizations generating a median of 300 terabytes of data a week. The most common forms of data analyzed in this way are business transactions stored in relational databases, followed by documents, e-mail, sensor data, blogs, and social media.

• Microsoft. “Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”

• Oracle. Big data is the derivation of value from traditional relational database-driven business decision making, augmented with new sources of unstructured data.

Page 6: Social network architecture - Part 3. Big data - Machine learning

Definitions

• Gartner. The increasing size of data, the increasing rate at which it is produced and the increasing range of formats and representations employed. This report predated the term “dig data” but proposed a three-fold definition encompassing the “three Vs”: Volume, Velocity and Variety. This idea has since become popular and sometimes includes a fourth V: veracity, to cover questions of trust and uncertainty.

Page 7: Social network architecture - Part 3. Big data - Machine learning
Page 8: Social network architecture - Part 3. Big data - Machine learning

Definitions

IBM defined:

• Capture data

• Manage data

• Analyze data

Page 9: Social network architecture - Part 3. Big data - Machine learning

Big Data Architecture

Page 10: Social network architecture - Part 3. Big data - Machine learning

Data analysis

• Artificial Intelligence - AI• Machine learning

• Robotics

• Computer vision

Page 11: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Applications:• Data analysis: stock market, financial market, user action …

• Weather forecast

• Natural Language Processing

• Search engine

Page 12: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Methods:• Supervised learning

• Unsupervised learning

• Semi-supervised learning• Reinforcement learning

• Data mining• Data exploration

1 2 3 4

x y z

DATA SET

CLUSTERS

DATA SET

CLASSES

Page 13: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Supervised learning application• Classify data set

Supervised learning algorithms

1. Decision tree

2. Neuron network

3. Naive Bayes classifier

1. Decision tree

2. Neuron network

Page 14: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Problems & Solutions

Page 15: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Decision tree1. Outlook(sunny)Humidity(High)NO

2. Outlook(sunny)Humidity(Normal)YES

3. Outlook(overcast)Yes

4. Outlook(rainy)Windy(TRUE)NO

5. Outlook(rainy)Windy(FALSE)YES

Page 16: Social network architecture - Part 3. Big data - Machine learning

Machine learningBuild root node

• Evaluate attributes: Outlook, Temperature, Humidity, Windy• entropy(X) = −∑p(x)logp(x): x ∈ 𝑋

• info([2,3]) = entropy(2

5,3

5) = 0.971

• info([4,0]) = entropy(4

4,0

4) = 0

• info([3,2]) = entropy(3

5,2

5) = 0.971

• info([2,3], [4,0], [3,2]) = 5

14info 2,3 +

4

14info 4,0 +

5

14info 3,2 = 0.693

→Gain(outlook) = info([9,5]) - info([2,3], [4,0], [3,2]) = 0.247

→Gain(temp.) = 0.029→Gain(humidity) = 0.152→Gain(windy) = 0.048

Page 17: Social network architecture - Part 3. Big data - Machine learning

Machine learning

• Full decision tree

Page 18: Social network architecture - Part 3. Big data - Machine learning

Machine learning

Classifying Steps:

for( i=0; i<n; i++ ){1. Split data set

• Training set: n-1

• Test set: 1

• n=10 is optimal

2. Training→Model

3. Test→ Error rate

}

Page 19: Social network architecture - Part 3. Big data - Machine learning

Advances

• Build training set

• Reinforcement learning and applications

• Machine leaning algorithms

Page 20: Social network architecture - Part 3. Big data - Machine learning

Demo

• Training with data set

• Test model

• Classifying