social network architecture - part 3. big data - machine learning

Post on 16-Jul-2015

105 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

<SOCIAL NETWORK ARCHITECTURE>@DEV ZONE

Overview architecture

Web Apps

Web Service APIs

Mobile Apps

4. Front-end

SSOUser

ranking

1. Core User

User Data Storage

Real-time Notification

News Feed

2. User Activity System

User Activity Storage

3. Others

Real-time Chat

Search System Suggestion System

3. Big Data System

Big Data Storage

External Apps

Service Data

UserAdministrator

BIG Data

Definitions

Dani Ariely defined:

Definitions

• Wiki. Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set

• Intel. Big data opportunities emerge in organizations generating a median of 300 terabytes of data a week. The most common forms of data analyzed in this way are business transactions stored in relational databases, followed by documents, e-mail, sensor data, blogs, and social media.

• Microsoft. “Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”

• Oracle. Big data is the derivation of value from traditional relational database-driven business decision making, augmented with new sources of unstructured data.

Definitions

• Gartner. The increasing size of data, the increasing rate at which it is produced and the increasing range of formats and representations employed. This report predated the term “dig data” but proposed a three-fold definition encompassing the “three Vs”: Volume, Velocity and Variety. This idea has since become popular and sometimes includes a fourth V: veracity, to cover questions of trust and uncertainty.

Definitions

IBM defined:

• Capture data

• Manage data

• Analyze data

Big Data Architecture

Data analysis

• Artificial Intelligence - AI• Machine learning

• Robotics

• Computer vision

Machine learning

Applications:• Data analysis: stock market, financial market, user action …

• Weather forecast

• Natural Language Processing

• Search engine

Machine learning

Methods:• Supervised learning

• Unsupervised learning

• Semi-supervised learning• Reinforcement learning

• Data mining• Data exploration

1 2 3 4

x y z

DATA SET

CLUSTERS

DATA SET

CLASSES

Machine learning

Supervised learning application• Classify data set

Supervised learning algorithms

1. Decision tree

2. Neuron network

3. Naive Bayes classifier

1. Decision tree

2. Neuron network

Machine learning

Problems & Solutions

Machine learning

Decision tree1. Outlook(sunny)Humidity(High)NO

2. Outlook(sunny)Humidity(Normal)YES

3. Outlook(overcast)Yes

4. Outlook(rainy)Windy(TRUE)NO

5. Outlook(rainy)Windy(FALSE)YES

Machine learningBuild root node

• Evaluate attributes: Outlook, Temperature, Humidity, Windy• entropy(X) = −∑p(x)logp(x): x ∈ 𝑋

• info([2,3]) = entropy(2

5,3

5) = 0.971

• info([4,0]) = entropy(4

4,0

4) = 0

• info([3,2]) = entropy(3

5,2

5) = 0.971

• info([2,3], [4,0], [3,2]) = 5

14info 2,3 +

4

14info 4,0 +

5

14info 3,2 = 0.693

→Gain(outlook) = info([9,5]) - info([2,3], [4,0], [3,2]) = 0.247

→Gain(temp.) = 0.029→Gain(humidity) = 0.152→Gain(windy) = 0.048

Machine learning

• Full decision tree

Machine learning

Classifying Steps:

for( i=0; i<n; i++ ){1. Split data set

• Training set: n-1

• Test set: 1

• n=10 is optimal

2. Training→Model

3. Test→ Error rate

}

Advances

• Build training set

• Reinforcement learning and applications

• Machine leaning algorithms

Demo

• Training with data set

• Test model

• Classifying

top related