social network architecture - part 3. big data - machine learning
Post on 16-Jul-2015
105 Views
Preview:
TRANSCRIPT
<SOCIAL NETWORK ARCHITECTURE>@DEV ZONE
Overview architecture
Web Apps
Web Service APIs
Mobile Apps
4. Front-end
SSOUser
ranking
1. Core User
User Data Storage
Real-time Notification
News Feed
2. User Activity System
User Activity Storage
3. Others
Real-time Chat
Search System Suggestion System
3. Big Data System
Big Data Storage
…
External Apps
Service Data
UserAdministrator
BIG Data
Definitions
Dani Ariely defined:
Definitions
• Wiki. Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set
• Intel. Big data opportunities emerge in organizations generating a median of 300 terabytes of data a week. The most common forms of data analyzed in this way are business transactions stored in relational databases, followed by documents, e-mail, sensor data, blogs, and social media.
• Microsoft. “Big data is the term increasingly used to describe the process of applying serious computing power—the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”
• Oracle. Big data is the derivation of value from traditional relational database-driven business decision making, augmented with new sources of unstructured data.
Definitions
• Gartner. The increasing size of data, the increasing rate at which it is produced and the increasing range of formats and representations employed. This report predated the term “dig data” but proposed a three-fold definition encompassing the “three Vs”: Volume, Velocity and Variety. This idea has since become popular and sometimes includes a fourth V: veracity, to cover questions of trust and uncertainty.
Definitions
IBM defined:
• Capture data
• Manage data
• Analyze data
Big Data Architecture
Data analysis
• Artificial Intelligence - AI• Machine learning
• Robotics
• Computer vision
Machine learning
Applications:• Data analysis: stock market, financial market, user action …
• Weather forecast
• Natural Language Processing
• Search engine
Machine learning
Methods:• Supervised learning
• Unsupervised learning
• Semi-supervised learning• Reinforcement learning
• Data mining• Data exploration
1 2 3 4
x y z
DATA SET
CLUSTERS
DATA SET
CLASSES
Machine learning
Supervised learning application• Classify data set
Supervised learning algorithms
1. Decision tree
2. Neuron network
3. Naive Bayes classifier
…
1. Decision tree
2. Neuron network
Machine learning
Problems & Solutions
Machine learning
Decision tree1. Outlook(sunny)Humidity(High)NO
2. Outlook(sunny)Humidity(Normal)YES
3. Outlook(overcast)Yes
4. Outlook(rainy)Windy(TRUE)NO
5. Outlook(rainy)Windy(FALSE)YES
Machine learningBuild root node
• Evaluate attributes: Outlook, Temperature, Humidity, Windy• entropy(X) = −∑p(x)logp(x): x ∈ 𝑋
• info([2,3]) = entropy(2
5,3
5) = 0.971
• info([4,0]) = entropy(4
4,0
4) = 0
• info([3,2]) = entropy(3
5,2
5) = 0.971
• info([2,3], [4,0], [3,2]) = 5
14info 2,3 +
4
14info 4,0 +
5
14info 3,2 = 0.693
→Gain(outlook) = info([9,5]) - info([2,3], [4,0], [3,2]) = 0.247
→Gain(temp.) = 0.029→Gain(humidity) = 0.152→Gain(windy) = 0.048
Machine learning
• Full decision tree
Machine learning
Classifying Steps:
for( i=0; i<n; i++ ){1. Split data set
• Training set: n-1
• Test set: 1
• n=10 is optimal
2. Training→Model
3. Test→ Error rate
}
Advances
• Build training set
• Reinforcement learning and applications
• Machine leaning algorithms
Demo
• Training with data set
• Test model
• Classifying
top related