tech meetup data driven - codemotion

39
Special Codemotion Tech Meetup: Data Driven Innovation Antimo Musone IT Manager 17 Maggio 2016

Upload: antimo-musone

Post on 16-Apr-2017

190 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Title (Arial bold 30 point)

Special Codemotion Tech Meetup:Data Driven Innovation

Antimo MusoneIT Manager17 Maggio 2016

#

About MePresentation title

>Antimo Musone

IT Manager / Architect presso EY

Co - Founder Fifth Ingenum Srls.

Ing. Informatica II Universit degli Studi di Napoli

email: [email protected]

#

2

IndiceWhat is Machine Learning ?Predictive AnalyticsMachine OverviewDefining Predictive Analytics Supervised LearningUnsupervised LearningWatson Service Cortana Suite

#

What is Machine Learning ?

#

Machine learning can be described as computing systems that improve with experience. It can also be described as a method of turning data into software. The goal of machine learning is to program computers to use example data or past experience to solve a given problem.Introduction to Machine Learning, 2nd Edition, MIT PressWhatever term is used, the results remain the same; data scientists have successfully developed methods of creating software models that are trained from huge volumes of data and then used to predict certain patterns, trends, and outcomes. 4

Machine Learning / Predictive Analytics

Vision Analytics

Recommenda-tion engines

Advertising analysis

Weather forecasting for business planning

Social network analysis

Legal discovery and document archiving

Pricing analysis

Fraud detection

Churn analysis

Equipment monitoring

Location-based tracking and services

Personalized InsuranceMachine learning & predictive analytics are core capabilities that are needed throughout your business

#

Predictive analytics is can be simply defined as a way to scientifically use the past to predict the future to help drive desired outcomes.

Server & Tools Business 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.5/17/20165

Machine Learning OverviewFormal definition: The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience - Tom M. Mitchell

Another definition: The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Introduction to Machine Learning, 2nd Edition, MIT Press

ML often involves two primary techniques:

Supervised Learning: Finding the mapping between inputs and outputs using correct values to train a model

Unsupervised Learning: Finding patterns in the input data (similar to Density Estimates in Statistics)

#

Machine learning can be described as computing systems that improve with experience. It can also be described as a method of turning data into software. Whatever term is used, the results remain the same; data scientists have successfully developed methods of creating software models that are trained from huge volumes of data and then used to predict certain patterns, trends, and outcomes. Apprendimento SupervisionatoApprendimento Non Supervisionato6

Machine LearningData: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Rules, or Algorithms:about, Learning, language Spelling and sounding builds wordsLearning about language. Words build sentences

Learning, or Abstraction:Any new understanding proceeds from previous knowledge.

Data + Rules/ Algorithms = Machine Learning

#

Tecniche alla base del machine learning sono utilizzate da noi tutti i giorni.ad esempio il linguaggio Che composto da un Alfabeto ( dati ) e da regole come il suono di una vocale o di una parola o da regole come la costruzione di una fraseLa comprensione avviene da una conoscenza pregressa.

I Dati + Rule = Machine Learning7

Traditional programming VS Machine Learning Computer

DataProgramOutputTraditional Programming

DataOutputProgram/AlgorithmsMachine LearningProgram can predict the output! Computer

#

Under traditional programming models, programs and data are processed by the computer to produce a desired output, such as using programs to process data and produce a report

When working with machine learning, the processing paradigm is altered dramatically. The data and the desired output are reverse-engineered by the computer to produce a new program

The power of this new program is that it can effectively predict the output, based on the supplied input data. The primary benefit of this approach is that the resulting program that is developed has been trained (via massive quantities of learning data) and finely tuned (via feedback data about the desired output) and is now capable of predicting the likelihood of a desired output based on the provided data. 8

ML : No, more like gardeningGardener = You

Seeds = AlgorithmsNutrients = DataPlants = Programs

#

A classic example of predictive analytics can be found everyday on Amazon.com; there, every time you search for an item, you will be presented with an upsell section on the webpage that offers you additional catalog items because customers who bought this item also bought those items. This is a great example of using predictive analytics and the psychology of human buying patterns to create a highly effective marketing strategy9

ML Sample ApplicationWeb search Computational biologyFinanceE-commerceSpace explorationRoboticsInformation extractionSocial networksDebugging[Your favorite area]

#

Many examples of predictive analytics can be found literally everywhere today in our society: Spam/junk email filters These are based on the content, headers, origins, and even user behaviors (for example, always delete emails from this sender). Mortgage applications Typically, your mortgage loan and credit worthiness is determined by advanced predictive analytic algorithm engines. Various forms of pattern recognition These include optical character recognition (OCR) for routing your daily postal mail, speech recognition on your smart phone, and even facial recognition for advanced security systems. Life insurance Examples include calculating mortality rates, life expectancy, premiums, and payouts. Medical insurance Insurers attempt to determine future medical expenses based on historical medical claims and similar patient backgrounds. Liability/property insurance Companies can analyze coverage risks for automobile and home owners based on demographics. Credit card fraud detection This process is based on usage and activity patterns. In the past year, the number of credit card transactions has topped 1 billion. The popularity of contactless payments via near-field communications (NFC) has also increased dramatically over the past year due to smart phone integration. Airline flights Airlines calculate fees, schedules, and revenues based on prior air travel patterns and flight data. Web search page results Predictive analytics help determine which ads, recommendations, and display sequences to render on the page. Predictive maintenance This is used with almost everything we can monitor: planes, trains, elevators, cars, and yes, even data centers. Health care Predictive analytics are in widespread use to help determine patient outcomes and future care based on historical data and pattern matching across similar patient data sets.

10

What is Predictive Analytics?

Presentation titleWikipedia Definition: (http://en.wikipedia.org/wiki/Predictive_analytics) Predictive analytics encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. Facts

Predictions

Predictive Analytics Techniques

#

Breaking it Down Presentation titlePredictive analytics encompasses a variety of techniques from statistics, modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. Machine Learning Use of computer algorithms to derive complex formulations based on objectives and constraints Tools and Techniques Data visualization, segmentation, correlations Use in Predictive Analytics Predictive analytics is often applied in the context of datasets that are too large for manual analysis, so data mining techniques are required Statistics Focus on learning population characteristics based on samples of data

Tools and Techniques p-values, confidence intervals, sampling, ANOVA

Use in Predictive Analytics Underlying theory behind many parametric models observed facts are a sample from a population including both known/historic and unknown/future events

Modeling Representations of systems used to understand the underlying dynamics of the system

Tools and Techniques Symbolic logic, proxies

Use in Predictive Analytics Complex relationships can be simplified through modeling these models can then be used to analyze relationships between factors

#

What is a Model? A model is a simplified representation of observed effects Presentation title

Key terms: Dependent or target variable the variable of interest Independent or predictor variable(s) variable(s) used for explanation/prediction Effect the (quantitative) impact of an independent variable or combination of independent variables on the dependent variable Main Effect The direct effect of a single independent variable on the dependent variable Interaction Effect The effect of a combination of multiple independent variables on the dependent variable

#

Regression analysisis a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between adependent variableand one or moreindependent variables(or 'predictors')13

Two types of model A model is a simplified representation of observed effects Presentation title

StatisticalParametric Models

Effects are well-quantified and can be examined An equation can be used to represent the model Emphasis on explanation What causes the dependent variable to change? Test hypotheses p-values, confidence intervals

Machine Learning Non-parametric models

Effects may be unquantified (black box) No representative equation Model may be stochastic, so results my vary Emphasis on prediction What will the value of the next observation be? Generate hypotheses

#

14

Types of LearningSupervised (inductive) learningTraining data includes desired outputsDependent variable is knownMay be statistical or non-statisticalUnsupervised learningTraining data does not include desired outputsNo dependent variableNon-statisticalSemi-supervised learningTraining data includes a few desired outputs

#

Supervised learning is a type of machine learning algorithm that uses known datasets to create a model that can then make predictions. The known data sets are called and include input data elements along with known response values In the case of unsupervised machine learning, the task of making predictions becomes much harder. In this scenario, the machine learning algorithms are not provided with any kind of known data inputs or known outputs to generate a new predictive model. In the case of unsupervised machine learning, the success of the new predictive model depends entirely on the ability to infer and identify patterns, structures, and relationships in the incoming data set. 15

Machine Learning ProblemClassification or Categorization ClusteringRegressionDimensionality reduction

Supervised LearningUnsupervised LearningDiscreteContinuous

#

Classification algorithms These are used to classify data into different categories that can then be used to predict one or more discrete variables, based on the other attributes in the dataset. Regression algorithms These are used to predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. Clustering algorithms These determine natural groupings and patterns in datasets and are used to predict grouping classifications for a given variable. One of the most common unsupervised learning algorithms is known as which is used to find hidden patterns or groupings within data sets. Some common examples of cluster analysis classifications would include the following: Socioeconomic tiers Income, education, profession, age, number of children, size of city or residence, and so on. Psychographic data Personal interests, lifestyle, motivation, values, involvement. Social network graphs Groups of people related to you by family, friends, work, schools, professional associations, and so on. Purchasing patterns Price range, type of media used, intensity of use, choice of retail outlet, fidelity, buyer or nonbuyer, buying intensity. The other type of approach to unsupervised machine learning is to use a reward system, rather than any kind of teaching aids, as are commonly used in supervised learning. Positive and negative rewards are used to provide feedback to the predictive model when it has been successful.

16

What is Logistic Regression? Presentation title

Regression Models are a form of supervised learning that attempt to fit linear functions to training data the most common type of regression, linear regression, should be familiar to most of you as a best fit line

Logistic Regression is closely related to linear regression, but fits a different shape function by using a binomial link function on the dependent variable

#

Instatistical modeling,regression analysisis a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between adependent variableand one or moreindependent variables(or 'predictors')17

Machine Learning Example

Predict function F(X) for new examples XDiscrete F(X): ClassificationContinuous F(X): Regression

F(X) = Probability(X): Probability estimationGiven examples of a function (X, F(X))The probability of an event X, denoted F(X), represents the proportion of all events that have X as their outcome, and is typically represented as a decimal 0