data analytics basics & understanding
TRANSCRIPT
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 1/37
Introduction toData Analytics
Prof. Rudra Pradhan
IIT Kharagpur
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 2/37
Preamble
• What is data Analytics
• Why is it?
• How is different to data analysis
•What are its requirements
• Course coverage
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 3/37
What is Data Analytics
• Analytics is the discovery and communication of meaningful patterns in
data.
• Especially valuable in areas rich with recorded information, analytics
relies on the simultaneous application of statistics, econometrics, computer programming and operations research to quantify performance.
• Analytics often favors data visualization to communicate insight.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 4/37
What is Data Analysis
• Analysis of data is a process of inspecting, cleaning,transforming, and modeling data with the goal of discovering
useful information, suggesting conclusions, and supporting
decision maing.
• Data analysis has multiple facets and approaches,
encompassing diverse techniques under a variety of names, in
different business, science, and social science domains.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 5/37
Related Issues
• Data mining is a particular data analysis technique that focuses on
modeling and nowledge discovery for predictive rather than purely
descriptive purposes.
•Business intelligence covers data analysis that relies heavily onaggregation, focusing on business information.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 6/37
Structure of Data Analysis
• !escriptive statistics
• "#ploratory data analysis $"!A%
• Confirmatory data analysis $C!A%
"!A focuses on discovering new features in the data, while C!A is on
confirming or falsifying e#isting hypotheses.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 7/37
Related Issues
• &redictive analytics and te#t analytics'
&A focuses on application of statistical or structural models for
predictive forecasting, while (A applies statistical, and structural
techniques to e#tract and classify information from te#tual sources,
a species of unstructured data.
• !ata integration is a precursor to data analysis, and data analysis is
closely lined to data visualization and data dissemination.
• (he term data analysis is sometimes used as a synonym for data
modeling.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 8/37
Analytics Vs. Analysis
• Analytics is a multi)dimensional discipline. (here is e#tensive use of
mathematics and statistics, the use of descriptive techniques and predictive
models to gain valuable nowledge from data ) data analysis. (he insights
from data are used to recommend action or to guide decision maing
rooted in business conte#t.
• Analytics is not so much concerned with individual analyses or analysis
steps, but with the entire methodology. (here is a pronounced tendency to
use the term analytics in business settings e.g. te#t analytics vs. the more
generic te#t mining to emphasize this broader perspective.
•
Advanced analytics, typically used to describe the technical aspects ofanalytics, especially predictive modeling, machine learning techniques lie
artificial neural networs.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 9/37
Why Data Analytics
• ar!eting optimi"ation
• Portfolio management
• Ris! management
• Stoc! mar!et prediction
• #inancial mar!et forecasting
• Digital analytics
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 10/37
Few Questions
• How to set a perfect path?
• Do you need support?
•
Do you need criteria?• Do you need tricks?
• Is it reliable?
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 11/37
Principles of odelling
Object/ System
hy? hat arewe lookin! for
"ind? hat do wewant to know
#odel$ariable% &arameters
#odel &rediction
$alid%
Accepted predictions
'est
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 12/37
Basic $nderstandings
• Data
• Variables
• Scaling
• odels% S&
• 'ools% statistics( mathematics( econometrics( operation research
• Statistical odeling
• athematical odeling
• Soft )omputing
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 13/37
odeling Structure
• 'heory
• Assumptions
• *b+ectives
• )onstraints
*odelling' it shows the relationships, direct and indirect, interrelationships ofactions and reactions in terms of cause and effect.
(wo types' !escriptive and predictive
+oth dynamic and static
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 14/37
()
E,amples of the !ind of problems that
may be solved by an Econometrician
(. 'estin! whether *nancial markets are weak+forminformationally e,cient.
-. 'estin! whether the A&# or A&' represent superior
models for the determination of returns on risky assets.
. #easurin! and forecastin! the 0olatility of bond returns.
). 12plainin! the determinants of bond credit ratin!s usedby the ratin!s a!encies.
3. #odellin! lon!+term relationships between prices ande2chan!e rates
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 15/37
(3
E,amples of the !ind of problems that
may be solved by an Econometrician -cont.d/
4. Determinin! the optimal hed!e ratio for a spot position inoil.
5. 'estin! technical tradin! rules to determine which makes
the most money.
6. 'estin! the hypothesis that earnin!s or di0idendannouncements ha0e no e7ect on stock prices.
8. 'estin! whether spot or futures markets react more rapidly
to news.
(9."orecastin! the correlation between the returns to thestock indices of two countries.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 16/37
(4
• Frequency & quantity of data
toc maret prices are measured every time there is a trade or
somebody posts a new quote.
• Quality
-ecorded asset prices are usually those at which the transaction too
place. o possibility for measurement error but financial data are /noisy0.
What are the Special )haracteristics
of #inancial Data0
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 17/37
(5
'ypes of Data and 1otation
• (here are 1 types of data which econometricians might use for analysis'
2. (ime series data
3. Cross)sectional data
1. &anel data, a combination of 2. 4 3.
• (he data may be quantitative $e.g. e#change rates, stoc prices, number ofshares outstanding%, or qualitative $e.g. day of the wee%.
• "#amples of time series data
Series Frequency
5& or unemployment monthly, or quarterly
government budget deficit annually
money supply weely
value of a stoc maret inde# as transactions occur
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 18/37
(6
'ypes of Data and 1otation -cont.d/
• Examples of Problems that Could be Tackled Usin a Time Series !eression
) How the value of a country6s stoc inde# has varied with that country6s
macroeconomic fundamentals.
) How the value of a company6s stoc price has varied when it announced the
value of its dividend payment.
) (he effect on a country6s currency of an increase in its interest rate
• Cross)sectional data are data on one or more variables collected at a single
point in time, e.g.) A poll of usage of internet stoc broing services
) Cross)section of stoc returns on the ew 7or toc "#change
) A sample of bond credit ratings for 89 bans
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 19/37
(8
'ypes of Data and 1otation -cont.d/
• Examples of Problems that Could be Tackled Usin a Cross"Sectional !eression
) (he relationship between company size and the return to investing in its shares
) (he relationship between a country6s 5!& level and the probability that the
government will default on its sovereign debt.
• &anel !ata has the dimensions of both time series and cross)sections, e.g. the
daily prices of a number of blue chip stocs over two years.
• :t is common to denote each observation by the letter t and the total number of
observations by T for time series data, and to to denote each observation by the
letter i and the total number of observations by # for cross)sectional data.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 20/37
-9
• :t is preferable not to wor directly with asset prices, so we usually convert theraw prices into a series of returns. (here are two ways to do this'
imple returns or log returns
where, !t denotes the return at time t
pt denotes the asset price at time t
ln denotes the natural logarithm
• We also ignore any dividend payments, or alternatively assume that the priceseries have been already ad;usted to account for them.
Returns in #inancial odelling
<2==2
2
×−
=−
−t
t t t p
p p ! <2==ln
2×
= −t
t t
p
p !
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 21/37
-(
• (he returns are also nown as log price relatives, which will be used throughout this
boo. (here are a number of reasons for this'
2. (hey have the nice property that they can be interpreted as continuously
compounded returns.
3. Can add them up, e.g. if we want a weely return and we have calculated daily log returns'
r 2 > ln p2p= > ln p2 ) ln p=
r 3 > ln p3p2 > ln p3 ) ln p2
r 1 > ln p1p3 > ln p1 ) ln p3
r @ > ln p@p1 > ln p@ ) ln p1
r > ln pp@ > ln p ) ln p@
ln p ) ln p= > ln pp=
2og Returns
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 22/37
--
• (here is a disadvantage of using the log)returns. (he simple return on a
portfolio of assets is a weighted average of the simple returns on the
individual assets'
• +ut this does not wor for the continuously compounded returns.
A Disadvantage of using 2og Returns
! $ ! pt ip it
i
#
==
∑2
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 23/37
-
Steps involved in the formulation of
econometric models
"conomic or Binancial (heory $&revious tudies%
Bormulation of an "stimable (heoretical *odel
Collection of !ata
*odel "stimation
:s the *odel tatistically Adequate?
o 7es
-eformulate *odel :nterpret *odel
8se for Analysis
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 24/37
-)
2. !oes the paper involve the development of a theoretical model or is it
merely a technique looing for an application, or an e#ercise in data
mining?
3. :s the data of /good quality0? :s it from a reliable source? :s the size of
the sample sufficiently large for asymptotic theory to be invoed?
1. Have the techniques been validly applied? Have diagnostic tests for violations of been conducted for any assumptions made in the
estimation
of the model?
Some Points to )onsider 3hen reading papers
in the academic finance literature
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 25/37
-3
@. Have the results been interpreted sensibly? :s the strength of the results
e#aggerated? !o the results actually address the questions posed by the
authors?
. Are the conclusions drawn appropriate given the results, or has the
importance of the results of the paper been overstated?
Some Points to )onsider 3hen reading papers
in the academic finance literature -cont.d/
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 26/37
*b+ectives of Data Analytics
• Data reduction
• Structural simplification
• Analysis of dependence
• Analysis of interdependence
• Prediction& #orecasting
• 4ypotheses construction and testing
• Strategy and policy implications
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 27/37
)ourse odules
odule 5% Basic Applied Econometrics
+asics, probability distribution, regression analysis, issues and problems ofregression analysis
odule 6% Advanced Econometrics
--*, &!*, "*odule 7% 'ime series Econometrics
:ntegration and co)integration, DA- modelling, volatility modelling, bootstrapping
odule 8% *ptimi"ation 'ools
imple E&&, :nteger programming, 5oal programming, imulation, AH&, WE&
odule 9% Soft computing
A, BE, 5A, D*
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 28/37
odelling Structure
• $nivariate structure
Central tendency, dispersion, sewness, urtosis
• Bivariate structure
Covariance, correlation, regression
• ultivarate structure
Correlation, regression, factor analysis, con;oint analysis, cluster analysis, path
analysis, *!, AH&, "*
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 29/37
Statistical Modelling: A BasicFraewor!
Object/ System:esearch Desi!n/hoice/ reati0ity
;ni0ariate#odellin! #ulti0ariate#odellin!
Data Analysis
Interpretation andonclusion
<i0ariate#odellin!
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 30/37
Research Process
Step ": #e$ne Research Pro%leStep &: Re'iew of (iterature
)Re'iew concepts and theories*
Re'iew pre'ious research $nding+
Step ,: Forulate -pothesesStep /: Research #esign
Step 0: #ata 1ollection
Step 2: #ata Analsis
Step 3: Interpretation
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 31/37
Soft commuting% Basics
• Soft computing is a term applied to a field within computer science which
is characterized by the use of ine#act solutions to computationally hard
tass such as the solution of non)deterministic polynomial $&%) complete
problems, for which there is no nown algorithm that can compute an
e#act solution in polynomial time.
• Soft computing differs from conventional $hard% computing in that, unlie
hard computing, it is tolerant of imprecision, uncertainty, partial truth, and
appro#imation. :n effect, the role model for soft computing is the human
mind.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 32/37
'ools of Soft )omputing
•
Artificial neural networs $A%• upport Dector *achines $D*%
• Buzzy logic $BE%
• "volutionary computation $"C%, including'
– "volutionary algorithms
•
5enetic algorithms• !ifferential evolution
– *etaheuristic and warm :ntelligence
• Ant colony optimization
• &article swarm optimization
• :deas about probability including'
– +ayesian networ
• Chaos theory
• Wavelet analysis
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 33/37
$ni:variate Statistics
• entral 'endency
• Dispersion
•
Skewness• =urtosis
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 34/37
<i+0ariate Statistics
• o0ariance
• orrelation
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 35/37
Why ultivariate odelling
• Applicability% )lient fields use these techni;ues
• <uantification% )reate the habit of loo!ing at the strength of a
relationship( not +ust the significance=
• )reativity% a!e introductory statistics give techni;ues that let
students e,press their o3n interests=• Empo3erment% ove from parado, to understanding.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 36/37
-ow to Teach Multi'ariateModelling to Intro. Students
Replace alge%ra with coputation4 siulationand geoetr.
• Siulation:
1on$dence inter'als 'ia %ootstrapping*hpothesis testing 'ia randoi5ation ofe6planator 'aria%les.
•
7eoetr:Regression as pro8ection* A9;A as
Pthagorean 'ector decoposition* p<'alues fro su%tended angles.
8/12/2019 Data Analytics Basics & Understanding
http://slidepdf.com/reader/full/data-analytics-basics-understanding 37/37
Data #odellin! and &acka!edSoftware
• SPSS• =;I=>S• MI1RFIT• 7A?SS• (IM#=P• MAT(AB•
AMS• MI9ITAB• STATISTI1A• RATS• S@STAT• STATA• (IS=RA(• SAS• TSP• S-AAM• #=A