quantitative analysis report

Upload: manoj-hariharan

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Quantitative Analysis Report

    1/28

    STATISTICAL INFERENCES AND

    REGRESSION ANALYSIS IN CRICKET

    SUBMITTED BY

    GAGANDEEP SINGH12PGP015

    MANOJ H - 12PGP026

    NIKESH AGARWAL - 12PGP030

    SOURAV MONDAL - 12PGP042

    VIJAYKRISHNAN G - 12PGP016

  • 7/30/2019 Quantitative Analysis Report

    2/28

    i

    ABSTRACT

    Cricket is a sport which employs extensive statistical tools for representation and analysis of

    data. We, in this project, intended to find how the impact of toss differs on the results of day and

    day-night matches. For the purpose of this statistical inference, we used the hypothesis testing of

    two population tool to study the mean of both day and day-night population. The findings showed

    that toss has a very minimum difference in impact on the result between the day and day-night

    matches. We have also studied and estimated with ninety percent confidence, the likely target

    interval for runs scored by Indian team while chasing against Pakistan using single population

    estimation. This was done with the help of the population which contained all the matches where

    India faced Pakistan and batted second. In addition to these, we studied the compensation of IPL

    players and tried to establish the relationship between the players skill using their statistical

    attributes, and the compensation they are paid using the simple linear regression and multiple

    linear regression analysis.

    GAGANDEEP SINGH12PGP015

    ([email protected])

    VIJAY KRISHNAN G - 12PGP016

    ([email protected])

    MANOJ H - 12PGP026

    ([email protected])

    NIKESH AGARWAL - 12PGP030

    ([email protected])

    SOURAV MONDAL - 12PGP042

    ([email protected])

  • 7/30/2019 Quantitative Analysis Report

    3/28

    ii

    ACKNOWLEDGEMENT

    We would like to sincerely thank Prof. Naval Bajpai, Indian Institute of Management

    Raipur for his valuable guidance in this project right from the conception till the completion of

    the same.

    We would also like to thank our beloved Prof. B.S. Sahay, Director of Indian Institute of

    Management Raipur, for rendering his support during the entire project period.

    We also thank all the anonymous referees for their valuable comments on the report.

    Last but not the least; we thank our classmates for their encouragement and support.

    http://iimraipur.ac.in/pdf/nbajpai.pdfhttp://iimraipur.ac.in/pdf/nbajpai.pdf
  • 7/30/2019 Quantitative Analysis Report

    4/28

    iii

    TABLE OFCONTENTS

    ABSTRACT --------------------------------------------------------------------------------- I

    ACKNOWLEDGEMENT ------------------------------------------------------------------------------- II

    TABLE OF CONTENTS ----------------------------------------------------------------------------------- III

    LIST OF FIGURES ----------------------------------------------------------------------------------- VI

    LIST OF TABLES ----------------------------------------------------------------------------------- VI

    CHAPTER 1 INTRODUCTION --------------------------------------------- 1

    1.1 CRICKET ---------------------------------------------------------------------------------------------------------------------- 1

    1.2 STATISTICS IN CRICKET -------------------------------------------------------------------------------------------------- 1

    1.2.1 INDIVIDUAL STATISTICS ------------------------------------------------------------------------------------------- 1

    1.2.2 TEAM STATISTICS --------------------------------------------------------------------------------------------------- 2

    1.3 APPLICATION OF TOOLS ------------------------------------------------------------------------------------------------ 2

    1.3.1 PIE CHART ------------------------------------------------------------------------------------------------------------ 2

    1.3.2 WAGON-WHEEL ---------------------------------------------------------------------------------------------------- 2

    1.3.3 WORM GRAPH ------------------------------------------------------------------------------------------------------ 2

    1.3.4 MANHATTAN CHART ---------------------------------------------------------------------------------------------- 2

    1.4 OBJECTIVE OF THE PROJECT ------------------------------------------------------------------------------------------- 3

    1.5 STATISTICAL TOOLS EMPLOYED --------------------------------------------------------------------------------------- 3

    1.5.1 CHARTS AND GRAPHS --------------------------------------------------------------------------------------------- 3

    1.5.2 SINGLE POPULATION ESTIMATION ---------------------------------------------------------------------------- 3

    1.5.3 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------- 3

    1.5.4 SIMPLE LINEAR REGRESSION ------------------------------------------------------------------------------------ 4

    1.5.5 MULTIPLE LINEAR REGRESSION -------------------------------------------------------------------------------- 4

    CHAPTER 2 LITERATURE REVIEW ------------------------------------- 5

    CHAPTER 3 RESEARCH METHODOLOGY ------------------------------ 7

    3.1 WINNING PERCENTAGE USING PIE CHART ------------------------------------------------------------------------ 7

    3.1.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 7

  • 7/30/2019 Quantitative Analysis Report

    5/28

    iv

    3.1.2 POPULATION -------------------------------------------------------------------------------------------------------- 7

    3.1.3 PIE CHART ------------------------------------------------------------------------------------------------------------ 7

    3.1.4 INFERENCES --------------------------------------------------------------------------------------------------------- 7

    3.2 CAPTAINCY RECORD CALCULATION USING BAR CHART-------------------------------------------------------- 8

    3.2.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 83.2.2 POPULATION -------------------------------------------------------------------------------------------------------- 8

    3.2.3 INFERENCES --------------------------------------------------------------------------------------------------------- 8

    3.3 ACHIEVABLE SCORE AT THE END OF 50 OVERS ------------------------------------------------------------------- 9

    3.3.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9

    3.3.2 TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9

    3.4 DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES --------------------------- 9

    3.4.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9

    3.4.2

    TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9

    3.5 VALUATION OF PLAYERS IN IPL --------------------------------------------------------------------------------------- 9

    3.5.1 REGRESSION --------------------------------------------------------------------------------------------------------- 9

    CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION 10

    4.1 ESTIMATION OF SINGLE POPULATION ----------------------------------------------------------------------------- 10

    4.1.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 10

    4.1.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 10

    4.1.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 104.1.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 10

    4.1.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 10

    4.1.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 10

    4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 10

    4.2 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------------ 11

    4.2.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 11

    4.2.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 11

    4.2.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 11

    4.2.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 11

    4.2.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 11

    4.2.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 12

    4.2.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 12

    4.3 REGRESSION ANALYSIS OF IPL VALUATION OF PLAYERS------------------------------------------------------ 12

    4.4 REGRESSION ANALYSIS ------------------------------------------------------------------------------------------------- 14

  • 7/30/2019 Quantitative Analysis Report

    6/28

    v

    4.4.1 AMOUNT VERSUS STRIKE RATE -------------------------------------------------------------------------------- 14

    4.4.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 14

    4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATE ------------------------------------------------------------------ 14

    4.4.4 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 15

    4.5 DESCRIPTION OF STATISTICS OF BATSMAN ---------------------------------------------------------------------- 164.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS ------------------------------------------------------------------ 16

    4.5.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 16

    4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGE ---------------------------------------------------- 17

    4.5.4 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 17

    CHAPTER 5 DISCUSSIONS ----------------------------------------------- 18

    5.1 BOWLERS ------------------------------------------------------------------------------------------------------------------- 18

    5.2 BATSMEN ------------------------------------------------------------------------------------------------------------------ 185.2.1 REASONS FOR NON-EXPLANATION --------------------------------------------------------------------------- 19

    CHAPTER 6 CONCLUSION ----------------------------------------------- 20

    6.1 LIMITATIONS -------------------------------------------------------------------------------------------------------------- 20

    6.2 FUTURE SCOPE------------------------------------------------------------------------------------------------------------ 20

    REFERENCES ----------------------------------------------------------------- 21

  • 7/30/2019 Quantitative Analysis Report

    7/28

    vi

    LIST OF FIGURES

    FIGURE 3.1 PIE CHART FOR WINNING PERCENTAGE 7

    FIGURE 4.1 RESIDUAL PLOTS FOR BOWLERS 15FIGURE 4.2 RESIDUAL PLOTS FOR AMOUNT 17

    LIST OF TABLES

    TABLE 3.1 POPULATION DATA 7TABLE 3.2 INDIA'S WINNING RECORD UNDER MS DHONI 8

    TABLE 3.3 MS DHONI'S CAPTAINCY RECORD 8

    TABLE 4.1DISTRIBUTION PLOT 11

    TABLE 4.2 DESCRIPTION OF VARIABLES 13

    TABLE 4.3 DESCRIPTION OF STATISTICS OF BOWLERS 14

    TABLE 4.4 BATSMAN STATISTICS 16

  • 7/30/2019 Quantitative Analysis Report

    8/28

    1

    CHAPTER 1 INTRODUCTION1.1CRICKETThe game of cricket has fascinated the minds of many statisticians simply because of the sheer

    amount and variety of statistics it generates. Individual statistics are recorded for each player

    during a match, and aggregated over a career for batting and bowling across formats. Team

    statistics are recorded and maintained separately for various teams in different formats of the

    cricket like Test matches, One Day Internationals, Twenty 20s, First-Class matches and List-A

    matches. The test matches are the international variant of the First Class matches and hence the

    corresponding statistics will be included in the first class statistics of an individual/team.

    Similarly, the One Day Internationals are a variant of the List-A matches and hence the

    corresponding statistics will be included in the List-A statistics of an individual/team.

    1.2STATISTICS IN CRICKETThe applications of statistics in cricket are very diverse, ranging from analysis of the

    team/players performance in a particular match/over a period of time, to a comprehensive study

    of the evolution of the various aspects of the game. For example, with the help of the games

    statistics, one can predict the impact of a particular player on the outcome, and that would serve

    as the performance indicator of the player, taken over a period of time. Based on the analysis of

    general statistics across the different formats of cricket, venue-based and team-based statistics

    could be arrived at, which upon performing an in-depth analysis tend to reveal a lot of clues on

    how the game has evolved over the years.

    1.2.1 INDIVIDUAL STATISTICSThey are generally calculated for each individual player either for a certain set of matches or

    aggregated over his career.

    o Matches Played

    o Runs Scored

    o Highest Score

    o Batting/Bowling Averages

    o Centuries, Strike Rate

    o Maiden Overs

    o Economy Rate

    o Best Bowling

    o Wickets

    o Partnerships

    o Catches &Stumping

    o Captaincy Statistics

  • 7/30/2019 Quantitative Analysis Report

    9/28

    2

    1.2.2 TEAM STATISTICSThey are generally calculated for the whole team taken together, considering all the individual

    players statistics into account.

    o Match Results

    o Result Marginso Series Results

    o Innings Totals

    o Match Scores

    o Run Rate

    o Extras etc.

    1.3APPLICATION OF TOOLSOf late, the impact of television coverage on the sport has been profound, and it has provided a

    huge impetus to develop interesting forms of statistical representation to the viewers. The

    television networks are thus engaged in pioneering the cause of several new innovative ways of

    presenting cricket statistics. Some of the most widely used new forms of statistical representation

    include:

    1.3.1 PIE CHARTThe Pie charts are one of the most widely used methods in representing cricket statistics, and it is

    a circular chart which is subdivided into many sectors. The size of each of the sector is

    dependent on the proportion of the total quantity it represents. For example, the extras can be

    presented as a pie-chart with the different sectors representing the Leg-byes, No Balls, and

    Wides etc.

    1.3.2 WAGON-WHEELIt displays a 2D or 3D plot of various shots or runs scored by a player/team upon a cricket fields

    overhead view.

    1.3.3 WORM GRAPHThis is used to represent the runs scored and wickets taken during an innings, plotted against the

    time or balls bowled during a match.

    1.3.4 MANHATTAN CHARTThis is used to represent the runs scored and wickets in each over during a match. It is a variant

    of the bar graph/histogram, and it is named as Manhattan Chart because of its similarity to the

    Manhattan skyline.

    With the help of various tools like the ones mentioned above, the purpose is to make the viewer

    understand clearly the impact of statistics on the game of cricket. Thereafter, many methods are

    devised by the cricket pundits to perform analysis of the statistics, and then to use statistical

    inferences to arrive at estimations and predictions about the game.

  • 7/30/2019 Quantitative Analysis Report

    10/28

    3

    1.4OBJECTIVE OF THE PROJECTThe main objective of this project is to illustrate the application of statistical inferences and

    regression analysis in cricket. A case is taken into account such that the situation is an India-

    Pakistan cricket match, and to perform a pre-match analysis, all the One Day Internationals

    which ended in a result between India and Pakistan so far are taken into account; the results are

    represented using a pie-chart and then proportion of results in each teams favor is interpreted.

    Since the data represented using the pie chart was taken from matches spread across a long

    duration of time, another type of statistic could be considered to perform the analysis. The wins,

    losses and other results achieved by Team India under the leadership of MS Dhoni are

    considered, and represented using the bar-chart, which could be used to understand the extremely

    high win-loss ratio of MS Dhoni, and hence, the head-to-head record advantage of Pakistan

    would not have a significant say in the outcome of the game.

    The prediction of the outcome of the game is done in two stages:

    a) In the pre-match analysis, prediction is done if there would be a difference in the impact

    of toss between the day and day-night matches, using 2-population Hypothesis testing.

    b) During the innings break, estimation of an achievable target score range for India is done

    with a confidence interval of ninety percent.

    Then, a regression analysis is carried out to determine if the pricing of the players in the IPL

    auction is explained fully by the various parametric statistics of the individual players or whether

    the pricing is influenced by other factors as well.

    1.5STATISTICAL TOOLS EMPLOYED1.5.1 CHARTS AND GRAPHSA chart is a graphical representation of data, in which the data is represented by symbols, such as

    bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart can represent tabular

    numeric data, functions or some kinds of qualitative structures. Charts are often used to ease

    understanding of large quantities of data and the relationships between parts of the data. Charts

    can usually be read more quickly than the raw data that they are produced from.

    1.5.2 SINGLE POPULATION ESTIMATIONThe Z statistic can be used in the calculation of prediction intervals. A prediction interval

    consisting of a lower endpoint designated and an upper endpoint designated, is an interval such

    that a future observation X will lie in the interval with high probability.

    1.5.3 HYPOTHESIS TESTING FOR TWO POPULATIONA statistical hypothesis test is a method of making decisions using data, whether from a

    controlled experiment or an observation study. In statistics, a result is called statistically

    significant if it is unlikely to have occurred by chance alone, according to a pre-determined

    threshold probability, the significance level.

  • 7/30/2019 Quantitative Analysis Report

    11/28

    4

    1.5.4 SIMPLE LINEAR REGRESSIONIn statistics, simple linear regression is the least squares estimator of a linear regression model

    with a single explanatory variable. In other words, simple linear regression fits a straight line

    through the set of n points in such a way that makes the sum of squared residuals of the model as

    small as possible.

    1.5.5 MULTIPLE LINEAR REGRESSIONMultiple linear regressions are when more than one explanatory variable is used to estimate the

    least squares.

  • 7/30/2019 Quantitative Analysis Report

    12/28

    5

    CHAPTER 2 LITERATURE REVIEWEstenson et al (1994), and Bennett and Flueck (1983) have studied the players compensation

    that is being done in the game of baseball. Results of auction have showed that salaries matched

    marginal revenue products and that the open auction showed the declining price anomaly found

    to exist in real-world auctions. Similarly, Dobson and Goddard (1998) and Kahn (1992)

    considered compensations made for players in football.

    Jones and Walsh, (1988) made similar studies in ice-hockey and concluded that skills are the

    principal determinant of salaries at all positions. Berri, (1999) answers the question of measuring

    the productivity of an individual participating in a team sport that links the player's statistics in

    the National Basketball Association (NBA) to team wins. An economic model is employed in the

    measurement of each player's marginal product. Such a study is useful in answering the question

    offered in the title, or a broader list of questions by both industry insiders and other interested

    observers.

    In cricket, there are a few studies which deal with the game of cricket. Barr and Kantor (2004)

    intended to determine the important skill set for a batsman in one-day cricket. The batting

    average statistic has been used to assess the worth of a batsman. However, in the one-day game,

    limits on the number of balls bowled have introduced a very important additional dimension to

    performance. Assessing batting performance in the one-day game requires the application of at

    least a two-dimensional measurement approach because of the time dimension imposed on

    limited over cricket. They had used a new graphical representation with Strike rate on one axis

    and the Probability of getting out on the other, akin to the risk-return framework used in portfolio

    analysis, to obtain useful, direct and comparative insights into batting performance, particularly

    in the context of the one-day game. However, we have not come across any study that links

    compensation to player attributes.

    Rosen (1974) based his model of product differentiation on the hypothesis that goods are valued

    for their utility generating attributes. According to him, while making a purchase decision,

    consumers evaluate product quality attributes, and pay the sum of implicit prices for each quality

    attribute, which is reflected in the observed market price. Hence, price of a product is nothing but

    the summation of the prices of all quality attributes.

    Shapiro (1983) presented a theoretical framework to examine the halo effect on prices.

    Developing an equilibrium price-quality schedule for high-quality products, under the

    assumption of competitive markets and imperfect information, he showed that reputation

    facilitates a price premium; hence, reputation building can be considered as an investment good.

    Weemaes and Riethmuller (2001) studied the role of quality attributes on preferences for fruit

    juices. The study involved market valuation of various attributes of fruit juice. It did not consider

    consumers preferences, but generated quality attributes from the product label. The study

  • 7/30/2019 Quantitative Analysis Report

    13/28

    6

    revealed that consumers paid a premium for nutrition, convenience, and information. In a similar

    study on tea, Deodhar and Intodia (2004) showed that colour and aroma were the two important

    attributes of a prepared tea.

    Extending the analogy to cricket, a cricket player is valued for his on-the-field (and perhaps, off-

    the-field) performance. We propose that a cricket player sells his cricketing skills for the IPLtournament. The franchisee team owners bid for the players services, for they would like to

    maximize their utility and player performance is an important argument of their utility function.

    In equilibrium, the final bid price of a player must be a function of the valuation of winning

    attributes of a player.

  • 7/30/2019 Quantitative Analysis Report

    14/28

    7

    CHAPTER 3 RESEARCH METHODOLOGY3.1WINNING PERCENTAGE USING PIE CHART3.1.1 OBJECTIVETo give a clear representation of the matches ending in a result between India and Pakistan in

    ODI matches played so far. We consider the entire matches played so far. We have a sample size

    of117 excluding four matches which have ended in no result

    3.1.2 POPULATIONTable 3.1 Population Data

    Total Matches Won by India Won by Pakistan117 48 69

    3.1.3 PIE CHARTFor the above data a pie-chart can be used best to represent the data.

    Figure 3.1 Pie Chart for Winning Percentage

    3.1.4 INFERENCESThe above pie-chart implies that among the total number of matches Pakistan won more matches

    with total win percentage of 59% and India won 41% of the total number of matches.

    India

    41%Pakistan

    59%

    Total Matches: 117

  • 7/30/2019 Quantitative Analysis Report

    15/28

    8

    3.2CAPTAINCY RECORD CALCULATION USING BAR CHART3.2.1 OBJECTIVEThe objective is to present the best way to represent the captaincy record of MS Dhoni. The

    number matches won or lost by India under the captaincy of Captain Mahendra Singh Dhoni is

    taken as the population and the graph is made for the same

    3.2.2 POPULATIONTable 3.2 India's Winning Record under MS Dhoni

    Total Matches Won Lost Tied No result

    117 80 32 2 3

    For the above data a bar-chart can be used as the best tool to represent the data.

    Table 3.3 MS Dhoni's Captaincy Record

    3.2.3 INFERENCESFrom the above bar chart we can see that under the captaincy of Mahendra Singh Dhoni India

    played a total of 117 matches among which India won 80 matches, lost 32 matches and tied 2 of

    them. For 3 of the matches there were no results.

    WON LOST TIED NO RESULT

    80

    32

    2 3

  • 7/30/2019 Quantitative Analysis Report

    16/28

    9

    3.3ACHIEVABLE SCORE AT THE END OF 50 OVERSIn the game of cricket, the team chasing can win when it exceeds the score scored by the

    opponent. For successful chasing of the total we need to have the team batting second score more

    than the team batting first. Thus, we intended to find the runs that Indian team could score while

    chasing a target against Pakistan.

    3.3.1 POPULATION AND SAMPLINGThe data of that particular team while chasing was considered as population

    3.3.2 TECHNIQUE EMPLOYEDEstimation of single population mean was applied to get the intended result. We were able to

    predict the mean with a confidence level of 90%.

    3.4DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES

    We intended to study the impact of toss between day and day- night matches played betweenIndia and Pakistan.

    3.4.1 POPULATION AND SAMPLINGWe used the data of matches played between India and Pakistan as the Population. From the

    population we applied the technique of random sampling and arrived at a sample size of 38 for

    both the populations of day and day-night matches.

    3.4.2 TECHNIQUE EMPLOYEDThe hypothesis testing for two populations was applied to study the differences between both the

    population means.

    3.5VALUATION OF PLAYERS IN IPLNext, our objective was to find the whether the valuation of players in IPL is matching their

    skills or are they over or under valued for their skill.

    3.5.1 REGRESSIONWe developed a regression model for finding the correlation between a players compensation

    against their skills. We choose a sample consisting of 7 batsmen and 7 bowlers and developed

    the regression.

  • 7/30/2019 Quantitative Analysis Report

    17/28

    10

    CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION4.1ESTIMATION OF SINGLE POPULATION4.1.1 SET NULL AND ALTERNATE HYPOTHESISIn this step, we are trying to predict whether India will be able to successfully chase the

    total of 245 runs in 50 overs. According to the data given we are estimating the single

    population mean at assumed standard deviation as 53

    4.1.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples is greater than 30(64), we take z-test for single sample

    population mean. We calculate the estimate value using the formula

    4.1.3 LEVEL OF SIGNIFICANCEAlpha = 0.10

    4.1.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null

    hypothesis will be rejected if the computed value of z is outside +1.645

    4.1.5 COLLECTION OF DATASample size (Runs): 64

    Standard Deviation: 52.71

    Mean of Sample: 243.68

    4.1.6 ANALYZE THE DATAZ -0.95

    P 0.340

    90% of CI (232.78, 254.58)

    SE Mean 6.63

    4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATIONWith the 90% confidence we can say that India will chase down the total of 245 in 50

    overs because the total score 245 comes in the range of (232.78, 254.58).

  • 7/30/2019 Quantitative Analysis Report

    18/28

    11

    Table 4.1Distribution Plot

    4.2HYPOTHESIS TESTING FOR TWO POPULATION4.2.1 SET NULL AND ALTERNATE HYPOTHESIS

    Null Hypothesis =(1 - 2)=0 (No significant difference in runs scored)

    Alternate Hypothesis =(1 - 2)0

    4.2.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples in both cases is greater than 30 and are independent and their

    population variance is unknown, we take z-test for two sample population mean.

    4.2.3 LEVEL OF SIGNIFICANCEAlpha = 0.10

    4.2.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null

    hypothesis will be rejected f the computed value of z is outside +1.645

    4.2.5 COLLECTION OF DATASample size 1: 38

    Sample size 2: 38

    Variance of sample 1: 2370.775

    Variance of sample 2: 3119.37

    Mean of sample 1: 7.539473684

    Mean of sample 2: 5.039473684

  • 7/30/2019 Quantitative Analysis Report

    19/28

    12

    4.2.6 ANALYZE THE DATAZ 0.207988776

    P (Z

  • 7/30/2019 Quantitative Analysis Report

    20/28

    13

    The data sources include the official website of IPL and two other websites, Cricinfo and

    Wikipedia. For the sake of convenience we have considered only 8 Indian players in each

    category i.e. Bowlers and Batsman. While we have considered final bidding price as the

    dependent variable, there is a wealth of data available on the cricketing attributes of IPL players

    hypothesized above. We have data relating to the individual performances of these 16 players

    spanning across all the IPLs taken place till date

    1) Batsman: For the multiple regression analysis we have taken 2 important independent

    variables which are the prime determinant of the performances of the players in the long

    run. The two variables are the Total runs scored and the Batting averages.

    2) Bowlers: For the multiple regression analysis of bowlers also we have taken 2 important

    independent variables which are the prime determinant of the performances of the players

    in the long run. These are the wickets taken and strike rate.

    The relevant variables are drawn from observations on skills that are considered important for

    Twenty20 form of the game. For example, in this shorter version of the game, no one is likely tomake centuries frequently. However, a player contributing many runs on a continuous basis and

    having high batting average would be an asset for the team. While IPL is a Batsmans game, a

    wicket taking bowler could put a lot of pressure on the opposition, and hence, he would be

    considered quite useful.

    To paraphrase the estimated variable coefficients should be having the right signs and are

    statistically significant, the equation has a reasonably high (adjusted) R-square and maintains

    parsimony, and there are sufficient degrees of freedom. Based on such guidelines, the variables

    chosen for estimating equation (1) and their description is reported in Table. It has been taken

    into consideration as to which combination offered the best goodness of fit in terms ofR-square,adjusted R-square, correct signs of the coefficients, t-statistics, and F-statistics. The exact

    specification of the regression is given below in Equation (2).

    P (BATSMAN)= b0 + b1(RUNS)+ b2(AVERAGE)

    P (Bowlers) = b0 + b1(wickets) + b2(strike rate)

    Table 4.2 Description of Variables

    Variable Description

    P Final bid price of a player.

    Runs Total runs scored over a span of 5 IPL .

    Average Average runs scored in the same period.

    Wickets Total number of wickets taken by a bowler in 5 IPLs.

    Strike Rate Strike rate i.e. balls per wicket.

  • 7/30/2019 Quantitative Analysis Report

    21/28

    14

    Table 4.3 Description of statistics of bowlersName of the bowler Wickets Strike rate Amount(in US Dollars)

    Harbhajan Singh 54 18.88 1300000Ishant Sharma 36 29.33 450000

    Munaf Patel 70 21.22 700000

    Pragyan Ojha 69 25.63 500000

    Praveen Kumar 53 22.35 800000

    R. Ashwin 49 20.66 850000

    R.P. Singh 74 19.78 500000

    Zaheer Khan 65 19.22 900000

    4.4REGRESSION ANALYSIS

    4.4.1 AMOUNT VERSUS STRIKE RATEThe regression equation is

    Amount = 1899656 - 51941 Strike Rate

    Predictor Coef SE-Coef T P

    Constant 1899656 529535 3.59 0.012

    Strike Rate -51941 23649 -2.20 0.070

    S = 226442 R-Sq = 44.6% R-Sq(adj) = 35.3%

    4.4.2 ANALYSIS OF VARIANCESource DF SS MS F P

    Regression 1 2.47345E+11 2.47345E+11 4.82 0.070

    Residual Error 6 3.07655E+11 51275908679

    Total 7 5.55000E+11

    Durbin-Watson statistic = 1.42023

    dl=0.76 du=1.33 4-du=2.67 4-dl=3.24

    Hence there is is no autocorrelation.

    4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATEThe regression equation is

    Amount = 3190691 - 13138 Wickets - 75398 Strike Rate

  • 7/30/2019 Quantitative Analysis Report

    22/28

    15

    Predictor Coef SE Coef T P

    Constant 3190691 716164 4.46 0.007

    Wickets -13138 5955 -2.21 0.078

    Strike Rate -75398 21286 -3.54 0.017

    S = 176571 R-Sq = 71.9% R-Sq(adj) = 60.7%

    4.4.4 ANALYSIS OF VARIANCESource DF SS MS F P

    Regression 2 3.99114E+11 1.99557E+11 6.40 0.042

    Residual Error 5 1.55886E+11 31177190382

    Total 7 5.55000E+11

    Source DF SeqSS

    Wickets 1 7940674349

    Strike Rate 1 3.91173E+11

    Durbin-Watson statistic = 1.39505

    Hence there is no autocorrelation.

    4000002000000-200000-400000

    99

    90

    50

    10

    1

    Residual

    Percent

    1000000800000600000400000

    200000

    100000

    0

    -100000

    -200000

    Fitted Value

    Residual

    2000001000000-100000-200000

    3

    2

    1

    0

    Residual

    Frequency

    87654321

    200000

    100000

    0

    -100000

    -200000

    Observation Order

    Residual

    Normal Probability Plot Versus Fits

    Histogram Versus Order

    Residual Plots for Amount

    Figure 4.1Residual Plots for Bowlers

  • 7/30/2019 Quantitative Analysis Report

    23/28

    16

    4.5DESCRIPTION OF STATISTICS OF BATSMAN4.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS

    Table 4.4 Batsman Statistics

    Name of the Batsman Runs Average Amount(in US Dollars)

    M.S.Dhoni 1782 37.12 1300000

    S.K.Raina 2254 33.64 1800000

    V.Sehwag 1879 30.3 1800000

    SR Tendulkar 2047 37.9 2400000

    V.Kohli 1639 28.25 500000

    R.G.Sharma 1975 31.35 2000000

    G.Gambhir 2065 33.31 1800000

    R.Dravid 1703 27.91 1800000

    The regression equation is

    Amount in US Dollars) = - 1663273 + 1740 Runs

    Predictor Coef SE Coef T P

    Constant -1663273 1649219 -1.01 0.352

    Runs 1740.5 855.5 2.03 0.088

    S = 467406 R-Sq = 40.8% R-Sq(adj) = 31.0%

    4.5.2 ANALYSIS OF VARIANCESource DF SS MS F PRegression 1 9.04188E+11 9.04188E+11 4.14 0.088

    Residual Error 6 1.31081E+12 2.18469E+11

    Total 7 2.21500E+12

    Durbin-Watson statistic = 2.59499

    Thus there is no autocorrelation

  • 7/30/2019 Quantitative Analysis Report

    24/28

    17

    4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGEThe regression equation is

    Amount (in US Dollars) = - 1946333 + 1562 Runs + 19235 Average

    Predictor Coef SE Coef T P

    Constant -1946333 1992016 -0.98 0.373

    Runs 1562 1080 1.45 0.207

    Average 19235 59656 0.32 0.760

    S = 506777 R-Sq = 42.0% R-Sq(adj) = 18.8%

    4.5.4 ANALYSIS OF VARIANCESource DF SS MS F P

    Regression 2 9.30888E+11 4.65444E+11 1.81 0.256

    Residual Error 5 1.28411E+12 2.56822E+11

    Total 7 2.21500E+12

    Source DF Seq SS

    Runs 1 9.04188E+11

    Average 1 26699560534

    Durbin-Watson statistic = 2.39651

    Thus there is no autocorrelation.

    10000005000000-500000-1000000

    99

    90

    50

    10

    1

    Residual

    Percent

    200000015000001000000

    500000

    250000

    0

    -250000

    -500000

    Fitted Value

    Residual

    5000002500000-250000-500000-750000

    2.0

    1.5

    1.0

    0.5

    0.0

    Residual

    Frequency

    87654321

    500000

    250000

    0

    -250000

    -500000

    Observation Order

    Residua

    l

    Normal Probability Plot Versus Fits

    Histogram Versus Order

    Residual Plots for Amount(in US Dollars)

    Figure 4.2Residual Plots for Amount

  • 7/30/2019 Quantitative Analysis Report

    25/28

    18

    CHAPTER 5 DISCUSSIONS5.1BOWLERSWe can clearly see that coefficient of determination is very low i.e.44.6% for strike rate as an

    individual factor i.e. Simple Linear regression. This indicates the low level of correlation

    between strike rate of a bowler and the amount paid to him in IPL. In other words only around

    44.6% of the change in amount is determined or explained by strike rate of the bowler. The rest

    of the change is unexplained. Similarly coefficient of determination in multiple regression model

    has been determined as 71.9% which too is low. Hence it can be safely concluded that the

    performance factors are not at the helm for determination of the bid price of the bowlers which is

    rather determined by various other factors which have been discussed later on in the below

    mentioned analysis of the regression output.The corresponding p-value has been obtained as

    0.042 which lies in the rejection region.

    H0: Key performance indicators (wickets and strike rate) are the key determinant of the

    amount paid to bowlers in IPL.

    H1: Other factors act as the key determinants of the amount paid to the bowlers.

    Since the null hypothesis has been rejected and the alternative hypothesis has been selected the

    key conclusion that can be derived from the above exercise is that there are a variety of other

    reasons responsible for the insuperably high amount of money paid to bowlers.

    5.2BATSMENIt is clearly evident that the coefficient of determination for runs is quite low at 40.8% which

    indicates runs do not play a major role in the fixing of the disbursements of the cricketers.This

    indicates the low level of correlation between runs scored by a batsman and the amount paid to

    him in IPL.It implies that only 40.8% of the change in amount is determined or explained by

    runs scored by the batsman in the T-20 format. The rest of the change is unexplained.Similarly

    coefficient of determination in multiple regression model has been determined as 42% which too

    is really low. Hence it can be safely concluded that the performance factors are not the key

    factors to be considered as majority of the part is dependent upon various other factors. The

    corresponding p-value has been obtained as 0.256 which lies in the rejection region.

    H0: Key performance indicators (runs and average) are the key determinant of the amount paid

    to batsman in IPL.

    H1: Other factors act as the key determinants of the amount paid to the batsman.

  • 7/30/2019 Quantitative Analysis Report

    26/28

    19

    Thus null hypothesis has been rejected driving home the point that there are various other factors

    in operation which may be responsible for the amount of money being so high.

    These high premiums, over and above thecompensation for their cricketing attributes, seem to

    bea reflection of their ability to draw huge crowds nationallydue to their charismatic association

    with film stars, the racial controversies surrounding them etc.

    5.2.1 REASONS FOR NON-EXPLANATIONSome of the reasons which may account for non-explanation of the relation might be as follows:

    1. Iconic Value.

    2. Glamour.

    3. Controversy.

    4. Age.

    5. Popularity.

  • 7/30/2019 Quantitative Analysis Report

    27/28

    20

    CHAPTER 6 CONCLUSION6.1LIMITATIONSThe limitations of our study are:

    1. The usage of the pie chart and the bar chart to represent the statistics for earlier India-Pakistan matches was appropriate, but when predictions are made with the help of those

    representative forms with respect to the current match, it is not exactly possible because

    of the inherent unpredictability in the game of cricket.

    2. While estimating the achievable target score with a ninety percent confidence interval

    range, we take into account only the matches played already between the two teams,without considering other factors like the difference in the set of players between those

    games and the current match, the form in which the individual players are currently in,

    the nature of the pitch, weather conditions etc. This might result in incorrect range

    estimation.

    3. In the determination of the difference in the impact of toss, we calculate the net run-ratedifference between the teams batting first and second, and arrive at two populations, oneeach for the Day and Day-night matches. But in this case, the net run rate difference is

    calculated across the maximum overs for all the matches, and the event of teams chasing

    down targets easily without losing wickets is not explained through our population.

    4. During the process of developing a regression model for determining the pricing of an

    IPL player based on his statistical attributes, there are many intangible attributes of anindividual player. For example, a players brand value, image, relevance to the franchise

    is all taken into account while determining his price. But, these aspects are completely

    ignored in our study while determining the regression model. This probably explains the

    low correlation between the independent variables and the pricing of the player.

    6.2FUTURE SCOPESingle population estimation could be used to estimate the likely scores of people with

    confidence based on their previous performances. This could help the teams in formulating the

    strategies against he opponent.

    Regression model could be applied in to fix a players compensation based on his skill set. This

    could help the team franchise to fix a ceiling price on each player before going in for auction.This could help them spend the money accordingly and thus could achieve maximum return on

    money.

  • 7/30/2019 Quantitative Analysis Report

    28/28

    REFERENCES

    Armstrong, J and Willis, R J (1993). Scheduling the Cricket World Cup: A Case Study, The

    Journal of the Operational Research Society, 44(11), 1067-1072.

    Barr, G D I and Kantor, B S (2004). A Criterion for Comparing and Selecting Batsmen in

    Limited Overs Cricket, Journal of the Operational Research Society, 55(12), 1266-

    1274.

    Bennett, J M and Flueck, J A (1983). An Evaluation of Major League Baseball Offensive

    Performance Models, The American Statistician, 37(1), 76-82.

    Berri, D J (1999). Who Is Most Valuable? Measuring the Players Production of Wins in the

    National Basketball Association,Managerial and Decision Economics, 20(8), 411-427.

    Cricinfo. http://www.cricinfo.com/, as on September 13, 2012.

    Estenson, P S (1994). Salary Determination in Major League Baseball: A Classroom Exercise,

    Managerial and Decision Economics, 15(5), 537-541.

    Jones, J C H and Walsh, W D (1988). Salary Determination in the National Hockey League:

    The Effects of Skills, Franchise Characteristics, and Discrimination, Industrial and

    Labor Relations Review, 41(4), 592-604.

    Rastogi, S. K. (APRIL - JUNE 2009). "Player Pricing and Valuation of Cricketing

    Attributes:Exploring the IPL Twenty20 Vision". Vikalpa, Volume 34, 15-23.