astha sharmapgcmrda229 project_report
TRANSCRIPT
PROJECT REPORT
To determine the reason for lower Voters Turnout among the Urban Population, in
India
By:
Astha Sharma
Batch: 2013-14
Enrolment No: PGCMRDA229
1 Reason for lower voter turnout amidst Urban Population, in India
INTRODUCTION
The purpose of this project is to identify the key reasons due to which the majority of Urban
Population does not exercise the right to franchise. The research is to try to understand what
would drive urban population to vote and propose solution(s) to the outcome.
In an ideal scenario I would like to share the outcome of the project with the election
commission or organizations which would be benefited by such information.
2 Reason for lower voter turnout amidst Urban Population, in India
ACKNOWLEDGEMENT
I would like to thank Prof. Vina Vani for being my guide throughout the project. Thank you for showing
immense support and guidance towards my work. She gave a lot of chance to self-develop throughout the
project & corrected me every now and then. She has tremendous knowledge on Statistics & Data
Analysis concepts.
I would like to thank Prof. Amit & Rohit, who has been of great help teaching concepts & statistically
technics in the class. Thank you for being so kind in accommodating my doubts in your busy schedule.
A heartfelt gratitude to all my respondents who spared the time amidst their other commitments and filled
the questioner forms. Thank you so much for your valuable inputs and suggestions without which this
study would not have been able to see the light of the day.
Also a big thank you to Mr. Ashish Roy, Manager – Mu Sigma, without his support and practical
experience on Analytics it would not have been possible for the analysis to be so methodical.
3 Reason for lower voter turnout amidst Urban Population, in India
TOPIC PAGE
I. ABSTRACT 4
II. BACKGROUND OF RESEARCH PROBLEM 5
2.1 INTRODUCTION 5
2.2 LITERATURE 7
2.3 REASONS FOR VOTING 7
2.4 11 REASONS WHY PEOPLE DON’T VOTE 8
2.5 ASSUMPTIONS 9
2.6 LIMITATIONS 9
III: EXPECTED CONTRIBUTION 10
3.1 RESEARCH QUESTION 11
3.2 HYPOSTHESISES 11
IV: SAMPLE, SCALES USED & INSTRUMENT OF DATA COLLECTION 12
4.1 SAMPLE 12
4.2 SCALE 12
4.3 INSTRUMENT OF DATA COLLECTION 13
V: RESEARCH DESIGN 15
VI: EXPLORATORY DATA ANALYSIS 15
6.1. SUMMARY 15
6.2 DESCRIPTIVE ANALYSIS 16
6.3 CHI-SQUARE TEST 17
6.3.1 SUMMARY: CHI-SQUARE 28
VII: FACTOR ANALYSIS 29
VIII: ONE WAY ANOVA 33
IX: MODEL 1 35
9.1 BINARY LOGISTIC REGRESSION 35
9.2 INTERPRETING COEFFICIENTS 37
9.3 ROC CURVE 38
9.4 CONCLUSION 41
X: MODEL 2 41
10.1 DISCRIMINANT ANALYSIS 42
10.2 MULTINOMIAL LOGISTICS REGRESSION 48
XI: CLASSIFICATION TREE 52
XII: SUMMARY AND CONCLUSION 54
XIII: REFERENCE SECTION 57
TABLE OF CONTENTS
4 Reason for lower voter turnout amidst Urban Population, in India
PART 1: ABSTRACT
The report is aimed at understanding the reason for lower voter’s turnout among urban
population, in India.
An online survey questionnaire was designed and circulated, with the help of social media and
word of mouth, amidst the targeted population. The questionnaire was aimed at understanding
the psychological behaviour of people who voted vs the people who did not vote, in last
elections. The questionnaire also captured the demographic profile of respondents which helped
in analysing the behaviour pattern across population.
373 valid responses were obtained. Data was skewed to Bangalore population but division of
data was done on the basis of North and South India.
Analysis of data was done using various Statistical tools like Factor Analysis, Binary Logistic,
Discriminant Analysis, and Multinomial Logistic Regression depending on the hypothesis and
nature of data.
The first model was built to find out factors affecting voting decisions. The accuracy of first
model was 91.2% and it successfully tested various hypothesises.
The second model was built to understand the various factors which differentiate people who
did not vote in the last elections (196 respondents). They were divided into 3 groups; people who
do not have a voters ID, people who have ID and people who have ID but of another state.
Majority of people who did not vote in the last election had ID card of other state and hence
could not have voted. There are factors which affect them and hence impact their decision to
vote. The same were analysed in the second model. The model had an accuracy of 76.5%. The
lower sample size can be one of the factors for lower accuracy.
The result throws light on the fact that people who are motivated to vote generally believe in the
system and think a change in leadership would improve their living standards. On the other
hand, people who did not vote look forward to comfortable means like registration of ID card to
happen in corporates and voting to happen in the business park.
Further, the second model also highlighted the fact that people who do not belong to the state
they are residing in, look forward to voting to happen in more familiar surroundings like
Business parks.
5 Reason for lower voter turnout amidst Urban Population, in India
Overall research gave results which could be reasoned with real life examples.
A further study, with larger and more varied sample can validate the accuracy of the model and
might throw more light related to psychological and demographic profiles of the people.
PART II: BACKGROUND OF RESEARCH PROBLEM
2.1 INTRODUCTION
In the 2009 general election, the Indian electorate was estimated to total approximately 714
million individuals, out of whom around 415 million (58.12%) actually cast a vote.
In India, Voters turnout among urban population is lower than the voters’ turnout percentage of
overall country. The statistics shows that percentage of voters turning in metros is less than that
of the overall state. The below table shows the data of the last Lok Sabha elections (2009):
STATE METRO
VOTERS TURNOUT
(2009)
VOTERS TURNOUT
(2014)
CITY STATE CITY STATE
KARNATAKA Bangalore Central 44.60% 58.80% 55.70% 67.28%
Bangalore North 46.70% 56.46%
Bangalore Rural 57.90% 68.00%
Bangalore South 44.74% 55.69%
DELHI Chandni Chowk 55.20% 51.81% 67.54% 65.09%
East Delhi 53.40% 65.35%
New Delhi 55.70% 65.00%
North East Delhi 52.40% 67.12%
North West Delhi 47.70% 61.66%
South Delhi 47.40% 62.98%
West Delhi 52.40% 66.03%
MAHARASHTRA Mumbai North 42.60% 50.50% 52.00% 61.70%
Mumbai North Central 39.50% 55.00%
Mumbai North East 42.50% 53.00%
Mumbai North West 44.10% 60.00%
Mumbai South 40.40% 54.00%
Mumbai South Central 39.50% 55.00%
Pune 40.70% 58.75%
6 Reason for lower voter turnout amidst Urban Population, in India
WEST BENGAL Kolkata Uttar 64.30% 81.00% 60.07% 81.35%
Kolkata Dakshin 67.00% 65.90%
TAMIL NADU Chennai Central 61.00% 73.10% 60.90% 73.00%
Chennai North 64.90% 64.63%
Chennai South 62.70% 57.86%
ORISSA Bhubaneshwar 49.10% 65.30% 40.00% 70.00%
Source: www.indiavotes.com
In 2009 polls, people of an elite city like Bhubaneswar disappointed with worst voter turnout in
all three assembly segments - Bhubaneswar (Madhya), Bhubaneswar (Uttar) and Bhubnaeswar
(Ekamra). In all these segments, less than 40% voter turnout was recorded.
Urban Population, in India, have maximum exposure to information in various forms. They are
most informed about the condition of economy and how governments are faring across. They
have access to data to make a rational decision. In spite of all, the majority of Urban Indians do
not vote.
They also contribute the most honest amount to taxes; they are major target customers for any
bank/investment firms. They are also the key customers for any brand to establish itself in the
market. They have the maximum disposable income.
In case of government does not provide them with good facilities, they buy comfort. The urban
population have invertors/generators for power back up; they buy water by paying hefty amount
for water tankers. They buy good homes; they can buy vegetables even when the price goes up.
They can talk about politics in length; they know who the right candidate is but however,
majority of them, do not step out to vote.
Voters Turnout: Historical Data (for reference)
Year Voter
Turnout
Total vote Registration VAP
Turnout
Voting age
Population
Population
2014 66.38% NA NA NA 81,45,00,000 1,23,70,00,000
2009 58.17% 41,70,37,606 71,69,85,101 56.45% 73,87,73,666 1,15,68,97,766
2004 58.07% 38,99,48,330 67,14,87,930 60.91% 64,01,82,791 1,04,97,00,118
1999 59.99% 37,16,69,104 61,95,36,847 65.69% 56,57,80,483 98,68,56,301
1998 61.97% 37,54,41,739 60,58,80,192 67.45% 55,66,51,400 97,09,33,000
1996 57.94% 34,33,08,035 59,25,72,288 61.08% 56,20,28,100 95,25,90,000
1991 56.73% 28,27,00,942 49,83,63,801 57.23% 49,39,63,380 85,16,61,000
7 Reason for lower voter turnout amidst Urban Population, in India
1989 61.98% 30,90,50,495 49,86,47,786 65.18% 47,41,43,040 81,74,88,000
1984 63.56% 24,12,46,887 37,95,40,608 64.61% 37,33,71,000 74,67,42,000
1980 56.92% 20,27,52,893 35,62,05,329 62.35% 32,51,62,040 66,35,96,000
1977 60.49% 19,42,63,915 32,11,74,327 64.67% 30,03,92,640 62,58,18,000
1971 55.25% 15,12,96,749 27,38,32,301 57.22% 26,43,93,600 55,08,20,000
1967 61.04% 15,27,24,611 25,02,07,401 63.11% 24,19,96,800 50,41,60,000
1962 55.42% 11,99,04,284 21,63,61,569 54.42% 22,03,24,090 44,96,41,000
1957 62.23% 12,05,13,915 19,36,52,179 61.15% 19,70,90,250 40,22,25,000
1952 61.17% 10,59,50,083 17,32,12,343 58.92% 17,98,30,000 36,70,00,000
Source: http://www.idea.int/vt/countryview.cfm?CountryCode=IN
2.2 LITERATURE
There is no survey/research done in India to figure out the reason for people not to vote.
Voter turnout is the percentage of eligible voters who cast a ballot in an election. (Who is
eligible varies by country, and should not be confused with the total adult population. For
example, some countries discriminate based on sex, race, and/or religion. Age and citizenship
are usually among the criteria.) After increasing for many decades, there has been a trend of
decreasing voter turnout in most established democracies since the 1960s. In general, low turnout
may be due to disenchantment, indifference, or contentment. Low turnout is often considered to
be undesirable, and there is much debate over the factors that affect turnout and how to increase
it. In spite of significant study into the issue, scholars are divided on reasons for the decline. Its
cause has been attributed to a wide array of economic, demographic, cultural, technological, and
institutional factors. There have been many efforts to increase turnout and encourage voting.
Source: http://en.wikipedia.org/wiki/Voter_turnout
2.3 REASONS FOR VOTING
The basic formula for determining whether someone will vote, on the questionable assumption
that people act completely rationally, is
Where
8 Reason for lower voter turnout amidst Urban Population, in India
P is the probability that an individual's vote will affect the outcome of an election,
B is the perceived benefit that would be received if that person's favoured political party
or candidate were elected,
D originally stood for democracy or civic duty, but today represents any social or
personal gratification an individual gets from voting, and
C is the time, effort, and financial cost involved in voting.
Since P is virtually zero in most elections, PB is also near zero, and D is thus the most important
element in motivating people to vote. For a person to vote, these factors must outweigh C.
Riker and Ordeshook developed the modern understanding of D. They listed five major forms of
gratification that people receive for voting:
o Complying with the social obligation to vote;
o Affirming one's allegiance to the political system;
o Affirming a partisan preference (also known as expressive voting, or voting for a
candidate to express support, not to achieve any outcome);
o Affirming one's importance to the political system; and
o For those who find politics interesting and entertaining, researching and making a
decision.
Other political scientists have since added other motivators and questioned some of Riker and
Ordeshook's assumptions. All of these concepts are inherently inaccurate, making it difficult to
discover exactly why people choose to vote.
Recently, several scholars have considered the possibility that B includes not only a personal
interest in the outcome, but also a concern for the welfare of others in the society (or at least
other members of one's favourite group or party.
In particular, experiments in which subject altruism was measured, using a dictator game,
showed that concern for the well-being of others is a major factor in predicting turnout and
political participation. Note that this motivation is distinct from D, because voters must think
others benefit from the outcome of the election, not their act of voting in and of itself.
Source: http://en.wikipedia.org/wiki/Voter_turnout
9 Reason for lower voter turnout amidst Urban Population, in India
2.4 11 REASONS WHY PEOPLE DON’T VOTE
1. Many people think their vote does not count.
2. Many people have the excuse that they are too busy to vote.
3. Voting registration is a process that people can fear or feel intimidated by.
4. Apathy is probably the most common reason for not voting.
5. Some people say they do not vote, because the 'lines are too long'.
6. Some people say they do not vote because they do not like the two candidates that are on
the news every night.
7. Some people say that they cannot get to the polling place to vote.
8. If a person is traveling, it may be an excuse for not voting.
9. Some people do not vote for a third party, because they are told it will 'spoil' the vote for
one of the big two politicians.
10. Some people say that voting does not matter, because their ONE vote will not swing the
result one way or the other.
11. Some people believe all political candidates are bought off by corporations, so why
bother voting, because the votes have already been bought and sold.
Source: www.agreenroad.blogspot.in
2.5 ASSUMPTIONS
Since the target population is Urban population of India, it is assumed that there will not
be much variation in the attitude of people towards voting
Data for cities like Mumbai and Pune is being combined since it is assumed that people
will have the same behavioural pattern due to similar demographics.
Similarly data from all the cities from a particular state is combined under the assumption
that the voting decisions will not vary, within the state.
2.6 LIMITATIONS
The sample size of 373 is too small considering the base population being very large.
Hence outcome can vary with larger sample size
The data is skewed to Bangalore population and hence may or may not be indicator of
other metros
10 Reason for lower voter turnout amidst Urban Population, in India
Since the sample size is small the variable Annual Income is divided into 2 parts. Income
between 2-10 lacs and >10 lacs.
The sample size of people who have not voted in last election is 196 and out of which
only 33 do not have a registered voter’s ID card. The sample is too small for analysis.
Hence there are chances for results to be different with a larger sample size.
PART III: EXPECTED CONTRIBUTION
Indian Urban Population comprises of people who maintain a comfortable standard of living.
The kind of issues which majority of the country faces is little felt in the lives of urban
population. It is not because they are given additional facilities but it is that they pay a premium
and buy the facilities.
Changing government adds little value to their day today lives.
This research is to point out the issue that when government wants something to be done on a
mandatory basis, it makes sure that all the facilities are provided to assist people in completing
the directive. For example; ever since Adhar cards are made mandatory government facilitates
Adhar card camps in various corporates. Similarly there are agents who assist making of PAN
cards, assist filing of IT returns. But there is NO such facility provided for registration of Voter’s
ID.
The concern is when it is important for Government, things are organised and promoted through
various channels and is made sure that it is completed. But then why is it not important for
Government to encourage people to vote. Especially the urban population who is more
informed and can take good decisions in the larger benefit of the country.
If people are not willing to take extra effort to go and register and stand in the long queues of
Voter’s ID card then why do government not encourage the registration of voters through
channels like Corporates?
The contribution I would like to make through this research is to point out reasons for people not
to vote and bring out the point that certain segment of the society likes to be treated differently.
When they pay higher taxes, higher cost of living then why is their vote not made to feel
important.
11 Reason for lower voter turnout amidst Urban Population, in India
Registration of Voter’s ID can be driven through corporates and can be promoted as a one of the
activities of Corporate Social Responsibility. Also, if polling booths are organized in the
Business Parks then it is much easier to track voting percentage since all have unique identity
cards. The existing technology can be used and corporates can track voters. Further, there are a
lot of chances that people would feel encouraged to vote when they are allowed to walk to the
polling booths along with their friends and colleagues in the familiar surroundings.
All of this is possible only if government takes a pledge to make sure everybody votes. This
research would help in quantifying the apprehensions the Urban Population has towards the
system of voting and if used effectively a solution to the problem can be arrived.
3.1 RESEARCH QUESTION
What are the factors driving lower voter turnout among Urban Population versus
the overall country average
3.2 The following HYPOSTHESISES will be tested:
Majority of Urban Population do not vote because they do not trust the system
They do not vote because the process of voting is very tedious; standing in long queues
A majority of population belongs to other states and hence have voter’s IDs of the state
of their origin and hence cannot vote
They do not vote because they are not happy with the choice of politicians
They perceive going for voting can be a threat to their safety
They are not comfortable with the location of polling booths
They do not consider their vote will add any value to the whole process
They feel that things will not change drastically with change of government and hence
they do not care to vote
12 Reason for lower voter turnout amidst Urban Population, in India
PART IV: SAMPLE, SCALES USED & INSTRUMENT OF DATA
COLLECTION
4.1 SAMPLE
The Urban Population, mostly people working in corporates, is the target population for the
study. Amidst 401 respondents, 373 valid responses are considered for analysis which consists
of Voters and Non Voters.
The questionnaire was distributed in a manner that most of the respondents belong to urban
population. The same was done by circulating questionnaire in corporate. It is assumed that the
people working in a similar environment will have similar requirements/expectations from the
political system.
4.2 SCALE
The measurement of the variables is done using a Five Point Likert Scale, Binary answers
(Yes/No). Below is the data dictionary for defining the measurement of variables:
DEFINING VARIABLES
5 POINT LIKERT SCALE (Ordinal Variables)
Strongly disagree 1
Disagree 2
Neither agree nor disagree 3
Agree 4
Strongly agree 5
Not Applicable 6
DURATION OF STAY (Ordinal Variable)
< 5 years 1
5-10 years 2
>10 years 3
ANNUAL INCOME (Ordinal Variable)
< 2 lacs p.a. 1
2-6 lacs p.a. 2
6-10 lacs p.a. 3
10-20 lacs p.a. 4
13 Reason for lower voter turnout amidst Urban Population, in India
> 20 lacs p.a. 5
DECISION TREE (Nominal Variables)
Yes 1
No 0
NA/May Be 3
The service areas to be evaluated in this study would be limited to the items that were included
in the Questioner.
4.3 INSTRUMENT OF DATA COLLECTION
Primary data was collected through structured questionnaire. Questionnaire aimed at capturing
the factors motivating people to vote and not vote. Also people were asked on their preference
for registration of voters ID in their corporate along with voting to happen in a business park
in/near their office. An online survey was created and circulated using the social media like
Facebook, LinkedIn etc.
Questionnaire consisted of 28 unique questions, not all were targeted to entire group. There are
2 major divisions of questions:
a) People who voted and people who did not vote in the last election
b) Amidst who voted did not vote; people who are registered voters and people who are
non-registered.
Below is the link for Questionnaire:
https://www.surveymonkey.com/s/XNTZNMB (can be accessed only once through a URL)
The table below is has list of all the questions. It also covers the name, type of the
Variable as used for the analysis:
Project_Data_Consolidated.xlsx
CONSOLIDATED DATE FOR ANALYSIS (POST PROCESSING)
SPSS DATA FOR EDA AND FIRST MODEL
14 Reason for lower voter turnout amidst Urban Population, in India
VOTING_RESPONSE is the Dependent Variable in 1st Model where analysis is done to
determine the factors affecting the decision to vote or not to vote.
REGISTERED_VOTER is the Dependent Variable in 2nd
Model where analysis is done to
understand the factors which affect people not to vote.
Two models are developed using the data.
15 Reason for lower voter turnout amidst Urban Population, in India
PART V: RESEARCH DESIGN
5.1 RESEARCH DESIGN
Figure 1: Conceptual Framework for VOTERS TURNOUT (Primary Study of Data and other articles
written on factors driving Voter’s Turnout)
ANALYSIS OF DATA
PART VI: EXPLORATORY DATA ANALYSIS
Exploratory Data Analysis is done to understand and summarize the data
6.1. SUMMARY
The data is primarily divided into voters and non-voters. Out of 401 respondents, 187 voted in
the latest elections and 214 did not vote. Below is the summary of data collected:
RAW DATA DUMP
SurveySummary_06082014.xls
SURVEY SUMMARY
FACTORS DRIVING
DECISION TO VOTE OR NOT
Importance of Vote
- Better Policy
- Better Economy
- Better Leader
Safety and Ease
- Safety Concerns
- Voting System being Tedious
Belief in Voting System
- Trust in Politician
- Trust and Interest in Political system
Social Responsibility
- Friends and Peers Voting
- Awarness created by media
16 Reason for lower voter turnout amidst Urban Population, in India
373 respondents completed the questionnaire and hence are considered for analysis. Below is
the analysis:
The questionnaire is further divided into Registered and Non Registered Respondents. All
who voted are considered Registered Voters. This question was asked to all who did not vote in
the last election. Below is the overall statistics for registered and non-registered voters:
6.2 DESCRIPTIVE ANALYSIS
Descriptive Analysis is performed on all the demographical variables to describe
the main features of data collected
Voting_response, dependent variable, determines if the respondent voted in the last election or
not. Since the response is Yes/No it is a nominal variable and hence Chi-Square Test is done to
determine the relation of voting_response with other demographic variables like Age, Gender,
Education Qualification, Annual Income, Occupation, Relationship Status, Duration of Stay in
Same City, Belong to State residing in and Duration of abroad stay.
Re sp o nse
Pe rce nt
Resp o nse
Count
46.6% 187
53.4% 214
401
1sk ip p ed q ue stion
I vo ted in the las t e lections
Answer Op tions
Yes
No
answe re d q ue stio n
Re sp o nse
Pe rce nt
Resp o nse
Count
47.5% 177
52.5% 196
373
1
I vo ted in the las t e lections
Answer Op tions
Yes
No
answe re d q ue stio n
sk ip p ed q ue stion
Response
Percent
Response
Count
81.8% 305
18.2% 68
373
I am a reg is te red vo te r
Answer Op tions
Yes
No
answered question
17 Reason for lower voter turnout amidst Urban Population, in India
6.3 CHI-SQUARE TEST
Below are the Chi-square tests of all the demographic variables
AGE – Ordinal Variable (voting_response * Age)
Definition: 18 to 24 – 1, 25 to 34 – 2, 35 to 44 – 3, 45 to 54 – 4, 55 or older - 5
Ho – Age of a person does not determine his/her choice to vote
H1 – Decision to vote varies with the age of a person
Re sp o nse
Pe rce nt
Re sp o nse
Co unt
22.9% 92
65.6% 263
9.2% 37
1.7% 7
0.5% 2
401
1skip p e d q ue stio n
Ag e
45 to 54
18 to 24
a nswe re d q ue stio n
35 to 44
Answe r Op tio ns
55 or older
25 to 34
Chi-Square Tests
Value Df
Asymp. Sig.
(2-sided)
Pearson Chi-Square 3.940a 4 .414
Likelihood Ratio 3.987 4 .408
Linear-by-Linear
Association
3.377 1 .066
N of Valid Cases 373
a. 4 cells (40.0%) have expected count less than 5.
The minimum expected count is .95.
With significance
value being more
than 0.05, it signifies
that voting response
rate across age
group does not vary.
Maximum
respondent are in
the age group of 25-
34 which is 67% of
the total
respondent.
18 Reason for lower voter turnout amidst Urban Population, in India
Null Hypothesis is accepted
Crosstab
Age
Total 1 2 3 4 5
voting_response 0 Count 50 128 15 2 1 196
Expected Count 44.7 128.7 17.9 3.7 1.1 196.0
1 Count 35 117 19 5 1 177
Expected Count 40.3 116.3 16.1 3.3 .9 177.0
Total Count 85 245 34 7 2 373
Expected Count 85.0 245.0 34.0 7.0 2.0 373.0
GENDER – Nominal Variable (voting_response * Gender)
Of the total 373 respondents 115 are Female and 258 are Male
Definition: Male -1, Female - 2
Ho – Gender of a person does not drive his/her choice to vote
Response
Pe rcent
Respo nse
Co unt
30.8% 115
69.2% 258
373
1
Gender
Answer Op tions
Female
Male
answere d q uestion
sk ipped q ue stion
128 respondents in the age
group of 25-34 did not vote
in the last election and 117
respondents in the same age
group voted. The trend is
similar for all the age group.
69.2% of the
respondents are
Male and 30.8% of
respondents are
Female
19 Reason for lower voter turnout amidst Urban Population, in India
H1 – Gender of a person drives his/her choice to vote
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Exact Sig. (2-
sided)
Exact Sig. (1-
sided)
Pearson Chi-Square .016a 1 .898
Continuity Correctionb .000 1 .987
Likelihood Ratio .016 1 .898
Fisher's Exact Test .911 .494
Linear-by-Linear
Association
.016 1 .898
N of Valid Cases 373
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 54.57.
b. Computed only for a 2x2 table
Null Hypothesis is Accepted
EDUCATION QUALIFICATION – Ordinal Variable (voting_response *
Education_Qualification)
Definition: Graduate - 1, Professional Degrees – 2, Masters – 3, Doctorate – 4
Re sponse
Pe rcent
Response
Count
3.8% 14
53.6% 200
16.9% 63
25.7% 96
373a nswe red question
Highe st Educa tio n Qua lifica tion
Answer Op tions
Doctorate
Masters
Professional Degree
Graduate
135 male respondents did
not vote in the last election
and 123 male respondents
voted. The trend is similar for
Female as well.
53.6% of the
respondents have
Masters and 25.7%
of respondents are
Graduate
With significance
value being more
than 0.05, it signifies
that gender does
not impact the
choice to vote.
20 Reason for lower voter turnout amidst Urban Population, in India
Ho – The Education Qualification of a person does not have an effect on his/her choice to vote
H1 – The Education Qualification of a person affects his/her choice to vote
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square .371a 3 .946
Likelihood Ratio .371 3 .946
Linear-by-Linear Association .173 1 .677
N of Valid Cases 373
a. 0 cells (.0%) have expected count less than 5. The minimum expected
count is 6.64.
OCCUPATION – Nominal Variable (voting_response * Occupation)
Definition: Student – 1, Home Maker – 2, Government Service – 3, Professional – 4,
Self Employed – 5
Ho – Occupation of a person does not impact his/her choice to vote
H1 – Occupation of a person impacts his/her decision to vote
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 1.605a 4 .808
Likelihood Ratio 1.619 4 .805
Linear-by-Linear Association .055 1 .814
N of Valid Cases 373
Re sp o nse
Pe rce nt
Re sp o nse
Co unt
7.0% 26
80.4% 300
1.6% 6
2.7% 10
0.0% 0
8.3% 31
373
Home Maker
Retired
Student
a nswe re d q ue stio n
Occup a tio n
Answe r Op tio ns
Self Employed
Professional
Government Service
With significance value
being more than 0.05,
it signifies that voting
response rate does not
vary with the education
qualification of a
person
With significance value
being more than 0.05,
it signifies that voting
response rate does not
vary with occupation of
a person
80.4% of the
respondents are
Professionals. This was
also the key target
segment for analysis
21 Reason for lower voter turnout amidst Urban Population, in India
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 1.605a 4 .808
Likelihood Ratio 1.619 4 .805
Linear-by-Linear Association .055 1 .814
N of Valid Cases 373
a. 3 cells (30.0%) have expected count less than 5. The minimum expected
count is 2.85.
Null Hypothesis is Accepted
ANNUAL INCOME – Ordinal Variable (voting_response * Annual_Income)
Answer Options Response
Percent Response
Count
< 10 lac p.a. 75.6% 282
> 10 lac p.a. 24.4% 91
answered question 373
Definition: < 10 lac p.a. – 0, > 10 lac p.a. – 1
Ho – The annual income of a person does not determine his choice to vote
H1 – The annual income of a person determines his choice to vote
With significance value
being more than 0.05,
it signifies that voting
response rate does not
vary with occupation of
a person
144 voters and 156 non-
voters are professionals. The
trend is similar for all the
respondents across various
Occupations.
75.6% of the respondents
have an annual income < 10
lac. 24.4% of the
respondents have an
income greater than INR 10
lac.
22 Reason for lower voter turnout amidst Urban Population, in India
Null Hypothesis is Accepted
CITY OF RESIDENCE – Nominal Variable
(voting_repsonse*current_city_stay)
Definition: South India – 1, North India – 2, Others – 3
Ho – Geographical location of a person does not impact his/her decision to vote
H1 – Geographical location of a person impacts his/her decision to vote
Chi-Square Tests
Value df
Asymp. Sig. (2-
sided)
Exact Sig. (2-
sided)
Exact Sig. (1-
sided)
Pearson Chi-Square 1.973a 1 .160
Continuity Correctionb 1.648 1 .199
Likelihood Ratio 1.971 1 .160
Fisher's Exact Test .184 .100
Linear-by-Linear Association 1.968 1 .161
N of Valid Cases 373
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 43.18.
b. Computed only for a 2x2 table
Re sp o nse
Pe rce nt
Resp onse
Count
72.4% 270
22.5% 84
5.1% 19
373
South India
North India
Others
answe re d q ue stio n
City I am currently res id ing in
Answer Op tions
Significance value is more
than 0.05 and hence
indicates that annual
income impacts the
decision to vote.
154 non-voters and 128
voters are in the income
bracket of < 10 lacs p.a.
There is no significant
difference in the voting
response w.r.t. Annual
Income
72.4% of the respondents
are from South India and
22.5% of them from North
India. Data is skewed
towards Southern India.
23 Reason for lower voter turnout amidst Urban Population, in India
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square .081a 2 .960
Likelihood Ratio .081 2 .960
Linear-by-Linear Association .041 1 .840
N of Valid Cases 373
a. 0 cells (.0%) have expected count less than 5. The minimum expected
count is 9.02.
Null Hypothesis is Accepted
RELATIONSHIP STATUS – Nominal Variable (voting_response *
Relationship_Status)
Definition: Single, never married – 1, Married – 2, Divorced – 3
Ho – Relationship status of a person does not impact his/her decision to vote
H1 – Relationship status of a person impacts his/her decision to vote
Respo nse
Pe rcent
Re sp o nse
Co unt
53.6% 200
44.5% 166
1.9% 7
373answere d q uestio n
Re la tionship Sta tus
Answer Op tions
Single, never married
Married
Divorced
53.6% of the respondents
are Single and 44.5% of
them are married. The
distribution of respondents
across two broad categories
of relation is almost equal.
Significance value is more
than 0.05 and hence
indicates that the city of
residence does not play a
significant role in choice of
voting
143 non-voters and 127 voters
belong to South India. 43 non-
voters and 41 voters belong to
North India. There is no
significant difference in the
pattern of voting between the
geographic locations.
24 Reason for lower voter turnout amidst Urban Population, in India
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 4.289a 2 .117
Likelihood Ratio 4.295 2 .117
Linear-by-Linear Association 4.182 1 .041
N of Valid Cases 373
a. 2 cells (33.3%) have expected count less than 5. The minimum
expected count is 3.32.
Null Hypothesis is Accepted
DURATION OF STAY IN THE SAME CITY – Ordinal Variable
(voting_response * Duration_Stay_Same_City)
Definition: <5 years - 1, 5-10 years – 2, >10 years – 3
Response
Pe rcent
Re spo nse
Co unt
45.0% 167
19.9% 74
35.0% 130
371
Dura tion o f s tay in the Sta te I am currently re ce d ing in
Answe r Op tions
< 5 years
5-10 years
>10 years
answered question
115 respondents, who did
not vote in, last elections, are
single whereas 85 single
respondents voted. The
trend is reverse for married
people 88 voted and 78 did
not vote.
Significance value is more
than 0.05 and hence
indicates that the
relationship status does
not play a significant role
in choice of voting
45.0% of respondents have
lived in same city for less than
5 years whereas 55% of the
respondents have been in the
same city for more than 5
years. The time frame should
be sufficient for people to have
voters ID created.
25 Reason for lower voter turnout amidst Urban Population, in India
Ho – Duration of Stay in the same city does not impact the choice to vote
H1 – Duration of Stay in the same city impacts the choice to vote
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 60.843a 3 .000
Likelihood Ratio 63.515 3 .000
Linear-by-Linear Association 55.298 1 .000
N of Valid Cases 373
a. 2 cells (25.0%) have expected count less than 5. The minimum
expected count is .95.
Crosstab
Duration_Stay_Same_City
Total 0 1 2 3
voting_response 0 Count 2 115 46 33 196
Expected Count 1.1 87.8 38.9 68.3 196.0
1 Count 0 52 28 97 177
Expected Count .9 79.2 35.1 61.7 177.0
Total Count 2 167 74 130 373
Expected Count 2.0 167.0 74.0 130.0 373.0
Significance value is less
than 0.05 and hence
indicates that the duration
of stay is significant factor
driving votes.
115 respondents, who did
not vote in, last elections, are
single whereas 85 single
respondents voted. The
trend is reverse for married
people 88 voted and 78 did
not vote.
There is a difference in expected count of voters and non-voters in accordance to the duration of
stay in the same city. Hence the variable has an impact on voting decision.
26 Reason for lower voter turnout amidst Urban Population, in India
Null Hypothesis is Rejected
I BELONG TO THE STATE I AM RESIDING IN – Nominal Variable
(voting_response * State_residing_flag)
Definition: Yes – 1, No – 2
Ho – Decision to vote is not impacted if a person belongs to state he/she is not residing in
H1 – Decision to vote is impacted if a person belongs to state he/she is not residing in
Response
Percent
Response
Count
63.0% 235
37.0% 138
373
I be long to the s ta te I am res id ing in
Answer Op tions
Yes
No
answered question
63.0% of the respondents
belong to the state they are
residing in can be a motivating
factor to vote for the
betterment of their state.
Significance value is less than 0.05 and hence indicates factor that people belong to the state they
currently living in impacts their decision to vote.
27 Reason for lower voter turnout amidst Urban Population, in India
Crosstab
State_residing_flag
Total 0 1
voting_response 0 Count 164 32 196
Expected Count 123.5 72.5 196.0
1 Count 71 106 177
Expected Count 111.5 65.5 177.0
Total Count 235 138 373
Expected Count 235.0 138.0 373.0
Null Hypothesis is Rejected
Have been abroad, if yes, total duration of stay – Ordinal Variable
(voting_response*Abroad_stay)
Definition: 0-1 year – 1, 1-3 years – 2, 3-10 years – 3, >10 years – 4, Not been abroad – 5
Response
Pe rcent
Response
Count
31.9% 119
7.0% 26
5.1% 19
1.9% 7
54.1% 202
373
Not been abroad
answered question
Have b e e n ab road , if yes, to ta l dura tion o f s tay
Answe r Op tions
0-1 year
1-3 years
3-10 years
>10 years
Amidst 235 respondents who
did not vote in the last
election, 164 people do not
belong to the state. Only 32
respondents who do not
belong to the state voted in
the last election.
There is a difference in expected
count of voters and non-voters
in accordance to the belonging
to state they are residing in.
Hence the variable has an impact
on voting decision.
54.1% of the respondents have not
been abroad and 31.9% of
respondents have spent less than 1
year outside India. Exposure to life
outside India can influence people’s
decision to make a difference.
28 Reason for lower voter turnout amidst Urban Population, in India
Ho – The exposure to life outside India does not impact person’s decision to vote
H1 – The exposure to life outside India impacts the decision to vote
Chi-Square Tests
Value Df
Asymp. Sig. (2-
sided)
Pearson Chi-Square 6.517a 4 .164
Likelihood Ratio 6.900 4 .141
Linear-by-Linear Association 2.813 1 .094
N of Valid Cases 373
a. 2 cells (20.0%) have expected count less than 5. The minimum
expected count is 3.32.
Null Hypothesis is Accepted
SPSS OUTPUT CHI-SQUARE ANALYSIS
Significance value is more than
0.05 and hence indicates factor
that people have lived outside
India does not have any
significance to their choice of
voting.
The exposure to abroad stay
has not influenced people’s
decision to vote. There was a
little difference seen in the
actual and expected count.
6.3.1 SUMMARY: CHI-SQUARE
Age, Gender, Education Qualification, Occupation, Relationship Status of a person
is not influential in driving people choice to vote.
Annual Income is significant in driving people’s choice to vote.
o From the data, more number of people, in the income bracket of 2-6 lac p.a., did
not vote in the last election
29 Reason for lower voter turnout amidst Urban Population, in India
PART VII: FACTOR ANALYSIS
The questionnaire was designed after considering various parameters which drives people to
vote. Voting is a personal choice and every person attributes different importance to voting.
Many factors drives the decision to vote and an effort was made to incorporate all important
attributes in the questionnaire. Many questions might be highly interrelated and brings out the
same characteristics of human psychology.
Hence, the analysis is used to identify principle variables from many variables selected for the
study. This would help in providing underlying construct of highly correlated variables.
It is assumed that the sample is homogeneous and hence the response of the people will not vary
if the same questions are asked in any other circumstances.
Variables which would help in understanding the psychology of respondents, on their reasons to
vote, are included for factor analysis. Other demographic variables will be directly used in the
model building.
The sample size is too small to comment on the higher brackets. Though it indicated
that more people in the income bracket of 10-20 lac p.a. did not vote the trend was
not the same for people with income more than 20 lacs p.a... A larger sample size
may or may not change this observation. Duration of stay in the same city highly
influences person’s decision to vote. More people have voted if they have stayed in
the city longer than 10 years. The trend is reverse for people who have stayed for
duration lesser than 5 years.
More number of people who belong to the state they are residing in has voted than
the ones who do not belong to the state. This variable also influences the decision to
vote.
30 Reason for lower voter turnout amidst Urban Population, in India
The following 9 variables are considered for Analysis:
1) Vote_Tedious_TimeConsuming
2) Opinion_Safety_Concern
3) Opinion_Politicians_choice
4) Opinion_Trust_Voting_System
5) Opinion_RegstrVoterID_Org
6) Opinion_Voting_Business_Park
7) Importance_Leader
8) Opinion_Vote_Important
9) Imapct_LifeStyle
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .680
Bartlett's Test of Sphericity Approx. Chi-Square 581.581
Df 36
Sig. .000
Communalities
Initial Extraction
Vote_Tedious_TimeConsumi
ng
1.000 .569
Opinion_Safety_Concern 1.000 .601
Opinion_Politicians_choice 1.000 .826
Opinion_Trust_Voting_Syste
m
1.000 .483
Opinion_RegstrVoterID_Org 1.000 .641
Opinion_Voting_Business_P
ark
1.000 .713
Importance_Leader 1.000 .710
Opinion_Vote_Important 1.000 .779
Imapct_LifeStyle 1.000 .777
- Kaiser-Meyer-Olkin
value being slightly higher
than 0.5 signifies that
sample is adequate to work
on Factor analysis
- Sig = 0.000 for Bartlett’s
Test of Sphericity indicates
that factor analysis is useful
for reduction of data
All the variables are
Ordinal with value
ranging from 1-5 with 1
being Strongly Disagree
and 5 being Strongly
Agree
- Communalities for none of
the variables, except
Opinion_Trust_Voting_System , are
less than 0.5 hence implies that
variance explained by all the
variables is more than 50%.
Hence, all the variables are
considered for further analysis.
Communality of 0.483 is very
close to 0.5 and hence even that
is included for analysis.
31 Reason for lower voter turnout amidst Urban Population, in India
Communalities
Initial Extraction
Vote_Tedious_TimeConsumi
ng
1.000 .569
Opinion_Safety_Concern 1.000 .601
Opinion_Politicians_choice 1.000 .826
Opinion_Trust_Voting_Syste
m
1.000 .483
Opinion_RegstrVoterID_Org 1.000 .641
Opinion_Voting_Business_P
ark
1.000 .713
Importance_Leader 1.000 .710
Opinion_Vote_Important 1.000 .779
Imapct_LifeStyle 1.000 .777
Extraction Method: Principal Component Analysis.
Total Variance Explained
Comp
onent
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
%
1 2.380 26.450 26.450 2.380 26.450 26.450 2.291 25.459 25.459
2 1.514 16.820 43.270 1.514 16.820 43.270 1.475 16.393 41.852
3 1.189 13.209 56.478 1.189 13.209 56.478 1.228 13.645 55.497
4 1.016 11.286 67.765 1.016 11.286 67.765 1.104 12.268 67.765
5 .788 8.758 76.523
6 .768 8.533 85.056
7 .639 7.095 92.151
8 .401 4.453 96.603
9 .306 3.397 100.000
Extraction Method: Principal Component Analysis.
- Communalities for none of
the variables, except
Opinion_Trust_Voting_System , are
less than 0.5 hence implies that
variance explained by all the
variables is more than 50%.
Hence, all the variables are
considered for further analysis.
Communality of 0.483 is very
close to 0.5 and hence even that
is included for analysis.
4 factors with Eigen value of greater than 1 are considered. Extracted 4 factors
explain 67.765% of variance. Eigen value for each component in the table is a
variance of the component/factor extracted.
32 Reason for lower voter turnout amidst Urban Population, in India
Rotated Component Matrixa
Component
1 2 3 4
Vote_Tedious_TimeConsumi
ng
-.139 .739 .059 .012
Opinion_Safety_Concern .079 .765 -.096 -.008
Opinion_Politicians_choice .047 -.022 -.022 .907
Opinion_Trust_Voting_Syste
m
.094 -.524 .043 .444
Opinion_RegstrVoterID_Org .024 -.206 .745 -.207
Opinion_Voting_Business_P
ark
-.061 .134 .810 .185
Importance_Leader .840 -.073 -.001 .014
Opinion_Vote_Important .877 -.056 -.016 .080
Imapct_LifeStyle .881 .003 -.032 .018
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
SPSS OUTPUT FACTOR ANALYSIS
Screen Plot visually helping
to identify the cut-off point
of 4. The Eigen value of 5th
Factor is 0.788 which is far
below 1 and hence 4 is the
right cut off.
33 Reason for lower voter turnout amidst Urban Population, in India
PART VIII: ONE WAY ANOVA
One Way Analysis of Variance (ANOVA) is performed to determine whether there are any
significant differences between the means of 4 independent Factors, respectively, w.r.t to the
dependent variable. Though the significance of the factors will be tested during Model building,
ANOVA is performed a part of EDA.
H01 – there is no difference in the average response of people who consider voting important
H11 – there is difference in the average response of people who consider voting important
H02 – there is no difference in the average response of people who have concerns with safety and
ease related to voting
H12 – there is difference in the average response of people who have concerns with safety and
ease related to voting
7.1 LABELLING THE FACTORS
Factor 1 – Importance_Voting_F1
Importance_Leader: It is important to me who, gets elected
Opinion_Vote_Important: I feel my vote is important
Imapct_LifeStyle: There will be a direct impact on my day today life with whosoever gets elected
Factor 2 – Security_Ease_F2
Vote_Tedious_TimeConsuming: The process of Voting is tedious and time consuming
Opinion_Safety_Concern: Safety is a concern when I go for voting
Factor 3 – Mobility_Comfort_F3
Opinion_RegstrVoterID_Org: Registration of Voters ID should be done at my corporate, just like
other government documents
Opinion_Voting_Business_Park: Voting should be in a business parks near to work locations and can
happen on a working day
Factor 4 – Belief_Voting_System_F4
Opinion_Politicians_choice: I am satisfied with the choice of politicians
Opinion_Trust_Voting_System: The voting system is trustworthy
34 Reason for lower voter turnout amidst Urban Population, in India
H03 – there is no difference in the average response of people who look forward to comfort in
terms of voters id registration and location of polling booths
H13 – there is difference in the average response of people who look forward to comfort in terms
of voters id registration and location of polling booths
H04 – there is no difference in the average response of people who believe in the voting system
H14 – there is difference in the average response of people who believe in the voting system
SPSS OUTPUT ONE WAY ANOVA
SUMMARY – ONE WAY ANOVA
The One way ANOVA for the 4 factors shows there is a difference in the means of following
factors:
Factor 1 – Importance Voting
Factor 3 – Voting Comfort
Hence the null hypothesis H01 and H03 are Rejected.
All the factors are reused during model building.
35 Reason for lower voter turnout amidst Urban Population, in India
PART IX: MODEL 1
9.1 BINARY LOGISTIC REGRESSION
Binary Logistic Regression model is used to predict the chances of a person opting to vote.
The predictor variables used are the 4 Factors created, along with all the demographic
variables. Since the independent variables are a mix of continuous and categorical variables,
Binary Logistic Model is the best method to predict the dependent variable.
Even though many of the demographic variables like Age, gender etc. showed low significance
in predicating Y, as per the Chi-square, the same are used in the model because it might become
significant when interacting with other variables. The same is done just to rule out any
possibility to outcome to be different when the variables interact with each other in the model.
The below model was developed after multiple iterations. The same can be seen in the attached
output file.
Block 0: Beginning Block
We see that there are
373 cases used in the
analysis.
36 Reason for lower voter turnout amidst Urban Population, in India
Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 0 Constant -.102 .104 .967 1 .325 .903
Block 1: Method = Enter
Model Summary
Step -2 Log likelihood
Cox & Snell R
Square
Nagelkerke R
Square
1 192.205a .580 .775
a. Estimation terminated at iteration number 7 because
parameter estimates changed by less than .001.
The Block 0 output is for a model that includes only the intercept. Given the base rates of
the two decision options (196/373 = 52.5% decided not to vote, 47.5% decided to vote),
and no other information, the best strategy is to predict, for every case, that the subject will
decide not to vote. Using that strategy, we would be correct 52.5% of the time.
Under Variables in the Equation the intercept-only model is ln(odds) = -.102. If we
exponentiate both sides of this expression we find that our predicted odds [Exp(B)] = .903.
That is, the predicted odd of deciding to vote is .903. Since 177 subjects decided to vote
and 196 decided not to vote, our observed odds are 177/196 = .903
Model summary we see that the -2 Log Likelihood statistic is 192.205. This statistic
measures how poorly the model predicts the decisions -- the smaller the statistic the better
the model.
37 Reason for lower voter turnout amidst Urban Population, in India
The Classification Table shows us that this rule allows us to correctly classify 169 / 177 =
95.5% of the subjects where the predicted event (deciding to vote) was observed. This is
known as the sensitivity (True Positive Rate) of prediction, the P (correct | event did occur),
that is, the percentage of occurrences correctly predicted. This rule allows us to correctly
classify 171 / 196 = 87.2% of the subjects where the predicted event was not observed. This
is known as the specificity (False Positive Rate) of prediction, the P (correct | event did not
occur), that is, the percentage of non-occurrences correctly predicted. Overall our predictions
were correct 340 out of 373 times, for an overall success rate of 91.2%.
9.2 Interpreting Coefficients
Ln[p/(1-p)] = a + b1X1 + b2X2 + b3X3 + b4X4
Annual Income, Belong to State Residing in, Importance of Voting and Comfort of Mobility
(registering ID and voting location) are significant factors driving decision to vote.
Annual Income though came out to be insignificant during Chi-square, in interaction with
other variables it became a significant factor driving voting decision.
Each coefficient increases the odds by a multiplicative amount, the amount is eb
which is
Exp(B). Every unit increase in X increases the Odds by eb
.
Annual Income: e-0.937
= 0.392; (0.392-1 = -0.608) Odds of Voting decreases 60.8% for
people with Annual Income less than 10 lacs p.a.
Belong to the state residing in: Odds of voting increases 605.0% for people who belong to
the state they are residing in.
Importance_Voting: Odds of voting increases 956.0% for people who believe their vote is
important and it is important for them who get elected at the same time believe that it would
improve their life style.
Mobility Comfort: Odds of voting decreases 34.8% for people who want voters ID
registration to happen in the corporate and voting to happen in the Business Park.
X1 X2
X3
X4
1
b1 b2 b3 b4 a
38 Reason for lower voter turnout amidst Urban Population, in India
THE FIRST MODEL:
Ln[p/(1-p)] = -1.586 + (-.937)*Annual Income + 1.953*State_residing_flag +
4.571*Importance_voting_F1 + (-.428)*Mobility_Comfort_F3
SPSS OUTPUT FOR BINARY LOGISTIC MODEL AND ROC CURVE
TPR_FPR.xlsx
MANUAL CALCULATION FOR TPR AND FPR 9.3 ROC Curve Receiver Operating Characteristics Curve is used for diagnostic test evaluation
True Positive Rate (Sensitivity) is plotted in function of the False Positive Rate (100-
Specificity) for the different cut-off points of the parameters. Each point on the ROC curve
represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area
under the ROC curve (AUC) is a measure of how well a parameter can distinguish between
two diagnostic groups
The summary shows the number
of respondents who voted in the
last elections (177) and number
of respondents who did not vote
in the last election (196)
39 Reason for lower voter turnout amidst Urban Population, in India
Coordinates of the Curve
Test Result Variable(s):Predicted probability
Positive if Greater Than or Equal To
a Sensitivity
1 - Specificity
Specificity
(1-(1-
Specificity))
Error =
ABS(TPR-
FPR)
.5819164 .910 .092 .908 .001
.5942843 .904 .092 .908 .004
.5757379 .910 .097 .903 .007
.6012760 .898 .092 .908 .010
.5747964 .915 .097 .903 .012
.6063015 .898 .087 .913 .015
.5730680 .915 .102 .898 .017
.6120221 .893 .087 .913 .021
.5712820 .915 .107 .893 .022
.6145646 .887 .087 .913 .026
The area under the curve (AUC) is a
measure of the power of the test. AUC
being 0.961, measures the overall
model fit to be 96.1%.
In other words, the model has 96.1%
ability to correctly classify the
probability of occurrence of the event.
The closer the curve follows the left
hand border and then the top border
of the ROC space the more accurate
the test is.
Cutoff
point lies
in this
range
40 Reason for lower voter turnout amidst Urban Population, in India
.9929000 .011 .000
.0000000 1.000 .990
.9938029 .006 .000
.0000000 1.000 .995
.0000000 1.000 1.000
1.0000000 .000 .000
a. The smallest cutoff value is the minimum
observed test value minus 1, and the largest
cutoff value is the maximum observed test
value plus 1. All the other cutoff values are
the averages of two consecutive ordered
observed test values.
.000
.200
.400
.600
.800
1.000
1.200
.00 .10 .20 .30 .40 .50 .60 .70 .80 .90 1.001.10
Sensitivity
Specificity
.5693794 .915 .112 .888 .027
.6180047 .881 .087 .913 .032
.5677229 .921 .112 .888 .033
.6205378 .876 .087 .913 .038
.5638978 .927 .112 .888 .039
.6278912 .876 .082 .918 .043
.5589323 .932 .112 .888 .044
.6400436 .876 .077 .923 .048
.5502517 .932 .117 .883 .050
.6519254 .876 .071 .929 .053
Data deleted for easy presenting. Please
refer to the attached calculations
The graph is plotted in excel to
show the cut off probability for
the model. The intersection point
of the two curves, Specificity-
Sensitivity is Zero, is the point of
cut-off.
The Cut-off value where the
TPR is highest and FPR is
lowest is calculated.
The data is sorted to get the
cut off probability value
below which the odds of a
person to vote decreases.
This is calculated manually
from the Coordinates for
better understanding. The
cut-off probability is at the
point where the absolute
difference between
specificity and sensitivity is
zero/minimum.
Specificity = 1-(1-Specificity)
Error = ABSOLUTE (TPR-FPR)
The cut-off probability is 0.58
for the model to classify 0
and 1.
41 Reason for lower voter turnout amidst Urban Population, in India
PART X: MODEL 2
INTRODUCTION
The analysis is divided into 2 parts. The first part concentrated on the factors driving voting
decision among urban population in India. The second part of the analysis is concentrated on
people who did not vote in the last elections.
As per the data, there are 3 categories of people who did not vote in the last election:
1) People who are not a registered voters (hence cannot vote)
2) People who are registered voters but did not vote
3) People who are registered voters but their Voter’s ID is of a state they are not currently
residing in
A Discriminant analysis is performed to see if there is any commonalty among the 3 groups.
SPSS INPUT FOR SECOND MODEL
9.4 CONCLUSION
People who are residing in the State, they belong to; have higher chances of voting than the others.
Further, all who feel their vote is important and think that change in leader will impact their life
style; have higher chances to vote than the ones who feel the voting is not important.
People in the income bracket of 2-10 lacs per annum have less probability of voting than the ones
in higher income bracket. There is a possibility that they do not feel importance of voting as it
might not impact their life style.
There is a requirement to encourage people to vote as a lot of people do not feel their vote is
important. People who feel it is important that registration of voter ID should happen in their
corporate and voting to happen in the business park are less likely to vote otherwise.
This is a great indication for government to encourage voting by streamlining the process better.
The further analysis, in the research, focuses on factors that affect people’s decision to vote. This
would also help in verifying the outcome from the above model.
42 Reason for lower voter turnout amidst Urban Population, in India
10.1 DISCRIMINANT ANALYSIS
The objective of using Discriminant analysis is for modeling variables and to identify the
principal discriminators which differentiate behavior of the individuals of who are registered
voters with voters ID belonging to the same state, registered with voters ID of state they are
not currently residing in, not a registered voter.
Objective of Discriminant Analysis is to understand:
How the 3 groups differ with respect to the underlying variables
How people differ with respect to underlying demographic and psychographic
dimensions
Given the information on various variables, a person belongs to which segment? With
what probability
Since there are many variables, Stepwise method is used to select the best variables to use in the
model.
43 Reason for lower voter turnout amidst Urban Population, in India
The stepwise method starts with a model that doesn't include any of the predictors (step
0).
At each step, the predictor with the largest F to Enter, value that exceeds the entry criteria
(by default, 3.84) is added to the model.
Box's Test of Equality of Covariance Matrices
The ANOVA table shows that following variables are significant across groups:
i. Age
ii. Relationship Status
iii. City of Residence
iv. If the person Belongs to the State he/she is residing in
v. Duration of Stay in the same State
vi. People for whom it is important who gets elected
vii. People who think it would impact their lifestyle, if they vote
The significance value of .000 indicates that the data is not homogeneous in its covariance
matrices which violate an assumption of DA.
When n is large, small deviations from homogeneity will be found significant, which is why
Box's M is interpreted in conjunction with inspection of the log determinants. Log
determinants are close to each other for group 0 and 1 and hence the assumption of equality
of covariance can hold good for the 2 groups.
44 Reason for lower voter turnout amidst Urban Population, in India
Stepwise Statistics
\
The table tells the variables selected in the model after 38 iterations. The above 3 variables are
considered for the analysis.
Tolerance is the proportion of a variable’s variance not accounted by other independent variables
in the equation. It is about multicollinearity.
Tolerance = 1/VIF (the closer the value is to 1 the less collinear the variables are)
A variable with low Tolerance contributes little information to a model and can cause
computational problems. The 3 variables have high tolerance that signifies that these 3 variables
affect the model outcome significantly.
45 Reason for lower voter turnout amidst Urban Population, in India
Summary of Canonical Discriminant Functions
Nearly all the variance explained by the model is through first Discriminant Function. We can
ignore the second function. For each set of function, this tests the hypothesis that the means of the
functions listed are equal across groups. The test of function 2 has a p value of .26, so this function
contributes little to the model.
Eigen Value indicates the proportion of variance explained. (Between-groups sums of squares
divided by within groups sums of squares). A large Eigen Value indicates strong function.
The canonical relation is a correlation between the discriminant scores and the levels of the
dependent variable. A high correlation indicates a function that discriminates well. The present
correlation of 0.599 is not extremely high (1.00 is perfect).
Function 1 is highly correlated to
State_Residing_Flag and Age
Function 2 is highly correlated to
Impotance_Leader
Although State_residing_flag is
correlated to both the functions, the
impact is higher on Function 1.
46 Reason for lower voter turnout amidst Urban Population, in India
Discriminant Equation:
Function 1 = 0.219 + 2.806 State_residing_flag + 0.547 Age – 0.422 Importance_Leader
Function 2 = -3.719 + 1.304 State_residing_flag – 0.100 Age + 0.921 Importance_Leader
Classification Statistics
Group Centroids is means of
discriminant functions or discriminant
scores.
Registered Voter with IDs belonging to
same state and ID belonging to different
state have similar score on Function 2.
Centroids of these 2 groups are close by.
These are actual regression coefficients
of the variables and are used to form
linear discriminant function.
47 Reason for lower voter turnout amidst Urban Population, in India
SPSS OUTPUT DISCRIMINANT ANALYSIS
Classification Results is a summary of a number and percent of subjects classified correctly and
incorrectly.
Overall % correctly classified = 61.2%
Discriminant scores plotted on the
Function 1 and Function 2 exhibit
that group 0 and group 2 are very
close whereas group 1 is slightly
different.
Function 1 is highly correlated with
Age and State_residing_flag and
Function 2 is correlated with
Importance_Leader.
CONCLUSION
Group 1: people who are registered voters and have ID of the state they live in; has high score on
Function 1 and is discriminated by Age of the people along with them belonging to the state they
live in.
Group 0 and 2 people are the ones who do not have registered voters ID and the ones who have ID
but of another state, respectively. The importance of who gets elected is a major discriminant,
along with them not belonging to the state they live in.
Hence, people who do not have registered voters ID and the ones who have ID but of different state are not impacted by who gets elected. This can be a factor driving their decision not to vote.
As further analysis; the factors, by what percentage, driving the decision for not creating voters ID is
studied using Multinomial Logistic Regression analysis.
48 Reason for lower voter turnout amidst Urban Population, in India
10.2 MULTINOMIAL LOGISTICS REGRESSION
Analysis is done to understand the factors driving registration of voters ID card along with the
probability of a person voting. There are 3 variables to consider.
People without registered voters ID (0)
People with registered voters ID (1)
People with registered voters ID of different state (2)
H0: There is no difference between the 3 groups of people
H1: There is a significant difference in the 3 groups of people
Multinomial Logistic Regression is performed since dependent variable has more than two
categories. The category of people without a Voters ID is taken as reference category since it
differs from people who have ID of same state and who has ID of different state.
Pseudo R-Square
Cox and Snell .591
Nagelkerke .681
McFadden .440
p<0.05 means rejecting the
null hypothesis that there
is no difference between
the ‘intercept only’ and
populated model
Both of these statistics test how well the
model fits that data (expected and actual
values) and p<0.05 means that there is a
significant difference between the two i.e.
the model is not a good fit.
According to the Pearson statistic the model
is a bad fit, but the Deviance statistic
suggests otherwise.
This could be due to low frequencies in
crosstabs.
49 Reason for lower voter turnout amidst Urban Population, in India
Variables which were not significant are removed for easy presentation and understanding.
Lower
Bound
Upper
Bound
Intercept -15.589 3230.327 .000 1 .996
[State_residing_flag=0] -2.997 .974 9.473 1 .002 .050 .007 .337
[Duration_Stay_Same_City=2] 2.667 1.146 5.419 1 .020 14.394 1.524 135.921
[Opinion_Politicians_choice=2] -4.019 1.572 6.535 1 .011 .018 .001 .392
[Opinion_Trust_Voting_System=1] -5.078 2.377 4.562 1 .033 .006 .000 .658
[Opinion_Trust_Voting_System=2] -3.203 1.760 3.312 1 .069 .041 .001 1.280
[Opinion_Trust_Voting_System=4] -2.856 1.635 3.051 1 .081 .058 .002 1.417
[Opinion_RegstrVoterID_Org=3] 2.596 1.250 4.314 1 .038 13.408 1.157 155.322
[Opinion_Voting_Business_Park=2] 3.978 1.502 7.015 1 .008 53.397 2.813 1013.610
Intercept 2.733 2.201 1.542 1 .214
[State_residing_flag=0] 3.454 1.061 10.603 1 .001 31.639 3.956 253.066
[Opinion_Safety_Concern=1] -4.342 1.638 7.024 1 .008 .013 .001 .323
[Opinion_Safety_Concern=2] -3.415 1.547 4.875 1 .027 .033 .002 .681
[Opinion_Safety_Concern=3] -3.582 1.547 5.363 1 .021 .028 .001 .577
[Opinion_Voting_Business_Park=2] 1.942 1.106 3.085 1 .079 6.972 .799 60.873
[Opinion_Voting_Business_Park=4] 2.146 .688 9.724 1 .002 8.554 2.220 32.968
[Importance_Leader=3] -2.430 .973 6.239 1 .012 .088 .013 .593
1
2
Parameter Estimates
Registered_Votera
B Std. Error Wald df Sig. Exp(B)
95% Confidence
Interval for Exp(B)
The pseudo R-square tells us how much of the variance in the dependent variable is
explained by the model – low values are normal in logistic regression. It is not a popular
model used for analysis.
All the variables were included in
the model and were removed one by
one to arrive at the list of variables
which are most significant.
Variables like:
Opinion_RegstrVoterID_Org and
Importance_Leader have
significance level more than 0.05 but
are still included in the analysis
since they are significant at 90%
level. Further they are also
significant in some of the levels with
respect to dependent variable.
50 Reason for lower voter turnout amidst Urban Population, in India
Interpreting Coefficients…
For intercept between 0 and 1 following variables are most significant:
0: People who are not a registered voter
1: People who are registered voter and have ID of same state
i) State_residing_flag ii) Duration_stay_same_city iii) Opinion_trust_voting_system
iv) Opinion_regstrvoterID_org v) Opinion_voting_business_park and
vi) Opinion_Politicians_choice
- The odds of people having a registered Voter’s ID of same state decreases if people do
not belong to the state they are residing in.
- Odds of having a registered Voters ID of same state decreases if people have been
staying in the city for more than 5 years
- Odds of having a registered voters ID of the same state decreases for people who do
not trust the voting system.
- Odds of having a registered voters ID increases if the registration of IDs is done in the
corporate
- People who do not want voting to happen in business park have higher chances of
having registered IDs of the same state
Interpreting Coefficients…
For intercept between 0 and 2 following variables are most significant:
0: People who are not a registered voter
2: People who are registered voter and have ID of different state
i) State_residing_flag ii) Opinion_safety_concern iii) Opinion_voting_business_park and
v) Importance_Leader
- The odds of people having Voter’s ID of other state increases if people do not belong
to the state they are residing in.
- Odds of having a registered Voters ID of another state decreases for people who do not
have concern with safety while they go for voting
- People who have IDs of another state prefer voting to happen in Business Park
- Odds of having a Voter’s ID card of another state decreases for people who are neutral
about who gets elected
51 Reason for lower voter turnout amidst Urban Population, in India
SPSS OUTPUT MULTINOMIAL LOGISTIC REGRESSION
The Classification Table shows us that this rule allows us to correctly classify 76.5% of the
subjects where the predicted event was observed.
In the above model 2 Binary equations are formed and overall accuracy is calculated (76.5%).
This model rightly classifies the probability, of someone with voters ID card of another state,
by 85.4% (2) which is better than the other 2 classifications (69.7% and 67.2%).
Lower sample size can be one of the reasons for lower accuracy of the model.
CONCLUSION
People who belong to the state they reside in have higher chances of having a Voters ID of the
state.
The model also leads to another conclusion that people who belong to the state they are residing in
are comfortable going to the government assigned polling booths to vote. Whereas, people who do
not belong to the state would like voting to happen in the business park, near their work location.
People who have registered voters ID of other state also have a negative perception related to their
safety when they go for voting.
This clearly indicates that people who belong to other state would feel more comfortable in an
environment familiar to them whereas people who belong to the same state are ready to go out and
vote.
52 Reason for lower voter turnout amidst Urban Population, in India
PART XI: CLASSIFICATION TREE
The objective of performing classification tree is to understand the bifurcation of opinion on
some key concerns related to voting process. State_residing_flag (Do you belong to the state you
are residing in) turned out to be a significant variable in the analysis.
State_residing_flag (0 or 1) is taken as dependent variable and significance of Safety, Register of
Voters ID in corporate and voting to happen in Business Park are taken as independent variables.
How does an opinion vary with the origin of a person is assessed.
0 – No, 1 – Yes
Of 373 valid responses 235
do not belonged to the state
they are residing in and 138
belong to the same state.
CRT classification tree
formed 2 nodes on the basis
of people’s opinion on
registration of Voters ID to
happen in the corporate. The
people who do not think
registration of ID should
happen in corporate or are
neutral about it formed the
first node and people who
agree to the thought process
formed second node.
The Second node, terminal
node, is further classified into
people who are concerned
with their safety when they
go for voting.
Node 3 classifies people who do not have any concern for their safety when they go for voting
whereas node 4 groups people who have little to high level of concern when they go for voting.
NOTE: The next question has high significance to Node 2, which classifies people who believe the
registration of voters ID should happen in corporate.
53 Reason for lower voter turnout amidst Urban Population, in India
Model Summary; showing
the Dependent &
Independent variables.
Maximum Tree depth is 5
The results have
considered two variables
i.e.,
Opinion_Rgstrvoterid_org
and
Opinion_safety_concern
Model overall accuracy is
63.5%.
Overall classification of 0 is
more accurate than the
classification of 1.
This can be due to lower
sample size for 1 i.e.
respondents who belong to
the state they are residing in.
CONCLUSION
- Of overall 373 respondents 297 (79.6%) believe that registration of voter ID should
happen in corporate.
- Of 235 respondents, who do not belong to state they are residing in, 198 feel that
registration of voters ID should happen in corporate.
- 165 respondents who do not belong to the state they are residing in have some amount
of concern to do with their safety when they go for voting.
Overall, people who do not belong to the state they are residing in prefer registration of ID
cards to happen in corporate and are also concerned about their safety. This is a fair
conclusion, considering people feel safe in environment they are familiar with.
54 Reason for lower voter turnout amidst Urban Population, in India
PART XII: SUMMARY AND CONCLUSION
The research helped in bringing many aspects under consideration to do with the perception of
urban population, of India, towards the process of voting.
Population, living in metros, are settled in these cities due to better paying jobs/quality of life.
However, sometimes is less attached to the city they live in, if they belong to another state.
Diversity of language/culture can be a reason for the same.
The research drew attention to factors which drive urban population to vote. People who belong
to the city they reside in have higher chances to vote than the people who do not belong to the
state.
Further, the analysis also brought to light that people with annual income of INR 5-10 lacs per
annum have lesser probability of voting than the segment who is earning more than INR 10 lacs.
Also people who have lived in the city for 5-10 years have lesser probability to vote than the
ones who have stayed longer.
There can be synergy drawn from the above outcome. The urban middle class which do not
belong to the state they are residing in and have lived in the city for less than 10 years are
generally less attached to the state since they are only settling into their day today lives. They are
the people who believe in voting system and recognise that importance of voting but look for
more familiar surroundings when it comes to registering for voters ID or going to vote.
It was also noted that people who are not happy with the choice of politicians have lesser
probability to vote and people who think a change in leader will make a difference have higher
probability to vote.
To conclude there are people who vote because the change will impact their lives and then there
are people who vote because of the social responsibility they attach to the cause. There is a
middle man whose day today life does not get impacted with changing governments. Though he
knows the importance of voting, he still requires an extra incentive to take that step forward to
make a difference for the country. These people, who are well educated and informed, look
forward to familiar surroundings and easier processes. If government encourages registering of
voters ID in corporate then more number of people can take the benefit of the system. If voting
55 Reason for lower voter turnout amidst Urban Population, in India
happens in business parks then more people, who look forward to safer and familiar
surroundings, can be encouraged to vote.
Metros in India are a home for a large number of populations. This segment can make a very
well informed decision. They have come out of comforts of their homes to lead a better life.
They make their own living and come around all the obstacles, of basic requirement, of day
today life. This self sufficient group needs an encouragement to go out and vote and the same
will happen if government takes interest and gives them a surrounding they will be comfortable
in.
Corporate can drive voting as a part of Corporate Social Responsibility. If we start, today, to set
up the process and drive this change; there are chances that we will be able to see higher voter’s
turnout by the next elections.
SPSS OUTPUT FOR CLASSIFICATION TRESS
FURTHER ANALYSIS
The current sample size is very small and not an indicator of the population. A similar study can
be done with a larger sample size, spread across regions, to understand the demographic and
psychological behaviour of people towards voting.
Registration of ID cards, concerns with safety can be analysed in greater details.
People with higher earnings can be analysed and the assumptions regarding they having higher
social responsibility can be verified.
56 Reason for lower voter turnout amidst Urban Population, in India
PART XIII: REFERENCE
Books:
- Marketing Research by Malhotra and Dash
- Multivariate Data Analysis by Hair Black Babin Anderson and Tatham
Link:
http://www.southampton.ac.uk/ghp3/docs/unicef/workshop5.pdf
http://www.uk.sagepub.com/burns/website%20material/Chapter%2025%20-
%20Discriminant%20Analysis.pdf
http://core.ecu.edu/ofe/StatisticsResearch/SPSS%20Discriminant%20Function%20Analysis.pdf
http://www.cs.uu.nl/docs/vakken/arm/SPSS/spss6.pdf
**********THANK YOU**********