panel data estimation in regressions for symbolic data: an...

30
Panel data estimation in regressions for symbolic data: An application to the clustering of cultural entrepreneurial regimes Andrej Srakar, PhD Institute for Economic Research, Ljubljana and Faculty of Economics, University of Ljubljana, Ljubljana, Slovenia, [email protected] Marilena Vecco, PhD Erasmus University Rotterdam, Rotterdam, The Netherlands, [email protected]

Upload: others

Post on 30-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Panel data estimation in regressions for symbolic data: An application to the clustering of cultural

entrepreneurial regimes

Andrej Srakar, PhD Institute for Economic Research, Ljubljana and Faculty of Economics,

University of Ljubljana, Ljubljana, Slovenia, [email protected]

Marilena Vecco, PhD Erasmus University Rotterdam, Rotterdam, The Netherlands,

[email protected]

Page 2: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Research problem(s)

• Theoretical: first construction and exploration of panel data estimators for regression with symbolic data

• „Contentwise“: modelling of (cultural) entrepreneurial regimes using symbolic data analysis

Page 3: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Structure of the presentation

1) Literature review

2) Model

3) Data and method

4) Symbolic clustering of cultural entr.regimes

5) Regression estimation

6) Findings and conclusions

Page 4: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, interval variables (DIAS AND BRITO, 2015)

• Methods, based on symbolic covariance definitions (Billard and Diday, 2000; 2006; Xu, 2010);

• Minimax method (Billard and Diday, 2002); • Center and range method (Lima Neto and De Carvalho, 2008); • Center and range least absolute deviation regression method (Maia and

Carvalho, 2008); • Constrained center and range method (Lima Neto and De Carvalho,

2010); • LASSO IR method (Giordani, 2014); • Bivariate symbolic regression models (Lima Neto et al, 2011); • Linear regression models for symbolic interval data using PSO

(Particular Swarm Optimization) algorithm (Yang et al, 2011); • Monte Carlo method (Ahn et al, 2012); • Radial basis function networks (Su et al, 2012); • Copula interval regression method (Neto et al, 2012); • Interval distributional model (Dias and Brito, 2015)

Page 5: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, histogram variables

• First model: Billard and Diday (2006): a non-probabilistic approach based on the minimization of a criterion like the sum of squared error for the parameters estimation.

• The regression method is based on the identification of the covariance matrix that depends on covariance measure, developed by Bertrand and Goupil (2000), but calculated in a different manner (when data are intervals, considering them as uniform distributions as well as when data are histograms, considering them as weighted intervals)

Page 6: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, histogram variables

• Second model: Dias and Brito (2011): a novel method for the regression of histogram valued data based on the Wasserstein distance between quantile functions.

• They proposed to expand the matrix M (containing the corresponding quantile functions) adding also the quantile functions of the symmetric distributions of the explicative symbolic variables.

Page 7: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, histogram variables

• Dias and Brito‘s basic model:

• With:

Page 8: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, histogram variables

• Third model: Irpino and Verde (2011) – „two components model“

• Cuesta-Albertos et al. (1997) showed that the (Squared) L2 Wasserstein distance can be rewritten as

• This property allows to consider the squared distance as the sum of two components, the first related to the location of „NPSD“ (numerical probabilistic (modal) symbolic data) and the second related to their variability structure.

Page 9: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Regression methods for symbolic data, histogram variables

• Each quantile function 𝑦𝑖(𝑡) can be expressed as a linear combination of the means 𝑥𝑖𝑗 and of the

centred quantile functions 𝑥𝑖𝑗𝑐(𝑡) plus an error

term 𝑒𝑖(𝑡)(which is a function) as follows:

Page 10: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Time series perspective • Though there are methodological advances within cross-sectional

symbolic data sets, from a time series perspective we are in the early stages of development.

• Arroyo, González-Rivera, and Maté (2010) and Arroyo and Maté (2009) have shown that classical algorithms, such as smoothing �lters and k-NN methods, can be adapted for HTS and ITS forecasting.

• González-Rivera and Arroyo (2010) – sample autocovariance based on histogram time series:

• Consequently, the empirical autocorrelation function of a histogram

time series with respect to the barycenter ℎ𝑐 is defined by:

𝜌𝑘 =𝛾𝑘𝛾0

Page 11: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Panel data – one-way error component models: fixed effects model

• Most general form for „classical“ variables (Baltagi, 2005):

• Broadly three main types: 1) Within estimator; 2) LSDV; 3) First differencing

• Within estimator:

Page 12: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Panel data – one-way error component models: random effects model

• Standard random effects (Feasible GLS) estimator:

• If we assume we have consistent estimators of 𝜎𝑢2 and 𝜎𝑐

2:

• Including autocorrelation: Arellano (1990, RES), Stock and Watson (2008); Vogelsang (2011)

Page 13: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Panel data – pooled regression • A more basic form:

• „Population averaged“ model – under the assumption that any latent heterogeneity has been averaged out

• If the remaining assumptions of the classical model are met (zero conditional mean of 𝑒𝑖(𝑡) , homoscedasticity, independence across observations, and strict exogeneity of 𝑥𝑖(𝑡) – in our case 𝑥𝑖𝑗 and 𝑥𝑖𝑗

𝑐(𝑡)), then least squares is the

efficient estimator

Page 14: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Short Literature Review – Modelling entrepreneurial regimes

• Entrepreneurial activities highly differ across countries (Blanchflower, 2004; Acs, Arenius, Hay and Minniti, 2005; Stel, 2005; Observatory of European SMEs, 2005a; Grilo and Irigoyen, 2005).

• Besides individual characteristics (risk tolerance, entrepreneurial culture, etc.), the level of economic development (Reynolds, Bygrave, Autio, Cox and Hay, 2002; Audretsch, Carree, Thurik and van Stel, 2005) and cultural aspects (Noorderhaven, Wennekers, Thurik and van Stel, 2004; Uhlaner and Thurik, 2005) are often mentioned as the principal drivers of entrepreneurial activities.

• The shape and development of entrepreneurship has been highly influenced by the institutional environment (Smallborne and Werter, 2001; 2006; Manolova, Eunni and Gyoshev, 2008).

• Building on work of Di Maggio and Powell (1983, 1991), North (1990) and Scott (1995) classified the formal and informal institutions impacting organisations and organisational actors into regulatory, normative and cognitive categories.

• In the context of new SMEs creation, the institutional environment with its regulatory institutions defines, supports and limits entrepreneurial opportunities and consequently entrepreneurial activities, by affecting the speed and the scope of entrepreneurship entry rate (Hwang and Powell, 2005).

Page 15: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Data used • Data used are from the Amadeus database covering 28 EU

countries, including also 5 non-EU countries: Albania, Bosnia Her., FYR of Macedonia, Montenegro and Serbia.

• We considered the last five countries to be able to estimate the full heterogeneity present in the CEE countries.

• Amadeus is a database of comparable financial information for public and private companies across Europe. It includes comprehensive information on around 21 million companies in 43 European countries.

• We use the NACE II classification of cultural and creative sectors, used in some previous studies, which classifies 58 sectors as cultural and creative, encompassing both market-based, mixed and cultural-related activities (see e.g. Söndermann, 2010).

Page 16: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Data used C18.1.1 Printing of newspapers J61.9.0 Other telecommunications activities

C18.1.2 Other printing J62.0.1 Computer programming activities

C18.1.3 Pre-press and pre-media services J62.0.2 Computer consultancy activities

C18.1.4 Binding and related services J62.0.3 Computer facilities management activities

C18.2.0 Reproduction of recorded media J62.0.9 Other information technology and computer service activities

C32.2.0 Manufacture of musical instruments J63.1.1 Data processing, hosting and related activities

G47.6.1 Retail sale of books in specialised stores J63.1.2 Web portals

G47.6.2 Retail sale of newspapers and stationery in specialised stores J63.9.1 News agency activities

G47.6.3 Retail sale of music and video recordings in specialised stores J63.9.9 Other information service activities n.e.c

G47.6.4 Retail sale of sporting equipment in specialised stores M71.1.1 Architectural activities

G47.6.5 Retail sale of games and toys in specialised stores M71.1.2 Engineering activities and related technical consultancy

J58.1.1 Book publishing M73.1.1 Advertising agencies

J58.1.2 Publishing of directories and mailing lists M73.1.2 Media representation

J58.1.3 Publishing of newspapers M73.2.0 Market research and public opinion polling

J58.1.4 Publishing of journals and periodicals M74.1.0 Specialised design activities

J58.1.9 Other publishing activities M74.2.0 Photographic activities

J58.2.1 Publishing of computer games M74.3.0 Translation and interpretation activities

J58.2.9 Other software publishing N77.2.1 Renting and leasing of recreational and sports goods

J59.1.1 Motion picture, video and television programme production activities N77.2.2 Renting of video tapes and disks

J59.1.2 Motion picture, video and television programme post-production activities R90.0.1 Performing arts

J59.1.3 Motion picture, video and television programme distribution activities R90.0.2 Support activities to performing arts

J59.1.4 Motion picture projection activities R90.0.3 Artistic creation

J59.2 Sound recording and music publishing activities R90.0.4 Operation of arts facilities

J59.2.0 Sound recording and music publishing activities R91.0.1 Library and archives activities

J60.1.0 Radio broadcasting R91.0.2 Museums activities

J60.2.0 Television programming and broadcasting activities R91.0.3 Operation of historical sites and buildings and similar visitor attractions

J61.1.0 Wired telecommunications activities R91.0.4 Botanical and zoological gardens and nature reserves activities

J61.2.0 Wireless telecommunications activities R93.2.1 Activities of amusement parks and theme parks

J61.3.0 Satellite telecommunications activities R93.2.9 Other amusement and recreation activities

Page 17: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Data used • From the existing database we extract the following variables:

- Characteristics of cultural and creative firms in individual countries: Operational revenue (OpRev); Number of employees (Empl); Level of firm capital (Capi); Level of long-term debt (Debt); Profit/Loss (difference between revenues and expenses); Total assets (TotAss); Solvency ratio (Solv; in %); Level of gross profit (GrProf); Operating Profit/Loss (PLOp); Financial Profit/Loss (PLFin);

• and add the following macroeconomic variables :

- GDP per capita (GDPpc);

- Unemployment rate (Unemp);

- Political Stability (PolStab); Government Effectiveness (GovEff); Rule of Law (RuleLaw); Control of Corruption (ContCorr) to measure institutional quality.

Page 18: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Basic descriptives

Page 19: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Basic descriptives

Page 20: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Basic descriptives

Page 21: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Basic descriptives, 2015

Page 22: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature
Page 23: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature
Page 24: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Dynamics of clusters, 2006-2015

Page 25: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Correlation table

OpRev Empl Capi Debt Profit/Loss TotAss Solv GrProf PLOp PLFin

GDPpc 0.6844 0.3867 0.7478 0.6861 0.6743 0.7846 0.3576 0.7366 0.7657 0.4654

Unemp -0.4971 -0.7051 -0.5351 -0.5256 -0.4682 -0.6282 -0.3774 -0.5630 -0.5366 -0.3693

PolStab 0.4717 0.2310 0.4329 0.4063 0.3830 0.5900 0.3187 0.4902 0.4560 0.2140

GovEff 0.6590 0.2469 0.6473 0.6480 0.6309 0.7745 0.4086 0.7176 0.6983 0.3961

RuleLaw 0.6725 0.2496 0.7050 0.7071 0.6691 0.8114 0.4347 0.7545 0.7446 0.4527

ContCorr 0.6594 0.2415 0.6205 0.6875 0.6297 0.7428 0.4453 0.6987 0.6922 0.3973

OpRev Empl Capi Debt Profit/Loss TotAss Solv GrProf PLOp PLFin

GDPpc 0.6871 0.4168 0.7437 0.6896 0.6666 0.7796 0.3803 0.7249 0.7590 0.4433

Unemp -0.5294 -0.6811 -0.5838 -0.5580 -0.5065 -0.6658 -0.3875 -0.6037 -0.5736 -0.3965

PolStab 0.4896 0.2578 0.4654 0.4316 0.4067 0.6095 0.3526 0.5044 0.4795 0.2165

GovEff 0.6569 0.2785 0.6467 0.6514 0.6234 0.7681 0.4132 0.7083 0.6975 0.3843

RuleLaw 0.6752 0.2822 0.7100 0.7134 0.6666 0.8122 0.4503 0.7526 0.7448 0.4430

ContCorr 0.6617 0.2674 0.6263 0.6924 0.6276 0.7435 0.4632 0.6935 0.6901 0.3832

Weighted correlation indices

Correlation indices - random generated weights

Page 26: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

„POLS“ Regression table

OpRev Empl Capi Debt Profit/Loss TotAss Solv GrProf PLOp PLFin

AV_Intercept 1.7562 2.1777* 2.5480* 2.5734* 3.4999* 3.4299* 4.6646* 3.8716* 5.4203* 4.5531*

AV_GDPpc 0.3168* 0.2313 0.2914* 0.3730* 0.3208* 0.2438* 0.2097 0.1384 0.0872 0.0933

AV_Unemp -0.1302 -0.2368* -0.1641 -0.1822 -0.1639 -0.1246 -0.1246 -0.0773 -0.0680 -0.0510

AV_GovEff 0.0198* 0.0158 0.0190* 0.0161 0.0185* 0.0208* 0.0233* 0.0219* 0.0157 0.0132

AV_RuleLaw 0.1430* 0.1701* 0.1055 0.1392* 0.1183* 0.0793 0.0785 0.0785 0.0510 0.0413

CEN_GDPpc 1.3278* 0.8100* 0.7776* 0.6454* 0.7938* 0.4842 0.5230* 0.4916* 0.4473 0.5815*

CEN_Unemp 0.1635 0.2060 0.2678 0.3348 0.3615 0.4809 0.3655 0.3179 0.3275 0.2522

CEN_GovEff 2.1351* 1.8575* 2.2105* 2.8957* 2.3455* 2.2986* 2.8503* 2.4227* 2.7135* 2.5235*

CEN_RuleLaw 0.0213 0.0230 0.0262 0.0218 0.0255 0.0232 0.0271 0.0336 0.0269 0.0293

n 264 264 264 264 264 264 264 264 264 264

Ω 0.6926 0.4592 0.5556 0.6056 0.4482 0.5871 0.4110 0.3740 0.4039 0.5372

RMSE_W 0.3159 0.3412 0.4470 0.4023 0.5028 0.6989 0.9785 0.7045 0.7891 0.8601

Pseudo R2 0.3212 0.2506 0.3032 0.3305 0.2446 0.3204 0.2243 0.2041 0.2204 0.2931

Irpino-Verde model

Note: * - significance at 5%

Page 27: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Robustness checks done

• Different number of bins

• Different clustering methods (hierarchical, k-means/leaders method, fuzzy c-means)

• Some different combinations of variables

• Regressions for individual years

• Billard-Diday and Dias-Brito regressions

Page 28: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Main findings

• First application of SDA in cultural economics, cultural entrepreneurship, usage of Amadeus data, and in modelling entrepreneurial regimes

• One of the first applications of „panel data“ analysis in SDA applications

• A set of 4 temporally very robust clusters

• Heterogeneity in the CEE countries

• Changes in clusters due to the crisis

• Confirmation of the strong influence of institutions and GDP to the entrepreneurial development

Page 29: Panel data estimation in regressions for symbolic data: An ...vladowiki.fmf.uni-lj.si/lib/exe/fetch.php?media=sda:pub:lj-srakar... · analysis . Structure of the presentation 1) Literature

Paths for future research

• Further development of panel data estimators in SDA regressions – fixed and random effects models

• Two components model in panel data for SDA

• Different set of variables and enlarged set of countries

• Enlarging the time period and to more appropriately model the time perspective in the data

• Usage of (additional) different datasets like GEM – Global Entrepreneurship Monitor

• Modeling (general) entrepreneurial regimes