towards a better measure of business proximity: topic modeling for industry intelligence

101
MISQ Workshop, Leuven, Belgium, August 2015 Towards A Better Measure of Business Proximity: Topic Modeling for Industry Intelligence 1 August 13th 2015 Zhan (Michael) Shi Gene Moo Lee* Andrew B. Whinston Arizona State University University of Texas at Arlington University of Texas at Austin * presenter

Upload: gene-moo-lee

Post on 22-Jan-2018

483 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Towards A Better Measure of Business Proximity:

Topic Modeling for Industry Intelligence

1

August 13th 2015

Zhan (Michael) Shi Gene Moo Lee* Andrew B. Whinston

Arizona State University

University of Texasat Arlington

University of Texasat Austin

* presenter

Page 2: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 2

Business proximity: motivation

Page 3: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 2

Business proximity: motivation• To measure firms’ dyadic relatedness in spaces of product, market, and

technology

• Essential in competitive/industry intelligence

• Building block in strategy/industrial organization fields

Page 4: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 2

Business proximity: motivation• To measure firms’ dyadic relatedness in spaces of product, market, and

technology

• Essential in competitive/industry intelligence

• Building block in strategy/industrial organization fields

• Existing methods

• Common industry membership (Wang and Zajac 2007)

• Patent holdings (Stuart 1998, Mowery et al. 1998)

• Geographic distance (Mitsuhashi and Greve 2009)

Page 5: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 2

Business proximity: motivation• To measure firms’ dyadic relatedness in spaces of product, market, and

technology

• Essential in competitive/industry intelligence

• Building block in strategy/industrial organization fields

• Existing methods

• Common industry membership (Wang and Zajac 2007)

• Patent holdings (Stuart 1998, Mowery et al. 1998)

• Geographic distance (Mitsuhashi and Greve 2009)

• These approaches have strong data requirement

• Typically scarce for early stage high-tech startups

Page 6: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach

Page 7: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

Page 8: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

Page 9: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

Page 10: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

Page 11: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

• Outperforming existing approaches

Page 12: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

• Outperforming existing approaches

• Automatic processing (vs. manual inspection)

Page 13: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

• Outperforming existing approaches

• Automatic processing (vs. manual inspection)

• Dynamic industry definition (vs. static)

Page 14: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

• Outperforming existing approaches

• Automatic processing (vs. manual inspection)

• Dynamic industry definition (vs. static)

• Finer granularity (vs. discrete)

Page 15: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 3

Our Big Data approach• Our approach: a unified framework that integrates

• Machine learning (LDA topic model)

• Statistical network model (ERGM)

• Big Data technologies (Cloud, NoSQL, Condor)

• Outperforming existing approaches

• Automatic processing (vs. manual inspection)

• Dynamic industry definition (vs. static)

• Finer granularity (vs. discrete)

• Relaxed data requirement (vs. patent, location)

Page 16: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 4

Main contributions

Page 17: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 4

Main contributions

1. Propose a transformative data-analytic framework for understanding dynamic startup landscape

Page 18: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 4

Main contributions

1. Propose a transformative data-analytic framework for understanding dynamic startup landscape

2. Construct an explicit network structure for understanding firm interactions

Page 19: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 4

Main contributions

1. Propose a transformative data-analytic framework for understanding dynamic startup landscape

2. Construct an explicit network structure for understanding firm interactions

3. Implement a BI for competitive intelligence in U.S. high-tech industry

Page 20: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 5

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 21: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 6

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 22: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

CrunchBase data

7

• CrunchBase: open database (“Wikipedia”) of high-tech industry

• Data collection time: April 2013 ~ April 2015

• 24,382 U.S. high-tech companies (1.4% public, 5.7 years old)

• HQ location, CB-defined industry sector, key personnels, M&A, investments, business summary

• States: CA, NY, MA, TX (stats page)

• Industries: software, web, e-commerce, ad, mobile

Page 23: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Data: networked business

8

Page 24: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Data: networked business

8

• M&A: 1689 total

• cross-state: 62.6%

• cross-sector: 63.6%

• top 10 buyers: 14.3% (skewed)

Page 25: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Data: networked business

8

• M&A: 1689 total

• cross-state: 62.6%

• cross-sector: 63.6%

• top 10 buyers: 14.3% (skewed)

• Investments: 531 total

Page 26: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Data: networked business

8

• M&A: 1689 total

• cross-state: 62.6%

• cross-sector: 63.6%

• top 10 buyers: 14.3% (skewed)

• Investments: 531 total

• Job mobility: 19K total

Page 27: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 9

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 28: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

10

Page 29: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

10

Page 30: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

10

Page 31: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

• unsupervised learning to discover latent “topics” from a large collection of documents

10

Page 32: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

• unsupervised learning to discover latent “topics” from a large collection of documents

10

24K company descriptions

Page 33: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

• unsupervised learning to discover latent “topics” from a large collection of documents

10

LDA24K company descriptions

Page 34: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

• unsupervised learning to discover latent “topics” from a large collection of documents

10

LDA

Industry-wide topics

24K company descriptions

Page 35: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our approach on business proximity

• Objectives: data-driven, scalability, finer granularity, little data requirements

• Approach: topic modeling [Blei et al. 2003]

• unsupervised learning to discover latent “topics” from a large collection of documents

10

LDA

Industry-wide topics

Company’s topics

24K company descriptions

Page 36: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Business proximity from topic model

• Business proximity pb(i,j) between firms i and j

• Cosine similarity of topic vectors Ti and Tj

• Range: 0 (no commonality) ~ 1 (same business components)

11

Page 37: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Business topic modelPer-word

business topic assignment

Observedbusiness

descriptions

Businesstopics

Per-firmbusiness

topics distrib.

Topic parameter

Proportions parameter

K: # topicsD: # companiesN: # words

Page 38: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

LDA topic model with CrunchBase

13

Click here for the complete list of 50 topics

Video/music

Energy

Sports

Healthcare

Page 39: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 14

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 40: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Validation: compare with existent method

15

• Baseline: category match (industry co-membership)

• 0 (different industry): 0.06

• 1 (same industry): 0.12

• Pearson corr. coef. between bizprox and category match = 0.11 (t-stat 61.94, p-val < 2.2e-16)

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

0.00

0.25

0.50

0.75

1.00

0 1category_match

busi

ness

pro

xim

ity ([

0, 1

])

Page 41: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Validation: Leading effect on business networks

16

• mean(business proximity)

• 0.293 (394 M&A pairs)

• 0.224 (129 invests pairs)

• 0.218 (9792 job mobility pairs)

• 0.068 (random pairs)

●●●

●●

●●●●●●●●●●●●

●●●

●●

●●●

●●

●●

●●●●●●●●●●●

●●

●●●●●●

●●

●●●

●●●●

●●

●●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●

●●●●●

●●

●●●●●●

●●

●●●

●●

●●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●

●●●●

●●

●●●●●●●

●●●

●●●●●●●

●●

●●

●●

●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

0.00

0.25

0.50

0.75

1.00

M&A invest jobmob randomgroup

busi

ness

pro

xim

ity ([

0, 1

])

Page 42: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 17

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 43: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Analysis on high-tech M&A network

18

Page 44: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Analysis on high-tech M&A network

• Objective: examine the relationship between likelihood of M&A matching and nodal/dyadic characteristics

• Nodal attributes: state, industry, previous M&A

• Dyadic proximities: business, geographic, investment, social

18

Page 45: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Analysis on high-tech M&A network

• Objective: examine the relationship between likelihood of M&A matching and nodal/dyadic characteristics

• Nodal attributes: state, industry, previous M&A

• Dyadic proximities: business, geographic, investment, social

• Challenges: recognize networked business environment

• Model all M&A deals as a network/graph

• Use statistical network model: ERGM or p* model

18

Page 46: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

19

degree selective mixing proximity

Page 47: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

ERGM (Exponential Random Graph Model):

19

degree selective mixing proximity

Page 48: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

ERGM (Exponential Random Graph Model): • Probability of realizing a graph = a function of the

graph’s statistics [Robins et al. 2007]

19

degree selective mixing proximity

Page 49: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

ERGM (Exponential Random Graph Model): • Probability of realizing a graph = a function of the

graph’s statistics [Robins et al. 2007]• Inter-firm proximity: business, geo, social, invest

19

degree selective mixing proximity

Page 50: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

ERGM (Exponential Random Graph Model): • Probability of realizing a graph = a function of the

graph’s statistics [Robins et al. 2007]• Inter-firm proximity: business, geo, social, invest• Selective mixing: 50 states, 30 industry sectors

19

degree selective mixing proximity

Page 51: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for M&A network

ERGM (Exponential Random Graph Model): • Probability of realizing a graph = a function of the

graph’s statistics [Robins et al. 2007]• Inter-firm proximity: business, geo, social, invest• Selective mixing: 50 states, 30 industry sectors• Degree distribution: node degree, M&A experiences

19

degree selective mixing proximity

Page 52: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Estimation setup

20

Page 53: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Estimation setup• Dataset

• US companies founded from 2008 to 2012: |V| = 24,382

• All dyadic/nodal attributes collected in April 2013

• M&A transactions (April 2013~April 2015): |E| = 394

20

Page 54: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Estimation setup• Dataset

• US companies founded from 2008 to 2012: |V| = 24,382

• All dyadic/nodal attributes collected in April 2013

• M&A transactions (April 2013~April 2015): |E| = 394

• Estimate our ERGM M&A model

• Randomly sample 25% companies for computational feasibility

• Run 100 condor jobs with different samples

• Estimate model coefficients by Markov chain Monte Carlo (MCMC) maximum likelihood estimation (MLE)

20

Page 55: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM results from a sample

21

Page 56: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: M&A & proximity

22

+++

Page 57: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: M&A & proximity

22

• Proximities are normalized for comparison

+++

Page 58: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: M&A & proximity

22

• Proximities are normalized for comparison• 1.0 std increase in business proximity

+++

Page 59: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: M&A & proximity

22

• Proximities are normalized for comparison• 1.0 std increase in business proximity

= 3.64 std increase in social proximity

+++

Page 60: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: M&A & proximity

22

• Proximities are normalized for comparison• 1.0 std increase in business proximity

= 3.64 std increase in social proximity= 6.89 std increase in investment proximity

+++

Page 61: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: complementarity

23

+++-

Page 62: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: complementarity

23

• Original term (+) / Squared term (-) -> reverse U-curve

+++-

Page 63: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: complementarity

23

• Original term (+) / Squared term (-) -> reverse U-curve• Interpretation: M&A transactions between two firms that

have complementarity but not substitutes

+++-

Page 64: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Empirical results: complementarity

23

• Original term (+) / Squared term (-) -> reverse U-curve• Interpretation: M&A transactions between two firms that

have complementarity but not substitutes• Can find this nonmonotonic relation because our

proximity has (1) comprehensiveness and (2) continuity

+++-

Page 65: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 24

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 66: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Transformative platform for M&A market

25

Page 67: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Transformative platform for M&A market• High-profile M&A involving startups (reference)

• Search for “fitted” startups in huge venture universe

25

Page 68: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Transformative platform for M&A market• High-profile M&A involving startups (reference)

• Search for “fitted” startups in huge venture universe

• “Data-driven” platform for M&A matching and startup search

1. M&A executives to find M&A targets

2. entrepreneurs to position products

3. VCs to monitor niche markets

4. Analysts to examine the industry trends

25

Page 69: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Transformative platform for M&A market• High-profile M&A involving startups (reference)

• Search for “fitted” startups in huge venture universe

• “Data-driven” platform for M&A matching and startup search

1. M&A executives to find M&A targets

2. entrepreneurs to position products

3. VCs to monitor niche markets

4. Analysts to examine the industry trends

• Implemented a cloud-based IS based on proposed business proximity

25

Page 70: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Cloud-based platform design

26

Front end

Back end

Data collector (Python)

Raw data (MongoDB)

Industry data (Crunchbase, etc.)

Topic model builder(Scala)

Processed data (MongoDB)

Webpages(HTML/CSS, Javascript)

Company meta info

(JSON)

Business proximity (JSON)

UsersSearch

Company infoCloud DB

(Google Datastore, Google Cloud Storage)

API Engine(Google App

Engine)

Big Data and Cloud technologies: Cronjob, NoSQL, Python, Scala, Condor, Google Cloud (Storage, App Engine, Datastore) and more

Page 71: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Algorithm: Find N nearest competitors

27

• Goal: find nearest competitors based on business proximity

• Idea: • instead of exhaustive

comparison: O(n2)• leverage sparseness

of topic distributions• minimize # of

comparisons

Page 72: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Algorithm: accuracy and speed

28

● ●●

●●

90.0

92.5

95.0

97.5

100.0

th=0.00 th=0.10 th=0.15 th=0.20 th=0.30algorithms

Accu

racy

(%) variable

top10top20top30top50top100

0

100

200

300

brute−force fast(th=0.0) fast(th=0.1) fast(th=0.15) fast(th=0.20) fast(th=0.30)algorithms

Num

ber o

f com

paris

ons

(milli

on)

• Our algorithm can detect 50 nearest competitors • with 92.5% accuracy using only 3% of calculations• with 100% accuracy using 36% of calculations

Page 73: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Platform UI: Find competitors

29

Page 74: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Search firms by business components

30

Page 75: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Search firms by business components

31

Page 76: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 32

Roadmap1. CrunchBase Data

2. Data-Analytics based Business Proximity

3. Empirical Validation

4. Empirical Application on M&A Analysis

5. Industry Intelligence System

6. Conclusion and implication

Page 77: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Summary

33

Page 78: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Summary1. Proposed new machine learning-based business

proximity measures for industry intelligence

• Empirical validation with high-tech industry data

33

Page 79: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Summary1. Proposed new machine learning-based business

proximity measures for industry intelligence

• Empirical validation with high-tech industry data

2. Built and estimated a statistical network model to understand M&A in high-tech industry

33

Page 80: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Summary1. Proposed new machine learning-based business

proximity measures for industry intelligence

• Empirical validation with high-tech industry data

2. Built and estimated a statistical network model to understand M&A in high-tech industry

3. Developed a prototype platform for industry intelligence

33

Page 81: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Implications

34

Page 82: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Implications1. For managers:

• Demonstrate the value of organizing unstructured data

• Understand the trade-off between old and new business proximities

• Provide a practical BI implementation based on business proximity

34

Page 83: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Implications1. For managers:

• Demonstrate the value of organizing unstructured data

• Understand the trade-off between old and new business proximities

• Provide a practical BI implementation based on business proximity

2. For researchers:

• Provide alternative/complementary approach for industry structure

• Show evidence on non-monotone relationship between business proximity and M&A matching

• Statistical network model to capture “networked” business

34

Page 84: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Future directions

35

Page 85: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Future directions1. Extending M&A analytics

1. Dynamic network model; Predictive analytics

2. Incorporate other data sources (e.g., firm size, patent, profitability, liquidity)

35

Page 86: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Future directions1. Extending M&A analytics

1. Dynamic network model; Predictive analytics

2. Incorporate other data sources (e.g., firm size, patent, profitability, liquidity)

2. Extending business proximity and topic model

1. Endogenize number of topics (Hierarchical Dirichlet process)

2. Topic model with word dependencies (Poisson Markov Random field)

35

Page 87: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Thank you!

36

Page 88: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Backup slides

37

Page 89: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Data: states and industries

38

AKSDWVWYNDMSMTNMARHIIDMEVTALIARIOKNELAKSDESCKYNHINWITNDCMONVCTUTMNMIORMDOHAZNCVACONJGAPAIL

WAFLTXMANYCA

0 2000 4000 6000 8000count

state

legalsemiconductor

securityeducation

searchcleantech

network_hostinghardware

public_relationsbiotech

enterpriseconsulting

games_videomobile

advertisingecommerce

otherweb

software

0 1000 2000 3000 4000count

industry

Back to main

Page 90: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Full topic model from CrunchBase

39

Back to main

Page 91: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Other existing proximity measures

40

• Common ownership • VCs • angels • institutions

• Count shared investors of two firms

• Geographic distance • lat, long • city • state

• Use great circle distance of two coord.

• Social linkage • board members • executives • developers

• Count common people in two firms

Back to Main

Page 92: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

M&A and proximity measures

41

• Measure the distances of company pairs • M&A matched pairs (red) • Random pairs (green)

• M&A pairs have closer business/geo proximity values

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Empirical CDF

busi

ness

pro

xim

ity ([

0, 1

])

groupM&Arandom

0

1000

2000

3000

4000

5000

0.00 0.25 0.50 0.75 1.00Empirical CDF

geog

raph

ic d

ista

nce

(km

)

groupM&Arandom

Back to Main

Page 93: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM for statistical network model• Exponential Random Graph Model (ERGM)

• Erdos and Renyi (1959), Newman (2003)

• Explain an observed graph based on node/edge properties

• Want to estimate that maximizes Pr(Y=y)

42

where • z_k(y) = a certain property of the graph y • theta_k = parameter for k-th statistic (want to estimate this) • Psi = normalization constant (require exponential computation) • K = # of statistics we are interested in

Back to Main

Page 94: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Why ERGM?

• M&A deals are interdependent

• Conventional statistical models (logit, probit) assume independency: treat each M&A deal separately

• Approach: use statistical network model “Exponential Random Graph Model (ERGM)”

43

photo photo

photo

video blog

face recognition

Back to Main

Page 95: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

ERGM 101• Let Y = <V, E> be an M&A graph, where

• V is the set of companies (nodes)

• E is the set of M&A transactions (undirected edges)

44

Want to explain an observed graph Y with statistics on E and V. Some notations before moving on...

Back to Main

Page 96: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Our M&A model (conditional form)

45

Back to Main

Page 97: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

M&A model notations

46

Back to Main

Page 98: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Density and node degree

Degree > 2 coefficient is positive

Power law is observed

Edge coefficient is a constant for the model

47

degree selective mixing proximity

Back to main

Page 99: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Selective mixing: state locations

• Selective mixing holds for state locations

• CA, MA, NJ, NY, TX, WA

48Back to main

Page 100: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015

Selective mixing: industry sectors

● Selective mixing holds for industry sectors, but it is coarse grained

● Proposed business proximity provides even finer grained measures49

degree selective mixing proximity

Back to main

Page 101: Towards a better measure of business proximity: Topic modeling for industry intelligence

MISQ Workshop, Leuven, Belgium, August 2015 50

M&A in high-tech industryBack to main