evolution of the mashup ecosystem by copying
DESCRIPTION
Michael Weiss and Solange Sari.Evolution of the mashup ecosystem by copyingTRANSCRIPT
[email protected] Mashups 2010
Evolution of the mashup ecosystem by copying
Michael WeissSolange Sari
Technology Innovation Management (TIM)
www.carleton.ca/tim1
[email protected] Mashups 2010
Objective
• Mashups are applications that combine data and services provided through APIs with user data
• New application development model: opportunistic programming, uses a bricolage approach
• Creation of mashups supported by an ecosystem of data providers, mashup platforms, and users
• Research questions– How do mashup developers select APIs?– How do mashup developers learn to develop mashups?
2
[email protected] Mashups 2010
Relevance
• Users/platforms: can benefit from/offer tools that better support the way users work
• Directory providers: their role is to facilitate the selection of APIs and learning of developers
• Data providers: need to understand which APIs their APIs are used together most (interoperability)
3
[email protected] Mashups 2010
Previous work
• Examined structure and growth of mashup ecosystem using visualization and network analysis to identify members and their relationships
• Opportunistic programming studies how developers use online resources in problem solving
• Research on innovation: (re)combination shortens learning curve, modularity allows mix-and-match
• Models of network growth: preferential attachment
• Copying and duplication mechanisms in describing the growth of the web and biological networks
4
• As answer to research questions, we examine to what degree developers create mashups by copying other mashups: copy of the mashup “blueprint”
[email protected] Mashups 2010
Hypothesis
5
Number of copies/mashup
1 5 10 50 100 500 1000
5e-04
5e-03
5e-02
5e-01
Number of copies
Cum
ula
tive p
robabili
ty
GoogleMaps
Flickr
Flickr/GoogleMaps
YouTube
GoogleMaps/YouTube
GoogleMaps/Twitter
Amazon/GoogleMaps/YouTube
Amazon/GoogleMaps
Snapshot on 08/16/10ProgrammableWebSnapshot on 08/16/10ProgrammableWebSnapshot on 08/16/10ProgrammableWebMashups 4983 100%Not copied 1528 31%
Blueprints 341 7%
Copies of blueprints
3114 62%
Not copied
[email protected] Mashups 2010
Copying model
• Mashup ecosystem as network of mashups and APIs: a link indicates that a mashup uses an API
• Assumption: mashups all have m APIs
• Initialize network: – Create m0 ≥ m APIs, one mashup
• Grow network from t=m0 + 1 to t=N: – Add new API with probability p
– With probability 1-p, choose a mashup as a template– For each API in template, copy the API with probability α, or
choose a new API at random with probability 1-α
6
[email protected] Mashups 2010
Example
• Initial network: APIs 1 and 2, mashup 3
• Thin solid lines indicate random selection
7
3
1
2
t
t
API
Mashup
[email protected] Mashups 2010
Example
• Growth: add a new mashup (4)
• Thick solid lines indicate “copies” relationship
• Thin dashed lines indicate copying
8
3
1
2
4
t
t
Full copy
API
Mashup
[email protected] Mashups 2010
Example
• Growth: add a new API (5)
9
3
1
2
5
4
t
t
Full copy
API
Mashup
[email protected] Mashups 2010
Example
• Growth: add a new mashup (6)
• Thin solid lines indicate random selection
10
3
61
2
5
4
t
t
Full copy
Partial copy
API
Mashup
[email protected] Mashups 2010
Research method
• Calibrate simulation parameters– N: combined actual number of APIs and mashups– m = 2: good approximation of average actual APIs / mashup– p: number of APIs / N (all as of 08/16/10)
• Simulate mashup ecosystem evolution– Vary α over range 0.0 to 1.0, keep m = 2 fixed
– Run each simulation multiple times and terminate when 95% confidence interval is sufficient for the optimization
• Determine best fit of simulated distribution of mashups / API with actual using two fitting methods: sum of squared error fit, and power law fit
11
[email protected] Mashups 2010
Actual distribution
• Distribution of mashups / API follows Zipf’s law: plotting frequency of mashups relative to rank results in a line with slope close to -1 in a log-log plot
12
1 2 5 10 20 50 100 200 500
15
10
50
100
500
Rank
Nu
mb
er
of
ma
sh
up
s
ActualGoogleMaps Flickr
YouTube
-0.990
[email protected] Mashups 2010
Sum of squared error fit
• Underestimates contribution of top-ranked API
• Overestimates the number of APIs used by at least one mashup by 45% (1020 vs 703)
13
0.2 0.4 0.6 0.8
2e+06
4e+06
6e+06
8e+06
1e+07
Copying factor (!)
Su
m o
f sq
ua
red
err
or
1 2 5 10 20 50 100 200 500
15
10
50
100
500
Rank
Nu
mb
er
of
ma
sh
up
s
Actual
Simulated (sum of squared error)α = 0.798
[email protected] Mashups 2010
Power law fit
• Slightly overestimates contribution of top API
• Overestimates the number of APIs used by at least one mashup by 22% (859 vs 703)
14
1 2 5 10 20 50 100 200 500
15
10
50
100
500
Rank
Nu
mb
er
of
ma
sh
up
s
Actual
Simulated (power law)
0.2 0.4 0.6 0.8
0.0
0.5
1.0
1.5
2.0
2.5
Copying factor (!)
Po
we
r la
w c
oe
ffic
ien
t e
rro
r
α = 0.855
[email protected] Mashups 2010
Cumulative contribution of APIs
• Sum of squared error fit underestimates number of APIs that contributed to 50% of API uses
• Power law fit overestimates number of APIs that contributed to 50% of API uses
15
1 2 5 10 20 50 100 200 500
0.2
0.4
0.6
0.8
1.0
Rank
Cu
mu
lative
co
ntr
ibu
tio
n
[email protected] Mashups 2010
Discussion
• Both methods obtained their best fit for a high copying factor: this suggests that most mashups are created by modifying the an existing blueprint
• Power law fit more closely approximates actual Zipf distribution, however, sum of squared error fit offers a better match of actual degrees of APIs in midrange
16
[email protected] Mashups 2010
Insights for stakeholders
• Confirmation of practices directories follow:– List combinations of APIs into mashups– Keep track of developers of mashups– Provide tutorials on mashup development
• Directory providers should make blueprints more apparent: also list frequency of blueprints
• Users benefit as they can look at blueprints to select APIs that work well together and as examples
• API providers learn which other APIs are frequently combined with their API: incentive to interoperate
17
[email protected] Mashups 2010
Conclusion
• Results indicate that copying plays a significant role in the evolution of the mashup ecosystem
• However, we cannot rule out other factors that could explain how mashup ecosystem grows
• Copying hypothesis in line with current thinking about innovation: eg MacArthur’s Nature of Technology
• Other current and future work:– Extend simulation to include mashups of different size– Test copying hypothesis empirically: we currently examine
hereditary relationships between mashups– Examine link between copying and diversity of ecosystem
18