project descriptionaldous/unpub/project_description.pdf · large social networks. [5] gave a...

18
PROJECT DESCRIPTION 1 Results from prior N.S.F. support: Intellectual merit as- pects (N.S.F. Grant DMS-1106998, $270,000 funded 5/01/12 - 4/30/15, titled “Large Random Networks and Information Exchange Processes”). Papers 2–14 are new since the previous proposal. The two major topics will be described here; a third is mentioned briefly in section 2.2. 1.1 Scale-invariant random spatial networks Real-world road networks have an approximate scale-invariance property; can one devise mathematical models of random networks whose distributions are exactly invariant under Euclidean scaling? This requires working in the continuum plane. In [9, 8] proposer introduced an axiomatization of a class of processes called “scale-invariant random spatial networks (SIRSNs)”, whose primitives are routes between each pair of points in the plane. One concrete model, based on minimum-time routes in a binary hierarchy of roads with different speed limits, was shown to satisfy the axioms, and informally one expects that two other constructions (based on Poisson line processes and on dynamic proximity graphs) should satisfy the axioms. These papers initiated study of structure theory and summary statistics for general processes in this class. Wilfrid Kendall [18] subsequently analysed the Poisson line process construction. Sec- tion 3.4 describes one starting point for future analysis of dynamic proximity graphs, and section 4.3 describes research problems in topological graph theory arising from this view- point. 1.2 Finite Markov Information-Exchange processes Take an irreducible symmetric n × n matrix with non-negative off-diagonal entries ν ij . This can classically be used to define a continuous-time Markov chain with uniform stationary distribution. But it can also be used as a basis for a class (FMIE) of processes, in which we envisage n “agents”, agent i has some “information” x i (t) ∈X at time t, each pair of agents {i, j } meets at times of a Poisson (rate ν ij ) process, at which time they update their information according to some deterministic or random update rule, a function F : X×X→X×X giving the updated states. This class of models overlaps substantially with interacting particle systems but with a somewhat different emphasis, for instance envisaging large social networks. [5] gave a lengthy survey from this viewpoint. [12] studied the simple averaging model: when agents i and j meet, each replaces their (real-valued) information by the average (x i (t)+ x j (t))/2. [2] studied the (conceptually opposite) compulsive gambler process, in which meeting agents play an instantaneous fair game in which one wins the other’s money. Graduate student Dan Lanoue has given a detailed treatment of the metric coalescent [21], which is the compulsive gambler process with agents positioned in a metric

Upload: others

Post on 19-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • PROJECT DESCRIPTION

    1 Results from prior N.S.F. support: Intellectual merit as-pects

    (N.S.F. Grant DMS-1106998, $270,000 funded 5/01/12 - 4/30/15, titled “Large RandomNetworks and Information Exchange Processes”). Papers 2–14 are new since the previousproposal. The two major topics will be described here; a third is mentioned briefly insection 2.2.

    1.1 Scale-invariant random spatial networks

    Real-world road networks have an approximate scale-invariance property; can one devisemathematical models of random networks whose distributions are exactly invariant underEuclidean scaling? This requires working in the continuum plane. In [9, 8] proposerintroduced an axiomatization of a class of processes called “scale-invariant random spatialnetworks (SIRSNs)”, whose primitives are routes between each pair of points in the plane.One concrete model, based on minimum-time routes in a binary hierarchy of roads withdifferent speed limits, was shown to satisfy the axioms, and informally one expects thattwo other constructions (based on Poisson line processes and on dynamic proximity graphs)should satisfy the axioms. These papers initiated study of structure theory and summarystatistics for general processes in this class.

    Wilfrid Kendall [18] subsequently analysed the Poisson line process construction. Sec-tion 3.4 describes one starting point for future analysis of dynamic proximity graphs, andsection 4.3 describes research problems in topological graph theory arising from this view-point.

    1.2 Finite Markov Information-Exchange processes

    Take an irreducible symmetric n×n matrix with non-negative off-diagonal entries νij . Thiscan classically be used to define a continuous-time Markov chain with uniform stationarydistribution. But it can also be used as a basis for a class (FMIE) of processes, in whichwe envisage n “agents”, agent i has some “information” xi(t) ∈ X at time t, each pairof agents {i, j} meets at times of a Poisson (rate νij) process, at which time they updatetheir information according to some deterministic or random update rule, a function F :X ×X → X ×X giving the updated states. This class of models overlaps substantially withinteracting particle systems but with a somewhat different emphasis, for instance envisaginglarge social networks. [5] gave a lengthy survey from this viewpoint. [12] studied the simpleaveraging model: when agents i and j meet, each replaces their (real-valued) informationby the average (xi(t)+xj(t))/2. [2] studied the (conceptually opposite) compulsive gamblerprocess, in which meeting agents play an instantaneous fair game in which one wins theother’s money. Graduate student Dan Lanoue has given a detailed treatment of the metriccoalescent [21], which is the compulsive gambler process with agents positioned in a metric

  • space S and with unit total money. Here the state space may be reinterpreted as the spaceof finite-support probability distributions on S, and the “entrance boundary” issue is toshow that the process makes sense started from a non-atomic distribution. Lanoue alsostudied the iPod model [20] in which agents have preferences over a set of “songs” and uponmeeting update their own preferences incrementally towards those of the other agents theymeet.

    A variety of other models are under consideration for future study, but will not beemphasized in this proposal.

    2 Results from prior N.S.F. support: Broader impact

    2.1 Technical research

    The rather fuzzily-defined academic discipline of network science is rapidly growing inprominence; a goal underlying the FMIE research is to expose techniques, results andproblems from relevant mathematical topics (interacting particle systems, Markov chainmixing times, etc) to the broad network science community.

    Most aspects of the natural or engineered world have an organized complexity whichis not usefully represented by simple (few parameters) probability models. So instead ofinventing and analyzing unrealistic specific models, an alternative paradigm is to considerclasses of processes, and focus on numerical statistics that relate to theoretical aspects of thebehavior of the process but can also be estimated from data. Modeling natural language asa stationary process and invoking the statistic “entropy rate”, or modeling physical featuressuch as coastlines as some fractal process and invoking the statistic “fractal dimension”,are iconic examples. Proposer’s work on SIRSNs provides a more sophisticated exampleof this paradigm. Associated with a particular SIRSN model are two obvious statistics(mean route length relative to straight line length; mean network length per unit area),but there is a less obvious “density of major roads” statistic that emerges without any priorformalism of “major road”.

    2.2 Contributions to education and resources

    Proposer is putting substantial effort into a not-easily-categorized project, which (in 10words) is intended

    to articulate what mathematical probability says about the real world.

    Tangible aspects, described on proposer’s web site, are

    • Developing an upper division course, emphasizing student projects which test predic-tions of theoretical models against new real data.

    • Advertizing and supervising undergraduate research projects.

    • Giving popular talks to general audiences on the topic, and a talk to young academics(in mathematical probability) on how to teach such a course.

  • • Developing content for a web site, intended for casual browsing, and as a more id-iosyncratic complement to two existing web sites: Understanding Uncertainty andChance News.

    • An extensive collection (100+) of reviews of non-technical books relating to Proba-bility.

    In the undergraduate course proposer gives 20 lectures on maximally different topics,and ideally each lecture is “anchored” by some interesting new data, preferably of thekind that students could obtain themselves. In seeking such data the main goal is toillustrate standard theory; however, looking at data sometimes suggests new directions fortheory research. For instance, material for a lecture on prediction markets and martingales(extended write-up in [6]) suggested research problems solved in research paper [14]. In amore speculative direction, one topic (the great filter) in the “science fiction meets science”lecture motivated a minor research paper [4]. Section 4 describes three ongoing researchprojects suggested by thinking about data.

    3 Proposed research: Projects from theory

    The proposed research falls into two opposite styles. Section 3 describes four projectsrepresenting continuing development of technical mathematical probability theory. Section4 describes three projects motivated by novel real world data.

    3.1 A compactification of finite reversible Markov chains

    We start with an analogy. A measured metric space (MMS) (S, d, µ) is a (complete sep-arable) metric space (S, d) equipped with a probability measure µ. Two such spaces areequivalent if there is a measure-preserving isometry between them. Given such a space,define an infinite random array A = A(S, d, µ) as follows. Take independent S-valuedµ-distributed random variables (ξi, 1 ≤ i

  • natural metric, the Gromov-Prohorov metric, on the space of all MMSs. See [16] for adetailed account of this theory. Note that by working with equivalence classes we aredealing with convergence “up to relabeling”.

    In the research project the objects of study are triples (S,Q, µ), where µ is a proba-bility measure on a space S and Q is a generator for an S-valued continuous-time Markovprocess which is reversible with stationary distribution µ and which for t > 0 has tran-sition densities p(x, y; t) = p(y, x; t) with respect to µ. Writing p(x, y) for the functiont → p(x, y; t), this set-up is somewhat analogous to the MMS set-up: instead of the real-valued function d(x, y) satisfying the triangle inequality we have a function-valued functionp(x, y) satisfying the Chapman-Kolmogorov equations. So, following the analogy, let usdefine (S(n), Q(n), µ(n))→ (S(∞), Q(∞), µ(∞)) to mean

    A(S(n), Q(n), µ(n))d→ A(S(∞), Q(∞), µ(∞)) (2)

    in the usual product topology, where now A(·) defined via

    Aij = p(ξi, ξj), 1 ≤ i, j

  • analyzing particular cases. The problem below arose as an effort to start some generaltheory.

    Suppose we have a result of the kind

    (*) provided i and j are not close, then Tij is close to its mean.

    Then the random set Si(t) of vertices within distance t from vertex i is approximately thedeterministic set {j : ETij ≤ t}. This can be viewed as a rough analog of the classical“shape theorem” [19] for IID FPP on Rd. So, when is (*) true?

    A precise conjecture is formulated below. The graph weights (we) and other quantitiesi, j, . . . below depend on n (not shown in notation) and limits are as n→∞. We formalizethe desired property (*) as

    TijETij

    →p 1. (3)

    The obvious obstacle to (3) is that there may be a set A (with i ∈ A, j 6∈ A) such that, inthe random minimum-length route πij , the time taken to traverse the edge A→ Ac is noto(ETij). One could conjecture this is the only obstacle; here is a slightly weaker conjecture,whose precise form arose from conversations with Jian Ding.

    Conjecture 1 Ifmax{ξe : e an edge in πij}

    ETij→p 0 (4)

    then (3) holds.

    It is intuitively clear (and not hard to prove) that condition (4) is necessary. Proving theconjecture would hardly be a definitive result (because condition (4) is not a conditiondirectly on w) but would represent a start on understanding when (3) holds.

    3.3 Continuum limits of the finite FPP model

    In the setting of the previous section, using an n-vertex graph with edge-weights w = (we)to construct independent Exponential(rate we) edge-lengths, it is clear that

    d(i, j) := ETij

    defines a metric d on the vertex-set, which we may take to be [n] := {1, . . . , n}. Calla metric admissible if it arises in this way from some w. It is easy to see that not allmetrics are admissible, and we conjecture that, for general n, there is no simple explicitcharacterization of the set of admissible metrics.

    But might matters be simpler in some n → ∞ limit? The topic in section 3.1 was aninstance of what proposer calls “the method of induced exchangeable substructures” [3],the best known example being the notion of graphons as limits of dense graphs [23]. Can weapply this methodology in the present context? The point is that a given weighted graph,with vertex-set [n] := {1, . . . , n} say, defines a finite MMS (recall section 3.1) ([n],d, µn)

  • where µn is just the uniform distribution on [n]. So consider a sequence of weighted graphs,and suppose condition (*) from section 3.2 holds for almost all choices of (in, jn). This saysthat the random percolation times Tnij are deterministic to first order. It now makes senseto consider convergence

    ([n],d, µn)→ (S∞,d∞, µ∞) (5)

    to some general MMS limit, convergence of MMSs being in the sense of (1).As usual in this methodology, two basic questions now arise. What is the compactness

    criterion for weighted graphs – when must there be a convergent subsequence? And whatMMSs can arise as limits of finite weighted graphs? The proposed research is to start toaddress these questions. Without checking details, it may be easy to show

    if a measured metric space (S, d, µ) has the propertythere is a geodesic between s1 and s2, for µ× µ-almost-all (s1, s2)then it is realizable as a limit of finite weighted graphs.

    It remains to be seen what else can be proved.

    3.4 The greedy dynamic random tree in the plane

    Minor variants of this process have apparently been considered independently by severalpeople [24] but with no published analysis. Fix k ≥ 2 distinct points z1, . . . , zk of the unitsquare, and imagine point zi has color i from a palette of k colors. Take i.i.d. uniformpoints Uk+1, Uk+2, . . . in the unit square, and inductively, for j ≥ k + 1,

    give point Uj the color of the closest point to Uj amongst U1, . . . , Uj−1

    where we interpret Ui = zi, 1 ≤ i ≤ k. See Figure 1.

    Figure 1. The random tree construction.

    It is seems intuitively clear that the colored regions converge (in some sense) to a randompartition of the square, and simulations suggest that the boundaries between regions in thelimit have some fractal dimension in (1, 2).

  • One can reformulate the construction in terms of a space-time Poisson process of ar-riving points in the infinite plane. Our motivation is as the simplest case of a conjecturedconstruction scheme for SIRSN models (as in section 1.1). We conjectured [8] that essen-tially any rule for creating a network by linking arriving points to some nearby existingpoints via some scale-invariant rule (e.g. link to the 3 nearest points) will lead in the limitto a SIRSN – the key issue is to prove that in the limit we can specify an a.s. uniqueroute between pairs (z1, z2) of points in the plane. In the present model we have a naturaltree structure to specify routes, so that issue disappears. The technical issues reduce toshowing that limit regions exist and that their boundaries have zero area. Following anatural “genealogical path analysis” approach, preliminary work shows that the length Lof the route between points at Euclidean distance r satisfies a tail bound

    P (L/r > `) = O(`−α) for some α > 0.

    This gives a form of convergence of the regions – based on weak convergence of empiricaldistributions of colored points – that is weaker than the form we really want – based onHausdorff convergence of the Voronoi-like colored regions as closed sets. The proposed re-search is to continue this analysis by seeking more sophisticated percolation-style estimates.

    4 Proposed research: Projects from data

    4.1 Nash equilibria in an online casual game which people actually play

    A first course in game theory is typically taught purely as theory: suppose Alice and Bob. . . . . . . At the research level, there have been many experiments using volunteers (typicallyone’s students) to play games, such as Prisoners’ Dilemma or the Ultimatum Game, thatexplicitly fit the mathematical game theory framework. But people’s behavior in suchstaged tournaments is not necessarily representative of behavior in the other aspects of lifeto which it is often claimed that game theory can be applied – aspects as varied as economicbehavior, political or military conflict, or evolutionary biology. The most interesting datathat proposer can find (to “anchor” an undergraduate lecture) is also from a game, butrather different in that the “game theory” aspect is a relatively minor part of the wholegame and that typical players self-descriptions (“age 63, retired nurse: interests church,crafts, grandkids”) suggest they are not students of game theory.

    Here is a mathematical description of the “game theory” part (with italicized commentson actual play).

    • There are M items of somewhat different known values, say b1 ≥ b2 ≥ . . . bM (alwaysM = 5, but the values vary).

    • There are N players (N varies but 5− 12 is typical).

    • A player can make a sealed bid for (only) one item, during a window of time (20seconds).

  • • During the time window, players see how many bids have already been placed oneach item, but do not see the bid amounts.

    Of course when time expires each item is awarded to the highest bidder on that item. Weassume players are seeking to maximize their expected gain. So a player has to decide threethings; when to bid, which item to bid on, and how much to bid.

    The actual game is pogo.com’s Dice City Roller, and as I write (12.30pm PDT on aWednesday) there are about 300 players, in “rooms” of typically 5 - 12 players, and byobserving play one can obtain an unlimited quantity of data about how people actuallyplay. What can we do with this data?

    This is exactly the type of game where (theory says) one might expect players to adjuststrategies to converge toward a unique Nash equilibrium. But do they? In the analogousone-stage model (that is, without the time window: players make sealed bids withoutany information about other players’ bids) it is trivial to write the equations determiningthe Nash equilibrium, but perhaps surprising that one can solve them explicitly in anundergraduate lecture. At equilibrium, expected gain per player is (under a minor non-degeneracy condition)

    c :=

    (M − 1∑i b−1/(N−1)i

    )N−1and the (sub-probability) density function for amount bid on item i is (at equilibrium)

    fi(x) :=1

    N−1c1/(N−1)(bi − x)−N/(N−1), 0 ≤ x ≤ bi − c.

    In seeking to compare this with data, recall the actual game has a time element, andvarious strategies suggest themselves (and are observed): bid late on an item that few orno others have bid on, or bid early on a valuable item to discourage others from biddingon it. Consequently the observed number of bids on a given item has a more concentrateddistribution than the one-stage model theory predicts. Preliminary data analysis indicatesthat, after adjusting for this effect, the distribution of winning bid amounts is indeed fairlyclose to what Nash equilibrium theory predicts using the one-stage model above.

    This is as far as one would wish to go in an undergraduate lecture. The researchproject is to understand theory in some model incorporating the time element. A complete“continuous time” analysis seems infeasible because of huge variety of possible strategies.But for a discrete time two-stage model (bid or wait in first stage) one can indeed writedown the equations defining the equilibrium, and we hope that numerical solution of thesimplest interesting case (N = 3,M = 2) will cast some light on the general case.

    4.2 A spatial queue

    Proposer has long wished to exploit time spent in line at coffee shops to get data to“anchor” a lecture on queueing theory, but alas can find no remotely interesting data-theory connection. Nor did data from a student who spent a day observing the line at aDMV office seem to relate to anything in queuing theory. However, spending 17 minutesin line at security at Oakland airport enabled collection of the data in Figure 2.

  • Imagine you are the 100th person in line at an airport security checkpoint. As peoplereach the front of the line they are being processed fairly regularly. But you move lessfrequently, and when you do move, you typically move several units of distance, where 1unit distance is the average distance between successive people standing in the line. Figure2 shows data: times and positions of proposer’s moves at Oakland airport.

    20 40 60 80 100 120 140 160

    2

    4

    6

    8

    10

    12

    14

    16

    rank in line

    time(min.)

    Figure 2. Data: progress in a long queue.

    This phenomenon is easy to understand qualitatively. Take the time unit as the averagetime for the head person to be processed. When the head person leaves the checkpoint, thenext person moves up to the front, the person behind moves, and so on, but this “wave”of motion often does not extend through the entire long line; instead, some person willmove only a short distance, and the person behind will decide not to move at all. Onecan argue informally that, when you are around the k’th position in line, there must besome number a(k) representing both the average time between your moves and the averagedistance you do move – these are equal because you are moving forwards at average speed1. This immediately suggests the question of how fast a(k) grows with k. We will describea stochastic model in which (from convincing heuristics and simulation) a(k) grows as orderk1/2; the research project is to actually prove this.

    In classical queueing theory, randomness enters via assumed randomness of arrival andservice times. Ironically, even though we are modeling a literal queue, randomness in ourmodel arises in a quite different way, via each customer’s choice of exactly how far behindthe preceding customer they choose to stand, after each move. That is, we assume that“how far behind” is chosen from a given probability density function f(x) on an interval[0, c]. In brief, the model is

    when the person in front of you moves forward to a new position, then you moveto a new position at a random distance (chosen from density f) behind them,

  • unless their new position is less than distance c in front of your old position, inwhich case you don’t move.

    Figure 3 shows a realization of the model. Time increases upwards, and the head of thequeue is on the left. The • indicate customer positions at each time, and the lines indi-cate the space-time trajectories of customers (for visual clarity, only alternate customers’trajectories are shown).

    time t

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    position x0 2 4 6 8 10

    Figure 3. A realization of the process, showing space-time trajectories of customers.

    Although Figure 3 seems the intuitively natural way to draw a realization, a differentgraphic is more suggestive for mathematical analysis. A configuration x = (0 = x0 < x1 <x2 < x3 . . .) of customer positions can be represented by its centered counting function

    F (x) := max{k : xk ≤ x} − x, 0 ≤ x

  • the upward-translated function

    x→ G(t, x) := t+ Ft(x). (6)

    In other words, we draw the function starting at the point (0, t) instead of the origin. Takingthe same realization as in Figure 3, and superimposing all these graphs, gives Figure 5.

    time t

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    position x0 2 4 6 8 10

    Figure 5. A realization of the process G(t, x).

    Figure 5 shows “something like” coalescing random walks in one dimension, and large-scalesimulations suggest the limit process is indeed coalescing Brownian motion, which wouldimply that a(k) grows as order k1/2. But we have done a smoke-and-mirrors trick here:in a graphic of a model of coalescing random walks which is drawn to resemble Figure 5,the horizontal axis would be “time” and the vertical axis would be “space”. So in makingthe analogy between our process and coalescing random walks, we are interchanging theroles of space and time. In particular, our process is (by definition) Markov in the verticaldirection, whereas models of coalescing random walks are (by definition) Markov in thehorizontal direction. This makes it hard to get started on a proof.

    4.3 Routed planar graphs

    In contrast to the previous ones, the project here arises from consideration of theory (insection 1.1) but is in a context where one can readily obtain data, from e.g. Google maps.Take k ≥ 3 street addresses, find the

    (k2

    )routes between pairs of addresses, and consider

    the union of these routes as the subnetwork (of the entire real-world road network) spannedby the k addresses. Figure 6 (left, k = 4) show a real-world example. The theory in section1.1 considered probability models on subnetworks that were consistent as k increases, and

  • identified the k → ∞ limit as a SIRSN model. But even in the simplest model we do nothave any explicit distribution for these spanning subnetworks.

    To study whether SIRSN models have any resemblance to the real-world road network,we can obtain data in the form of subnetworks on k = 4 randomly chosen addressespositioned roughly as the corners of a square. What can we do with this data?

    Figure 6. Topology of a subnetwork spanning 4 addresses.

    Of course the data is in part quantitative – the total subnetwork length and the(42

    )=

    6 route lengths, for instance – but it seems more interesting to think in terms of the“topology” of the subnetwork. One can represent the subnetwork mathematically as aplane graph by redrawing the route-segments between the subnetwork junctions as straightedges (Figure 6, right). So from our data we could estimate the relative frequencies ofdifferent “types” (topological equivalence classes) of plane graphs. In particular, scale-invariance would imply that this distribution does not depend on the size of the square –is this empirically true?

    To get started we need to know (for k = 4) what the possible “types” are. Our settingdiffers from textbook topological graph theory in several ways. That theory typically en-visages working on the 2-sphere, where there is no distinction between inside and outsidea cycle; for us, “inside the Beltway (M25, Berliner Ring)” is hardly equivalent to outside.From SIRSN theory we may assume the 4 original vertices are leaves, and we simplify byassuming they are on the outer boundary of the subnetwork. Finally, and perhaps mostinteresting, the usual notion of equivalence of plane graphs would be via “a graph isomor-phism induced by a homeomorphism R2 → R2”, but we impose the stronger requirementthat the isomorphism preserves the 6 routes. In Figure 7 we distinguish routes via colorsto get a “routed network”; the corresponding “unrouted” network is what you see if youare looking at Figure 7 in black-and-white.

  • 9

    10

    12

    69 56

    62 62

    Figure 7. Examples of types.

    At this point, a rather informal preliminary analysis suggests there are exactly 71 “types”of routed network. Six are shown in Figure 7 (the two labeled 62 are the same type). Infact the Figure shows the only cases where the routed-unrouted distinction matters; 9/10are different types as routed networks but the same type as unrouted; as are 12/69 and56/62.

    This suggests a huge variety of research problems, all likely to be very difficult. Manydo not involve probability, of instance (for general k)

    • Give a rigorous classification of “types” of routed k-leaf networks, and an algorithmfor deciding whether two given routed plane (i.e. embedded) networks are isomorphic.

    • Which (topological isomorphism types of) unrouted plane graph with k leaves arisefrom routed k-networks?

    • Which types can be embedded into the plane in such a way that routes are shortest-length paths, where the length of an edge (v1, v2) is(i) Euclidean distance; or(ii) an arbitrary positive real number, subject to the “metric” constraint that it is atmost the length of any alternate path from v1 to v2.

    From a probability viewpoint, a key question is

    • what probability distributions on types of routed 4-networks arise from some SIRSNmodel (as the spanning subnetwork of 4 points in a square configuration).

    But this is perhaps impossibly difficult.

  • 5 Broader impact of proposed activities

    Regarding the technical research proposed in section 3, the research is intend to showthat these “compactification” and limit methods potentially work in a much wide range ofsettings than are currently studied; this methodology has potential impact in many “limitsof discrete structures” contexts, in network science and beyond.

    Regarding general contributions to education and resources, the range of activities listedin section 2.2, in particular proposer’s ongoing interactions with undergraduates doingprojects involving data, will continue. This experience encourages students to continue tograduate school is some quantitative discipline, and will surely continue to provide further“research projects inspired from data” like those in section 4.

    At another level we have in mind the following issue. Over the last decade the profileof Probability within theorem-proof mathematics has risen substantially, as evidenced by 3Fields Medals for instance, but in parallel the strength of the traditional link with Statisticshas declined, and I suspect that a typical new Ph.D. in Probability from a Mathematicsdepartment today has had no serious engagement with data. The issue is best describedby a well known von Neumann quote.

    As a mathematical discipline travels far from its empirical source, or still more,if it is a second and third generation only indirectly inspired by ideas comingfrom “reality” . . . . . . there is a grave danger that the subject will develop alongthe line of least resistance, that the stream, so far from its source, will separateinto a multitude of insignificant branches, and that the discipline will becomea disorganized mass of details and complexities.

    In talking with young mathematicians, proposer uses examples like those in section 4 toremind them that Probability does actually have a non-trivial connection to “reality” thatis worthy of attention – or more rhetorically “your intellectual horizons should extendbeyond the narrow confines of the Mathematics Library”.

  • References

    [1] David Aldous. A conjectured compactification of some finite reversible MCs.http://www.stat.berkeley.edu/∼aldous/Talks/MCcompact.pdf, 2012.

    [2] David Aldous, Daniel Lanoue, and Justin Salez. The compulsive gambler process.arXiv 1406:1214, 2014.

    [3] D.J. Aldous. Exchangeability and continuum limits of discrete random structures. InR. Bhatia, editor, Proceedings of the International Congress of Mathematicians, 2010,volume 1, pages 141–153. Hindustan Book Agency, 2011.

    [4] D.J. Aldous. The great filter, branching histories and unlikely events. The Mathemat-ical Scientist, pages 55–64, 2012.

    [5] D.J. Aldous. Interacting particle systems as stochastic social dynamics. Bernoulli,19:1122–1149, 2013.

    [6] D.J. Aldous. Using prediction market data to illustrate undergraduate probability.American Math. Monthly, 120:583–593, 2013.

    [7] D.J. Aldous. When knowing early matters: Gossip, percolation and Nash equilib-ria. In A.N. Shiryaev, S.R.S. Varadhan, and E.L. Presman, editors, Prokhorov andContemporary Probability Theory, pages 3–28. Springer-Verlag, 2013.

    [8] D.J. Aldous. Scale invariant random spatial networks. Electron. J. Probab., 19:1–41,2014.

    [9] D.J. Aldous and K. Ganesan. True scale-invariant random spatial networks. Proc.Natl. Acad. Sci. USA, 110:8782–8785, 2013.

    [10] D.J. Aldous, M. Krikun, and L. Popovic. Five statistical questions about the tree oflife. Systematic Biology, 60:318–328, 2011.

    [11] D.J. Aldous and T. Lando. The stretch-length tradeoff in geometric networks: Worst-case and average-case study. arXiv 1404.2653, 2014.

    [12] D.J. Aldous and D. Lanoue. A lecture on the averaging process. Probability Surveys,9, 2012.

    [13] D.J. Aldous and Nathan Ross. Entropy of some models of sparse random graphs withvertex-names. Probab. Engineering Inform. Sci., 28:145–168, 2014.

    [14] D.J. Aldous and M. Shkolnikov. Fluctuations of martingales and winning probabilitiesof game contestants. Electron. J. Probab., 18:1–17, 2013.

    [15] Moez Draief and Laurent Massoulié. Epidemics and rumours in complex networks, vol-ume 369 of London Mathematical Society Lecture Note Series. Cambridge UniversityPress, Cambridge, 2010.

  • [16] Andreas Greven, Peter Pfaffelhuber, and Anita Winter. Convergence in distribution ofrandom metric measure spaces (Λ-coalescent measure trees). Probab. Theory RelatedFields, 145(1-2):285–322, 2009.

    [17] O. Kallenberg. Probabilistic Symmetries and Invariance Principles. Probability andits Applications (New York). Springer, New York, 2005.

    [18] W. S. Kendall. From Random Lines to Metric Spaces. ArXiv 1403.1156, 2014.

    [19] H. Kesten. First-passage percolation. In From Classical to Modern Probability, num-ber 54 in Progr. Probab., pages 93–143. Birkhauser, 2003.

    [20] D. Lanoue. The iPod Model. ArXiv 1402.4216.

    [21] D. Lanoue. The Metric Coalescent. ArXiv 1406.1131, 2014.

    [22] David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov chains and mixingtimes. American Mathematical Society, Providence, RI, 2009. With a chapter byJames G. Propp and David B. Wilson.

    [23] László Lovász. Large networks and graph limits, volume 60 of American MathematicalSociety Colloquium Publications. American Mathematical Society, Providence, RI,2012.

    [24] Mathew D. Penrose and Andrew R. Wade. Random directed and on-line networks. InNew perspectives in stochastic geometry, pages 248–274. Oxford Univ. Press, Oxford,2010.

    [25] H. Towsner. Limits of Sequences of Markov Chains. ArXiv 1404.3815, 2014.

  • [6, 4, 2, 11, 13, 8, 5, 9, 14, 7, 12, 3, 10]

  • Data Management Plan

    The theoretical aspects of the proposal do not involve data. For other aspects, datawill typically be taken from public sources. If novel data is generated for use in journalpublications, the data will be deposited in a permanent repository associated with thejournal.