









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The spiderweb of the drug-free spider is characterized by a clear central hole, threads radiating outward at approximately even intervals, and ...
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!
Michelle Feng and Mason A. Porter University of California, Los Angeles (Dated: June 18, 2020)
Spatial networks are ubiquitous in social, geographical, physical, and biological applications. To understand the large-scale structure of networks, it is important to develop methods that allow one to directly probe the effects of space on structure and dynamics. Historically, algebraic topology has provided one framework for rigorously and quantitatively describing the global structure of a space, and recent advances in topological data analysis (TDA) have given scholars a new lens for analyzing network data. In this paper, we study a variety of spatial networks — including both synthetic and natural ones — using novel topological methods that we recently developed for analyzing spatial networks. We demonstrate that our methods are able to capture meaningful quantities, with specifics that depend on context, in spatial networks and thereby provide useful insights into the structure of those networks, including a novel approach for characterizing them based on their topological structures. We illustrate these ideas with examples of synthetic networks and dynamics on them, street networks in cities, snowflakes, and webs spun by spiders under the influence of various psychotropic substances.
I. INTRODUCTION
Many complex systems have a natural embedding in a low-dimensional space or are otherwise influenced by space, and it is often insightful to study such spatial com- plex systems using the formalism of networks [1, 2]. In a spatial network, the location of nodes and edges in space can heavily inform both the structure of the network and the behavior of dynamical processes on it. Indeed, ob- taining a meaningful understanding of power grids [3–5], granular systems [6], rabbit warrens [7], and many other systems is impossible without considering the physical relationships between nodes in a network. For example, to examine traffic patterns on a transportation network in a meaningful way, it is important to include infor- mation about the physical distances between points and about the locations and directions of paths between heav- ily trafficked areas [8]. There are a variety of existing perspectives for study- ing spatial networks [1, 9]. Many of these perspectives hail from quantitative geography [10, 11]. In the 1970s, geographers were already studying the role of space in the formation of networks and in the activities of individu- als and goods over geographical networks. As data have become richer and more readily available, it has become possible to take increasingly intricate computational ap- proaches to the study of spatial networks, and a variety of complex-systems approaches have contributed greatly to the literature on spatial networks [1]. Researchers have also proposed various random models for spatial networks, and studying them yields baseline examples to compare to empirical networks [12–15]. There have also been investigations of the effects of certain spatial net- work properties on the behaviors of several well-known dynamical processes, including the Ising model [16], cou- pled oscillators [17], and random walks [18]. Although there is much existing work on the proper- ties of spatial networks (e.g., degree distributions, short-
est paths, and so on), there are relatively few network tools that leverage “global” structure in the traditional topological sense of the word. Current tools for study- ing global network structure tend to rely on aggregating local information in some way to paint a global picture of a network. By contrast, methods for understanding the global structure of a topological space rely intrinsi- cally on information about the entire space. To illustrate the difference, consider a sphere. If we sample a neigh- borhood of any point on a sphere, we obtain a surface with the same properties as a plane. If we take a col- lection of a sphere’s neighborhoods (which each resemble a plane) and stitch them together, we are able to ob- tain a lot of information about the sphere, but we are unable to describe the void in the center of the sphere. (For example, a stereographic projection of a sphere cov- ers the sphere’s entire surface, but it fails to capture the void.) To fully understand the structure of a sphere, we must consider the entire sphere at once. Over the last few decades, algebraic topology has been very useful for characterizing the global structure of mathematical spaces [19, 20] through its use of mathematical tools that consider spaces as global objects. By reframing spatial networks using the language of topological spaces, we can leverage existing topological tools to better understand their structures. For a case study with voting data, see our recent paper [21]. Homology groups, which were defined originally in al- gebraic topology and have been applied insightfully to a broad range of mathematical topics, provide one way to distinguish between mathematical spaces based on their numbers and types of “holes” [19]. Moreover, the exten- sion of homology to so-called “persistent homology” (PH) allows one to quantify holes in data in a meaningful way and has made it possible to apply homological ideas to a wide variety of empirical data sets [22, 23]. PH is help- ful for characterizing the “shape” of data, and the myr- iad applications of it include protein structure [24–27],
DNA structure [28], neuronal structure [29], computer vision [30], diurnal cycles in hurricanes [31], inferring symbolic dynamics in chaotic systems [32], spatial per- colation problems [33], and many others. Additionally, combining machine-learning approaches with PH has also been very useful for several classification problems [34– 37].
Because it is so natural to apply PH to the study of the shape of data, many successful applications of it have been to spatial networks. One particular area of inter- est has been the study of granular materials, because PH is able to effectively capture geometric properties of granular substances [6, 38, 39]. In addition to analyzing geometric information, PH methods are also able to de- scribe multiscale spatial relationships. Many biological applications to proteins and DNA rely on the ability of PH to illuminate features at multiple scales, as multi- scale structures and compositions of these molecules are extremely important to their functionality. PH has also been applied to larger-scale biological systems, including leaf-venation patterns [40], aggregation models [41], hu- man migration [42], networks of blood vessels [43], and the effects of psychoactive substances on brain activity [44]. The recent review article [45] includes an extensive discussion of applications of PH to networks.
One confounding factor in the use of PH to study spa- tial networks is that although PH is able to capture infor- mation across scales, traditional distance-based PH con- structions can have difficulty with applications in which differences in scale may not be meaningful. For example, in most applications to human geographical data, the difference in population density between urban and rural areas can dominate analyses that employ traditional PH constructions, and they thereby miss signals that do not rely on this variation in density. In a recent paper [21], we examined the shape of voting patterns in the state of California and found that traditional methods for com- puting PH are more likely to capture disparities in pop- ulation than to detect the presence of interesting voting patterns. To address this issue, we developed two novel PH constructions — one based on network adjacency and one based on the physical geometry of a map — that were more successful at capturing these voting patterns. For a recent analysis of the difficulty of interpreting signal and noise in PH results, see [46]. For approaches other than PH for analyzing maps while accounting for density variation, see [47, 48].
In the present paper, we apply our new PH construc- tions to a variety of spatial complex systems to demon- strate their usefulness across many domains. We show that these methods are well-adapted to capturing in- teresting structural properties of spatial networks and can thereby yield new insights into such networks, es- pecially with respect to their global structure. Our ex- amples include several synthetic graph models and dy- namics on them, city street networks (which we compare both within a city and across different cities), snowflakes, and webs spun by spiders under the influence of various
psychotropic substances. Our paper proceeds as follows. In Section II, we give technical background on PH and on our particular con- structions. In Section III, we discuss computational re- sults from computing the PH of (1) several well-known models of synthetic networks and (2) a variety of em- pirical data sets from diverse applications. We con- clude in Section IV. A public repository of the code that we use for our computations is available at https: //bitbucket.org/mhfeng/spatialtda/src/master/.
II. METHODS
A. Computing Persistent Homology
We now give a brief introduction to PH and tools for computing it. See [22, 49, 50] for more details. We begin by defining k-simplices and simplicial complexes. A k- simplex is k-dimensional polytope that is a convex hull of k + 1 nodes. A face of a k-simplex is any subset (of dimension smaller than k) of the k-simplex that is itself a simplex. A simplicial complex K is a set of simplices that satisfy the following requirements: (1) if σ ∈ K is a k-simplex, then every face of σ is in K; and (2) if σ and τ are simplices in K, then σ ∩ τ is a face of both σ and τ. Given a data set X, we construct a sequence X 1 ⊆ X 2 ⊆ · · · ⊆ Xl of simplicial complexes of some fixed max- imum dimension. We call the sequence {Xi} a “filtered simplicial complex”, and we call each Xi a “subcomplex” of the filtered simplicial complex. We equip each relation Xi ⊆ Xi+1 with an inclusion map. The filtered simpli- cial complex, along with its inclusion maps and the chain and boundary maps of each subcomplex, constitutes a “persistence complex”. The inclusion maps Xi ↪→ Xj induce a map fi,j : Hm(Xi) → Hm(Xj ) between homol- ogy groups. The map fi,j allows us to track an element of Hm(Xi) (the mth homology group of the subcomplex Xi) to an element of Hm(Xj ). The mth homologies of the persistence complex are given by the pair ( {Hm(Xi)} 1 ≤i≤l , {fi,j } 1 ≤i≤j≤l
and we call them the “mth persistent homology” of X. We refer to the collection of all mth persistent homologies as the “persistent homology” (PH) of X. Consider a generator x ∈ Hm(Xi) for some m and i. If x is not in the image of fi− 1 ,i, we say that x is “born” at time i. Correspondingly, if x ∈ Hm(Xi) and fi,i+1(x) = 0 ∈ Hm(Xi+1), we say that x “dies” at time i + 1. If for every j ≤ l, we have that fi,j (x) 6 = 0, then we say that x never dies, and we assign a death time of ∞ to the element x. For each element x of the PH of X, there is a birth time bx and a death time dx, and the collection of intervals {[bx, dx)} is the “barcode” of X. Generators with longer associated half-open intervals [bx, dx) are more persistent. It is traditional to construe
time, such that no features would occur after the first filtration step. When v = 0, there is no evolution.
(a) (b) (c)
FIG. 2: Illustration of level-set dynamics. Starting from (a) an initial black-and-white image, we apply level-set evolution (2) for several steps to obtain the image in (b) and then the one in (c). In these images, the white space in the center of the image shrinks until eventually it is completely covered by the expanding black surface.
By imposing {Mi} over a triangular grid of points, as described in [21], we obtain a corresponding simplicial complex Xi for each Mi. In Fig. 3, we give a visualiza- tion of this simplicial complex. We construct this level- set complex using a polygon whose points we choose uni- formly at random from [0, 1]×[0, 1] as an initial synthetic image. Because the level-set equation (2) evolves contin- ually outward, we automatically satisfy that condition that Xi ⊆ Xi+1, so {Xi} is a filtered simplicial complex. Our implementation of the level-set method works with any black-and-white image (or any image that one for- mulate as a piecewise-constant function R^2 → { 0 , 1 }). We expect our level-set approach to capture information about H 0 and H 1 for any such image. The level-set ap- proach also captures geometric information, which can be useful for some applications; however, this may make it difficult to capture information about holes that are visually irregular. Throughout this paper, we compare images that have roughly the same resolutions, where we take the image resolution from raw image data. Because image size should primarily affect the computation time of our level-set approach — but not the order in which features appear and disappear as an image evolves — we expect that it is possible to adapt our level-set con- struction when comparing images of different resolutions. Possible approaches for such an adaptation include nor- malizing image sizes or adjusting the resolution of the triangular grid that one uses for each image.
III. APPLICATIONS
We now discuss applications of PH to both synthetic networks and empirical spatial networks from a diverse variety of applications.
(a) (b)
(c) (d)
FIG. 3: Illustration of a level-set adjacency construction of PH. In (a), we show a synthetic image that we use as an initial manifold for level-set evolution. In (b)–(d), we show various filtration steps of the filtered simplicial complex that we generate by performing a level-set evolution on the image in panel (a). Panel (b) shows the simplicial complex that we obtain by overlaying the image in panel (a) on a triangular grid. In panels (c) and (d), we add new nodes, edges, and triangles to the image as it evolves outward. Darker colors indicate simplices that enter the filtration at a later time step.
A. Synthetic Networks
In this subsection, we discuss applications of our ad- jacency PH construction to a dynamical process on syn- thetic networks in which space plays an important role. For each network (V, E), we run the Watts threshold model (WTM) [55] on it. Given a graph, we select a fraction ρ 0 = 0.05 of its nodes uniformly at random to be “infected” at time 0. At each time step, we then we compute the fraction of each node’s neighbors that are infected. (That is, we synchronously update the states of the nodes [56].) If the fraction of a node’s neighbors that are infected meets or exceeds a threshold (in our case, the threshold is $ = 0.18 for all nodes), the node becomes infected. We take this implementation of the WTM to be the generator of a function f : V → N [57], where f (v) is the time at which node v becomes infected. We say that infected nodes are in the set I. If v never becomes infected, we set f (V ) = maxv∈I f (v)+1, so that we eventually add all nodes to a filtered simplicial com- plex. The resulting filtered simplicial complex consists of the subgraphs that are generated by I at each time step. See [58–60] for studies of the WTM on spatial networks. We use parameter values of ρ 0 = 0.05 and $ = 0. 18 throughout this section. We expect changes in ρ and $
to affect the birth times and death times of features in a filtered simplicial complex. Using a different value of ρ 0 entails considering a different fraction of initially infected notes. Therefore, a larger value of ρ 0 yields a larger sim- plicial complex at the first filtration step, and smaller value of ρ 0 yields a smaller simplicial complex. Using a larger value of $ results in fewer nodes becoming infected at each time step, and it thus takes more filtration steps before the simplicial complex stops growing. Because our underlying graph is the same for any choice of values of ρ and $, we do not expect changes in the homology of the last filtration step, unless ρ or $ are sufficiently small such that some nodes in a graph never become infected. However, one can certainly obtain a different PH for dif- ferent values of ρ or $, as nodes and edges can join the filtered simplicial complex at different times and (more importantly) in different orders.
We examine topological changes in the infected sub- graph of three different types of synthetic networks (see Fig. 4). We first examine random geometric graphics (RGGs) [61]. For each instance of an RGG, we pick 100 nodes uniformly at random from the unit square. If the Euclidean distance between two nodes is less than or equal to 0.1, we add an edge between them [see Fig. 4(a)]. Our second type of synthetic network is a square lattice with 100 nodes. We arrange the 100 nodes in a 10 × 10 grid on the unit square, and we then connect the nodes along the grid lines [see Fig. 4(b)]. Our third type of synthetic network is a Watts–Strogatz (WS) small-world network [62, 63]. We begin with a ring of 100 nodes, and we then connect each node to its k = 2 nearest neigh- bors on each side. We then rewire each edge uniformly at random with a probability of p = 0.1 using the im- plementation of the WS model in NetworkX [64]. In this version of the WS model, one removes each rewired edge before replacing it with a new edge. We show an example of a WS graph in Fig.. 4(c).
For each type of synthetic network, we consider 100 instances, which we generate using NetworkX. For the RGG and WS networks, each instance is a different graph; the square lattice network is deterministic. For all three types of networks, each instance has a different initial set of infected nodes. We show visualizations of each of these types of networks (with WTM dynamics on it) in Fig. 4.
Our adjacency construction begins by selecting the ini- tially infected nodes and the edges between them of a net- work to create an infected subgraph that we call an “in- fection network”. As an infection spreads, we add more nodes and edges to the infection network until eventually we have added all nodes and edges to it.
Examining the PHs of the RGGs (see Fig. 5), we see for our parameter values that an infection network tends to have several connected components, resulting in a large number of features in H 0. However, because of the spreading behavior of the WTM, new nodes can be- come infected only via their infected neighbors. Because features in H 0 record connected components of a graph,
FIG. 4: An instance of each of our synthetic networks with Watts threshold model (WTM) dynamics on it. The corresponding persistence diagrams (PDs) are in Figs. 5, 6, and 7. We color the nodes based on the time that they become infected. The three types of synthetic networks are (a) a random geometric graph, (b) a square lattice network, and (c) a Watts–Strogatz small-world network.
FIG. 5: The PD for an instance of the WTM on an RGG. We plot each feature as a point on the PD, for which the horizontal coordinate represents the birth time and the vertical coordinate represents the death time. We plot features with infinite persistence (i.e., features that do not die within the range of filtration parameters that use for a PH computation) on a horizontal line at the top of the PD. We plot features in H 0 (which indicates the connected components) as pink circles, and we plot features in H 1 (which indicates the one-dimensional holes) as dark-blue squares.
new infected nodes join existing connected components. Therefore, features can only be born at time 0 or in the last step, which is when we add all remaining uninfected nodes to our filtered simplicial complex. By contrast, fea- tures in H 1 are relatively rare, as most cycles that occur in an RGG are filled because of the uniform probability distribution of the node locations. For a square lattice network (see Fig. 6 for a PD of the WTM on such a network), we first note that there is only a single infinite-length feature in H 0 , as the final infection network necessarily consists of a single connected compo-
shapefile, we obtain a bounding box for each point. We sample uniformly within this bounding box, discard- ing points that do not lie within the polygonal district geometry that is defined in the shapefile. We stop sampling when we reach the desired number of points. In total, we sample ten points from each administrative district, and we also include nine historical landmarks with coordinates from Google Maps [75]. In Fig. 8, we show maps and their associated PDs for two examples.
(a) (b)
FIG. 8: Two sampled street networks from (a) Pudong New Area and (b) Zhabei district. [We generated both maps using OSMnx [73].]
After computing PH (in the form of a PD) for each map, we compute the bottleneck distance between each pair of maps. Bottleneck distance is a metric that is de- fined on the space of PDs. It gives the shortest distance d for which there exists a perfect matching between the points of the two PDs (along with all diagonal points), such that any pair of matched points are at most a dis- tance d from each other, where we use the supremum norm in R^2 to compute the distance between points. Once we have pairwise bottleneck distances between PDs, we perform average-linkage hierarchical clustering into three clusters. (We chose to have three clusters based on looking at the dendrogram.) We can replace our metric with a different metric (such as a Wasserstein distance [76]) on PDs or cluster our PDs using a different cluster- ing algorithm. We do not discuss the impact that such choices may have on our results, although we note in passing that we performed k-medoids clustering [77] for our case study of Shanghai and obtained qualitatively similar results.
In Fig. 9, we show the sampled points (which we color according to their cluster assignment). We observe that the three clusters consist largely of historical areas (“City center”), concession-era areas (“Transition areas”), and modern areas (“New construction”). In Fig. 10, we show administrative districts along with the year that they were constructed. We break them down by the percent-
age of the sample points that are in each cluster.
FIG. 9: Sampled points in Shanghai. We color these points according to their cluster assignment from average-linking hierarchical clustering of areas of Shanghai into three clusters.
We continue our analysis of cities by characterizing and comparing the structures of street networks of 306 cities across the globe. We downloaded latitude and longitude coordinates from SimpleMaps [78] and selected all cities with a population of at least 1.5 million people. Given these latitude and longitude coordinates, we use OSMnx [73] to obtain street networks. We then compute PH for each city and cluster their PDs using average-linkage hi- erarchical clustering with three clusters. We sometimes refer to a city in a given cluster as a city of a certain “type”. Our results depend on the specific latitude and longitude coordinates in our downloaded data set. Ac- cordingly, our results are influenced by the particular lo- cation of a city’s coordinates, which are the standard ones in SimpleMaps. In the following paragraphs, we describe our clusters of cities. We define “blocks” to be the cells of a pla- nar street graph. Although our level-set construction for computing PH is not designed explicitly to characterize blocks, we take advantage of the fact that our level-set construction takes the set of streets as its initial manifold. As the streets expand outward according to the level-set equation (2), they fill in the blocks. Larger blocks take longer to fill in, and blocks fill in more evenly when they are closer to circular in shape. Roughly, we characterize block sizes based on the death times of features in H 1 : small sizes correspond to early death times (specifically, less than 10), medium sizes correspond to death times be-
FIG. 10: Breakdown of administrative districts in Shanghai into our three clusters. (We order the districts roughly by their year of development.) Most of the older districts have a larger percentage of points that are assigned to the “City center” cluster, whereas the points in the “Transition areas” cluster tend to occur in districts that included development in the 19th and early 20th centuries. The “New construction” cluster is the most common assignment for administrative districts from the 1950s or later.
tween 10 and 15, and large sizes correspond to late death times (specifically, more than 15). We also designate blocks as “regular” (when they are close to a regular con- vex polygon) or “irregular” (for blocks that do not resem- ble a rectangle or some other regular convex polygon). If a block is very irregular, then as its streets expand, it is possible that narrow parts of the block will shrink and close off, such that the block segments into smaller blocks. We refer to this phenomenon as “pinching”. Our three main clusters are dominated by (1) gridlike cities; (2) cities with gridlike patches that are interspersed with larger, non-gridlike blocks; and (3) cities that have a large number of non-gridlike structures (specifically, dead ends or large holes) that interrupt other structures. We use the term “interrupted grid” for cities that are either (1) mostly gridlike with some patches that are not gridlike or (2) consist of patches of disparate grids that are stitched together (with other features between them).
Our first major cluster has 99 cities and is dominated by cities with small, gridlike blocks. All regions of the world have some cities of this type, but North America has the largest percentage (relative to all of the cities that we sample from that continent) of these gridlike cities and Europe has the smallest percentage of them. The block sizes in this cluster tend to be small or of medium size, resulting in filtrations whose maximum filtration value tends to be small in comparison to cities in the other two clusters. In the PDs, we also observe that the distribu- tions of death times of features in H 1 tends to be close to
uniform and over a small range. Such distributions occur because these gridlike cities tend to have even distribu- tions of block sizes, even though they include some ar- eas with slightly smaller and/or slightly larger grid sizes. They do not have large blocks, so they do not have fea- tures in H 1 with late death times.
FIG. 11: Cities in our first major cluster have gridlike street layouts. One example of such a city is Los Angeles, which we show in this figure. We show its street network in the top row and its associated PD in the bottom row.
Our second major cluster has cities with patches of grids that are interspersed with structures that are not gridlike. This cluster, with 149 cities, is the largest of our three clusters. The PDs in this cluster tend to have larger maximum death times than the PDs for the cities in our first cluster. In the PDs, gridlike blocks yield col- lections of features in H 1 with early death times; and the larger, non-gridlike structures yield features in H 1 with late death times. The non-gridlike areas in these cities tend to have fairly regular shapes, resulting in a relatively small number of features in H 1 with late birth
with gridlike street layouts. This is consistent with the common perception that North American cities are much more gridlike than European cities. In all regions, we also observe that a large fraction of the cities are inter- rupted grids. Additionally, we observe that South Amer- ica, Africa, and Asia have similar distributions of city types. Interestingly, from the map in Fig. 14, South America, Asia, and Africa appear to have areas that are dominated by specific major clusters. We observe non-gridlike cities in the northern part of South America, whereas we see gridlike cities along its east coast. In Africa, most of the non-gridlike cities occur along the western coastline. In Asia, most of the gridlike and non-gridlike cities occur in East Asia, whereas Southeast Asia is dominated by interrupted grids. Across the map, there appears to be a potential equatorial band of non-gridlike cities. We do not have an explanation for these patterns, but they are fascinating and seem worthy of future research efforts.
We compare our results to the city classification of Louf and Barthelemy [80], who associated each city with a con- ditional probability distribution that captures the area and shape of its blocks. We choose their method as a point of comparison because they studied a wide range of cities and (like us) codified cities from a block-based perspective. They used the word “fingerprint” as a mon- icker for their block-based representation of cities. In our method, we codify cities according to their PHs, which we generate using the level-set construction of Section II C. Both the approach of [80] and our approach capture in- formation that is based on city blocks, although our PH representation differs substantially from the fingerprints of [80]. Louf and Barthelemy clustered cities into four groups, whereas we have chosen to cluster our cities into three groups. In [80], European and North American cities largely inhabit the same cluster (group three in [80]), but they appear in distinct subclusters, demonstrating that there is a substantive difference between cities from the two regions. Our method finds that North America has the largest proportion of cities with gridlike streets among all of the regions and that Europe has the smallest proportion of such cities. In contrast to the above situation, Africa, Asia, and South America have a fairly balanced composition of city types, with a potential equatorial band of non-gridlike cities. Louf and Barthelemy observed several clusters (groups one, two, and four in [80]) that occur predomi- nantly in Africa, Asia and Oceania (which they combined into one entity), and South America. Notably, none of our clusters are as dominant as group three (which they described as having heterogeneous block sizes and shapes) of [80], although we do observe that our cluster
of cities with interrupted grids (such cities are character- ized in part by their heterogeneous block sizes) is also our largest cluster. Now that we have compared our results to those of [80], we briefly compare and contrast the types of informa- tion that the two methods can capture. Recall that our level-set construction for PH generates filtered simplicial complexes that first consist of streets and then expand outward to absorb the blocks between them. The PH of such a filtered simplicial complex thereby gives a low- dimensional representation of the original image of a city street network. Because irregularly shaped blocks are absorbed into the surrounding streets at a different pace than regular blocks, we capture information about the regularity of each block. Louf and Barthelemy’s method also uses information about the regularity of block shape. See Eq. (3.2) in [80] for a precise mathematical statement of how they measured the regularity of blocks. It is re- lated to a subset of so-called “compactness measures” [81] (which are used in the study of gerrymandering [82, 83]) that compare the area of a shape to the area of a circle in which the shape is circumscribed. Because the original image of a city street network includes information about the spatial relationships be- tween blocks, the PH that results from our approach also encodes some of this information. By contrast, Louf and Barthelemy’s fingerprints do not encode information about the spatial relationships of blocks to each other. Additionally, our method captures information from dead ends, which Louf and Barthelemy discarded. Overall, although both our approach and that of [80] use a block-based representation to characterize cities, there are subtle differences in the way that the two ap- proaches encode block information. Nevertheless, the commonality of a block-based perspective results in some similarities. For example, the clusters that result from the two approaches seem to be based heavily on block size and regularity. However, our approach appears to prioritize spatial relationships between different clusters of blocks (specifically, whether blocks are arranged in a grid); such information is not captured in the approach of [80]. Consequently, the two approaches capture differ- ent city morphologies, and we expect them to be useful as complementary techniques for studying structures in spatial complex systems.
C. Snowflakes
As a second application that uses empirical data, we consider snowflake crystals [84]. We start with twelve different images (from [84]) of snowflakes with different crystalline structures. (See Fig. 20 in Appendix A.) Using the GNU Image Manipulation Program [85], we thresh- old these grayscale images (using a thresholding setting of
Cities colored by their cluster assignments from average-linkage hierarchical clustering of cities into three clusters. [The
shapefile
of the world
map is from [79].]
spiders who were administered more toxic substances produce webs that are more deformed (in comparison to a web that is spun by a drug-free spider) than less toxic substances. Additionally, using techniques from statis- tical crystallography, they concluded that spiders fail to complete more sides of their webs when they are under the influence of more toxic substances.
In our case study of PH in spiderwebs, we use five images from the NASA technical briefing [88] and two images from Witt [87] of webs that were spun by spi- ders under the influence of a variety of psychotropic sub- stances, threshold grayscale images to turn them into black-and-white images (using a thresholding setting of 205 in the GNU Image Manipulation Program), apply our level-set construction to compute PH, and perform average-linkage hierarchical clustering to yield the den- drogram in Fig. 18. We show the images of the spider- webs and their associated PDs in Fig. 19.
Our classification places the drug-free spider into its own cluster. The spiderweb of the drug-free spider is characterized by a clear central hole, threads radiating outward at approximately even intervals, and completed rings of threads that surround the center. We place the webs that were spun by spiders under the influence of marijuana, peyote, and LSD into the same cluster. In these webs, there is a clearly identifiable center; and most of the radial threads are evenly-spaced, straight, and ra- diate outward directly from the center. However, for the webs in this cluster, rings of threads are either difficult to see or are incomplete. The final cluster consists of webs that were spun by spiders under the influence of chloral hydrate, caffeine, and speed. In the caffeinated spider’s web, one cannot even clearly identify a center [89]. One can locate a center in the webs of the spiders that were under the influence of speed or chloral hydrate (a seda- tive that is used in sleeping pills), but many of the radial threads do not join the center and some of the radial threads are not straight. Almost no complete rings of thread are visible in any of the three webs in this cluster.
IV. CONCLUSIONS
It is important to exploit spatial information in the study of spatial complex systems. In this paper, by us- ing new methods of computing persistent homology that take spatial information into account, we presented sev- eral applications of topological data analysis to spatial networks. We showed that topological methods are ca- pable of characterizing network structures and detect- ing structural differences from images of various spatial networks. We also demonstrated, using both synthetic examples and networks from empirical data, that such methods are able to provide insights into large-scale net-
work structures that complement those from traditional techniques of network analysis. As an extended case study, we examined the morphology of street networks in cities, and we used spatial TDA to compare and contrast
FIG. 18: Classification of webs that were spun by spiders under the influence of various psychotropic substances.
(1) different regions of the same city and (2) different cities. We hope that our examples help illustrate some ways in which topological methods, especially ones that directly incorporate spatial information in their formu- lation, can be useful for the analysis of spatial complex systems.
Appendix A: Additional Snowflake Images
In Fig. 20, we show the images of all twelve snowflakes that we examined.
ACKNOWLEDGMENTS
We thank Marc Barthelemy, Heather Zinn Brooks, Hanbaek Lyu, Elizabeth Munch, Stan Osher, Nina Ot- ter, Giovanni Petri, Bernadette Stolz, and an anonymous referee for helpful comments. We are particularly grate- ful to Joshua Gensler for his many helpful comments on both our paper and our code. We also acknowledge sup- port from the National Science Foundation (grant num- ber 1922952) through the Algorithms for Threat Detec- tion (ATD) program.
(a) (b) (c) (d)
(e) (f) (g)
FIG. 19: Webs spun by a drug-free spider and spiders that were under the influence of various psychotropic substances, with the associated PD displayed beneath each web. We compare the webs of (a) a drug-free spider with webs spun by spiders that were under the influence of (b) chloral hydrate (which is used in some sleeping pills), (c) marijuana, (d) speed, (e) caffeine, (f) peyote, and (g) LSD. [The images for panels (a)–(e) are from [88], and the images for panels (f) and (g) are from [87].]
[1] M. Barthelemy, Morphogenesis of Spatial Networks (Springer International Publishing, Cham, Switzerland, 2018). [2] M. E. J. Newman, Networks, 2nd ed. (Oxford University Press, Oxford, UK, 2018). [3] R. V. Sol´e, M. Rosas-Casals, B. Corominas-Murtra, and S. Valverde, Physical Review E 77 , 026102 (2008).
[4] H. Kim, D. Olave-Rojas, E. Alvarez-Miranda, and S.-W.´ Son, Scientific Data 5 , 180209 (2018). [5] R. Albert, I. Albert, and G. L. Nakarado, Physical Re- view E 69 , 025103 (2004). [6] L. Papadopoulos, M. A. Porter, K. E. Daniels, and D. S. Bassett, Journal of Complex Networks 6 , 485 (2018). [7] S. H. Lee, M. Cucuringu, and M. A. Porter, Physical Review E 89 , 032810 (2014).
On Machine Learning And Applications (ICMLA) (2019) pp. 1211–1218. [37] C. Cai and Y. Wang, arXiv:2001.06058 (2020). [38] M. Kra´ar, A. Goullet, L. Kondic, and K. Mischaikow, Physical Review E 87 (2013). [39] M. Buchet, Y. Hiraoka, and I. Obayashi, in Nanoinfor- matics (Springer-Verlag, Heidelberg, Germany, 2018) pp. 75–95. [40] H. Ronellenfitsch and E. Katifori, Physical Review Let- ters 117 , 138301 (2016). [41] C. M. Topaz, L. Ziegelmeier, and T. Halverson, PLOS ONE 10 , e0126383 (2015). [42] P. S. P. Ignacio and I. K. Darcy, European Physical Jour- nal — Data Science 8 , 1 (2019). [43] H. M. Byrne, H. A. Harrington, R. Muschel, G. Reinert, B. J. Stolz, and U. Tillmann, Mathematics Today 55 , 206 (2019). [44] G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P. J. Hellyer, and F. Vaccarino, Journal of the Royal Society Interface 11 (2014). [45] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri, arXiv:2006.01764 (2020). [46] P. Bubenik, M. Hull, D. Patel, and B. Whittle, Inverse Problems 36 , 025008 (2020). [47] W. R. Tobler, Geographical Review 53 , 59 (1963). [48] M. T. Gastner and M. E. J. Newman, Proceedings of the National Academy of Sciences of the United States of America 101 , 7499 (2004), arXiv:0401102 [physics]. [49] A. Zomorodian and G. Carlsson, Discrete & Computa- tional Geometry 33 , 249 (2005). [50] R. Ghrist, Bulletin of the American Mathematical Soci- ety 45 , 61 (2008). [51] B. J. Stolz, H. A. Harrington, and M. A. Porter, Chaos 27 , 047410 (2017). [52] The Gudhi Project, Gudhi User and Reference Manual (Gudhi Editorial Board, 2015). [53] C. Maria, in Gudhi User and Reference Manual (Gudhi Editorial Board, 2015). [54] S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, Vol. 153 (Springer-Verlag, Heidelberg, Germany, 2003). [55] D. J. Watts, Proceedings of the National Academy of Sciences of the United States of America 99 , 5766 (2002). [56] M. A. Porter and J. P. Gleeson, Dynamical Systems on Networks: A Tutorial, Vol. 4 (Springer International Publishing, Cham, Switzerland, 2016). [57] We use the convention that N includes 0. [58] D. Taylor, F. Klimm, H. A. Harrington, M. Kram´ar, K. Mischaikow, M. A. Porter, and P. J. Mucha, Nature Communications 6 , 7723 (2015). [59] B. I. Mahler, U. Tillmann, and M. A. Porter, arXiv:1812.09806 (2018). [60] F. M. Ying, Dynamical processes on random geomet- ric graphs (2013), available at https://www.math.ucla. edu/~mason/research/fabian-report-092913.pdf. [61] M. Penrose, Random Geometric Graphs (Oxford Univer- sity Press, Oxford, UK, 2003). [62] D. J. Watts and S. H. Strogatz, Nature 393 , 440 (1998). [63] M. A. Porter, Scholarpedia 7 , 1739 (2012). [64] A. A. Hagberg, D. A. Schult, and P. J. Swart, in Pro- ceedings of the 7th Python in Science Conference, edited
by G. Varoquaux, T. Vaught, and J. Millman (Pasadena, CA USA, 2008) pp. 11 – 15. [65] M. Barthelemy, Nature Reviews Physics 1 , 406 (2019). [66] G. Boeing, International Journal of Information Manage- ment (2019), available at doi:10.1016/j.ijinfomgt. 2019.09.009. [67] G. Boeing, Environment and Planning B: Urban Analyt- ics and City Science 47 , 590 (2020). [68] M. Barthelemy, Environment and Planning B: Urban An- alytics and City Science 44 , 256 (2017). [69] A. Cardillo, S. Scellato, V. Latora, and S. Porta, Physical Review E 73 , 066107 (2006). [70] M. Ahmed, B. T. Fasy, and C. Wenk, in Proceedings of the 22nd ACM SIGSPATIAL International Confer- ence on Advances in Geographic Information Systems, SIGSPATIAL ’14 (Association for Computing Machin- ery, New York, NY, USA, 2014) pp. 43–52. [71] Y. Wu, G. Shindnes, V. Karve, D. Yager, D. B. Work, A. Chakraborty, and R. B. Sowers, in IEEE Confer- ence on Intelligent Transportation Systems, Proceedings, ITSC , Vol. 2018-March (Institute of Electrical and Elec- tronics Engineers Inc., 2018) pp. 1–6. [72] J. Thompson, M. Stevenson, J. S. Wijnands, K. A. Nice, G. D. P. A. Aschwanden, J. Silver, M. Nieuwenhuijsen, P. Rayner, R. Schofield, R. Hariharan, and C. N. Morri- son, The Lancet: Planetary Health 4 , E32 (2020). [73] G. Boeing, Computers, Environment, and Urban Sys- tems 65 , 126 (2017). [74] E. Song, Administrative district boundaries of city of Shanghai, People’s Republic of China, 2017, ArcGIS (2017). [75] Google, Google Maps search for Shang- hai, available at https://www.google.com/ maps/place/Shanghai,+China/data=!4m2!3m1! 1s0x35b27040b1f53c33:0x295129423c364a1?sa=X&ved= 2ahUKEwjSmom9nevmAhXNuZ4KHangDhIQ8gEwK3oECBkQBA. [76] M. Kerber, D. Morozov, and A. Nigmetov, Journal of Experimental Algorithmics 22 , 1.4 (2017). [77] H.-S. Park and C.-H. Jun, Expert Systems with Appli- cations 36 , 3336 (2009). [78] SimpleMaps, World cities database (2019), available at https://simplemaps.com/data/world-cities. [79] M. Belgiu, UIA Latitude/Longitude Graticules and World Countries Boundaries, ArcGIS (2015), avail- able at https://www.arcgis.com/home/item.html?id= a21fdb46d23e4ef896f31475217cbb08. [80] R. Louf and M. Barthelemy, Journal of the Royal Society Interface 11 , 20140924 (2014). [81] R. Gillman, Math Horizons 10 , 10 (2002). [82] R. Barnes and J. Solomon, arXiv:1803.02857 (2018). [83] M. Duchin and B. E. Tenner, arXiv:1808.05860 (2018). [84] K. G. Libbrecht, arXiv:1910.06389 (2019). [85] The GIMP Development Team, GIMP. [86] Interestingly, whiskey itself produces webs [90]. [87] P. N. Witt, Behavioral Science 16 , 98 (1971). [88] D. A. Noever, R. J. Cronise, and R. A. Relwani, NASA Tech Briefs 19 , 82 (1995). [89] The web that was produced by the caffeinated spider is always fun to point out when giving presentations. [90] S. J. Williams, M. J. Brown, and A. D. Carrithers, Phys- ical Review Fluids 4 , 100511 (2019).