Sampling of Random Graphs with Prescribed Degree Sequence

Date

2019-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Random graph generation is the foundation of the statistical study of complex networks, which are commonly found in social, technological, and biological systems. In empirical studies, often only limited information about the topological structure of a network is available. In order to make the best use of this information, one must sample the ensemble of graphs that satisfy the constraint of the known structure, but are otherwise as random as possible.

Similar to the microcanonical ensemble and canonical ensemble in statistical physics, there are two types of methods to generate an ensemble of graphs with prescribed topological constraints. A hard constraint method generates an ensemble of graphs in which each graph satisfies the constraints exactly. On the other hand, a soft constraint method generates an ensemble of graphs in which the constraints are satisfied only on average within the ensemble.

In this dissertation, inspired by the idea of maximizing entropy, improvements in a hard constraint method called Sequential Importance Sampling (SIS) are developed for the case when the number of connections of each node, the degree sequence, is prescribed. Among the improvements is a more stable method of calculating ensemble averages that allows much larger networks to be sampled. With the improved methods, results from hard constraint methods are compared with those of soft constraint methods. It is found that soft constraint methods significantly overestimate the global clustering coefficient for both regular random graphs and scale-free graphs. This implies that problems exist with many network analyses and that care must be taken about the assumptions of network statistical analyses.

A dynamical model to generate random graphs with prescribed degrees is also considered. The graphs resulting from this model are split graphs. We develop a linear complexity algorithm to decompose any graph into a series of split graphs, which can potentially be used to improve the efficiency of hard constraint sampling methods.

Description

Keywords

Graph sampling, Sequential importance sampling, Exponential random graph models

Citation