I am a permanent lecturer of Computer Science at the University of Bristol. Prior to that, I was a senior researcher at the Max Planck Institute for Informatics, and led the reserach group on Combinatorics, Computing, and Randomness. I was affiliated with the Cluster of Excellence on "Multimodal Computing and Interaction" at the Saarland University from 2013 - 2015, and led the research group on Randomised Algorithms. I was awarded a Simons-Berkeley Research Fellowship, and worked at the Simons Institute for the Theory of Computing, UC Berkeley in 2014.
I obtained a Bachelor Degree from Fudan University (2005), and a PhD from Fudan University (2010). I won the President Medal of Fudan University in 2004, awarded to 2 out of 45,000 students that year. My thesis won Shanghai Distinguished Dissertation Award.
My main research areas range over the fields of machine learning and data mining, applied probablity and statistics, combinatorial optimisation, matrix analysis, and spectral graph theory. I am particularly interested in applying these techniques to designing efficient algorithms for massive data sets, as well as analysing the dynamics of large-scale networks.
some of my recent publications
A spectral sparsifier is a reweighted sparse subgraph of the original graph such that, for all real vectors, the Laplacian quadratic forms between that subgraph and the original graph are approximately the same. Since many algorithms run faster on sparese graphs and storing sparse graphs are cheaper, spectral sparsification has been widely used in designing fast algorithms. We present the first almost-linear time algorithm for constructing linear-sized spectral sparsifiers. This algorithm is almost optimal respect to the runtime and the sparsity of the output graph.
Partitioning a graph into several pieces such that each piece is better connected on the inside than towards the outside is an important problem in algorithm design, and has comprehensive applications in Machine Learning, Computer Vision, and Algorithm Design. We study the widely used spectral clustering algorithms and show that, for a suitable class of well-clustered graphs, the approximation ratios of spectral clustering algorithms can be bounded. We further show how heat kernel can be used in graph partitioning to get a nearly-linear time algorithm.
One central problem in distributed computing is to balance the load (tokens) that must be performed by the processors (nodes) through local communications, and the objective is to analyze the converence rate of different communication models with different initial distributions and network topologies. We analysed two popular models, the diffusion and the matching model. For both models, we analyze several iterative protocols that use the idea of randomized rounding. We show that these protocols almost achieve the same discrepancy with the same number of rounds as their counterparts for the case of divisible load.
The d-choice process is a basic stochastic process with applications to resource allocation and scheduling. However, for many practical applications in networks where tasks correspond to balls and processors correspond to bins, a d-choice process becomes infeasible due to high communication cost. We introduced a natural process for allocating balls into bins that are organised as the vertices of an undirected graph, and analysed the impact of the graph's structure on the maximum load. In particular, for expander graphs our new process behaves as good as the classical d-choice process.
Randomized Rumor spreading are simple epidemic protocols that spread messegeas (rumors) from one node to all other nodes. This is done by allowing nodes to exchange the rumor with a randomly chosen neighbor with a each round. The main questions is th determine the number of required rounds to spread the rumor to all nodes in different networks. Our results include: (1) analyzing the rumor spreading time on different network topologies; (2) analyzing the randomness requirement of rumor spreading protocols.
Most real-world structured data sets nowadays are formulated as massive graphs, and these graphs may change over time. One modern approach is to process the underlying graph as a stream where every edge comes sequentially. Graph streaming algorithms run in sub-linear space, and approximate certain graph quantities with high probability. We developed a new technique of designing graph streaming algorithms using complex-valued hash functions, and presented the first algorithm for this problem that runs in distributed and dynamic graph streams.
Given a set of points in the Euclidean space, the minimum Manhattan network problem is to construct a rectilinear network such that, for any pair of input points, there is a path connecting them whose length equals to their L1 distance. We proved that (1) the decision version of this problem is strong NP-complete, and (2) present the best approximation algorithm with respect to runtime and approximation ratio. The complexity status of this problem was open for 11 years before our result.
a selection of my conference and invited talks
Last updatd on April 21st, 2015