I am a permanent lecturer of Computer Science at the University of Bristol. Prior to that, I was a senior researcher at the Max Planck Institute for Informatics, and led the reserach group on Combinatorics, Computing, and Randomness. I was affiliated with the Cluster of Excellence on "Multimodal Computing and Interaction" at the Saarland University from 2013 - 2015, and led the research group on Randomised Algorithms. I was awarded a Simons-Berkeley Research Fellowship, and worked at the Simons Institute for the Theory of Computing, UC Berkeley in 2014.

I obtained a Bachelor Degree from Fudan University (2005), and a PhD from Fudan University (2010). I won the President Medal of Fudan University in 2004, awarded to 2 out of 45,000 students that year. My thesis won Shanghai Distinguished Dissertation Award.

My main research areas range over the fields of machine learning and data mining, applied probablity and statistics, combinatorial optimisation, matrix analysis, and spectral graph theory. I am particularly interested in applying these techniques to designing efficient algorithms for massive data sets, as well as analysing the dynamics of large-scale networks.

Events

  • Probability and Statistics Seminar Series, University of Bristol, May 2016.
  • BIRS Workshop on Algebraic and Spectral Graph Theory, July 2016

Research

some of my recent publications

Spectral Sparsification

A spectral sparsifier is a reweighted sparse subgraph of the original graph such that, for all real vectors, the Laplacian quadratic forms between that subgraph and the original graph are approximately the same. Since many algorithms run faster on sparese graphs and storing sparse graphs are cheaper, spectral sparsification has been widely used in designing fast algorithms. We present the first almost-linear time algorithm for constructing linear-sized spectral sparsifiers. This algorithm is almost optimal respect to the runtime and the sparsity of the output graph.

  • Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time. joint work with Yin Tat Lee.
            Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 250-269, 2015.
            Conference version | arXiv version

Graph Partitioning

Partitioning a graph into several pieces such that each piece is better connected on the inside than towards the outside is an important problem in algorithm design, and has comprehensive applications in Machine Learning, Computer Vision, and Algorithm Design. We study the widely used spectral clustering algorithms and show that, for a suitable class of well-clustered graphs, the approximation ratios of spectral clustering algorithms can be bounded. We further show how heat kernel can be used in graph partitioning to get a nearly-linear time algorithm.

  • Partitioning Well-Clustered Graphs: Spectral Clustering Works! joint work with Richard Peng, and Luca Zanetti
            Proceedings of the Annual Conference on Learning Theory (COLT), pages 1423-1455, 2015.
            Conference version | arXiv version | Poster

Load Balancing

One central problem in distributed computing is to balance the load (tokens) that must be performed by the processors (nodes) through local communications, and the objective is to analyze the converence rate of different communication models with different initial distributions and network topologies. We analysed two popular models, the diffusion and the matching model. For both models, we analyze several iterative protocols that use the idea of randomized rounding. We show that these protocols almost achieve the same discrepancy with the same number of rounds as their counterparts for the case of divisible load.

  • Tight Bounds For Randomized Load Balancing on Arbitrary Network Topologies. joint work with Thomas Sauerwald
            Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 341-350, 2012.
           
    Conference version | arXiv version

Balls-in-Bins in Graphs

The d-choice process is a basic stochastic process with applications to resource allocation and scheduling. However, for many practical applications in networks where tasks correspond to balls and processors correspond to bins, a d-choice process becomes infeasible due to high communication cost. We introduced a natural process for allocating balls into bins that are organised as the vertices of an undirected graph, and analysed the impact of the graph's structure on the maximum load. In particular, for expander graphs our new process behaves as good as the classical d-choice process.

  • Balls into Bins via Local Search. joint work with Paul Bogdan, Thomas Sauerwald, and Alexandre Stauffer
            Proceedings of the 24th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 16-34, 2013.
           
    Conference version | arXiv version
  • Balls into Bins via Local Search: Cover Time and Maximum Load. joint work with Karl Bringmann, Thomas Sauerwald, and Alexandre Stauffer
            Proceedings of the 31st International Symposium on Theoretical Aspects of Computer Science (STACS), 2014.
            The full version is to appear in Random Structures & Algorithms, 2015.
            Conference version | arXiv version

Information Spreading

Randomized Rumor spreading are simple epidemic protocols that spread messegeas (rumors) from one node to all other nodes. This is done by allowing nodes to exchange the rumor with a randomly chosen neighbor with a each round. The main questions is th determine the number of required rounds to spread the rumor to all nodes in different networks. Our results include: (1) analyzing the rumor spreading time on different network topologies; (2) analyzing the randomness requirement of rumor spreading protocols.

  • Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading. with Zeyu Guo
            Proceedings of the 26th ACM-SIAM Symposium on Discrete Algorithms (SODA), 2015.
           
    Conference version | arXiv version
  • Randomized Rumor Spreading: the Effect of the Network Topology. with Xavier Pérez Giménez, Konstantinos Panagiotou, Thomas Sauerwald
            Combinatorics, Probability and Computing, 2015.
           
    Journal version
  • Low Randomness Rumor Spreading via Hashing. with George Giakkoupis, Thomas Sauerwald, and Philipp Woelfel.
            Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science (STACS), pages 314-325, 2012.
           
    Conference version

Subgraph Counting in Streams

Most real-world structured data sets nowadays are formulated as massive graphs, and these graphs may change over time. One modern approach is to process the underlying graph as a stream where every edge comes sequentially. Graph streaming algorithms run in sub-linear space, and approximate certain graph quantities with high probability. We developed a new technique of designing graph streaming algorithms using complex-valued hash functions, and presented the first algorithm for this problem that runs in distributed and dynamic graph streams.

  • Counting Hypergraphs in Data Streams.
            arXiv version
  • Counting Arbitrary Subgraphs in Data Streams. joint work with Daniel M. Kane, Kurt Mehlhorn, and Thomas Sauerwald.
    Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP), pages 598-609, 2012.
            Conference version
  • Approximate Counting of Cycles in Streams. joint work with Madhusudan Manjunath, Kurt Mehlhorn, and Konstantinos Panagiotou.
    Proceedings of the 19th Annual European Symposium on Algorithms (ESA), pages 677-688, 2011.
           
    Conference version  

Minimum Manhattan Networks

Given a set of points in the Euclidean space, the minimum Manhattan network problem is to construct a rectilinear network such that, for any pair of input points, there is a path connecting them whose length equals to their L1 distance. We proved that (1) the decision version of this problem is strong NP-complete, and (2) present the best approximation algorithm with respect to runtime and approximation ratio. The complexity status of this problem was open for 11 years before our result.

  • Minimum Manhattan Network is NP-Complete. joint work with Francis Chin, and Zeyu Guo
            Discrete and Computational Geometry, 45(4):701-722, 2011.
           
    Conference version | Journal version  
  • Greedy Construction of 2-Approximate Minimum Manhattan Networks. joint work with Zeyu Guo, and Hong Zhu
            International Journal of Computational Geometry and Applications, 21(3): 331-350, 2011.
           
    Conference version | Journal version  
  • A Fast 2-Approximation Algorithm for the Minimum Manhattan Network Problem. joint work with Zeyu Guo, and Hong Zhu
            Proceedings of the 4th International Confernce on Algorithmic Aspects in Information Management (AAIM), pages 212-223, 2008.
           
    Conference version  

Teaching

Courses


Advanced Algorithms

  • University of Bristol, Spring 2016

Data Structures and Algorithms

  • University of Bristol, Fall 2015

Advanced Algorithms

  • University of Bristol, Spring 2015

Great Ideas in Theoretical Computer Science

  • Saarland University, Spring 2014

Spectral Graph Theory

  • Saarland University, Spring 2014

Great Ideas in Theoretical Computer Science

  • Saarland University, Spring 2013

Spectral Graph Theory and Applications

  • Saarland University, Spring 2012

Expander Graphs in Computer Science

  • Saarland University, Spring 2011

Students

  • Luca Zanetti. PhD student, University of Bristol, Oct. 2012 - present
  • Pavel Kolev. master student, Max Planck Institute for Informatics, 2012 - 2013.

Talks

a selection of my conference and invited talks

Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time

  • The Technical University of Dortmund, March 2016
  • Dagstuhl Seminar on Data Structures and Advanced Models of Computation on Big Data, March 2016
  • The University of Birmingham, February 2016
  • The University of Warwick, November 2015
  • Max Planck Institute for Informatics, Saarbruecken, September 2015
  • EPFL, Lausanne, August 2015

Heat Kernel in Graphs: A Journey from Random Walks to Geometry, and Back

  • The University of Warwick, November 2015
  • The University of Middlesex, October 2015
  • EPFL, Lausanne, August 2015
  • Institute of Software, Chinese Academy of Sciences, Beijing, July 2015
  • The University of Cambridge, Cambridge, June 2015
  • Workshop on Random Walks on Random Graphs and Applications, Eindhoven, April 2015
  • Max Planck Institute for Informatics, January 2015

Partitioning Well-Clustered Graphs with k-Means and Heat Kernel

  • Reunion Workshop at the Simons Institute for the Theory of Computing, UC Berkeley, December 2015
  • Workshop on Distributed Machine Learning and Optimization, Edinburgh, November 2015
  • Tsinghua University, Beijing, July 2015
  • The University of Bristol, Bristol, July 2015
  • Simons Institute for the Theory of Computing, UC Berkeley, December 2014

Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading.

  • The University of Bristol, Bristol, April 2015
  • Simons Institute for the Theory of Computing, UC Berkeley, October 2014

Efficient Algorithms in Massive Graphs

  • The University of California, Davis, November 2014
  • The University of Bristol, Bristol, July 2014
  • Eindhoven University of Technology, Eindhoven, May 2014
  • Joint MPI-INF/MPI-SWS/MMCI Lecture Series, Saarbrücken, December 2013

Randomness-Effiient Rumor Spreading

  • Max Planck Institute for Informatics, Saarbrücken, April 2013
  • Dagstuhl Seminar: Epidemic Algorithms and Processes, January 2013

From Kant To Turing: On the Philosophical Foundations of Computer Science

  • Max Planck Institute for Informatics, Saarbrücken, November 2012

Counting Arbitrary Subgraphs in Data Streams

  • Dagstuhl Seminar: Data Structures and Advanced Models of Computation on Big Data, February 2014
  • The University of Cambridge, October 2013
  • Dagstuhl Seminar: Computational Counting, January. 2013
  • Stanford University, Palo Alto, November 2012
  • California Institute of Technology, Pasadena, October 2012
  • Workshop on Algorithms for Data Streams, Dortmund, July 2012
  • 39th International Colloquium on Automata, Languages and Programming, Warwick, July 2012

Tight Bounds For Randomized Load Balancing on Arbitrary Network Topologies

  • Conference on Applications of Graph Spectra in Computer Science, Bellaterra, July 2012
  • 5th Annual Meeting of the Asian Association for Algorithms and Computation, Shanghai, April 2012

Low Randomness Rumor Spreading via Hashing

  • 29th International Symposium on Theoretical Aspects of Computer Science, Paris, March 2012
  • 12th Max Planck Advanced Course on the Foundations of Computer Science, Saarbrücken, August 2011

Approximate Counting of Cycles in Streams

  • 19th Annual European Symposium on Algorithms, Saarbrücken, September 2011
  • 15th International Conference on Random Structures and Algorithms, Atlanda, May 2011

Last updatd on April 21st, 2015