Benchmarking at Scale: Comparing Analysis Workflows for Single-Cell Genomic Data, UTHealth, School of Biomedical informatics, Precision Medicine Seminar Series, November 2019, Houston, TX.

The rapid adoption of single-cell RNA sequencing (scRNA-seq) has created a new pressure point in computational analyses. As of July 2019, >450 tools have appeared to address tasks such as normalization, clustering, and imputation. However, the community still struggles to identify the best tool(s) for any given task. At the time of publishing a method, the authors typically show how the method outperforms others in author-defined settings, using real data with presumed “truth”, sometimes supplemented with synthetic data simulated under specific models (e.g., clusters or continuous trajectories). Comparative re-evaluation of available tools tends to be limited to default workflows, using simulations that are not community-agreed or not easily extendable. To address the difficulty of standardized benchmarking at a large scale, we created >1000 archival quality simulated scRNA-seq datasets with complete knowledge of their underlying clusters, and used them to test 15 clustering algorithms over 225 workflows. The datasets are transcript count matrices, linked in a hyper-grid of parameters to cover a range of models and known degrees of difficulties. The differential performance of the 225 workflows in the >1K datasets allowed both global statistical control of the model space and fine-grained assessment of the algorithmic decisions affecting performance.

This project is an example of the research in the Michigan Center for Single-Cell Genomic Data Analytics. We and colleagues in the Center develop new metrics, revise existing algorithms, to address data science challenges of sparse counts data. I will discuss our vision of developing guidelines to learn statistically-relevant features from real datasets and adjusting the simulations accordingly, for making the open-source in silico data sufficiently real: matching the empirical data/platform to arbitrary closeness, and reusable at any scale. We also examine data science behavior: the diversity of decision-making styles among analysts. The ultimate goal of this research is to build a general-purpose support system, including evolving knowledge of available algorithms, checklists for making claims, for mass customization of new pipelines based on the statistical structure and difficulty of the data, rather than on the biological topic.