Generating Interpretable, Reliable, and Quantitative Models of Emergent Behavior from High-dimensional Data
Event Details
- Type
- Center for Studies in Physics and Biology Seminars
- Speaker(s)
-
Jason Kim, Ph.D., Kavli Institute at Cornell Theory Fellow, Cornell University
- Speaker bio(s)
-
Natural systems with emergent behaviors often organize along nonlinear low-dimensional subsets of high-dimensional spaces. For example, despite the tens of thousands of genes in the human genome, the principled study of genomics is fruitful because biological processes rely on coordinated organization along lower dimensional subspaces of phenotypes. To uncover this organization, many dimensionality reduction techniques embed high-dimensional data in low-dimensional spaces by modeling local relationships between data points. However, these methods fail to directly model the subspaces in which the data reside, thereby limiting their ability to infer the biological processes that globally organize the data, and to generalize out-of-distribution. Here, we address this limitation by directly learning a nonlinear subspace that is well-behaved not only in regions where there are data, but also in regions where there are no data by regularizing the curvature of manifolds generated by autoencoders, a method we coin "Γ-autoencoder." We demonstrate its utility in a wide range of datasets, including bulk RNA-seq from healthy and cancer tissues, single-cell RNA-seq from cell differentiation, and neural activity from the mouse hippocampus. We discover the global biological programs that emerge as relevant variables, demonstrate superior predictions on data from completely unseen out-of-distribution classes, and consistently learn the same nonlinear subspaces across different random initializations. Broadly, we anticipate that direct modeling of the low-dimensional subspaces that generate and organize data through regularizing the curvature of generative models will enable more interpretable, generalizable, and consistent models in any high-dimensional system with emergent low-dimensional behavior.
- Open to
- Public
- Phone
- (212) 327-8636
- Sponsor
-
Melanie Lee
(212) 327-8636
leem@rockefeller.edu