报告主题：Issue with CS curriculum in the era of data science and machine learning
报告摘要：In the era of big data and AI, how to improve and expand the traditional CS curriculum system? What new courses should be set up and what new knowledge should be updated in order to adapt to the era which IT technology develops rapidly? In this talk, I will introduce the curriculum reform of Rice University in big data and machine learning, and showing my thoughts about it.
报告主题：Inference of Phylogenetic Networks in the Post-genomic Era
报告摘要：Using genome-wide data for phylogenetic inference and analysis has become commonplace in the post-genomic era, giving rise to the field of phylogenomics. The multispecies coalescent (MSC) model has emerged as the main stochastic process that helps capture the intricate relationship between species trees and gene trees. Combined with models of sequence evolution, the MSC can be viewed as a generative model of genomic sequence data in the context of a (species) phylogenetic tree.
A significant outcome of the use of genome-wide data has been the increasing evidence, or hypotheses, of reticulation (e.g., hybridization) during the evolution of various groups of eukaryotic species. Reticulate evolutionary histories are best represented as phylogenetic networks, which extend the tree model to allow for admixtures of genetic material. In this talk, I will describe the multispecies network coalescent (MSNC) model, which extends the MSC model so that it operates within the branches of a phylogenetic network. This extended model naturally allows for modeling vertical and horizontal evolutionary processes acting within and across species boundaries. In particular, it simultaneously accounts for gene tree incongruence across loci due to both hybridization and incomplete lineage sorting. I will then describe a likelihood function for this model, as well as a method for Bayesian sampling of phylogenetic networks and their parameters using reversible-jump Markov chain Monte Carlo (RJMCMC).
报告主题：Elucidating Intratumor Heterogeneity from Single-cell DNA Sequencing Data
报告摘要：Intra-tumor heterogeneity, as caused by a combination of mutation and selection, poses significant challenges to the diagnosis and clinical therapy of cancer. Resolving this heterogeneity to identify the tumor cell populations (clones) and delineate their evolutionary history is of critical importance in improving cancer diagnosis and therapy. This heterogeneity can be readily elucidated and understood through the reconstruction of the clonal genotypes and evolutionary history of the tumor cells. These tasks are challenging since genomic data is most often collected from one snapshot during the evolution of the tumor's constituent cells. Consequently, using computational methods that infer the tumor phylogeny and tumor subpopulations from sequence data is the approach of choice. Recently emerged single-cell DNA sequencing (SCS) technologies promise to resolve intra-tumor heterogeneity to a single-cell level. However, inherent technical errors in SCS datasets, including false-positive (FP) errors, false-negatives (FN) due to allelic dropout, cell doublets and coverage non-uniformity significantly complicate these tasks.
In this talk, I will first describe a maximum likelihood method for inferring tumor trees from imperfect SCS genotype data with potentially missing entries, under a finite-sites model of evolution. I will then describe a non-parametric Bayesian method that simultaneously reconstructs the clonal populations as clusters of single cells, mutations associated with each clone, and the genealogical relationships between the clonal populations. I will demonstrate the performance of the methods on both synthetic and real data sets.