Introduction

Understanding NSBM

schist is based on Nested Stochastic Block Models (NSBM), a generative process based on the notion of group of nodes. Here, we use NSBM to cluster cells in scRNA-seq experiments.

Bayesian approach

Cell to cell relations are commonly represented as a neighborhood graph (typically KNN or SNN), cell groups are identified as graph communities, this step is usually performed maximising graph modularity. In schist, instead, partitioning is performed maximising the Bayesian posterior probability P(b|A), that is the likehood of generating a network _A_ with a partition _b_ and it is obtained according to

\frac{ \sum_{t=0}^{N}f(t,k) }{N}

System Message: WARNING/2 (P(\b|\A) = \frac{P(A|\theta,b)P(\theta,b)}{P(A)})

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 84 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size12.clo)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/utf8x.def)) (/usr/share/texlive/texmf-dist/tex/latex/ucs/ucs.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/data/uni-global.def)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/latex/anyfontsize/anyfontsize.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/bm.sty) (./math.aux) (/usr/share/texlive/texmf-dist/tex/latex/ucs/ucsencs.def) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd) LaTeX Warning: Command \b invalid in math mode on input line 14. ! Undefined control sequence. <argument> \split@tag \begin {split}P(\b |\A ) = \frac {P(A|\theta ,b)P(\the... l.14 ...P(A|\theta,b)P(\theta,b)}{P(A)}\end{split} LaTeX Warning: Command \b invalid in math mode on input line 14. ! Undefined control sequence. <argument> \split@tag \begin {split}P(\b |\A ) = \frac {P(A|\theta ,b)P(\the... l.14 ...P(A|\theta,b)P(\theta,b)}{P(A)}\end{split} [1] (./math.aux) ) (see the transcript file for additional information) Output written on math.dvi (1 page, 456 bytes). Transcript written on math.log.

Where P(A|θ,b) is the probability of obtaining the network _A_ given the partition _b_ and additional parameters _θ_; P(θ,b) is the probability of occurrence of the partition _b_ having observed the netwok _A_; P(A) is the “model evidence” and it is the same for all possible partitions. Refer to the excellent graph-tool documentation <https://graph-tool.skewed.de/static/doc/demos/inference/inference.html>_ for more details. Note that maximising this quantity is equivalent to minimizing the entropy

System Message: WARNING/2 (Sigma&space;=&space;-\ln&space;P(\boldsymbol&space;A|\boldsymbol\theta,&space;\boldsymbol&space;b)&space;-&space;\ln&space;P(\boldsymbol\theta,&space;\boldsymbol&space;b)" title="\Sigma = -\ln P(\boldsymbol A|\boldsymbol\theta, \boldsymbol b) - \ln P(\boldsymbol\theta, \boldsymbol b))

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 84 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size12.clo)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/utf8x.def)) (/usr/share/texlive/texmf-dist/tex/latex/ucs/ucs.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/data/uni-global.def)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/latex/anyfontsize/anyfontsize.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/bm.sty) (./math.aux) (/usr/share/texlive/texmf-dist/tex/latex/ucs/ucsencs.def) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd) ! Extra alignment tab has been changed to \cr. <template> }$\hfill \endtemplate l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Argument of \boldsymbol has an extra }. <inserted text> \par l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} Runaway argument? ! Paragraph ended before \boldsymbol was complete. <to be read again> \par l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing $ inserted. <inserted text> $ l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing } inserted. <inserted text> } l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Misplaced \cr. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} ! Missing \cr inserted. <inserted text> \cr l.14 ...oldsymbol\theta, \boldsymbol b)\end{split} (That makes 100 errors; please try again.) No pages of output. Transcript written on math.log.

The nested model introduces a hierarchy of priors used to infer the optimal recursive grouping of single cell groups. If you are familiar with Leiden or Louvain methods to find cell groups, you may think at this multilevel approach as a multiresolution one, except that it is not. Here, not only the cell groups at each hierarchy level are found maximising the equation above, but the hierarchy itself (hence the groups of groups) is part of the model. Since there may be more than one fit with similar probability, schist uses the graph-tool routines to apply a Markov chain Monte Carlo sampling of the posterior distribution aiming to converge to the best model. One of the main limitations of schist is that it requires significantly more time than any other state of the art approach to identify cell groups. This cost comes with the benefit that it is possible to choose between different parameters according to the likelihood of a partition set to be found over a network.

Fast model vs standard approach

In the standard approach, the model is initialized by minimizing the description length (entropy). This requires extra time but, in general, returns better results. It is possible to achieve good results with lower memory footprint and in shorter times settting fast_model:

nested_model(adata, fast_model=True)

This will seed the model with a dummy partition scheme, then a greedy merge-split MCMC will explore solutions until it converges.

Marginals

When using the following invocation

nested_model(adata, collect_marginals=True, equilibrate=True)

an additional step (with fixed number of iterations) is added to execution. During this step, schist collects the probability of having a certain number of groups for each level of the hierarchy. These are stored into adata.uns[‘nsbm’][‘group_marginals’]:

level = 2
S = adata.uns['nsbm']['group_marginals'][level].sum()
p = adata.uns['nsbm']['group_marginals'][level] / S
ng = range(1, len(p) + 1)
bar(ng, p)
xticks(ng)
xlabel('Number of groups')
ylabel('Probability')
title(f'Group marginals for level {level}')
alternate text

Prior to version 0.3.4, this option would have collected posterior probabilities of cells to belong to a certain group. Since this quantity is interesting and it would have required a long time to compute, we’d rather calculate variation of global entropy by moving all cells to all identified groups and transform that into a probaility, which is now stored into adata.uns[‘nsbm’][‘cell_affinity’]. Here, a dictionary keyed with NSBM levels counts the times a cell has been successfully moved to a group. These probabilities can be efficiently used as covariates when looking for marker genes, this approach will weight the belief that a cell belongs to a group. We have prepared a notebook showing an example.