1 Introduction

The main purpose of the LandSCENT package is to provide a means of estimating the differentiation potency of single cells without the need to assume prior biological knowledge (e.g. marker expression or timepoint). As such, it may provide a more unbiased means for assessing potency or pseudotime. The package features:

Provided input arguments for SingleCellExperiment class and CellDataSet class for interoperability with a wide range of other Bioconductor packages, like scater and monocle;
Tools for visualising entropy values of scRNA-seq data, especially 3D density plots with cell clusters;
Inferring distinct differential potency states for a certain scRNA-seq dataset;
Similar application to both single cell RNA-seq data and bulk RNA-seq data, and providing functions comparing each other.

This document gives a detailed tutorial of the LandSCENT package from data normalization to result visualization. LandSCENT package requires two main sources of input:

Raw/Normalized single cell RNA-seq data
User defined functional gene network

How to prepare these inputs for using in LandSCENT will be described in detail in the following sections.

2 User defined functional gene network

LandSCENT requires as input a user defined functional gene network, for instance, a protein-protein interaction(PPI) network including the main interactions that take place in a cell. Although these networks are mere caricatures of the underlying signaling networks, ignoring time, spatial and biological contexts, one of the discoveries made recently is that cell potency appears to be encoded by a subtle positive correlation between transcriptome and connectome, with hubs in these networks generally exhibit higher expression in more potent cells. For details we refer the reader to our publications given at the end of this vignette(Teschendorff and Enver 2017) and (Teschendorff, Sollich, and Kuehn 2014).

In this vignette we will use a previously defined PPI network to calculate all the example results. The specific PPI network we use here is derived from Pathway Commons, which is an integrated resource collating together PPIs from several distinct sources. In particular, the network is constructed by integrating the following sources: the Human Protein Reference Database (HPRD), the National Cancer Institute Nature Pathway Interaction Database (NCI-PID), the Interactome (Intact) and the Molecular Interaction Database (MINT).

We have stored the networks in the package. There are two versions of protein-protein interaction (PPI) network under filenames “net17Jan2016.m.RData” and “net13Jun12.m.RData”(early version). You can access these with the data function. Here we use the early version network:

library(LandSCENT)
data(net13Jun12.m)

Importantly, the nodes (genes) in this network are labeled with Entrez gene IDs, and entries take on values “0” and “1”, with “0” indicating that there is no interaction or connection between the two genes, and ”1” indicating there an interaction has been reported. It is also important to note that the diagonal entries are set to “0”.

3 Single cell RNA-seq data

We assume that you have a matrix/object containing expression count data summarised at the level of genes. You then need to do quality control and normalization on the data.

If you have a normalized data matrix already, you can directly go to the 4th section of this vignette.

Moreover, we provide input arguements for SingleCellExperiment (scater) and CellDataSet (monocle) class in DoIntegPPI function. If you have objects of such two classes, you could also directly go to the 4th section.

Here we use a scRNA-Seq dataset from (Chu et al. 2016), generated with the Fluidigm C1 platform, as an example. Due to the package size restriction, we cannot store the whole data matrix in the package. However, the example is a sub-dataset of 100 cells with two cell types: pluripotent human embryonic stem cells (hESC) and non-pluripotent progenitors of endothelial cells (EC). You can access the raw data and phenotype information using data function.

data(rawExample.m)
data(phenoExample.v)

The full dataset can be downloaded from the GEO website under accession number GSE75748, and the specific file to download is one of the supplementary file under filename “GSE75748_sc_cell_type_ec.csv.gz”. You can download the dataset with getGEOSuppFiles function in GEOquery package and extract the raw expression matrix from the supplementary files.

require(GEOquery)
require(Biobase)
GSE75748 <- getGEOSuppFiles("GSE75748")
gunzip(rownames(GSE75748)[3])
rawdata.m <- as.matrix(read.csv("GSE75748/GSE75748_sc_cell_type_ec.csv", row.names = 1))

We also provide the phenotype information of the full dataset in the package. It can be easily loaded into your session using data function.

data(phenoscChu.v)

3.1 Quality control

Here we use scater package to do quality control and normalizaion on the raw data. Since LandSCENT is rather robust to different normalization methods, you can always choose more suitable workflow for your own dataset, just guaranteeing the normalized data meet the specific requirements of LandSCENT package. First, we create a SingleCellExperiment object containing the rawdata. Rows of the object correspond to genes, while columns correspond to single cells.

require(scater)
example.sce <- SingleCellExperiment(assay = list(counts = rawExample.m))

Then, we detect low-quality cells based on library size and number of expressed genes in each library. We also select cells with low proportion of mitochondrial genes and spike-in RNA. Using the isOutlier function in scater package with default arrguments, we remove 4 cells and most of them (3 cells) are filtered out because of high spike-in gene content.

### Detect mitochondrial gene and spike-in RNA

is.mito <- grepl("^MT", rownames(example.sce))
is.spike <- grepl("^ERCC", rownames(example.sce))

counts(example.sce) <- as(counts(example.sce), "dgCMatrix")
example.sce <- calculateQCMetrics(example.sce, feature_controls=list(Spike=is.spike, Mt=is.mito))

### Cell Filtering wih isOutlier function

libsize.drop <- isOutlier(example.sce$total_counts, nmads=5, type="lower", log=TRUE);
mito.drop <- isOutlier(example.sce$pct_counts_Mt, nmads=5, type="higher");
spike.drop <- isOutlier(example.sce$pct_counts_Spike, nmads=5, type="higher");

filter_example.sce <- example.sce[, !(libsize.drop | mito.drop | spike.drop)]
phenoExample.v <- phenoExample.v[!(libsize.drop | mito.drop | spike.drop)]
data.frame(ByLibSize=sum(libsize.drop), ByMito=sum(mito.drop), 
           BySpike=sum(spike.drop), Remaining=ncol(filter_example.sce))

##   ByLibSize ByMito BySpike Remaining
## 1         0      1       3        96

example.sce <- filter_example.sce

3.2 Normalization

After quality control, we then move to the normalization part. scater package defines the size factors from the scaled library sizes of all cells.

sizeFactors(example.sce) <- librarySizeFactors(example.sce)

Scaling normalization is then used to remove cell-specific biases, e.g. coverage or capture efficiency. Log-transformed normalized expression values can be simply computed with normalize function.

Importantly, we need to add an offset value of 1.1 before log-transformation. The offset is added in order to ensure that the minimum value after log-transformation would not be 0, but a nonzero value (typically log2(1.1)~0.14). We do not want zeroes in our final matrix since the computation of signaling entropy rate involves ratios of gene expression values and zeros in the denominator are undefined.

example.sce <- normalize(example.sce, log_exprs_offset = 1.1)
example.m <- as.matrix(assay(example.sce, i = "logcounts"))
min(example.m)

## [1] 0.1375035

3.3 Check gene identifier

Because LandSCENT will integrate the network with scRNA-Seq data, the row names and column names of the network must use the same gene identifier as used in the scRNA-Seq data.

In our example, rownames of the data matrix are all human gene symbols. So we need to use mapIds function in AnnotationDbi package along with org.Hs.eg.db package to get corresponding human Entrez gene ID.

require(AnnotationDbi)
require(org.Hs.eg.db)
anno.v <- mapIds(org.Hs.eg.db, keys = rownames(example.m), keytype = "SYMBOL", 
                 column = "ENTREZID", multiVals = "first")
unique_anno.v <- unique(anno.v)
example_New.m <- matrix(0, nrow = length(unique_anno.v), ncol = dim(example.m)[2])
for (i in seq_len(length(unique_anno.v))) {
  tmp <- example.m[which(anno.v == unique_anno.v[i]) ,]
  if (!is.null(dim(tmp))) {
    tmp <- colSums(tmp) / dim(tmp)[1]
  }
  example_New.m[i ,] <- example_New.m[i ,] + tmp
}
rownames(example_New.m) <- unique_anno.v
colnames(example_New.m) <- colnames(example.m)
example_New.m <- example_New.m[-which(rownames(example_New.m) %in% NA) ,]
Example.m <- example_New.m

Now the scRNA-seq data are ready for the calculation of signaling entropy rate.

4 How to use `LandSCENT` package

Before start introducing functionalities of LandSCENT package, we give several points for you to check, in case you skipped the one or two sections above:

Make sure the diagonal elements in your network matrix are set to “0”;
Make sure scRNA-seq data use the same gene identifier with network matrix;
Make sure the minimal expression value in your normalized scRNA-seq matrix/object is a positive (non-zero) number. We suggest a value of around 0.1.

And more notes for users with SingleCellExperiment and CellDataSet objects:

LandSCENT accepts these two kinds of objects, and will do normalization inside the function process. So there is no need for you to normalize the data before implement LandSCENT. But it still requires the scRNA-seq data using the same gene identifier with network matrix.
Use of function is identical to the scenario of data matrix. So in the following sections, we use data matrix as examples.

4.1 Differentiation potency estimation

The estimation of differentiation potency with LandSCENT consists of two major steps:

Integration of the scRNA-Seq data with the user-defined gene functional network.
Computation of the Signaling Entropy Rate (denoted SR) which is used to approximate differentiation potency of single cells.

A typical workflow starts from integration of the scRNA-Seq data with the user-defined gene functional network. Here you can simply apply DoIntegPPI function on Example.m dataset and PPI network net13Jun12.m:

Integration.l <- DoIntegPPI(exp.m = Example.m, ppiA.m = net13Jun12.m)
str(Integration.l)

## List of 3
##  $ expMC: num [1:8393, 1:96] 0.138 9.374 4.464 11.055 10.152 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:8393] "1510" "10436" "7917" "4173" ...
##   .. ..$ : chr [1:96] "H1_Exp1.001" "H1_Exp1.002" "H1_Exp1.003" "H1_Exp1.004" ...
##  $ adjMC: num [1:8393, 1:8393] 0 1 1 0 0 0 0 0 0 0 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:8393] "1510" "10436" "7917" "4173" ...
##   .. ..$ : chr [1:8393] "1510" "10436" "7917" "4173" ...
##  $ data : num [1:18935, 1:96] 3.57 2.773 0.138 0.138 0.138 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:18935] "57496" "135228" "80325" "23139" ...
##   .. ..$ : chr [1:96] "H1_Exp1.001" "H1_Exp1.002" "H1_Exp1.003" "H1_Exp1.004" ...

DoIntegPPI function takes these two arguments as input. The function finds the overlap between the gene identifiers labeling the network and those labeling the rows of the scRNA-seq matrix, and then extracts the maximally connected subnetwork, specified by the adjMC output argument. Also, the function constructs the reduced scRNA-Seq matrix, specified by the expMC output argument.

With the output object Integration.l, we can now proceed to compute the SR value for any given cell, using the CompSRana function. It takes five objects as input:

The output object Integration.l from DoIntegPPI function;
local, a logical parameter to tell the function whether to report back the normalized local, i.e. gene-centric, signaling entropies;
mc.cores, the number of cores to use, i.e. at most how many child processes will be run simultaneously. We use parallel package in this function with a defult mc.cores value of 1.

SR.o <- CompSRana(Integration.l, local = TRUE, mc.cores = 40)

Here we run the CompSRana function with 40 cores, which means the computer should have at least 40 processing cores. In your case, you may set mc.cores value based on your own computer/server.

The output argument SR.o is a list that added four elements onto the input Integration.l:

SR: the SR value of the cells.
inv: a matrix specifying the invariant measures, or steady-state probabilities, for each cell/sample over the network. That is, each column labels a cell/sample and is of length equal to the number of nodes in the adjMC matrix, with its entries adding to 1.
s: a matrix containing the unnormalized local signaling entropies, and therefore row number equals to the number of nodes in the adjMC matrix, column number equals to cell number.
ns: if local=TRUE, a matrix containing normalized local signaling entropies.

More for users with SingleCellExperiment and CellDataSet classes: the SR values will also be added as a new phenotype information onto the original sce and cds objects with name SR.

One note with the above step: the local gene-based entropies can be used in downstream analyses for ranking genes according to differential entropy, but only if appropriately normalized. For instance, they could be used to identify the main genes driving changes in the global signaling entropy rate of the network. However, if the user only wishes to estimate potency, specifying local=FALSE is fine, which will save some RAM on the output object, which is why we make it the default option.

As we mentioned above, we provide the example’s phenotype infromation in the package, stored in the phenoExample.v vector. With SR values and the phenotype information, you can then check that the SR values do indeed correlate with potency:

boxplot(SR.o$SR ~ phenoExample.v, main = "SR values against cell types", xlab = "Cell Types", ylab = "SR values")

Here hESC and EC refer to human embryonic stem cells and progenitors of endothelial cells, respectively.

4.2 Infer the potency states in a cell population

Having estimated the cell potency values, we can then infer cell potency states with InferPotency function. This function infers discrete potency states of single cells and its distribution across the single cell populations.

If you have no phenotype information, you can simply implement this method with above aforementioned object SR.o:

InferPotency.o <- InferPotency(SR.o)

Then the infered distinct potency states for every cell will be stored in the element named potencyState.

InferPotency.o$potencyState

But if phenotype information is provided, InferPotency will return the distribution of potency states in relation to the phenotype classes provided (e.g. cell-types):

InferPotency.o <- InferPotency(SR.o, pheno.v = phenoExample.v)

## [1] "Fit Gaussian Mixture Model to Signaling Entropies"
## [1] "Identified 2 potency states"
## [1] "Compute Shannon (Heterogeneity) Index for each Phenotype class"
## [1] "Done"

InferPotency.o$distPSPH

##        ordpotS.v
## pheno.v  1  2
##    EC    0 46
##    hESC 50  0

This result indicates that InferPotency function inferred 2 potency states, with the first potency state being occupied only by human embryonic stem cells(hESC), while the second potency state is enriched for mesoderm progenitor cells(EC).

Such results will also be stored for SingleCellExperiment and CellDataSet objects.

4.3 Infer potency-coexpression clusters (landmarks)

Next step, we may want to explore the heterogeneity in potency within the cell population and infer lineage relationships. One way to approach this question is to use the InferLandmark function.

This function identifies potency-coexpression clusters of single cells called landmarks, and finally infers the dependencies of these landmarks which can aid in recontructing lineage trajectories.

With aforementioned result InferPotency.o, one can easily implement this function:

InferLandmark.o <- InferLandmark(InferPotency.o, pheno.v = phenoExample.v,
                                 reduceMethod = "PCA", clusterMethod = "PAM",
                                 k_pam = 2)

## [1] "Now estimating number of significant components of variation in scRNA-Seq data"
## [1] "Centering and scaling matrix"
## [1] "Done, now performing SVD"
## [1] "Performing full SVD since dimensionality of data matrix is not big"
## [1] "Done"
## [1] "Number of significant components = 4"
## [1] "Do dimension reduction via PCA"
## [1] "Identifying co-expression clusters via PAM"
## [1] "Inferred 2 clusters"
## [1] "Now identifying landmarks (potency co-expression clusters)"
## [1] "Identified 2 Landmarks"
## [1] "Constructing expression medoids of landmarks"
## [1] "Inferring dependencies/trajectories/transitions between landmarks"

InferLandmark will do dimension reduction inside, so we provide two arguments, reduceMethod and clusterMethod, for you to decide the methods you want.

For reduceMethod, one can choose PCA or tSNE.

If you choose PCA, then InferLandmark will return the coordinates with top two components of PCA result and store them in InferLandmark.o$coordinates.
If you choose tSNE, InferLandmark will firstly implement RMT method on the data matrix to estimate the number of significant components. Then it will do PCA under the estimated number of components. With such PCA result, tSNE will then be performed to generate data coordinates in two-dimensional space. The coordinates will also be stored in InferLandmark.o$coordinates.

For clusterMethod, one can select PAM or dbscan.

Clustering via PAM: you need to specific the maximal number of the clustes, whose corresponding argument is k_pam.
Clustering via dbscan: we provide two arguments, eps_dbscan and minPts_dbscan, for you to optimize the clustering result. And you need to tune these two arguments based on your own dataset. The default values are 10 and 5, respectively. The implementation is also very straightforward:

InferLandmark.o <- InferLandmark(InferPotency.o, pheno.v = phenoExample.v,
                                 reduceMethod = "PCA", clusterMethod = "dbscan",
                                 eps_dbscan = 10, minPts_dbscan = 5)

The other results, except coordinates, of this function will be stored in a list named InferLandmark.l inside the output object InferLandmark.o. For example, you can access the landmark index for every single cell with:

InferLandmark.o$InferLandmark.l$cl

##   H1_Exp1.001   H1_Exp1.002   H1_Exp1.003   H1_Exp1.004   H1_Exp1.006 
##             1             1             1             1             1 
##   H1_Exp1.007   H1_Exp1.008   H1_Exp1.009   H1_Exp1.010   H1_Exp1.011 
##             1             1             1             1             1 
##   H1_Exp1.012   H1_Exp1.014   H1_Exp1.015   H1_Exp1.016   H1_Exp1.017 
##             1             1             1             1             1 
##   H1_Exp1.018   H1_Exp1.019   H1_Exp1.020   H1_Exp1.021   H1_Exp1.022 
##             1             1             1             1             1 
##   H1_Exp1.023   H1_Exp1.024   H1_Exp1.025   H1_Exp1.026   H1_Exp1.027 
##             1             1             1             1             1 
##   H1_Exp1.029   H1_Exp1.030   H1_Exp1.031   H1_Exp1.032   H1_Exp1.033 
##             1             1             1             1             1 
##   H1_Exp1.035   H1_Exp1.036   H1_Exp1.038   H1_Exp1.039   H1_Exp1.040 
##             1             1             1             1             1 
##   H1_Exp1.041   H1_Exp1.042   H1_Exp1.043   H1_Exp1.044   H1_Exp1.045 
##             1             1             1             1             1 
##   H1_Exp1.047   H1_Exp1.048   H1_Exp1.049   H1_Exp1.050   H1_Exp1.051 
##             1             1             1             1             1 
##   H1_Exp1.052   H1_Exp1.053   H1_Exp1.054   H1_Exp1.055   H1_Exp1.057 
##             1             1             1             1             1 
## EC_Batch1.001 EC_Batch1.002 EC_Batch1.003 EC_Batch1.004 EC_Batch1.005 
##             2             2             2             2             2 
## EC_Batch1.006 EC_Batch1.007 EC_Batch1.008 EC_Batch1.011 EC_Batch1.012 
##             2             2             2             2             2 
## EC_Batch1.013 EC_Batch1.014 EC_Batch1.015 EC_Batch1.016 EC_Batch1.017 
##             2             2             2             2             2 
## EC_Batch1.018 EC_Batch1.019 EC_Batch1.020 EC_Batch1.021 EC_Batch1.022 
##             2             2             2             2             2 
## EC_Batch1.023 EC_Batch1.026 EC_Batch1.027 EC_Batch1.028 EC_Batch1.030 
##             2             2             2             2             2 
## EC_Batch1.031 EC_Batch1.032 EC_Batch1.033 EC_Batch1.034 EC_Batch1.036 
##             2             2             2             2             2 
## EC_Batch1.037 EC_Batch1.038 EC_Batch1.042 EC_Batch1.043 EC_Batch1.044 
##             2             2             2             2             2 
## EC_Batch1.046 EC_Batch1.047 EC_Batch1.048 EC_Batch1.049 EC_Batch1.050 
##             2             2             2             2             2 
## EC_Batch1.051 EC_Batch1.052 EC_Batch1.053 EC_Batch1.054 EC_Batch1.055 
##             2             2             2             2             2 
## EC_Batch1.057 
##             2

You can also access the cell number distribution of phenotype against landmark with:

InferLandmark.o$InferLandmark.l$distPHLM

##        psclID.v
## pheno.v PS1-CL1 PS2-CL2
##    EC         0      46
##    hESC      50       0

For more information, you can check the help pages.

4.4 Application on bulk samples

It is important to note that the differentiation potency estimation step can also be applied on bulk RNA-seq data. The procedure is identical with single-cell RNA-Seq data, the only difference is in the specific preprocessing and normalization of the data.

4.5 Density based visualization tool

Here, we used an example dataset contains 3473 human breast epithelial cells from (Nguyen et al. 2018) to show our density based visualiazation function Plot_LandSR and Plot_CellSR.

4.5.1 `Plot_LandSR` function

This function generates figures which compares cell density across different distinct potency states, which are inftered by InferPotency function.

We provide arguments for you to choose whether to inherent dimension reduction result from the InferLandmark.o object, or to input the reduced diemension coordinates yourself.

And you can specify the color between all cell density and distinct potency cell density.

The horizon lines of these density maps decrease from left to right, which indicates the cell potency states decrease from high (PS1) to low (PS3).

LandSR.o <- Plot_LandSR(InferLandmark.o, coordinates = tsne.o, colpersp = NULL, 
                        colimage = NULL, bty = "f", PDF = FALSE)

The output list LandSR.o will store the input coordinates as an element, in LandSR.o$coordinates. you can easily access it with

LandSR.o$coordinates

4.5.2 `Plot_CellSR` function

This function generates figures that shows cell density on top of cell SR value distribution.

We also provide arguments for you to choose whether to inherent dimension reduction result from the InferLandmark.o object, or to input the reduced diemension coordinates yourself.

And you can specify the color of cell density and SR values distribution.

Note that the input argument num-grid is sensitive to the data size, which indicates the number of grid points in each direction. So it should be well asigned bsed on your own dataset. See more details in package help page.

CellSR.o <- Plot_CellSR(InferLandmark.o, coordinates = tsne.o, 
                        num_grid = 35, theta = 40, colpersp = NULL, 
                        colimage = NULL, bty = "f", PDF = FALSE)

The output list CellSR.o will also store the input coordinates as an element, in CellSR.o$coordinates.

4.6 A “pipeline” function for easy use

In LandSCENT, we provide a function named DoLandSCENT for users do not care much about the intermedium results but the final information. It takes the scRNA-seq data and network matrix with some control arguments as its inputs. And return a list contains SR values, potency states and other results after running all the necessary functions. You can also control the plot via PLOT and PDF arguments.

DoLandSCENT.o <- DoLandSCENT(exp.m = Example.m, ppiA.m = net13Jun12.m,
                             mc.cores = 30, pheno.v = NULL, 
                             coordinates = NULL,
                             PLOT = FALSE, PDF = FALSE)

4.7 Employ diffusion maps to infer differentiation trajectory

In the latest update, we have integrated diffusion maps from package destiny with our potency estimation,i.e. SR values, to infer differentiation trajectory.

Here we provide two functions: 1. DoDiffusionMap: This fuction gives utility of constructing diffusion map and selecting the root cell of the tajectory. 2. Plot_DiffusionMap: Based on the result of DoDiffusionMap, users can easily plot the diffusion maps with Plot_DiffusionMap function.

4.7.1 `DoDiffusionMap` function

Typically, the main input of DoDiffusionMap would be the output from InferLandmark function. It also requires you to specify several arguments based on your own dataset:

The mean_gap, sd_gap is the mean and standard deviation threshold for selecting highly variable genes(HVGs) inside function, respectively.
The root argument has two type of choices: cell and state.

cell means function will choose the root cell to be the one with the highest SR values among all the cells.
state means the function will first cluster the high potency cells and choose a group of cells with highest cluster-median SR values. Then the root cell will be assigned to the highest SR valued cell inside the choosen cluster.

The num_comp defines how many diffusion components you want to calculate. And this would influence the following plot procedure.
k is the number of nearest neighbors to consider while constructing the diffusion maps.

DoDiffusionMap.o <- DoDiffusionMap(Integration.l,
                                   mean_gap = 1, sd_gap = 1,
                                   root = c("cell", "state"),
                                   num_comp = 3,
                                   k = 30)

In the output object DoDiffusionMap.o, three new elements will be added:

A diffusion map object in DM
The diffusion compoents in DMEigen
The index of root cell is stored in root

In case you do not like the way we generate the plot or you want to add some other stuff, these elements are easy to transfer.

4.7.2 `Plot_DiffusionMap` function

This function is quite straightfoward. With aforementioned result DoDiffusionMap.o, one can easily run it with the following commands:

The default set of Plot_DiffusionMap would give you a 3D version figure. This function estimates diffusion pseudotime from the diffusion maps constructed from DoDiffusionMap. So you could choose to color the cells by its SR values or diffusion pseudotime(DPT) with color_by parameter.

Plot_DiffusionMap(DoDiffusionMap.o,
                  dim = c(1, 2, 3),
                  color_by = "SR",
                  phi = 40,
                  theta = 135,
                  bty = "g",
                  PDF = FALSE)

In the figure, Plot_DiffusionMap automatically highlights the root cell and predicted terminal cells. And we provide two arguments phi and theta to tune the angel in order to have a better view.

There three NOTES need to be addressed:

For argument dim, this is a vector virable. It indicates which dimension to use and in what order. For example, dim = c(1,2,3) means use diffusion component 1,2 and 3 to generrate figures, and the x-axis relates to DC1, y-axis relates to DC2 and z-axis relates to DC3. While dim = c(2,3,5) means x-axis relates to DC2, y-axis relates to DC3 and z-axis relates to DC5. More Importantly, If you specify negative numbers in dim (e.g. dim = c(2,3,-5)), then the corresponding DC5 will be flipped. Moreover, if the maximum dimension you choose here is larger than num_comp in DoDiffusionMap function, the plot would fail.
We provide an option in color_by to be DPT, which gives a diffusion pseudotime(DPT) estimation. In some cases, like the cell number is too little, or the cell clusters are seprated too far with each other, the estimation may fail. So before you generate the figure with DPT, you may need to look into the diffusion map itself, and make sure the DPT estimation could be carry out correctly.
In the 2D version figure, we employed ggplot2 to generate the figure. So it will be convient for you to add some customized feature with ggplot2 library functions. As we have shown in the following:

Plot_DiffusionMap(DoDiffusionMap.o,
                  dim = c(1, 2),
                  color_by = "DPT",
                  TIPs = c(1, 2, 3),
                  PDF = FALSE) +
  annotate("text", x = -0.02, y = -0.007, label = "Basal", size = 7) +
  annotate("text", x = 0.013, y = 0.053, label = "Lum2", size = 7) +
  annotate("text", x = 0.013, y = -0.023, label = "Lum1", size = 7)

4.8 Extract objects from function results

If users provided SingleCellExperiment or CellDataSet objects as the input, it then can be easily to extract the sce or cds from the results. The sce or cds objects are stored in the list with name data.sce and data.cds, respectively. And SR values and potency states are already been added to their phenotype information, so you can get such information in the colData or pData, respectively.

Take SR.o as an example:

result.sce <- SR.o$data.sce
result.cds <- SR.o$data.cds

With such object, you can then easily interact with monocle, scater and other packages.

5 Session information

sessionInfo()

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS:   /usr/local/lib64/R/3.6.0/lib64/R/lib/libRblas.so
## LAPACK: /usr/local/lib64/R/3.6.0/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] marray_1.62.0               limma_3.40.2               
##  [3] scater_1.12.1               ggplot2_3.1.1              
##  [5] SingleCellExperiment_1.6.0  SummarizedExperiment_1.14.0
##  [7] DelayedArray_0.10.0         BiocParallel_1.18.0        
##  [9] matrixStats_0.54.0          Biobase_2.44.0             
## [11] GenomicRanges_1.36.0        GenomeInfoDb_1.20.0        
## [13] IRanges_2.18.0              S4Vectors_0.22.0           
## [15] BiocGenerics_0.30.0         LandSCENT_0.99.3           
## [17] BiocStyle_2.12.0           
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15               ggbeeswarm_0.6.0        
##   [3] VGAM_1.1-1               colorspace_1.4-1        
##   [5] RcppEigen_0.3.3.5.0      class_7.3-15            
##   [7] rio_0.5.16               mclust_5.4.3            
##   [9] qvalue_2.16.0            corpcor_1.6.9           
##  [11] XVector_0.24.0           BiocNeighbors_1.2.0     
##  [13] clue_0.3-57              proxy_0.4-23            
##  [15] ggrepel_0.8.1            ranger_0.11.2           
##  [17] splines_3.6.0            docopt_0.6.1            
##  [19] robustbase_0.93-5        knitr_1.23              
##  [21] cluster_2.0.9            pheatmap_1.0.12         
##  [23] BiocManager_1.30.4       compiler_3.6.0          
##  [25] assertthat_0.2.1         Matrix_1.2-17           
##  [27] lazyeval_0.2.2           BiocSingular_1.0.0      
##  [29] htmltools_0.3.6          tools_3.6.0             
##  [31] rsvd_1.0.0               igraph_1.2.4.1          
##  [33] misc3d_0.8-4             gtable_0.3.0            
##  [35] glue_1.3.1               GenomeInfoDbData_1.2.1  
##  [37] RANN_2.6.1               reshape2_1.4.3          
##  [39] dplyr_0.8.1              ggthemes_4.2.0          
##  [41] Rcpp_1.0.1               carData_3.0-2           
##  [43] slam_0.1-45              DDRTree_0.1.5           
##  [45] cellranger_1.1.0         JADE_2.0-1              
##  [47] DelayedMatrixStats_1.6.0 lmtest_0.9-37           
##  [49] laeken_0.5.0             xfun_0.7                
##  [51] stringr_1.4.0            openxlsx_4.1.0          
##  [53] irlba_2.3.3              isva_1.9                
##  [55] DEoptimR_1.0-8           zoo_1.8-5               
##  [57] zlibbioc_1.30.0          MASS_7.3-51.4           
##  [59] scales_1.0.0             VIM_4.8.0               
##  [61] hms_0.4.2                plot3D_1.1.1            
##  [63] monocle_2.12.0           RColorBrewer_1.1-2      
##  [65] yaml_2.2.0               curl_3.3                
##  [67] gridExtra_2.3            fastICA_1.2-1           
##  [69] stringi_1.4.3            e1071_1.7-1             
##  [71] destiny_2.14.0           TTR_0.23-4              
##  [73] boot_1.3-22              densityClust_0.3        
##  [75] zip_2.0.2                rlang_0.3.4             
##  [77] pkgconfig_2.0.2          bitops_1.0-6            
##  [79] qlcMatrix_0.9.7          evaluate_0.13           
##  [81] lattice_0.20-38          purrr_0.3.2             
##  [83] tidyselect_0.2.5         plyr_1.8.4              
##  [85] magrittr_1.5             bookdown_0.10           
##  [87] R6_2.4.0                 combinat_0.0-8          
##  [89] withr_2.1.2              pillar_1.4.0            
##  [91] haven_2.1.0              foreign_0.8-71          
##  [93] xts_0.11-2               scatterplot3d_0.3-41    
##  [95] abind_1.4-5              RCurl_1.95-4.12         
##  [97] sp_1.3-1                 nnet_7.3-12             
##  [99] tibble_2.1.1             crayon_1.3.4            
## [101] car_3.0-2                rmarkdown_1.12          
## [103] viridis_0.5.1            readxl_1.3.1            
## [105] grid_3.6.0               data.table_1.12.2       
## [107] FNN_1.1.3                forcats_0.4.0           
## [109] vcd_1.4-4                HSMMSingleCell_1.4.0    
## [111] sparsesvd_0.1-4          digest_0.6.19           
## [113] dbscan_1.1-3             munsell_0.5.0           
## [115] beeswarm_0.2.3           viridisLite_0.3.0       
## [117] smoother_1.1             vipor_0.4.5

References

Chu, Li-Fang, Ning Leng, Jue Zhang, Zhonggang Hou, Daniel Mamott, David T. Vereide, Jeea Choi, Christina Kendziorski, Ron Stewart, and Thomson James A. 2016. “Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm.” Genome Biology 17 (1): 173.

Nguyen, Quy H., Nicholas Pervolarakis, Kerrigan Blake, Dennis Ma, Ryan Tevia Davis, James Nathan, Anh T. Phung, et al. 2018. “Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity.” Nature Communications 9 (1): 2028.

Teschendorff, Andrew E, and Tariq Enver. 2017. “Single-cell entropy for accurate estimation of differentiation potency from a cell’s transcriptome.” Nature Communications 8 (1): 15599.

Teschendorff, Andrew E, Peter Sollich, and Reimer Kuehn. 2014. “Signalling entropy: A novel network-theoretical framework for systems analysis and interpretation of functional omic data.” Methods 67 (3): 282.

A Tutorial for `LandSCENT`: Landscape Single Cell Entropy R package

2019-05-23

Package

Contents

1 Introduction

2 User defined functional gene network

3 Single cell RNA-seq data

3.1 Quality control

3.2 Normalization

3.3 Check gene identifier

4 How to use `LandSCENT` package

4.1 Differentiation potency estimation

4.2 Infer the potency states in a cell population

4.3 Infer potency-coexpression clusters (landmarks)

4.4 Application on bulk samples

4.5 Density based visualization tool

4.5.1 `Plot_LandSR` function

4.5.2 `Plot_CellSR` function

4.6 A “pipeline” function for easy use

4.7 Employ diffusion maps to infer differentiation trajectory

4.7.1 `DoDiffusionMap` function

4.7.2 `Plot_DiffusionMap` function

4.8 Extract objects from function results

5 Session information

References

A Tutorial for LandSCENT: Landscape Single Cell Entropy R package

2019-05-23

Package

Contents

1 Introduction

2 User defined functional gene network

3 Single cell RNA-seq data

3.1 Quality control

3.2 Normalization

3.3 Check gene identifier

4 How to use LandSCENT package

4.1 Differentiation potency estimation

4.2 Infer the potency states in a cell population

4.3 Infer potency-coexpression clusters (landmarks)

4.4 Application on bulk samples

4.5 Density based visualization tool

4.5.1 Plot_LandSR function

4.5.2 Plot_CellSR function

4.6 A “pipeline” function for easy use

4.7 Employ diffusion maps to infer differentiation trajectory

4.7.1 DoDiffusionMap function

4.7.2 Plot_DiffusionMap function

4.8 Extract objects from function results

5 Session information

References

A Tutorial for `LandSCENT`: Landscape Single Cell Entropy R package

4 How to use `LandSCENT` package

4.5.1 `Plot_LandSR` function

4.5.2 `Plot_CellSR` function

4.7.1 `DoDiffusionMap` function

4.7.2 `Plot_DiffusionMap` function