Spatially-dependent normalisation for spatial transcriptomics data
Source:R/mainSpaNorm.R
SpaNorm.Rd
Performs normalisation of spatial transcriptomics data using spatially-dependent spot- and gene- specific size factors.
Usage
SpaNorm(
spe,
sample.p = 0.25,
gene.model = c("nb"),
adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"),
scale.factor = 1,
df.tps = 6,
lambda.a = 1e-04,
batch = NULL,
tol = 1e-04,
step.factor = 0.5,
maxit.nb = 50,
maxit.psi = 25,
maxn.psi = 500,
overwrite = FALSE,
verbose = TRUE,
...
)
# S4 method for class 'SpatialExperiment'
SpaNorm(
spe,
sample.p = 0.25,
gene.model = c("nb"),
adj.method = c("auto", "logpac", "pearson", "medbio", "meanbio"),
scale.factor = 1,
df.tps = 6,
lambda.a = 1e-04,
batch = NULL,
tol = 1e-04,
step.factor = 0.5,
maxit.nb = 50,
maxit.psi = 25,
maxn.psi = 500,
overwrite = FALSE,
verbose = TRUE,
...
)
Arguments
- spe
a SpatialExperiment or Seurat object, with the count data stored in 'counts' or 'data' assays respectively.
- sample.p
a numeric, specifying the (maximum) proportion of cells/spots to sample for model fitting (default is 0.25).
- gene.model
a character, specifying the model to use for gene/protein abundances (default 'nb'). This should be 'nb' for count based datasets.
- adj.method
a character, specifying the method to use to adjust the data (default 'auto', see details)
- scale.factor
a numeric, specifying the sample-specific scaling factor to scale the adjusted count.
- df.tps
a numeric, specifying the degrees of freedom for the thin-plate spline (default is 6).
- lambda.a
a numeric, specifying the smoothing parameter for regularizing regression coefficients (default is 0.0001). Actual lambda.a used is lambda.a * ncol(spe).
- batch
a vector or numeric matrix, specifying the batch design to regress out (default NULL, representing no batch effects). See details for more information on how to define this variable.
- tol
a numeric, specifying the tolerance for convergence (default is 1e-4).
- step.factor
a numeric, specifying the multiplicative factor to decrease IRLS step by when log-likelihood diverges (default is 0.5).
- maxit.nb
a numeric, specifying the maximum number of IRLS iteration for estimating NB mean parameters for a given dispersion parameter (default is 50).
- maxit.psi
a numeric, specifying the maximum number of IRLS iterations to estimate the dispersion parameter (default is 25).
- maxn.psi
a numeric, specifying the maximum number of cells/spots to sample for dispersion estimation (default is 500).
- overwrite
a logical, specifying whether to force recomputation and overwrite an existing fit (default FALSE). Note that if df.tps, batch, lambda.a, or gene.model are changed, the model is recomputed and overwritten.
- verbose
a logical, specifying whether to show update messages (default TRUE).
- ...
other parameters fitting parameters.
Value
a SpatialExperiment or Seurat object with the adjusted data stored in 'logcounts' or 'data', respectively.
Details
SpaNorm works by first fitting a spatial regression model for library size to the data. Normalised data can then be computed using various adjustment approaches. When a negative binomial gene-model is used, the data can be adjusted using the following approaches: 'logpac', 'pearson', 'medbio', and 'meanbio'.
Batch effects can be specified using the batch
parameter. If this parameter is a vector, a design matrix will be created within the function using model.matrix
. If a custom design is provided in the form of a numeric matrix, this should ideally be created using model.matrix
. The batch matrix should be created with an intercept term. The SpaNorm function will automatically detect the intercept term and remove the relevant column. Alternatively, users can subset the model matrix to remove this column manually. Please note that the model formula should include the intercept term and that the intercept column should be subset out after.
Examples
data(HumanDLPFC)
# \donttest{
SpaNorm(HumanDLPFC, sample.p = 0.05, df.tps = 2, tol = 1e-2)
#> Loading required package: SpatialExperiment
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
#> Warning: replacing previous import ‘S4Arrays::makeNindexFromArrayViewport’ by ‘DelayedArray::makeNindexFromArrayViewport’ when loading ‘SummarizedExperiment’
#> (1/2) Fitting SpaNorm model
#> 201 cells/spots sampled to fit model
#> iter: 1, estimating gene-wise dispersion
#> iter: 1, log-likelihood: -1131632.440359
#> iter: 1, fitting NB model
#> iter: 1, iter: 1, log-likelihood: -1131632.440359
#> iter: 1, iter: 2, log-likelihood: -813184.377314
#> iter: 1, iter: 3, log-likelihood: -726819.709782
#> iter: 1, iter: 4, log-likelihood: -708014.945154
#> iter: 1, iter: 5, log-likelihood: -704554.727696
#> iter: 1, iter: 6, log-likelihood: -703928.449409
#> iter: 1, iter: 7, log-likelihood: -703796.923679
#> iter: 1, iter: 8, log-likelihood: -703757.707708 (converged)
#> iter: 2, estimating gene-wise dispersion
#> iter: 2, log-likelihood: -703309.607508
#> iter: 2, fitting NB model
#> iter: 2, iter: 1, log-likelihood: -703309.607508
#> iter: 2, iter: 2, log-likelihood: -703040.415059
#> iter: 2, iter: 3, log-likelihood: -703023.823163 (converged)
#> iter: 3, log-likelihood: -703023.823163 (converged)
#> (2/2) Normalising data
#> class: SpatialExperiment
#> dim: 5076 4015
#> metadata(1): SpaNorm
#> assays(2): counts logcounts
#> rownames(5076): ENSG00000188976 ENSG00000188290 ... ENSG00000198727
#> ENSG00000278817
#> rowData names(2): gene_name gene_biotype
#> colnames(4015): AAACAAGTATCTCCCA-1 AAACACCAATAACTGC-1 ...
#> TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
#> colData names(3): cell_count sample_id AnnotatedCluster
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
# }