This function will run permutation framework to compute a p-value for the correlation between the vectorised genes and clusters each cluster.
Usage
compute_permp(
data,
cluster_info,
perm.size,
bin_type,
bin_param,
all_genes,
correlation_method = "pearson",
n.cores = 1,
correction_method = "BH",
w_x,
w_y
)Arguments
- data
A list of matrices containing the coordinates of transcripts.
- cluster_info
A dataframe/matrix containing the centroid coordinates and cluster label for each cell.The column names should include "x" (x coordinate), "y" (y coordinate), and "cluster" (cluster label).
- perm.size
A positive number specifying permutation times
- bin_type
A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".
- bin_param
A numeric vector indicating the size of the bin. If the
bin_typeis "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If thebin_typeis "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.- all_genes
A vector of strings giving the name of the genes you want to test correlation for.
gene_mt.- correlation_method
A parameter pass to
corindicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.- n.cores
A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).
- correction_method
A character string pass to
p.adjustspecifying the correction method for multiple testing .- w_x
a numeric vector of length two specifying the x coordinate limits of enclosing box.
- w_y
a numeric vector of length two specifying the y coordinate limits of enclosing box.
Value
A named list with the following components
obs.statA matrix contains the observation statistic for every gene and every cluster. Each row refers to a gene, and each column refers to a cluster
perm.arraysA three dimensional array. The first two dimensions represent the correlation between the genes and permuted clusters. The third dimension refers to the different permutation runs.
perm.pvalA matrix contains the raw permutation p-value. Each row refers to a gene, and each column refers to a cluster
perm.pval.adjA matrix contains the adjusted permutation p-value. Each row refers to a gene, and each column refers to a cluster
Details
To get a permutation p-value for the correlation between a gene
and a cluster, this function will permute the cluster label for
each cell randomly, and calculate correlation between the genes and
permuted clusters. This process will be repeated for perm.size
times, and permutation p-value is calculated as the probability of
permuted correlations larger than the observation correlation.
Examples
set.seed(100)
# simulate coordinates for clusters
df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5), cluster="A")
df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5), cluster="B")
clusters = rbind(df_clA, df_clB)
clusters$sample="rep1"
# simulate coordinates for genes
trans_info = data.frame(rbind(cbind(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5),
feature_name="gene_A1"),
cbind(x = rnorm(n=100, mean=20, sd=5),
y = rnorm(n=100, mean=20, sd=5),
feature_name="gene_A2"),
cbind(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5),
feature_name="gene_B1"),
cbind(x = rnorm(n=100, mean=100, sd=5),
y = rnorm(n=100, mean=100, sd=5),
feature_name="gene_B2")))
trans_info$x=as.numeric(trans_info$x)
trans_info$y=as.numeric(trans_info$y)
w_x = c(min(floor(min(trans_info$x)),
floor(min(clusters$x))),
max(ceiling(max(trans_info$x)),
ceiling(max(clusters$x))))
w_y = c(min(floor(min(trans_info$y)),
floor(min(clusters$y))),
max(ceiling(max(trans_info$y)),
ceiling(max(clusters$y))))
rep1 = list(trans_info = trans_info)
perm_res_lst = compute_permp(data=rep1,
cluster_info=clusters,
perm.size=100,
bin_type="square",
bin_param=c(2,2),
all_genes=unique(trans_info$feature_name),
correlation_method = "pearson",
n.cores=2,
correction_method="BH",
w_x=w_x ,
w_y=w_y)
#> Correlation Method = pearson
#> Running 100 permutation with 2 cores in parallel
perm_pvalue = perm_res_lst$perm.pval.adj