Calculate a p-value for correlation with permutation.

This function will run permutation framework to compute a p-value for the correlation between the vectorised genes and clusters each cluster.

Usage

compute_permp(
  data,
  cluster_info,
  perm.size,
  bin_type,
  bin_param,
  all_genes,
  correlation_method = "pearson",
  n.cores = 1,
  correction_method = "BH",
  w_x,
  w_y
)

Arguments

data: A list of matrices containing the coordinates of transcripts.
cluster_info: A dataframe/matrix containing the centroid coordinates and cluster label for each cell.The column names should include "x" (x coordinate), "y" (y coordinate), and "cluster" (cluster label).
perm.size: A positive number specifying permutation times
bin_type: A string indicating which bin shape is to be used for vectorization. One of "square" (default), "rectangle", or "hexagon".
bin_param: A numeric vector indicating the size of the bin. If the bin_type is "square" or "rectangle", this will be a vector of length two giving the numbers of rectangular quadrats in the x and y directions. If the bin_type is "hexagonal", this will be a number giving the side length of hexagons. Positive numbers only.
all_genes: A vector of strings giving the name of the genes you want to test correlation for. gene_mt.
correlation_method: A parameter pass to cor indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman": can be abbreviated.
n.cores: A positive number specifying number of cores used for parallelizing permutation testing. Default is one core (sequential processing).
correction_method: A character string pass to p.adjust specifying the correction method for multiple testing .
w_x: a numeric vector of length two specifying the x coordinate limits of enclosing box.
w_y: a numeric vector of length two specifying the y coordinate limits of enclosing box.

Value

A named list with the following components

obs.stat: A matrix contains the observation statistic for every gene and every cluster. Each row refers to a gene, and each column refers to a cluster
perm.arrays: A three dimensional array. The first two dimensions represent the correlation between the genes and permuted clusters. The third dimension refers to the different permutation runs.
perm.pval: A matrix contains the raw permutation p-value. Each row refers to a gene, and each column refers to a cluster
perm.pval.adj: A matrix contains the adjusted permutation p-value. Each row refers to a gene, and each column refers to a cluster

Details

To get a permutation p-value for the correlation between a gene and a cluster, this function will permute the cluster label for each cell randomly, and calculate correlation between the genes and permuted clusters. This process will be repeated for perm.size times, and permutation p-value is calculated as the probability of permuted correlations larger than the observation correlation.

Examples


set.seed(100)
# simulate coordinates for clusters
df_clA = data.frame(x = rnorm(n=100, mean=20, sd=5),
                   y = rnorm(n=100, mean=20, sd=5), cluster="A")
df_clB = data.frame(x = rnorm(n=100, mean=100, sd=5),
                  y = rnorm(n=100, mean=100, sd=5), cluster="B")
clusters = rbind(df_clA, df_clB)
clusters$sample="rep1"
# simulate coordinates for genes
trans_info = data.frame(rbind(cbind(x = rnorm(n=100, mean=20, sd=5),
                                    y = rnorm(n=100, mean=20, sd=5),
                                 feature_name="gene_A1"),
                           cbind(x = rnorm(n=100, mean=20, sd=5),
                                 y = rnorm(n=100, mean=20, sd=5),
                                 feature_name="gene_A2"),
                           cbind(x = rnorm(n=100, mean=100, sd=5),
                                 y = rnorm(n=100, mean=100, sd=5),
                                 feature_name="gene_B1"),
                           cbind(x = rnorm(n=100, mean=100, sd=5),
                                 y = rnorm(n=100, mean=100, sd=5),
                                 feature_name="gene_B2")))
trans_info$x=as.numeric(trans_info$x)
trans_info$y=as.numeric(trans_info$y)
w_x =  c(min(floor(min(trans_info$x)),
            floor(min(clusters$x))),
        max(ceiling(max(trans_info$x)),
            ceiling(max(clusters$x))))
w_y =  c(min(floor(min(trans_info$y)),
         floor(min(clusters$y))),
      max(ceiling(max(trans_info$y)),
          ceiling(max(clusters$y))))
rep1 = list(trans_info = trans_info)
perm_res_lst = compute_permp(data=rep1,
                    cluster_info=clusters,
                    perm.size=100,
                    bin_type="square",
                    bin_param=c(2,2),
                    all_genes=unique(trans_info$feature_name),
                    correlation_method = "pearson",
                    n.cores=2,
                    correction_method="BH",
                    w_x=w_x ,
                    w_y=w_y)
#> Correlation Method = pearson
#> Running 100 permutation with 2 cores in parallel
perm_pvalue = perm_res_lst$perm.pval.adj