WAD.processing
- WAD.processing.merge_by_celltype(sample_names: list, celltype_map: dict, chrom_sizes: dict, temp_dir: str = 'temp', threshold_fraction: float = 0.25)
Merge denoised signals across samples for each cell type, chromosome by chromosome.
- Parameters:
sample_name (list of str) – List of sample names.
celltype_map (dict) – Mapping from sample name to cell type.
chrom_sizes (dict) – Chromosome sizes dictionary.
temp_dir (str, optional) – Directory containing parquet files (default is “temp”).
threshold_fraction (float, optional) – Minimum fraction of samples with non-zero signal required to keep a peak (default is 0.25).
- Returns:
The function writes a merged and filtered denoised parquet file for each cell type and chromosome to the temp_dir. The function estimates scATAC cell-type proportions based on denoised signal sum.
- Return type:
dict
- WAD.processing.process_all_samples(scatac: list[str], cell_types: list[str], chrom_sizes: dict, temp_dir: str = 'temp', step: int = 50, wavelet_name: str = 'db4', level: int = 1)
Process all bigWig file into denoised chromosome-level parquet files.
- Parameters:
scatac (list of str) – Paths to single-cell ATAC bigWig files.
cell_types (list of str) – Corresponding cell types for each bigWig file.
chrom_sizes (dict) – Chromosome sizes dictionary.
temp_dir (str, optional) – Directory to store intermediate parquet files (default is “temp”).
step (int, optional) – Window size for bigWig signal extraction (default is 50).
wavelet_name (str, optional) – Wavelet type for DWT denoising (default is “db4”).
level (int, optional) – Decomposition level for DWT (default is 1).
- Returns:
The function writes denoised chromosome-level parquet files to temp_dir.
- Return type:
None
- WAD.processing.process_one_chrom(bigwig_path: str, cell_type: str, chrom: str, size: int, temp_dir: str = 'temp', step: int = 50, wavelet_name: str = 'db4', level: int = 1)
Process a single chromosome for one bigWig file: extract signal, denoise and save as parquet.
- Parameters:
bigwig_path (str) – Path to the bigWig file.
cell_type (str) – Cell type label.
chrom (str) – Chromosome name.
size (int) – Chromosome size.
temp_dir (str, optional) – Directory to store intermediate parquet files (default is “temp”).
step (int, optional) – Window size for signal extraction (default is 50).
wavelet_name (str, optional) – Wavelet type for DWT (default is “db4”).
level (int, optional) – Decomposition level for DWT (default is 1).
- Returns:
The function writes the denoised chromosome-level signal to a parquet file in temp_dir.
- Return type:
None
- WAD.processing.process_rocco(bigwig_files: list[str], chrom_sizes_file: str, chrom_sizes_dict: dict, temp_dir: str, step: int = 50, budget: float = 0.03, gamma: float = 1.0, c_1: float = 1.0, c_2: float = -1.0, c_3: float = 1.0)
Call peaks using Rocco on all cell-type specific bigWig files in parallel.
- Parameters:
bigwig_files (list of str) – Paths to cell-type specific bigWig files.
chrom_sizes_file (str) – Path to chromosome sizes file.
chrom_seizes_dict (dict) – Chromosome sizes dictionary.
temp_dir (str) – Directory to store output BED files.
step (int, optional) – Window size for bigWig signal extraction (default is 50).
budget (float, optional) – Sparsity budget for Rocco (default is 0.03).
gamma (float, optional) – Weight parameter for Rocco optimization (default is 1.0).
c_1 (float, optional) – Linear term weight for central tendency (default is 1.0).
c_2 (float, optional) – Linear term weight for dispersion (default is -1.0).
c_3 (float, optional) – Linear term weight for boundary (default is 1.0).
- Returns:
The function writes BED peak files fro each chromosome to temp_dir and merges them into a single BED file.
- Return type:
None