WAD.processing

WAD.processing.merge_by_celltype(sample_names: list, celltype_map: dict, chrom_sizes: dict, temp_dir: str = 'temp', threshold_fraction: float = 0.25)

Merge denoised signals across samples for each cell type, chromosome by chromosome.

Parameters:
  • sample_name (list of str) – List of sample names.

  • celltype_map (dict) – Mapping from sample name to cell type.

  • chrom_sizes (dict) – Chromosome sizes dictionary.

  • temp_dir (str, optional) – Directory containing parquet files (default is “temp”).

  • threshold_fraction (float, optional) – Minimum fraction of samples with non-zero signal required to keep a peak (default is 0.25).

Returns:

The function writes a merged and filtered denoised parquet file for each cell type and chromosome to the temp_dir. The function estimates scATAC cell-type proportions based on denoised signal sum.

Return type:

dict

WAD.processing.process_all_samples(scatac: list[str], cell_types: list[str], chrom_sizes: dict, temp_dir: str = 'temp', step: int = 50, wavelet_name: str = 'db4', level: int = 1)

Process all bigWig file into denoised chromosome-level parquet files.

Parameters:
  • scatac (list of str) – Paths to single-cell ATAC bigWig files.

  • cell_types (list of str) – Corresponding cell types for each bigWig file.

  • chrom_sizes (dict) – Chromosome sizes dictionary.

  • temp_dir (str, optional) – Directory to store intermediate parquet files (default is “temp”).

  • step (int, optional) – Window size for bigWig signal extraction (default is 50).

  • wavelet_name (str, optional) – Wavelet type for DWT denoising (default is “db4”).

  • level (int, optional) – Decomposition level for DWT (default is 1).

Returns:

The function writes denoised chromosome-level parquet files to temp_dir.

Return type:

None

WAD.processing.process_one_chrom(bigwig_path: str, cell_type: str, chrom: str, size: int, temp_dir: str = 'temp', step: int = 50, wavelet_name: str = 'db4', level: int = 1)

Process a single chromosome for one bigWig file: extract signal, denoise and save as parquet.

Parameters:
  • bigwig_path (str) – Path to the bigWig file.

  • cell_type (str) – Cell type label.

  • chrom (str) – Chromosome name.

  • size (int) – Chromosome size.

  • temp_dir (str, optional) – Directory to store intermediate parquet files (default is “temp”).

  • step (int, optional) – Window size for signal extraction (default is 50).

  • wavelet_name (str, optional) – Wavelet type for DWT (default is “db4”).

  • level (int, optional) – Decomposition level for DWT (default is 1).

Returns:

The function writes the denoised chromosome-level signal to a parquet file in temp_dir.

Return type:

None

WAD.processing.process_rocco(bigwig_files: list[str], chrom_sizes_file: str, chrom_sizes_dict: dict, temp_dir: str, step: int = 50, budget: float = 0.03, gamma: float = 1.0, c_1: float = 1.0, c_2: float = -1.0, c_3: float = 1.0)

Call peaks using Rocco on all cell-type specific bigWig files in parallel.

Parameters:
  • bigwig_files (list of str) – Paths to cell-type specific bigWig files.

  • chrom_sizes_file (str) – Path to chromosome sizes file.

  • chrom_seizes_dict (dict) – Chromosome sizes dictionary.

  • temp_dir (str) – Directory to store output BED files.

  • step (int, optional) – Window size for bigWig signal extraction (default is 50).

  • budget (float, optional) – Sparsity budget for Rocco (default is 0.03).

  • gamma (float, optional) – Weight parameter for Rocco optimization (default is 1.0).

  • c_1 (float, optional) – Linear term weight for central tendency (default is 1.0).

  • c_2 (float, optional) – Linear term weight for dispersion (default is -1.0).

  • c_3 (float, optional) – Linear term weight for boundary (default is 1.0).

Returns:

The function writes BED peak files fro each chromosome to temp_dir and merges them into a single BED file.

Return type:

None