WAD.io

WAD.io.concat_chromosome_parquets_for_celltype(celltype: str, chrom_sizes: dict, temp_dir: str = 'temp', denoised: bool = False) DataFrame

Load all chromosome-level parquet files for a given cell type and concatenates them into a single DataFrame.

Parameters:
  • celltype (str) – Name of the cell type.

  • chrom_sizes (dict) – Dictionary mapping chromosomes to their sizes.

  • temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).

  • denoised (bool, optional) – Whether to load denoised parquet files (default is False).

Returns:

Concatenated DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’].

Return type:

pd.DataFrame

WAD.io.group_samples(scATAC: list[str], cell_type: list[str])

Build a mapping from celltype to scATAC sample paths.

Parameters:
  • scATAC (list of str) – List of scATAC bigWig file paths.

  • cell_type (list of str) – List of corresponding cell type labels for each scATAC files.

Returns:

Dictionary mapping each cell type to a list of file paths.

Return type:

defaultdict(list)

WAD.io.load_celltype_parquets_for_chrom(cell_type: str, sample_names: list, chrom: str, temp_dir: str = 'temp', denoised: bool = False) list

Load (original or denoised) parquet files for all samples of a given cell type and chromosome.

Parameters:
  • cell_type (str) – Name of the cell type.

  • sample_names (list of str) – List of sample names.

  • chrom (str) – Chromosome name.

  • temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).

  • denoised (bool, optional) – Whether to load denoised parquet files (default is False).

Returns:

List of DataFrame loaded from the parquet files.

Return type:

list of pd.DataFrame

WAD.io.load_chrom_sizes(path: str) dict

Convert tab-delimited chromosome sizes file into a dictionary.

Parameters:

path (str) – Path to the chromosome sizes file.

Returns:

Dictionary mapping chromosome names to their sizes.

Return type:

dict

WAD.io.load_matrix(file_path: str)

Load TSV matrix file into DataFrame for deconvolution.

Parameters:

file_path (str) – Path to the TSV matrix file.

Returns:

DataFrame containing the numeric matrix with the index column removed.

Return type:

pd.DataFrame

WAD.io.read_bigWig(bigwig_path: str, chrom: str, start: int, end: int, step: int = 50)

Read bigWig signal over [start, end) region in fixed step (default: 50) bins.

Parameters:
  • bigwig_path (str) – Path to the bigWig file.

  • chrom (str) – Chromosome name.

  • start (int) – Start position of the interval.

  • end (int) – End position of the interval.

  • step (int, optional) – Bin size in base pairs (default is 50).

Returns:

DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’] representing the binned signal.

Return type:

pd.DataFrame

WAD.io.save_deconvolution_results(deconvolution_results: dict, output_dir: str) None

Save the deconvolution results as a TSV file

Parameters:
  • deconvolution_results (dict) – Dictionary mapping bulk sample names to cell-type proportions.

  • output_dir (str) – Directory to save the output TSV file.

Returns:

The function writes the resulting matrix to the output_dir

Return type:

None

WAD.io.write_bigWig(path: str, header: list[tuple[str, int]], df: DataFrame)

Write a pandas dataframe to bigwig format. Dataframe must have columns: chrom, start, end, value

Parameters:
  • path (str) – Path to the output bigWig file.

  • header (list of tuple) – List of (chrom, size) tuples for the bigWig header.

  • df (pd.DataFrame) – DataFrame containing the genomic intervals and values.

Returns:

The function writes the bigWig file to the path.

Return type:

None