WAD.io

WAD.io.concat_chromosome_parquets_for_celltype(celltype: str, chrom_sizes: dict, temp_dir: str = 'temp', denoised: bool = False) → DataFrame

Load all chromosome-level parquet files for a given cell type and concatenates them into a single DataFrame.

Parameters:

celltype (str) – Name of the cell type.
chrom_sizes (dict) – Dictionary mapping chromosomes to their sizes.
temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).
denoised (bool, optional) – Whether to load denoised parquet files (default is False).

Returns:

Concatenated DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’].

Return type:

pd.DataFrame

WAD.io.group_samples(scATAC: list[str], cell_type: list[str])

Build a mapping from celltype to scATAC sample paths.

Parameters:

scATAC (list of str) – List of scATAC bigWig file paths.
cell_type (list of str) – List of corresponding cell type labels for each scATAC files.

Returns:

Dictionary mapping each cell type to a list of file paths.

Return type:

defaultdict(list)

WAD.io.load_celltype_parquets_for_chrom(cell_type: str, sample_names: list, chrom: str, temp_dir: str = 'temp', denoised: bool = False) → list

Load (original or denoised) parquet files for all samples of a given cell type and chromosome.

Parameters:

cell_type (str) – Name of the cell type.
sample_names (list of str) – List of sample names.
chrom (str) – Chromosome name.
temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).
denoised (bool, optional) – Whether to load denoised parquet files (default is False).

Returns:

List of DataFrame loaded from the parquet files.

Return type:

list of pd.DataFrame

WAD.io.load_chrom_sizes(path: str) → dict

Convert tab-delimited chromosome sizes file into a dictionary.

Parameters:: path (str) – Path to the chromosome sizes file.
Returns:: Dictionary mapping chromosome names to their sizes.
Return type:: dict

WAD.io.load_matrix(file_path: str)

Load TSV matrix file into DataFrame for deconvolution.

Parameters:: file_path (str) – Path to the TSV matrix file.
Returns:: DataFrame containing the numeric matrix with the index column removed.
Return type:: pd.DataFrame

WAD.io.read_bigWig(bigwig_path: str, chrom: str, start: int, end: int, step: int = 50)

Read bigWig signal over [start, end) region in fixed step (default: 50) bins.

Parameters:

bigwig_path (str) – Path to the bigWig file.
chrom (str) – Chromosome name.
start (int) – Start position of the interval.
end (int) – End position of the interval.
step (int, optional) – Bin size in base pairs (default is 50).

Returns:

DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’] representing the binned signal.

Return type:

pd.DataFrame

WAD.io.save_deconvolution_results(deconvolution_results: dict, output_dir: str) → None

Save the deconvolution results as a TSV file

Parameters:

deconvolution_results (dict) – Dictionary mapping bulk sample names to cell-type proportions.
output_dir (str) – Directory to save the output TSV file.

Returns:

The function writes the resulting matrix to the output_dir

Return type:

None

WAD.io.write_bigWig(path: str, header: list[tuple[str, int]], df: DataFrame)

Write a pandas dataframe to bigwig format. Dataframe must have columns: chrom, start, end, value

Parameters:

path (str) – Path to the output bigWig file.
header (list of tuple) – List of (chrom, size) tuples for the bigWig header.
df (pd.DataFrame) – DataFrame containing the genomic intervals and values.

Returns:

The function writes the bigWig file to the path.

Return type:

None