WAD.io
- WAD.io.concat_chromosome_parquets_for_celltype(celltype: str, chrom_sizes: dict, temp_dir: str = 'temp', denoised: bool = False) DataFrame
Load all chromosome-level parquet files for a given cell type and concatenates them into a single DataFrame.
- Parameters:
celltype (str) – Name of the cell type.
chrom_sizes (dict) – Dictionary mapping chromosomes to their sizes.
temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).
denoised (bool, optional) – Whether to load denoised parquet files (default is False).
- Returns:
Concatenated DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’].
- Return type:
pd.DataFrame
- WAD.io.group_samples(scATAC: list[str], cell_type: list[str])
Build a mapping from celltype to scATAC sample paths.
- Parameters:
scATAC (list of str) – List of scATAC bigWig file paths.
cell_type (list of str) – List of corresponding cell type labels for each scATAC files.
- Returns:
Dictionary mapping each cell type to a list of file paths.
- Return type:
defaultdict(list)
- WAD.io.load_celltype_parquets_for_chrom(cell_type: str, sample_names: list, chrom: str, temp_dir: str = 'temp', denoised: bool = False) list
Load (original or denoised) parquet files for all samples of a given cell type and chromosome.
- Parameters:
cell_type (str) – Name of the cell type.
sample_names (list of str) – List of sample names.
chrom (str) – Chromosome name.
temp_dir (str, optional) – Directory containing the parquet files (default is “temp”).
denoised (bool, optional) – Whether to load denoised parquet files (default is False).
- Returns:
List of DataFrame loaded from the parquet files.
- Return type:
list of pd.DataFrame
- WAD.io.load_chrom_sizes(path: str) dict
Convert tab-delimited chromosome sizes file into a dictionary.
- Parameters:
path (str) – Path to the chromosome sizes file.
- Returns:
Dictionary mapping chromosome names to their sizes.
- Return type:
dict
- WAD.io.load_matrix(file_path: str)
Load TSV matrix file into DataFrame for deconvolution.
- Parameters:
file_path (str) – Path to the TSV matrix file.
- Returns:
DataFrame containing the numeric matrix with the index column removed.
- Return type:
pd.DataFrame
- WAD.io.read_bigWig(bigwig_path: str, chrom: str, start: int, end: int, step: int = 50)
Read bigWig signal over [start, end) region in fixed step (default: 50) bins.
- Parameters:
bigwig_path (str) – Path to the bigWig file.
chrom (str) – Chromosome name.
start (int) – Start position of the interval.
end (int) – End position of the interval.
step (int, optional) – Bin size in base pairs (default is 50).
- Returns:
DataFrame with columns [‘chrom’, ‘start’, ‘end’, ‘values’] representing the binned signal.
- Return type:
pd.DataFrame
- WAD.io.save_deconvolution_results(deconvolution_results: dict, output_dir: str) None
Save the deconvolution results as a TSV file
- Parameters:
deconvolution_results (dict) – Dictionary mapping bulk sample names to cell-type proportions.
output_dir (str) – Directory to save the output TSV file.
- Returns:
The function writes the resulting matrix to the output_dir
- Return type:
None
- WAD.io.write_bigWig(path: str, header: list[tuple[str, int]], df: DataFrame)
Write a pandas dataframe to bigwig format. Dataframe must have columns: chrom, start, end, value
- Parameters:
path (str) – Path to the output bigWig file.
header (list of tuple) – List of (chrom, size) tuples for the bigWig header.
df (pd.DataFrame) – DataFrame containing the genomic intervals and values.
- Returns:
The function writes the bigWig file to the path.
- Return type:
None