Collections »

A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast

About this collection

Extent

1 digital object.

Cite This Work

Baba, Shahnawaz A.; Labhsetwar, Shreyas; Klemke, Richard; Desgrosellier, Jay S. (2023). A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0R78FDG

Description

These data show the relative amount of chromosomal instability (CIN) in a diverse array of human breast cell types, including non-transformed mammary epithelial cells as well as cancer cell lines. Additional data is also provided from human embryonic and mesenchymal stem cells. To produce this dataset, we compared a published chromosomal instability gene signature against publicly available datasets containing gene expression information for each cell. We then analyzed these data with the Python GSEAPY software package, providing a CIN enrichment score for each cell. These data are useful for comparing the relative amounts of CIN in different breast cell types. This includes cells representing the major clinical (ER/PR+, HER2+ & Triple-negative) as well as intrinsic breast cancer subtypes (Luminal B, HER2+, Basal-like and Claudin-low). Our dataset has a great potential for re-use given the recent surge in interest surrounding the role of CIN in breast cancer. The large size of the dataset, coupled with the diversity of the cell types represented, provides numerous possibilities for future comparisons.

Creation Date

2018 to 2023

Date Issued

2023

Principal Investigator

Researchers

Methods

FASTQs were converted to gene-expression matrices and the files were processed to remove all the header information and only retain the data. The CIN gene signature was acquired from Bakhoum et al. The CIN scores were obtained by examining enrichment for the CIN associated gene signature in each cell type represented in the sequencing datasets according to Barbie et al. To generate the CIN scores for each cell type, we analyzed data with the Python GSEAPY Library (https://gseapy.readthedocs.io/en/latest/). First, input files were read using Python’s Pandas library and joined with each other using the ID & amp columns before deleting any unnecessary columns. Ensemble Gene IDs were mapped to their HGNC Symbols using Python’s BioMart API https://pypi.org/project/biomart. Any Ensemble ID which did not have a corresponding HGNC Symbol was dropped. Once we obtained the data frame having HGNC Symbols as rows, samples as columns, and their feature counts as values in all rows, this data frame, along with the CIN gene set was passed to the Single Sample GSEA Python library. The final data comprised 36866 rows and 106 columns before feeding it into GSEAPY. To determine the enrichment scores (ES), we applied Single Sample GSEA to the final data frame. The experiment was repeated with a normalized version of the data frame, but the normalized enrichment scores (NES) were identical to the ES. GSEAPY output was then processed into Excel format and saved as final results files.

Note

Shahnawaz A. Baba was responsible for conceptualization of the study. Shahnawaz A. Baba, Labhsetwar, Shreyas, Richard Klemke, and Jay S. Desgrosellier were responsible for developing the methodology. Shreyas Labhsetwar and Richard Klemke were responsible for formal analysis and software development. Jay S. Desgrosellier was Principal Investigator.

Funding

Tobacco-Related Disease Research Program [Grant #T32IR4741 (to J.S.D.)]; and the California Breast Cancer Research Program [Grant #B28IB5479 (to J.S.D.)].

Topics

Formats View formats within this collection

Language

English

Related Resources

Primary associated publication

Shahnawaz A. Baba; Qi Sun; Samson Mugisha; Shreyas Labhsetwar; Richard Klemke; Jay S. Desgrosellier (2023). Breast cancer stem cells tolerate chromosomal instability during tumor progression via c-Jun/AXL stress signaling. Heliyon. https://doi.org/10.1016/j.heliyon.2023.e20182

Source data

National Library of Medicine, Transcriptome of sorted HCC38 breast cancer cells - BioProject: https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA750073

NCBI, Gene Expression Omnibus, Characterization of cell lines derived from breast cancers and normal mammary tissues for the study of the intrinsic molecular subtypes: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50470

Software

Python BioMart API: https://pypi.org/project/biomart

Python GSEAPY Library : https://gseapy.readthedocs.io/en/latest/

Reference

Bakhoum, S., Ngo, B., Laughney, A. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018). https://doi.org/10.1038/nature25432

Barbie, D., Tamayo, P., Boehm, J. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009). https://doi.org/10.1038/nature08460

Hong, C., Schubert, M., Tijhuis, A.E. et al. cGAS–STING drives the IL-6-dependent survival of chromosomally instable cancers. Nature 607, 366–373 (2022). https://doi.org/10.1038/s41586-022-04847-2

Prat, A., Karginova, O., Parker, J.S. et al. Characterization of cell lines derived from breast cancers and normal mammary tissues for the study of the intrinsic molecular subtypes. Breast Cancer Res Treat 142, 237–255 (2013). https://doi.org/10.1007/s10549-013-2743-3

Sun, Q., Wang, Y., Officer, A. et al. Stem-like breast cancer cells in the activated state resist genetic stress via TGFBI-ZEB1. npj Breast Cancer 8, 5 (2022). https://doi.org/10.1038/s41523-021-00375-w

Collection image

Image source: Shahnawaz Baba. "CIN enrichment score pipeline."

A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast

View Collection Items