A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast
A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast
About this collection
- Extent
-
1 digital object.
- Cite This Work
-
Baba, Shahnawaz A.; Labhsetwar, Shreyas; Klemke, Richard; Desgrosellier, Jay S. (2023). A dataset of chromosomal instability gene signature scores in normal and cancer cells from the human breast. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0R78FDG
- Description
-
These data show the relative amount of chromosomal instability (CIN) in a diverse array of human breast cell types, including non-transformed mammary epithelial cells as well as cancer cell lines. Additional data is also provided from human embryonic and mesenchymal stem cells. To produce this dataset, we compared a published chromosomal instability gene signature against publicly available datasets containing gene expression information for each cell. We then analyzed these data with the Python GSEAPY software package, providing a CIN enrichment score for each cell. These data are useful for comparing the relative amounts of CIN in different breast cell types. This includes cells representing the major clinical (ER/PR+, HER2+ & Triple-negative) as well as intrinsic breast cancer subtypes (Luminal B, HER2+, Basal-like and Claudin-low). Our dataset has a great potential for re-use given the recent surge in interest surrounding the role of CIN in breast cancer. The large size of the dataset, coupled with the diversity of the cell types represented, provides numerous possibilities for future comparisons.
- Creation Date
- 2018 to 2023
- Date Issued
- 2023
- Principal Investigator
- Researchers
- Methods
-
FASTQs were converted to gene-expression matrices and the files were processed to remove all the header information and only retain the data. The CIN gene signature was acquired from Bakhoum et al. The CIN scores were obtained by examining enrichment for the CIN associated gene signature in each cell type represented in the sequencing datasets according to Barbie et al. To generate the CIN scores for each cell type, we analyzed data with the Python GSEAPY Library (https://gseapy.readthedocs.io/en/latest/). First, input files were read using Python’s Pandas library and joined with each other using the ID & amp columns before deleting any unnecessary columns. Ensemble Gene IDs were mapped to their HGNC Symbols using Python’s BioMart API https://pypi.org/project/biomart. Any Ensemble ID which did not have a corresponding HGNC Symbol was dropped. Once we obtained the data frame having HGNC Symbols as rows, samples as columns, and their feature counts as values in all rows, this data frame, along with the CIN gene set was passed to the Single Sample GSEA Python library. The final data comprised 36866 rows and 106 columns before feeding it into GSEAPY. To determine the enrichment scores (ES), we applied Single Sample GSEA to the final data frame. The experiment was repeated with a normalized version of the data frame, but the normalized enrichment scores (NES) were identical to the ES. GSEAPY output was then processed into Excel format and saved as final results files.
- Note
-
Shahnawaz A. Baba was responsible for conceptualization of the study. Shahnawaz A. Baba, Labhsetwar, Shreyas, Richard Klemke, and Jay S. Desgrosellier were responsible for developing the methodology. Shreyas Labhsetwar and Richard Klemke were responsible for formal analysis and software development. Jay S. Desgrosellier was Principal Investigator.
- Funding
-
Tobacco-Related Disease Research Program [Grant #T32IR4741 (to J.S.D.)]; and the California Breast Cancer Research Program [Grant #B28IB5479 (to J.S.D.)].
- Topics
Formats
View formats within this collection
- Language
- English
- Related Resources
- Shahnawaz A. Baba; Qi Sun; Samson Mugisha; Shreyas Labhsetwar; Richard Klemke; Jay S. Desgrosellier (2023). Breast cancer stem cells tolerate chromosomal instability during tumor progression via c-Jun/AXL stress signaling. Heliyon. https://doi.org/10.1016/j.heliyon.2023.e20182
- National Library of Medicine, Transcriptome of sorted HCC38 breast cancer cells - BioProject: https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA750073
- NCBI, Gene Expression Omnibus, Characterization of cell lines derived from breast cancers and normal mammary tissues for the study of the intrinsic molecular subtypes: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50470
- Python BioMart API: https://pypi.org/project/biomart
- Python GSEAPY Library : https://gseapy.readthedocs.io/en/latest/
- Bakhoum, S., Ngo, B., Laughney, A. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018). https://doi.org/10.1038/nature25432
- Barbie, D., Tamayo, P., Boehm, J. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009). https://doi.org/10.1038/nature08460
- Hong, C., Schubert, M., Tijhuis, A.E. et al. cGAS–STING drives the IL-6-dependent survival of chromosomally instable cancers. Nature 607, 366–373 (2022). https://doi.org/10.1038/s41586-022-04847-2
- Prat, A., Karginova, O., Parker, J.S. et al. Characterization of cell lines derived from breast cancers and normal mammary tissues for the study of the intrinsic molecular subtypes. Breast Cancer Res Treat 142, 237–255 (2013). https://doi.org/10.1007/s10549-013-2743-3
- Sun, Q., Wang, Y., Officer, A. et al. Stem-like breast cancer cells in the activated state resist genetic stress via TGFBI-ZEB1. npj Breast Cancer 8, 5 (2022). https://doi.org/10.1038/s41523-021-00375-w
- Image source: Shahnawaz Baba. "CIN enrichment score pipeline."
Primary associated publication
Source data
Software
Reference
Collection image