Data from: Exploration and Explanation in Computational Notebooks
About this collection
- Extent
-
1 digital object.
- Cite This Work
-
Rule, Adam; Tabard, Aurélien; Hollan, James D. (2018). Data from: Exploration and Explanation in Computational Notebooks. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0JW8C39
- Description
-
In July 2017, our team queried, downloaded, and analyzed approximately 1.25 million Jupyter Notebooks in public repositories on GitHub. By our calculation this was about 95% of all Jupyter Notebooks publicly available on GitHub at the time. This dataset includes:
~1.25 million Jupyter Notebooks
Metadata about each notebook
Metadata about each of the nearly 200,000 public repositories that contained a Jupyter Notebook
Top level README files for nearly 150,000 repositories containing a Jupyter Notebook
In addition to this core data, these data include:
A smaller, starter dataset with 1000 randomly selected repositories containing ~6000 notebooks
CSV files summarizing and indexing the notebooks, repositories, and READMEs
Log files documenting when each file was downloaded
Scripts for our initial analysis of the dataset - Date Collected
- July 2017
- Date Issued
- 2018
- Creators
- Funding
-
This research was funded by NSF grants #1319829 and #1735234 as well as NLM grant #T15LM011271.
- Topics
Formats
View formats within this collection
- Language
- English
- Related Resources
- Rule A, Tabard A, and Hollan J. (2018). Exploration and Explanation in Computational Notebooks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’18). ACM Press, New York, NY. https://doi.org/10.1145/3173574.3173606
- Analysis scripts on GitHub: https://github.com/activityhistory/jupyter_on_github
Primary associated publication
Reference