https://github.com/nestordemeure/nersc_documentation_summarizer
Boils the NERSC (mkdocs) documentation to a single markdown file for LLM ingestion.
https://github.com/nestordemeure/nersc_documentation_summarizer
documentation llm markdown
Last synced: 2 months ago
JSON representation
Boils the NERSC (mkdocs) documentation to a single markdown file for LLM ingestion.
- Host: GitHub
- URL: https://github.com/nestordemeure/nersc_documentation_summarizer
- Owner: nestordemeure
- License: apache-2.0
- Created: 2025-03-28T19:16:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-28T21:56:23.000Z (about 1 year ago)
- Last Synced: 2025-04-10T14:24:14.329Z (about 1 year ago)
- Topics: documentation, llm, markdown
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# NERSC's Documentation Summarizer
Boils the [NERSC (mkdocs) documentation](https://gitlab.com/NERSC/nersc.gitlab.io/-/tree/main?ref_type=heads) to a single file for LLM ingestion, keeping only the most viewed pages.
## Installing
Creating a dedicated Python environment:
```sh
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
# Install dependencies
pip install pandas pyyaml
```
## Usage
Place the `Pages_and_screens_Page_title_and_screen_class.csv` file in an `inputs` folder.
Update the NERSC documentation submodule:
```sh
# Pull the latest changes from the submodule's main branch
git submodule update --remote
# Commit the updated submodule reference
git commit -m "Update NERSC documentation submodule"
```
Load the Python environment and runing the code to collect the current documentation as a single file:
```sh
source venv/bin/activate
python3 merge.py
```
## Results
Currently (March 2025) the 248 files of our documentation seen at least once this month make for a 666296 Gemini tokens long file.
Fitting confortably in Gemini 2.5Pro's 1M tokens context length.
Short tests show that Gemini 2.5Pro is able to answer questions meaningfully using the following prompt:
```md
You have been given the [NERSC Supercomputing Center's documentation](https://docs.nersc.gov/). Use it to provide a NERSC-specific answer to questions submitted to you.
```