https://github.com/trygu/ssb-pypi-statistics
https://github.com/trygu/ssb-pypi-statistics
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/trygu/ssb-pypi-statistics
- Owner: trygu
- License: mit
- Created: 2024-11-27T19:24:31.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-04T20:31:17.000Z (9 months ago)
- Last Synced: 2025-10-04T22:19:16.040Z (9 months ago)
- Language: HTML
- Size: 607 KB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Security: SECURITY.md
Awesome Lists containing this project
README
# SSB Package Statistics Viewer
This project provides a tool to download, process, and interactively explore statistics about public packages using the [Libraries.io API](https://libraries.io/). It fetches data about all public packages associated with Statistics Norway and presents the results in an interactive table format.
If you're just interested in the processed results, visit the [GitHub Pages deployment](https://trygu.github.io/ssb-pypi-statistics/).
## Features
- **Package Data Fetching**: Fetches data about all public packages from Libraries.io.
- **Interactive Table**: Displays package data in a dynamic, searchable, and sortable table using [Tabulator.js](http://tabulator.info/).
- **CSV Download**: Allows users to download the dataset as a CSV file for offline use.
- **DuckDB Integration**: Easily query and sample the data in the DuckDB Web Shell for further analysis.
## Requirements
To fetch and process data using the Libraries.io API:
1. **Libraries.io API Key**:
- You'll need a valid API key from Libraries.io. You can sign up for one [here](https://libraries.io/account).
- Add your API key to the appropriate part of the data-fetching script.
2. **Python Environment**:
- Install the required Python dependencies:
```bash
pip install pandas requests
```
## How to Use
### 1. Download and Process Data
Run the data-fetching script to download the package data:
```bash
python fetch_data.py
```
The script will:
- Fetch all public packages associated with Statistics Norway from Libraries.io.
- Save the results as `results.csv` in the `src/` directory.
### 2. Open the Results Viewer
- Open `index.html` in your browser to view the interactive table.
### 3. Explore the Results
- Use the **"Download CSV"** button to save the data for offline use.
- Use the **"Open in DuckDB Web Shell"** button to query the dataset directly in the DuckDB Web Shell.
### Preprocessed Results
If you don't want to fetch and process the data yourself, you can access the processed results directly:
- [Interactive Viewer on GitHub Pages](https://trygu.github.io/ssb-pypi-statistics/)
## DuckDB Query Example
The DuckDB Web Shell button includes a query to:
1. Load the dataset into a table called `ssb_packages`.
2. Sample 10 random rows from the table.
The SQL query used:
```sql
-- Load CSV file and create a table
CREATE TABLE ssb_packages AS
SELECT *
FROM read_csv_auto('https://trygu.github.io/ssb-pypi-statistics/results.csv');
-- Sample 10 rows from the table
FROM ssb_packages USING SAMPLE 10;
```
## Development Notes
- Ensure the API key is correctly configured in the `fetch_data.py` script before running it.
- The data viewer (`index.html`) is designed to use a preprocessed `results.csv`. Modify the DuckDB query URL in the HTML if hosting the dataset elsewhere.
## Credits
- [Libraries.io API](https://libraries.io/)
- [DuckDB](https://duckdb.org/)
- [Tabulator.js](http://tabulator.info/)
## License
This project is licensed under the [MIT License](LICENSE).