{"id":21425205,"url":"https://github.com/dr-lego/gag-network","last_synced_at":"2025-07-14T08:32:13.438Z","repository":{"id":257214884,"uuid":"796221619","full_name":"Dr-Lego/gag-network","owner":"Dr-Lego","description":"Network Visualizer for the 'Geschichten aus der Geschichte' Podcast","archived":false,"fork":false,"pushed_at":"2025-02-01T14:28:13.000Z","size":93433,"stargazers_count":17,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T05:45:13.846Z","etag":null,"topics":["data-science","data-visualization","database","javascript","network-analysis","podcast","python","sqlite3","wikipedia","wikipedia-dump"],"latest_commit_sha":null,"homepage":"https://alpharee.de/geschichte/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Dr-Lego.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-05T10:03:34.000Z","updated_at":"2025-02-01T19:01:49.000Z","dependencies_parsed_at":"2024-09-15T10:48:50.794Z","dependency_job_id":"a9a610e8-e649-48a0-83af-3fcb8ae12c73","html_url":"https://github.com/Dr-Lego/gag-network","commit_stats":null,"previous_names":["dr-lego/gag-network"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Dr-Lego/gag-network","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dr-Lego%2Fgag-network","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dr-Lego%2Fgag-network/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dr-Lego%2Fgag-network/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dr-Lego%2Fgag-network/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Dr-Lego","download_url":"https://codeload.github.com/Dr-Lego/gag-network/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Dr-Lego%2Fgag-network/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265262638,"owners_count":23736439,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","data-visualization","database","javascript","network-analysis","podcast","python","sqlite3","wikipedia","wikipedia-dump"],"created_at":"2024-11-22T21:27:26.747Z","updated_at":"2025-07-14T08:32:08.589Z","avatar_url":"https://github.com/Dr-Lego.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](https://raw.githubusercontent.com/Dr-Lego/gag-network/main/assets/banner.png)\n\n# `Geschichten aus der Geschichte` Network Visualizer\n\nThis project aims to visualize the intricate web of connections within the German podcast [\"Geschichten aus der Geschichte\"](https://www.geschichte.fm) (Stories from History). It creates a network visualization based on the podcast's [episode list on Wikipedia](https://de.wikipedia.org/w/index.php?title=Geschichten_aus_der_Geschichte_(Podcast)/Episodenliste\u0026useskin=vector), showcasing how the Wikipedia articles of the topics mentioned in the podcast are interconnected.\n\n\u003e [!Note]\n\u003e I am 16 years old and made this in my spare time, so please don't blame me for inefficient code :)\n\n## Project Overview\n\nThe GAG Network Visualizer scrapes data from Wikipedia, processes it, and generates a interactive network graph. It reveals interesting connections between historical topics discussed in the podcast episodes.\n\nKey features:\n- Scrapes podcast episode data and related Wikipedia articles\n- Processes and analyzes links between articles\n- Generates a visual network of interconnected topics\n- Provides an interactive web interface to explore the network\n\n### Tech Stack\n- Python\n- Web scraping: `requests`, `BeautifulSoup`, and `Selenium`\n- Data processing: `pandas` and `numpy`\n- Text parsing: `wikitextparser`\n- Multiprocessing\n- SQLite database \n- SvelteKit as JavaScript frontend framework\n- Network visualization with `vis.js` \n\n### Performance Optimization\n\nSignificant effort has been put into optimizing the performance of this project as processing complete Wikipedia dumps:\n- Wikipedia index file is converted to an sqlite database at the beginning to make it searchable more easily.\n- Utilizes multiprocessing for parallel execution of tasks.\n- Uses database indexing and optimized SQL queries to speed up data retrieval.\n- Pre-loads and compresses network data to reduce initial loading times for the visualization.\n\n## Prerequisites\n- Chrome browser for Selenium WebDriver (or change it in build.py)\n- Python packages: `pip install -r requirements.txt`\n- Node.js tools: `sudo npm install terser roadroller -g`\n- Initialize frontend: `cd frontend \u0026\u0026 npm install`\n\n## Downloading Wikipedia Dumps\n1. Visit [dumps.wikimedia.org/dewiki](https://dumps.wikimedia.org/dewiki/) and d[dumps.wikimedia.org/enwiki](https://dumps.wikimedia.org/enwiki/)\n2. Choose the latest timestamp directory\n3. Download the two topmost files: `*-pages-articles-multistream.xml.bz2` and `*-pages-articles-multistream-index.txt.bz2`\n\n## Setting Up Environment Variables\n\nYou need to set several environment variables that define the user agent, database path, and paths to the Wikipedia dump files.\n\n```bash\nGAG_USER_AGENT=\"GAG-Network (your-email@example.com)\"\nGAG_DATABASE=\"/path/to/database\"\nGAG_WIKIDUMP_DE=\"/path/to/german/wikipedia-dump.xml.bz2\"\nGAG_WIKIDUMP_DE_INDEX=\"/path/to/german/wikipedia-dump-index.txt.bz2\" \nGAG_WIKIDUMP_EN=\"/path/to/english/wikipedia-dump.xml.bz2\"\nGAG_WIKIDUMP_EN_INDEX=\"/path/to/english/wikipedia-dump-index.txt.bz2\" \n```\n\n\n## Running the Project\n\nThe project's main script (`main.py`) orchestrates the entire process of data collection, processing, and visualization creation. You can run it in different modes depending on your needs.\n\n\n### Usage\n\nYou can run `main.py` in three different modes:\n\n1. **Full Process**\n\n   To run the entire process (refresh data and create save):\n\n   ```\n   python main.py\n   ```\n\n   This will execute all steps in order:\n   - Create the database\n   - Transform icons\n   - Load the network\n   - Build the frontend\n\n2. **Refresh Data Only**\n\n   To only refresh the data without creating a new save:\n\n   ```\n   python main.py --data\n   ```\n\n   This will:\n   - Create the database\n   - Transform icons\n\n3. **Create Save Only**\n\n   To create a new preloaded network save without refreshing data:\n\n   ```\n   python main.py --preload\n   ```\n\n   This will:\n   - Load the network\n   - Build the frontend\n\n### Output\n\nAfter running `main.py`, you will have:\n\n- A SQLite database with scraped and processed data\n- JavaScript files containing network data and metadata\n- A preloaded network save for faster initial loading\n\n### Notes\n\n- The full process can be time-consuming, especially when processing complete Wikipedia dumps. Ensure you have sufficient computational resources available.\n- If you encounter any issues, check the console output for error messages and ensure all prerequisites are correctly set up.\n- The `--data` option is useful when you want to update the underlying data without regenerating the visualization.\n- The `--preload` option is helpful when you've made changes to the visualization code but don't need to refresh the underlying data.\n\nAfter running the script, you can view the visualization by opening `build/index.html` in a web browser.\nAn up-to-date build can also be found at [Dr-Lego/gag-network-build](https://github.com/Dr-Lego/gag-network-build).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdr-lego%2Fgag-network","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdr-lego%2Fgag-network","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdr-lego%2Fgag-network/lists"}