{"id":22643173,"url":"https://github.com/hochfrequenz/edi_energy_scraper","last_synced_at":"2025-09-05T00:41:24.399Z","repository":{"id":58864206,"uuid":"534141657","full_name":"Hochfrequenz/edi_energy_scraper","owner":"Hochfrequenz","description":"Python script to download/mirror edi-energy.de and sort the files (more structured than `wget --mirror`)","archived":false,"fork":false,"pushed_at":"2025-04-07T18:17:55.000Z","size":2961,"stargazers_count":1,"open_issues_count":8,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-07T19:25:20.720Z","etag":null,"topics":["ahb","anwendungshandbuch","beautifulsoup4","edi-energy","energiewirtschaft","message-implementation-guide","mig"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Hochfrequenz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-08T09:31:12.000Z","updated_at":"2025-04-07T18:09:53.000Z","dependencies_parsed_at":"2023-02-15T08:16:42.754Z","dependency_job_id":"c7a54346-6f84-483a-90ca-56ddf83e0a77","html_url":"https://github.com/Hochfrequenz/edi_energy_scraper","commit_stats":{"total_commits":33,"total_committers":2,"mean_commits":16.5,"dds":0.303030303030303,"last_synced_commit":"0f2c369f24513d0a5498afbde7d5fe97a28fab9a"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":"Hochfrequenz/python_template_repository","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hochfrequenz%2Fedi_energy_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hochfrequenz%2Fedi_energy_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hochfrequenz%2Fedi_energy_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Hochfrequenz%2Fedi_energy_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Hochfrequenz","download_url":"https://codeload.github.com/Hochfrequenz/edi_energy_scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248358555,"owners_count":21090402,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ahb","anwendungshandbuch","beautifulsoup4","edi-energy","energiewirtschaft","message-implementation-guide","mig"],"created_at":"2024-12-09T05:09:33.926Z","updated_at":"2025-04-11T06:55:14.687Z","avatar_url":"https://github.com/Hochfrequenz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# edi-energy.de scraper\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n![Unittests status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Unittests/badge.svg)\n![Coverage status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Coverage/badge.svg)\n![Linting status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Linting/badge.svg)\n![Black status badge](https://github.com/Hochfrequenz/edi_energy_scraper/workflows/Black/badge.svg)\n![PyPi Status Badge](https://img.shields.io/pypi/v/edi_energy_scraper)\n![Python Versions (officially) supported](https://img.shields.io/pypi/pyversions/edi_energy_scraper.svg)\n\nThe Python package `edi_energy_scraper` provides easy to use methods to mirror the free documents on bdew-mako.de.\n\n### Rationale / Why?\n\nIf you'd like to be informed about new regulations or data formats being published on bdew-mako.de you can either\n\n- visit the site every day and hope that you see the changes if this is your favourite hobby,\n- or automate the task.\n\nThis repository helps you with the latter. It allows you to create an up-to-date copy of edi-energy.de on your local\ncomputer. Other than if you mirrored the files using `wget` or `curl`, you'll get a clean and intuitive directory\nstructure.\n\nFrom there you can e.g. commit the files into a VCS (like e.g. our [edi_energy_mirror](https://github.com/Hochfrequenz/edi_energy_mirror)), scrape the PDF/Word files for later use...\n\nWe're all hoping for the day of true digitization on which this repository will become obsolete.\n\n### See also\nThere is a similar project in C# by Fabian Wetzel: [fabsenet/edi-energy-extracto](https://github.com/fabsenet/edi-energy-extractor/).\nOther than this project, it stores the downloaded data in a database instead of a file system.\nIt also works with `bdew-mako.de`.\n\n## How to use the Package (as a user)\n\nInstall via pip:\n\n```bash\npip install edi_energy_scraper\n```\n\nCreate a directory in which you'd like to save the mirrored data:\n\n```bash\nmkdir edi_energy_de\n```\n\nThen import it and start the download:\n\n```python\nimport asyncio\nfrom edi_energy_scraper import EdiEnergyScraper\n\n\n# add the following lines to enable debug logging to stdout (CLI)\n# import logging\n# import sys\n# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n\nasync def mirror():\n    scraper = EdiEnergyScraper(path_to_mirror_directory=\"edi_energy_de\")\n    await scraper.mirror()\n\n\nif __name__ == \"__main__\":\n    loop = asyncio.new_event_loop()\n    asyncio.set_event_loop(loop)\n    asyncio.run(mirror())\n\n```\n\nThis creates a directory structure:\n\n```\n-|-your_script_cwd.py\n |-edi_energy_de\n    |- FV2310 (contains files valid since 2023-10-01)\n        |- ahb.pdf\n        |- ahb.docx\n        |- ...\n    |- FV2404 (contains files valid since 2024-04-03)\n        |- mig.pdf\n        |- mig.docx\n        |- ...\n    |- FV2504 (contains files valid since 2025-06-06)\n        |- allgemeine_festlegungen.pdf\n        |- schema.xsd\n        |- ...\n```\n\n\u003e [!TIP]\n\u003e You can extract the information encoded into the filenames:\n\u003e ```python\n\u003e from edi_energy_scraper import DocumentMetadata\n\u003e structured_information = DocumentMetadata.from_filename(\"AHB_COMDIS_1.0f_99991231_20250605_20250605_8872.pdf\")\n\u003e # DocumentMetadata(kind='MIG', edifact_format=\u003cEdifactFormat.REQOTE: 'REQOTE'\u003e, valid_from=datetime.date(2023, 9, 30), valid_unt...traordinary_publication=True, is_error_correction=False, is_informational_reading_version=True, additional_text=None, id=10071)\n```\n\n## How to use this Repository on Your Machine (for development)\n\nPlease follow the instructions in\nour [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine)\n. And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).\n\n## Contribute\n\nYou are very welcome to contribute to this template repository by opening a pull request against the main branch.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhochfrequenz%2Fedi_energy_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhochfrequenz%2Fedi_energy_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhochfrequenz%2Fedi_energy_scraper/lists"}