{"id":13811976,"url":"https://github.com/niklaswais/gesp","last_synced_at":"2025-05-14T20:31:18.373Z","repository":{"id":44484826,"uuid":"505325934","full_name":"niklaswais/gesp","owner":"niklaswais","description":null,"archived":false,"fork":false,"pushed_at":"2023-07-16T04:27:24.000Z","size":243,"stargazers_count":16,"open_issues_count":6,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-09-28T23:16:05.463Z","etag":null,"topics":["court-decisions","preprocessing","web-scraping"],"latest_commit_sha":null,"homepage":"https://nwais.de/gesp","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/niklaswais.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-06-20T06:48:02.000Z","updated_at":"2024-05-01T15:31:44.000Z","dependencies_parsed_at":"2023-01-20T15:30:52.721Z","dependency_job_id":"4d3c25f7-317e-4f8f-aa7e-2229a0d247ad","html_url":"https://github.com/niklaswais/gesp","commit_stats":{"total_commits":95,"total_committers":5,"mean_commits":19.0,"dds":"0.12631578947368416","last_synced_commit":"34e2eb6e05066386839b3c05baa5075eaaeddb1b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklaswais%2Fgesp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklaswais%2Fgesp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklaswais%2Fgesp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/niklaswais%2Fgesp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/niklaswais","download_url":"https://codeload.github.com/niklaswais/gesp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225307868,"owners_count":17453889,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["court-decisions","preprocessing","web-scraping"],"created_at":"2024-08-04T04:00:43.653Z","updated_at":"2025-05-14T20:31:18.344Z","avatar_url":"https://github.com/niklaswais.png","language":"Python","funding_links":[],"categories":["Europe \u0026 European Union"],"sub_categories":["Germany"],"readme":"**Copyright notice**: Automated retrieval of decisions from federal and state databases is permitted for non-commercial purposes only. Since gesp accesses these databases, the use of gesp is also permitted for **non-commercial purposes only**.\n\n## gesp: convenient scraping of German court decisions\n\nThe federal and state governments in Germany make court decisions available for download on individual online platforms. In addition to the lack of uniformity, these platforms only allow individual retrieval out of the box. With gesp, decisions can be downloaded in large quantities in a filter-based and reproducible manner.\n\n### A. Installation\nUsing **pip**:\n```Shell\npython -m pip install gesp \n```\n\n**Or** download \u0026 build the package (requires \"build\")...\n```Shell\ngit clone https://github.com/niklaswais/gesp \u0026\u0026 cd gesp \u0026\u0026 python -m build\n```\n... and install the local .tar.gz:\n```Shell\npython -m pip install dist/gesp-0.2.0.tar.gz\n```\n\n### B. Basic Usage\nA call without command-line arguments will result in the retrival of all **machine-readable** (= non-PDF) court decisions. If only a **subset** is to be downloaded, the arguments **\"-s\"** (followed by abbreviations of states) and **\"-t\"** (followed by abbreviations of court types) can be used. Multiple states or court types are separated by commas.\n```Shell\npython -m gesp -s bund,by,hh,nw -c bgh,ag,lg,olg\n```\nSince Saxony and Bremen provide court decisions only as PDF files, they are excluded when gesp is run without flags. An explicit call nevertheless makes the corresponding files available (-s sn,hb).\n\nA specific path under which the decisions are to be stored can be specified with the argument \"-p\". If the folder has not been created yet, gesp will take care of that. If the folder has already been created and contains the results of a previous execution, this will cause an **update** of the dataset.\n```Shell\npython -m gesp -p path/to/folder\n```\n\nAn existing fingerprint (see C.) can be used to **reconstruct** a dataset. To do so, the path to the fingerprint file must be passed as an argument using \"-fp\". Naturally, \"-c\" and \"-s\" arguments are not allowed in this case.\n```Shell\npython -m gesp -fp /path/to/fingerprint\n```\n\n### C. Results\nIf no specific path is passed with \"-p\", gesp will create a folder for the results in the current working directory (\"results/\"). The name of the subfolder is based on the date and time of execution to avoid conflicts in subsequent runs. Decisions that are available as **html/xhtml** files are preferentially downloaded as such. However, some federal states unfortunately provide decisions only as pdf files. The editable documents are minimally cleaned up (e.g., print dialogs and navigation menus are removed), but **not pre-processed**, unless \"-pp\" is used (see E.).\n\n### D. Reproducibility\nTo create a fingerprint file alongside the downloaded court decisions in the results folder (\"fp.xz\"), set the \"-fp\" **flag**. If you want to reconstruct the dataset of a previous run, e.g. because you are working on multiple machines or in a team, simply share the fingerprint file. Using the fingerprint file by means of \"-fp\" as an **argument** will result in the assembly of an identical collection.\n\nThe fingerprinting feature of gesp can also be used to meet good scientific practice standards without the need to provide large collections of data. Since it is part of good scientific practice to disclose the data basis of the results obtained, publications on the empirical study of court decisions must be accompanied by relatively large data sets. Instead of making the entire collection of decisions available for retrieval online, simply share the fingerprint file that others may use to retrieve your data.\n\n### E. Pre-Processing\nThe use of \"-pp\" activates pre-processing. A separate subfolder is created in the \"results\" folder for the subsequent outputs.\n\n### F. Delayed Retrieval\nYou can use the argument \"-w\" to add a delay between two consecutive downloads of decisions from the same source. This reduces the server load for the provider of the decisions and can prevent bans.\n\n### G. Appendix\n#### 1. Abbreviations for \"-s\" (federal/states)\n| Name | Abbreviation |\n| --- | --- |\n| Federal | *bund* |\n| Baden-Württemberg | *bw* |\n| Bavaria | *by* |\n| Berlin | *be* |\n| Brandenburg | *bb* |\n| Bremen | *hb* |\n| Hamburg | *hh* |\n| Hesse | *he* |\n| Mecklenburg-Vorpommern | *mv* |\n| Lower Saxony | *ni* |\n| North Rhine-Westphalia | *nw* |\n| Rhineland-Palatinate | *rp* |\n| Saarland | *sl* |\n| Saxony | *sn* |\n| Saxony-Anhalt | *st* |\n| Schleswig-Holstein | *sh* |\n| Thuringia | *th* |\n\n#### 2. Abbreviations for \"-c\" (court types)\n| Name | Abbreviation |\n| --- | --- |\n| Amtsgerichte | *ag* |\n| Arbeitsgerichte | *arbg* |\n| Bundesgerichtshof | *bgh* |\n| Bundesfinanzhof | *bfh* |\n| Bundesverwaltungsgericht | *bverwg* |\n| Bundesverfassungsgericht | *bverfg* |\n| Bundespatentgericht | *bpatg* |\n| Bundesarbeitsgericht | *bag* |\n| Bundessozialgericht | *bsg* |\n| Finanzgerichte | *fg* |\n| Landesarbeitsgerichte | *lag* |\n| Landgerichte | *lg* |\n| Landessozialgerichte | *lsg* |\n| Landesverfassungsgerichte | *verfgh* |\n| Oberlandesgerichte (incl. KG, BayObLG) | *olg* |\n| Oberverwaltungsgerichte (incl. vgh) | *ovg* |\n| Sozialgerichte | *sg* |\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniklaswais%2Fgesp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fniklaswais%2Fgesp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fniklaswais%2Fgesp/lists"}