{"id":29572761,"url":"https://github.com/maxblee/court_data_collection","last_synced_at":"2025-07-19T05:11:16.392Z","repository":{"id":52300253,"uuid":"226866354","full_name":"maxblee/court_data_collection","owner":"maxblee","description":"This is a project designed to collect civil court data from states with unified court systems. ","archived":false,"fork":false,"pushed_at":"2021-06-02T00:46:52.000Z","size":22,"stargazers_count":0,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-03-08T20:57:10.673Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/maxblee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-09T12:30:03.000Z","updated_at":"2019-12-09T19:11:09.000Z","dependencies_parsed_at":"2022-09-12T17:50:50.375Z","dependency_job_id":null,"html_url":"https://github.com/maxblee/court_data_collection","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/maxblee/court_data_collection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxblee%2Fcourt_data_collection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxblee%2Fcourt_data_collection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxblee%2Fcourt_data_collection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxblee%2Fcourt_data_collection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/maxblee","download_url":"https://codeload.github.com/maxblee/court_data_collection/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/maxblee%2Fcourt_data_collection/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265892544,"owners_count":23845039,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-19T05:11:15.157Z","updated_at":"2025-07-19T05:11:16.381Z","avatar_url":"https://github.com/maxblee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Court Data Collection Project\n\nThis is a project designed to collect court-specific information for a set\nof cases that have unified court systems (i.e. where there is a single court\ndatabase system for the entire state or for large portions of the state).\n\nThroughout this README, you'll see information about which court cases we've collected data from\nand what information we're collecting using individual scrapers.\n\n## Table of Contents\n\n- **[Installation](#installation)**\n- **[Structure](#structure)**\n- **[States](#states)**\n    - **[Connecticut](#connecticut)**\n- **[Steps Forward](#steps-forward)**\n\n## Installation\n\nIn order to run this program, you will need Firefox Geckodriver installed, ideally in its default location (e.g. `/usr/bin/geckodriver`). Instructions are available [here](https://github.com/mozilla/geckodriver).\n\nTo install the other requirements, simply type\n\n```shell\npipenv install\n```\n\nat the root of this directory.\n\n## Structure\n\n### API\n\nAll of these scrapers are designed to be fairly easy to use from the perspective of an end-user. All of the scrapers support the same basic\nsyntax using Python's `with` syntax.\n\nIn order to get all cases occurring on a single date, simply type:\n\n```python\nfrom datetime import date\nfrom court_scrapers.\u003c2-digit state abbreviation\u003e import \u003cScraper Name\u003e\n\ndate_query = date(year, month, day)\n\nwith \u003cScraper Name\u003e as court_cases:\n    query_results = court_cases.get_court_cases(date_query)\n```\n\nGetting all cases in a date range is similarly easy:\n\n```python\nwith \u003cScraper Name\u003e as court_cases:\n    query_results = court_cases.collect_cases(start_date, end_date)\n```\n\n### Developer Guide\n\nFrom the perspective of a developer, it's a little bit more complicated, but not too much. The scrapers generally interact with an abstract base class, `SeleniumBase`. That base class has a number of convenience methods, like simple date validation, designed to reduce the amount of code developers writing future scrapers will need to write. \n\nHowever, in some cases, you might need to override those methods. So keep in mind that the base class does default to validating date queries and to compiling cases using `collect_cases` by iterating over `get_court_cases`.\n\nAdditionally, cases should generally contain the followinng fields:\n\n```python\nCASE_FIELDS = [\n    \"case_num\", \"case_type\", \"date_filed\", \"parties\", \"court_location\"\n]\n```\n\nFiles containing state scrapers should named by their lowercase two-letter state abbreviation (e.g. `ct.py`). And the scrapers themselves should be given predictable names, containing a state name and a description of the type of court. However, there aren't great ways of standardizing this since states have differing court structures.\n\n#### Testing\n\nI use `pytest` for testing on these projects. Tests requiring scraping\nshould be left in their own files, one for each state (e.g. `tests/test_ct.py`). This allows developers to easily run tests for one state or to run generalized tests that run faster than fully fledged scraping jobs.\n\n## States\n\n### Connecticut\n\n`ConnecticutCivil`\n\n- Courts supported: Civil court\n- Date queries only work on dates occurring on a given date or in the future. The Connecticut court search system for civil court does not allow you to query dates occurring in the past. For this reason, people considering using this tool should likely consider running only `get_court_cases` and using a scheduler (like a cron job on an EC2 instance) to update the cases.\n\n## Steps Forward\n\nFrom here, there are a few places where I really need to improve the script. \n\n1. First of all, the error handling on this should be improved, particularly in how it handles duplicate cases. Right now, I'm using sets to reduce the number of duplicate rows, but without even handling whitespace trimming, the accury of these deduplication efforts is pretty limited.\n2. This tool should ideally enable people to update based on a running database of collected court cases (especially since Conntecticut does not allow you to query historic civil cases). \n3. Finally, we need to add more states into this database. I know for a fact that Virginia has a) a unified court system and b) a court system with date fields that would allow you to write similar scrapers. I imagine other states do as well.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxblee%2Fcourt_data_collection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaxblee%2Fcourt_data_collection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaxblee%2Fcourt_data_collection/lists"}