{"id":44167576,"url":"https://github.com/open-austin/indigent-defense-stats","last_synced_at":"2026-02-09T09:41:59.507Z","repository":{"id":38188714,"uuid":"394014478","full_name":"open-austin/indigent-defense-stats","owner":"open-austin","description":"A web scraper for collecting and processing public case records from sites using Tyler Technology's Odyssey court records database software.","archived":false,"fork":false,"pushed_at":"2024-11-26T01:29:49.000Z","size":763,"stargazers_count":25,"open_issues_count":27,"forks_count":13,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-10-11T06:24:50.564Z","etag":null,"topics":["court-cases","parser","python","scraper"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/open-austin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-08-08T16:34:31.000Z","updated_at":"2025-09-15T12:08:11.000Z","dependencies_parsed_at":"2023-01-30T16:45:26.064Z","dependency_job_id":"2a8a47b7-dc45-4281-97a5-441f4d27e8ed","html_url":"https://github.com/open-austin/indigent-defense-stats","commit_stats":{"total_commits":412,"total_committers":14,"mean_commits":"29.428571428571427","dds":"0.24757281553398058","last_synced_commit":"081aecf3c07e0bd074613319f3eb2de12451c853"},"previous_names":["open-austin/odyssey-court-records-to-json"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/open-austin/indigent-defense-stats","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-austin%2Findigent-defense-stats","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-austin%2Findigent-defense-stats/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-austin%2Findigent-defense-stats/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-austin%2Findigent-defense-stats/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/open-austin","download_url":"https://codeload.github.com/open-austin/indigent-defense-stats/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-austin%2Findigent-defense-stats/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29261216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-09T04:11:57.159Z","status":"ssl_error","status_checked_at":"2026-02-09T04:11:56.117Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["court-cases","parser","python","scraper"],"created_at":"2026-02-09T09:41:54.878Z","updated_at":"2026-02-09T09:41:59.497Z","avatar_url":"https://github.com/open-austin.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tyler Technologies Odyssey scraper and parser\n\nThis is a scraper to collect and process public case records from the Tyler Technologies Odyssey court records system. If you are a dev or want to file an Issue, please read [CONTRIBUTING](CONTRIBUTING.md).\n\n## Local setup\n\n### Install toolchain\n\n1. Clone this repo and navigate to it.\n   - `git clone https://github.com/open-austin/indigent-defense-stats`\n   - `cd indigent-defense-stats`\n2. Install Pyenv if not already installed ([linux, mac](https://github.com/pyenv/pyenv), or [windows](https://github.com/pyenv-win/pyenv-win))\n3. Run `pyenv install` to get the right Python version\n\n### Setup `venv`\n\nFirst, you'll need to create a virtual environment, this differs depending on your OS.\n\nOn linux/mac\n\n```bash\npython -m venv .venv --prompt ids # (you can substitute `ids` for any name you want)\n```\n\nOn Windows\n\n```powershell\nc:\\\u003ePython35\\python -m venv c:\\path\\to\\repo\\ids # (you can substitute `ids` for any name you want)\n```\n\nNext, you'll need to \"activate\" the venv. You'll need to run this command every time you work in the codebase and tell your IDE which Python environment to use. It will likely default to wherever `python` resolves to in your system path. The specific command you run will depend on both your OS and shell.\n\nOn linux/mac\n\n| platform | shell      | Command to activate virtual environment |\n| :------- | :--------- | :-------------------------------------- |\n| POSIX    | bash/zsh   | $ source \u003cvenv\u003e/bin/activate            |\n|          | fish       | $ source \u003cvenv\u003e/bin/activate.fish       |\n|          | csh/tcsh   | $ source \u003cvenv\u003e/bin/activate.csh        |\n|          | PowerShell | $ \u003cvenv\u003e/bin/Activate.ps1               |\n| Windows  | cmd.exe    | C:\\\u003e \u003cvenv\u003e\\Scripts\\activate.bat        |\n|          | PowerShell | PS C:\\\u003e \u003cvenv\u003e\\Scripts\\Activate.ps1     |\n\nSource: https://docs.python.org/3/library/venv.html#how-venvs-work\n\nNote: Again, you'll need to activate venv _every time you want to work in the codebase_.\n\nIf the above doesn't work, try these instructions for creating and activating a virtual environment:\n1. Navigate to your project directory: cd [insert file path]\n2. Create a virtual environenment: python -m venv venv\n3. Activate the virtual environment: .\\venv\\Scripts\\activate.bat\n\n### Install python dependencies\n\nUsing `pip`, install the project dependencies.\n\n```shell\npip install -r requirements.txt\n```\n\n### Running CLI\n\n@TODO - this section needs to be updated.\n\n7. Set parameters to the main command:\n   - counties = The counties that are listed in the count CSV. Update column \"scraper\" in the CSV to \"yes\" to include the county.\n   - start_date = The first date you want to scrape for case data. Update in scraper.\n   - end_date = The last date you want to scrape for case data. Update in scraper.\n8. Run the handler.\n   - `python run python .src/orchestrator`\n\n## Structure of Code\n\n- County Database: A CSV table contains the necessary Odyssey links and version for each county in Texas. One column (\"scrape\") indicates whether that county should be scraped. Currently, Hays is the default.\n- Handler (src/handler): This reads the CSV for the counties to be scraped and runs the following processes for each county. You can also set the start and end date of the parser here.\n\n  - **Scraper** (`src/scraper`): This scrapes all of the judicial officers for each day within the period set in the handler and saves all of the HTML to data/[county name]/case_html.\n  - **Parser** (`src/parser`): This parses all of the HTML in the county-specific HTML folder to accompanying JSON files in data/[county name]/case_json.\n  - **Cleaner** (`src/cleaner`): This cleans and redacts information in in the county-specific json folder to a new folder of JSON files in data/[county name]/case_json_cleaned.\n  - **Updater** (`src/updater`): This pushed the cleaned and redacted JSON in the county-specific cleaned json folder to a container in CosmosDB where it can then be use for visualization.\n\n## Flowchart: Relationships Between Functions and Directories\n\n```mermaid\nflowchart TD\n    orchestrator{\"src/orchestrator (class): \u003cbr\u003e orchestrate (function)\"} --\u003e county_db[resources/texas_county_data.csv]\n    county_db  --\u003e |return counties where 'scrape' = 'yes'| orchestrator\n    orchestrator --\u003e|loop through these counties \u003cbr\u003e and run these four functions| scraper(1. src/scraper: scrape)\n    scraper --\u003e parser(2. src/parser: parse)\n    scraper --\u003e |create 1 HTML per case| data_html[data/county/case_html/case_id.html]\n    parser--\u003e pre2017(src/parser/pre2017)\n    parser--\u003e post2017(src/parser/post2017)\n    pre2017 --\u003e cleaner[3. src/cleaner: clean]\n    post2017 --\u003e cleaner\n    parser --\u003e |create 1 JSON per case| data_json[data/county/case_json/case_id.json]\n    cleaner --\u003e |look for charge in db\u003cbr\u003eand normalize it to uccs| charge_db[resouces/umich-uccs-database.json]\n    charge_db --\u003e cleaner\n    cleaner --\u003e updater(4. src/updater: update)\n    cleaner --\u003e |create 1 JSON per case| data_json_cleaned[data/county/case_json_cleaned/case_id.json]\n    updater --\u003e |send final cleaned JSON to CosmosDB container| CosmosDB_container[CosmosDB container]\n    CosmosDB_container --\u003e visualization{live visualization}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-austin%2Findigent-defense-stats","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-austin%2Findigent-defense-stats","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-austin%2Findigent-defense-stats/lists"}