https://github.com/iamshouvikmitra/bharat-courts
Programmatically access Indian court data from eCourts. Search cases, download orders, and get cause lists across 700+ District Courts, 25 High Courts and the Supreme Court. Robust web scraper with built-in OCR based captcha resolvers.
https://github.com/iamshouvikmitra/bharat-courts
ecourts high-courts indian-laws lawyers sci supreme-court
Last synced: 29 days ago
JSON representation
Programmatically access Indian court data from eCourts. Search cases, download orders, and get cause lists across 700+ District Courts, 25 High Courts and the Supreme Court. Robust web scraper with built-in OCR based captcha resolvers.
- Host: GitHub
- URL: https://github.com/iamshouvikmitra/bharat-courts
- Owner: iamshouvikmitra
- License: mit
- Created: 2026-03-02T18:35:36.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-25T22:23:51.000Z (about 2 months ago)
- Last Synced: 2026-04-26T00:32:31.121Z (about 2 months ago)
- Topics: ecourts, high-courts, indian-laws, lawyers, sci, supreme-court
- Language: Python
- Homepage: https://pypi.org/project/bharat-courts/
- Size: 229 KB
- Stars: 5
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bharat-courts
> Async Python SDK for Indian court data — search cases, download orders, and access cause lists from eCourts and the Supreme Court.
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://pypi.org/project/bharat-courts/)
---
## What is this?
India's eCourts platform holds millions of case records across 25+ High Courts, 700+ District Courts, and the Supreme Court — but there's no official API. Checking case status means navigating clunky portals, solving CAPTCHAs by hand, and copy-pasting results one at a time.
**bharat-courts** fixes that. It gives you — and your AI assistant — direct programmatic access to:
- **Find any judgment with one call** — `Judgments().find(judge=..., year=..., text=..., cnr=...)` routes to the right backend (archive vs live) and returns a uniform result. The SDK does the heavy lifting; you describe the data, not where it lives.
- **Track matters** — search by case number, party name, or advocate across any High Court or District Court
- **Download orders & judgments** — get PDFs for all orders in a case with one call
- **Monitor cause lists** — see which cases are listed before which bench, every day
- **Pull recent Supreme Court judgments** — scrape the homepage's "Latest Judgements / Orders" feed and download the PDFs
- **Query the historical archive** — instant offline search across SCI judgments from 1950 and 25-HC judgments (CC-BY-4.0 AWS Open Data, no CAPTCHA, no rate limits)
- **Access District Courts** — dynamically discover courts across 36 states/UTs and search 700+ court complexes
- **Bulk download judgments** — paginate through results, batch-download PDFs with automatic session management
- **Automate CAPTCHA handling** — built-in OCR solver, ONNX solver, or plug in your own
Works standalone as a Python library, as a CLI tool, or as an **AI agent skill** — install it into Claude Code, GitHub Copilot, or any MCP-compatible assistant and ask questions in plain English.
Built for practicing lawyers, litigation teams, legal researchers, legal aid organizations, and legal tech builders.
## Installation
```bash
pip install bharat-courts
# With automatic CAPTCHA solving (recommended)
pip install bharat-courts[ocr]
# With lightweight ONNX CAPTCHA solver (alternative to ddddocr)
pip install bharat-courts[onnx]
# With CLI
pip install bharat-courts[cli]
# With historical-archive support (DuckDB over AWS Open Data buckets)
pip install bharat-courts[archive]
# Everything (OCR + ONNX + CLI + archive + dev tools)
pip install bharat-courts[all]
```
**Requires Python 3.11+**
## Quick Start
### Find a judgment without picking a backend
The `Judgments` facade is the recommended entry point for "find a judgment matching some criteria" — it owns both the archive and live clients, picks the right one per query, and returns a uniform `Judgment` list. Use this unless you specifically need a portal-only feature (cause list, live case status, district-court drill-down).
```python
import asyncio
from bharat_courts import Judgments
async def main():
async with Judgments() as j:
# Structured filters → archive (no CAPTCHA, partition-pruned)
for r in await j.find(judge="chandrachud", year=(2018, 2024), court="sci", limit=10):
print(f"{r.decision_date} {r.case_id} {r.title}")
# Free-text → live (only the live portal does full-text)
for r in await j.find(text="right to privacy", limit=5):
print(f"{r.decision_date} {r.title} [{r.source}]")
# CNR alone — prefix routes to the right bucket, no full scan
result = await j.find(cnr="DLHC010230802020")
pdf = await j.fetch_pdf(result[0])
with open("judgment.pdf", "wb") as f:
f.write(pdf)
# Force a specific backend if you need to
archive_only = await j.find(text="bail", source="archive", limit=5)
asyncio.run(main())
```
Routing rules (see API reference for details):
| Filter shape | Backend |
|---|---|
| `cnr=` | archive (prefix → court → partition) |
| `text=` only | live (only it does full-text body search) |
| structured only (judge/party/year/court/citation) | archive |
| `text=` + structured | archive — `text` folds into a title-substring match |
Each returned `Judgment` carries a `source` field (`"archive"` or `"live"`) so consumers can still tell where it came from when that matters.
### Find all pending matters for your client
```python
import asyncio
from bharat_courts import get_court, HCServicesClient
from bharat_courts.captcha.ocr import OCRCaptchaSolver
async def main():
delhi = get_court("delhi")
solver = OCRCaptchaSolver()
async with HCServicesClient(captcha_solver=solver) as client:
cases = await client.case_status_by_party(
delhi,
party_name="Reliance Industries",
year="2024",
status_filter="Pending",
)
for case in cases:
print(f"{case.case_number}: {case.petitioner} v {case.respondent}")
print(f" CNR: {case.cnr_number}")
asyncio.run(main())
```
### Check case status and download orders
```python
async with HCServicesClient(captcha_solver=solver) as client:
# Look up a specific writ petition. `case_type` is the numeric code from
# list_case_types() (e.g. "134" = W.P.(C) on Delhi HC).
cases = await client.case_status(
get_court("bombay"),
case_type="134",
case_number="4520",
year="2023",
)
# case_type on the result is now a label like "W.P.(C)" (from the
# portal's type_name field). The showRecords endpoint does not return
# case status, so case.status is always empty.
print(f"{cases[0].case_type} {cases[0].case_number} — CNR: {cases[0].cnr_number}")
# Download all orders for the case
orders = await client.court_orders(
get_court("bombay"),
case_type="134",
case_number="4520",
year="2023",
)
for order in orders:
print(f"{order.order_date} — {order.order_type} by {order.judge}")
# download_order_pdf raises RuntimeError if the portal hands back its
# 30-byte BOM+error string instead of a real PDF.
pdf = await client.download_order_pdf(order.pdf_url)
with open(f"order_{order.order_date}.pdf", "wb") as f:
f.write(pdf)
```
### Get tomorrow's cause list before court
```python
pdfs = await client.cause_list(
get_court("delhi"),
civil=True,
causelist_date="03-03-2026", # DD-MM-YYYY
)
for pdf in pdfs:
print(f"{pdf.bench} — {pdf.cause_list_type}")
print(f" Download: {pdf.pdf_url}")
```
### Search District Court cases
```python
from bharat_courts import DistrictCourtClient
from bharat_courts.districtcourts.parser import parse_complex_value
async with DistrictCourtClient(captcha_solver=solver) as client:
# Discover the court hierarchy
districts = await client.list_districts("8") # Bihar
complexes = await client.list_complexes("8", "1") # Patna district
# Parse complex value to get code + establishment info
complex_val = list(complexes.keys())[-1] # e.g. "1080010@2,3,4@Y"
code, ests, needs_est = parse_complex_value(complex_val)
est = ests[0] if needs_est else ""
# Look up case types — the portal returns codes as "^"
# compound strings (e.g. "89^2"); pass them back verbatim.
case_types = await client.list_case_types("8", "1", code, est)
# {"89^2": "ADMINISTRATIVE SUITE", "152^2": "Anticipatory Bail - ABP", ...}
cases = await client.case_status(
state_code="8", dist_code="1",
court_complex_code=code, est_code=est,
case_type="89^2", # full compound code, not just "89"
case_number="100", year="2024",
)
for case in cases:
print(f"{case.case_number}: {case.petitioner} v {case.respondent}")
```
### List recent Supreme Court judgments
```python
from bharat_courts import SCIClient
# www.sci.gov.in surfaces the 50 most recent items inline on the homepage.
# No CAPTCHA, no search form — just scrape the feed.
async with SCIClient() as client:
recent = await client.list_recent_judgments(limit=10)
for j in recent:
print(f"{j.judgment_date}: {j.title}")
print(f" Diary: {j.source_id} {j.case_number}")
# Download via /sci-get-pdf/?diary_no=... (portal viewer URL).
await client.download_pdf(j)
if j.pdf_bytes:
with open(f"sci_{j.source_id}.pdf", "wb") as f:
f.write(j.pdf_bytes)
```
(Date-range / party-name search against the legacy `main.sci.gov.in` host is no longer functional — that host is permanently 503 and the live `www.sci.gov.in` portal gates those flows behind a CAPTCHA-protected case-no/diary-no form that the SDK does not yet wire up. `search_by_year` and `search_by_party` raise `NotImplementedError`.)
### Query the historical archive (no CAPTCHA, no rate limits)
For research workloads — "find every judgment by Justice X", "all 2020 Delhi HC writ
petitions", bulk PDF retrieval — use the `ArchiveClient`, which reads the public
[AWS Open Data buckets](https://registry.opendata.aws/indian-supreme-court-judgments/)
maintained by Dattam Labs: SCI judgments from 1950 onwards and 25 High Courts.
```python
from bharat_courts import ArchiveClient
async with ArchiveClient() as client:
# Substring match on judge, year range, partition-pruned in DuckDB.
results = await client.search(
court="sci", judge="chandrachud", year=(2018, 2024), limit=20,
)
for j in results:
print(f"{j.decision_date} {j.case_id} {j.title}")
print(f" {j.citation} outcome: {j.disposal_nature}")
# Stream every Delhi 2020 judgment (~18k) without holding them all in memory.
async for j in client.iter_judgments(court="delhi", year=2020, batch_size=500):
process(j)
# Fetch the PDF — CNR alone is enough; the SDK infers the source.
pdf_bytes = await client.fetch_pdf("DLHC010230802020")
# SCI judgments default to English; pass language="hindi" / "tamil" / etc.
sci_pdf = await client.fetch_pdf("ESCR010000301950", language="english")
```
Notes:
- **Freshness gap**: the buckets update bi-monthly (SCI) and quarterly (HC). For
judgments delivered in the last 2–3 months, fall back to `JudgmentSearchClient`.
- **Cache**: PDFs and parquet shards cache under `~/.cache/bharat-courts/archive/`.
Default cap is 5 GiB (`BHARAT_COURTS_ARCHIVE_CACHE_MAX_GB`); metadata TTL is
30 days (`BHARAT_COURTS_ARCHIVE_METADATA_TTL_DAYS`).
- **License**: data is CC-BY-4.0 — attribute Dattam Labs / the eCourts platform
when redistributing.
### Use with AI agents (Claude Code, Copilot, etc.)
Install the bundled skill so your AI assistant can look up court data for you in natural language:
```bash
bharat-courts install-skills
```
Then just ask your AI agent:
> "Find all pending writ petitions for Tata Motors in Delhi High Court from 2024"
> "Download the latest order in WP(C) 4520/2023 before the Bombay High Court"
> "What's on the cause list for Karnataka High Court tomorrow?"
> "Search for cases filed by State of Bihar in Patna district court in 2024"
> "Show me the most recent Supreme Court judgments from this week"
The agent uses bharat-courts under the hood — handles CAPTCHA, sessions, and parsing automatically.
### JSON serialization
All models support `to_dict()` and `to_json()` — pipe results into spreadsheets, dashboards, or case management tools:
```python
import json
cases = await client.case_status_by_party(delhi, party_name="HDFC", year="2024")
# Export to JSON for your case tracker
with open("matters.json", "w") as f:
json.dump([c.to_dict(exclude_none=True) for c in cases], f, indent=2)
```
```json
[
{
"case_number": "3/2024",
"case_type": "W.P.(C)",
"cnr_number": "DLHC010582482024",
"filing_number": "213400000032024",
"registration_number": "3",
"petitioner": "HDFC BANK LTD.",
"respondent": "UNION OF INDIA & ORS.",
"court_name": "Delhi High Court",
"judges": []
}
]
```
Note: `status`, `registration_date`, `judges`, and `next_hearing_date` are not returned by the live `showRecords` endpoint and stay empty/null. They live behind the case-history endpoint which the SDK does not call yet.
## Supported Portals
| Source | Client | Status |
|--------|--------|--------|
| **Federated (archive + live)** | **`Judgments`** | **Recommended entry point — one `find()` call routes to archive vs live by query shape** |
| [HC Services](https://hcservices.ecourts.gov.in) | `HCServicesClient` | Fully working |
| [District Courts](https://services.ecourts.gov.in) | `DistrictCourtClient` | Case status, orders, cause lists across 700+ courts |
| [Judgment Search](https://judgments.ecourts.gov.in) | `JudgmentSearchClient` | Search, pagination, bulk PDF download |
| [Supreme Court](https://www.sci.gov.in) | `SCIClient` | Recent judgments feed + PDF download (case-no search not yet implemented) |
| [Calcutta High Court](https://calcuttahighcourt.gov.in) | `CalcuttaHCClient` | Order/judgment search + PDF download (direct from HC website) |
| [SCI Archive](https://registry.opendata.aws/indian-supreme-court-judgments/) (S3, CC-BY-4.0) | `ArchiveClient` | DuckDB metadata search + PDF retrieval, 1950–present; SCI bi-monthly + 25 HCs quarterly |
| [HC Archive](https://registry.opendata.aws/indian-high-court-judgments/) (S3, CC-BY-4.0) | `ArchiveClient` | Same as above; routed via the unified `ArchiveClient` |
## API Reference
### `Judgments` (federated facade)
The recommended entry point for finding judgments. Owns an `ArchiveClient` and a `JudgmentSearchClient` internally (lazy-initialised), picks the right backend per query, and returns a uniform `list[Judgment]`. Reach for the portal-specific clients only when you need something the facade doesn't expose (cause lists, case status, district-court drill-down).
```python
from bharat_courts import Judgments
async with Judgments() as j:
...
```
No new install extra — pulls in whatever backends are available. For best behaviour install both:
```bash
pip install 'bharat-courts[archive,ocr]'
```
---
#### `find(*, text=None, court=None, year=None, judge=None, party=None, citation=None, cnr=None, source="auto", limit=50) -> list[Judgment]`
Run a search, routing transparently between the archive and the live portal.
| Parameter | Type | Description |
|-----------|------|-------------|
| `text` | `str \| None` | Free-text keyword search. Routes to live (only the live judgments portal does full-body search). |
| `court` | `Court \| str \| None` | Court object or code (`"sci"`, `"delhi"`). |
| `year` | `int \| tuple[int, int] \| None` | Single year or inclusive range. Drives archive partition pruning — strongly recommended for non-CNR queries. |
| `judge` | `str \| None` | Substring on the judge field (archive). |
| `party` | `str \| None` | Substring on petitioner/respondent (SCI) or title (HC). |
| `citation` | `str \| None` | Citation substring (archive, SCI only). |
| `cnr` | `str \| None` | Exact CNR match. Auto-routes via the 4-letter prefix when `source="auto"`. |
| `source` | `"auto" \| "archive" \| "live"` | Override the automatic routing. |
| `limit` | `int` | Total results to return (default 50). |
**Returns:** `list[Judgment]` with `source` set to `"archive"` or `"live"` per item.
**Routing (in `source="auto"` mode):**
| filter shape | backend |
|---|---|
| `cnr=` set | archive (prefix-routed; no scan) |
| `text=` set, no structured filters | live |
| structured filters only (court/year/judge/party/citation) | archive |
| `text=` + structured filters | archive — `text` folds into a title-substring match (party slot) |
| nothing | raises `ValueError` |
The decision is logged at INFO level (`logging.getLogger("bharat_courts.facade")`).
---
#### `fetch_pdf(judgment_or_cnr, *, language="english") -> bytes`
Convenience wrapper that delegates to `ArchiveClient.fetch_pdf`. Accepts a `Judgment` instance or a CNR string. Raises `NotImplementedError` if you pass a `Judgment` with `source="live"` — the live download path needs the original `JudgmentResult` (for session continuity), so call `JudgmentSearchClient.download_pdf(judgment_result, court_type)` directly for that case.
---
#### `live_to_judgment(jr: JudgmentResult) -> Judgment`
Public helper if you're doing routing yourself: normalises a `JudgmentResult` (returned by `JudgmentSearchClient.search`) into the unified `Judgment` shape. Maps `source_id → cnr`, resolves `court_name` via the registry, pulls `disposal_nature` / `registration_date` out of `metadata`.
---
### `HCServicesClient`
Primary client for High Court case data via `hcservices.ecourts.gov.in`.
```python
from bharat_courts import HCServicesClient
client = HCServicesClient(
config=None, # BharatCourtsConfig | None — uses global config singleton if None
captcha_solver=None, # CaptchaSolver | None — defaults to OCRCaptchaSolver if ddddocr installed
http_client=None, # RateLimitedClient | None — creates one internally if None
)
```
Use as an async context manager (no solver needed if `bharat-courts[ocr]` is installed):
```python
async with HCServicesClient() as client:
...
```
---
#### `list_benches(court) -> dict[str, str]`
Get available benches for a High Court. **No CAPTCHA required.**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `court` | `Court` | Yes | Court object from `get_court()` |
**Returns:** `dict[str, str]` — mapping of bench code to bench name.
```python
delhi = get_court("delhi")
benches = await client.list_benches(delhi)
# {'1': 'Principal Bench at Delhi'}
bombay = get_court("bombay")
benches = await client.list_benches(bombay)
# {'1': 'Principal Seat at Bombay', '2': 'Nagpur Bench', '3': 'Aurangabad Bench', '4': 'Goa Bench'}
```
---
#### `list_case_types(court, *, bench_code="1") -> dict[str, str]`
Get available case type codes for a court bench. **No CAPTCHA required.**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court` | Yes | — | Court object |
| `bench_code` | `str` | No | `"1"` | Bench code from `list_benches()` |
**Returns:** `dict[str, str]` — mapping of case type code to name.
```python
case_types = await client.list_case_types(delhi)
# {'134': 'W.P.(C)(CIVIL WRITS)-134', '27': 'W.P.(CRL)-27', '3': 'EL.PET.-3', ...}
```
---
#### `case_status(court, *, case_type, case_number, year, bench_code="1") -> list[CaseInfo]`
Look up case status by case number. **CAPTCHA required** (auto-retried, default 5 attempts).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court` | Yes | — | Court object |
| `case_type` | `str` | Yes | — | Numeric case type code (use `list_case_types()` to discover) |
| `case_number` | `str` | Yes | — | Case number without type/year |
| `year` | `str` | Yes | — | Registration year, e.g. `"2024"` |
| `bench_code` | `str` | No | `"1"` | Bench code from `list_benches()` |
**Returns:** `list[CaseInfo]` — matching cases. Notable field semantics:
- `case_type` on the result is a **label** like `"W.P.(C)"` (sourced from the portal's `type_name` field), not the numeric code you passed in.
- `registration_number` is populated from the portal's `case_no2` field.
- `status` is **always empty** — the live `showRecords` endpoint doesn't return Pending/Disposed (that data lives behind `o_civil_case_history.php`, which the SDK doesn't call yet). Same for `registration_date`, `judges`, and `next_hearing_date`.
```python
cases = await client.case_status(
delhi,
case_type="134", # numeric code from list_case_types()
case_number="1",
year="2024",
)
for case in cases:
print(f"{case.case_type} {case.case_number} CNR: {case.cnr_number}")
print(f" {case.petitioner} v {case.respondent}")
```
---
#### `case_status_by_party(court, *, party_name, year, bench_code="1", status_filter="Both") -> list[CaseInfo]`
Search cases by party name. **CAPTCHA required** (auto-retried, default 5 attempts).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court` | Yes | — | Court object |
| `party_name` | `str` | Yes | — | Petitioner or respondent name (min 3 characters) |
| `year` | `str` | Yes | — | Registration year — **mandatory**, server returns error if empty |
| `bench_code` | `str` | No | `"1"` | Bench code |
| `status_filter` | `str` | No | `"Both"` | `"Pending"`, `"Disposed"`, or `"Both"` (forwarded to the portal — but the response carries no status field, so filtering happens server-side and the returned `CaseInfo.status` is still empty) |
**Returns:** `list[CaseInfo]` — matching cases. Same field-population caveats as `case_status` above. Wide queries can return tens of thousands of records in a single response with no pagination — see [issue tracker](https://github.com/iamshouvikmitra/bharat-courts/issues).
```python
cases = await client.case_status_by_party(
delhi,
party_name="state",
year="2024",
status_filter="Pending",
)
for case in cases:
print(f"{case.case_number}: {case.petitioner} v {case.respondent}")
```
---
#### `court_orders(court, *, case_type, case_number, year, bench_code="1") -> list[CaseOrder]`
Get court orders for a case. **CAPTCHA required** (auto-retried).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court` | Yes | — | Court object |
| `case_type` | `str` | Yes | — | Numeric case type code |
| `case_number` | `str` | Yes | — | Case number |
| `year` | `str` | Yes | — | Registration year |
| `bench_code` | `str` | No | `"1"` | Bench code |
**Returns:** `list[CaseOrder]` — orders with dates, types, judges, and PDF URLs.
```python
orders = await client.court_orders(
delhi,
case_type="134",
case_number="1",
year="2024",
)
for order in orders:
print(f"{order.order_date}: {order.order_type} by {order.judge}")
if order.pdf_url:
pdf_bytes = await client.download_order_pdf(order.pdf_url)
with open(f"order_{order.order_date}.pdf", "wb") as f:
f.write(pdf_bytes)
```
---
#### `cause_list(court, *, civil=True, bench_code="1", causelist_date="") -> list[CauseListPDF]`
Get cause list PDFs for a court. **CAPTCHA required** (auto-retried).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court` | Yes | — | Court object |
| `civil` | `bool` | No | `True` | `True` for civil, `False` for criminal |
| `bench_code` | `str` | No | `"1"` | Bench code |
| `causelist_date` | `str` | No | `""` (today) | Date in `DD-MM-YYYY` format |
**Returns:** `list[CauseListPDF]` — one entry per bench with bench name, list type, and PDF URL.
```python
pdfs = await client.cause_list(delhi, civil=True)
for pdf in pdfs:
print(f"#{pdf.serial_number} {pdf.bench} — {pdf.cause_list_type}")
print(f" PDF: {pdf.pdf_url}")
# Criminal cause list for a specific date
criminal_pdfs = await client.cause_list(
delhi,
civil=False,
causelist_date="15-01-2025",
)
```
---
#### `download_order_pdf(pdf_url) -> bytes`
Download an order or judgment PDF. **No CAPTCHA required.**
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `pdf_url` | `str` | Yes | URL from `CaseOrder.pdf_url` or `CauseListPDF.pdf_url` |
**Returns:** `bytes` — raw PDF file content.
**Raises:** `RuntimeError` if the response doesn't start with the `%PDF` magic bytes. The HC Services portal sometimes hands back a 30-byte BOM-prefixed `"Unable to connect to server"` string with HTTP 200; this method now refuses to silently return that as a PDF.
```python
pdf_bytes = await client.download_order_pdf(order.pdf_url)
with open("order.pdf", "wb") as f:
f.write(pdf_bytes)
```
---
### `DistrictCourtClient`
Client for District Courts across India via `services.ecourts.gov.in`. Covers 700+ court complexes across 36 states/UTs.
Unlike High Courts (which use static `get_court()` codes), district courts require dynamic discovery of the 4-level hierarchy: **State → District → Court Complex → Establishment**.
```python
from bharat_courts import DistrictCourtClient
client = DistrictCourtClient(
config=None, # BharatCourtsConfig | None
captcha_solver=None, # CaptchaSolver | None — defaults to OCRCaptchaSolver if ddddocr installed
http_client=None, # RateLimitedClient | None
)
```
Use as an async context manager:
```python
async with DistrictCourtClient() as client:
...
```
---
#### Court Discovery Methods (No CAPTCHA)
These methods discover the court hierarchy dynamically.
##### `list_states() -> dict[str, str]`
Returns all 36 states/UTs with their codes. Static data, no network call.
```python
states = await client.list_states()
# {"8": "Bihar", "7": "Delhi", "27": "Maharashtra", ...}
```
##### `list_districts(state_code) -> dict[str, str]`
Get districts for a state.
```python
districts = await client.list_districts("8") # Bihar
# {"1": "Patna", "35": "Gaya", "38": "Muzaffarpur", ...}
```
##### `list_complexes(state_code, dist_code) -> dict[str, str]`
Get court complexes for a district. Values are in `code@ests@flag` format.
```python
complexes = await client.list_complexes("8", "1") # Bihar, Patna
# {"1080010@2,3,4@Y": "Civil Court, Patna Sadar", ...}
# Parse the value to extract the code and check if establishment selection is needed
from bharat_courts.districtcourts.parser import parse_complex_value
code, est_codes, needs_est = parse_complex_value("1080010@2,3,4@Y")
# code="1080010", est_codes=["2","3","4"], needs_est=True
```
##### `list_establishments(state_code, dist_code, court_complex_code) -> dict[str, str]`
Get establishments for a court complex. Only needed when `needs_est` is `True`.
```python
establishments = await client.list_establishments("8", "1", "1080010")
# {"2": "DJ Div. Patna Sadar", "3": "CJM Div. Patna Sadar", ...}
```
##### `list_case_types(state_code, dist_code, court_complex_code, est_code) -> dict[str, str]`
Get available case types for a court. Codes are returned in the portal's compound `"^"` format — pass them back **verbatim** to `case_status` / `court_orders`; do not strip the `^N` suffix.
```python
case_types = await client.list_case_types("8", "1", "1080010", "2")
# {"89^2": "ADMINISTRATIVE SUITE", "152^2": "Anticipatory Bail - ABP", ...}
```
##### `list_cause_list_courts(state_code, dist_code, court_complex_code, est_code="") -> dict[str, str]`
Get the courts dropdown for cause-list lookup. Returns a mapping of `court_no` (e.g. `"1@2"`) to court display name (e.g. `"District & Sessions Judge - DJ Div. Patna Sadar"`). The cause-list form requires both — pass either through directly to `cause_list()`, which will look up the matching name automatically if you only know the code.
```python
courts = await client.list_cause_list_courts("8", "1", "1080010", "2")
# {"1@2": "District & Sessions Judge - DJ Div. Patna Sadar", ...}
```
---
#### Search Methods (CAPTCHA Required)
All search methods take the 4-level court identifiers as keyword arguments.
##### `case_status(*, state_code, dist_code, court_complex_code, est_code, case_type, case_number, year) -> list[CaseInfo]`
Search by case number. `case_type` must be the full compound `"^"` string from `list_case_types()`.
```python
cases = await client.case_status(
state_code="8", dist_code="1",
court_complex_code="1080010", est_code="2",
case_type="89^2", # full compound code, not just "89"
case_number="100", year="2024",
)
```
##### `case_status_by_party(*, state_code, dist_code, court_complex_code, est_code, party_name, year, status_filter="Both") -> list[CaseInfo]`
Search by party name (min 3 characters). `year` is mandatory.
```python
cases = await client.case_status_by_party(
state_code="8", dist_code="1",
court_complex_code="1080010", est_code="2",
party_name="kumar", year="2024",
status_filter="Pending", # "Pending", "Disposed", or "Both"
)
```
##### `court_orders(*, state_code, dist_code, court_complex_code, est_code, case_type, case_number, year) -> list[CaseOrder]`
Get court orders for a case.
```python
orders = await client.court_orders(
state_code="8", dist_code="1",
court_complex_code="1080010", est_code="2",
case_type="1", case_number="100", year="2024",
)
```
##### `cause_list(*, state_code, dist_code, court_complex_code, est_code, court_no, court_name="", causelist_date="", civil=True) -> list[CauseListEntry]`
Get cause list entries. `court_no` is now **required** — discover the available codes via `list_cause_list_courts()`. `court_name` is the option's display label; the portal validates against it (sending an empty `court_name_txt` triggers a `"Court Name is required"` error). If you leave `court_name` blank, this method calls `list_cause_list_courts()` once and looks up the matching label for `court_no`.
```python
entries = await client.cause_list(
state_code="8", dist_code="1",
court_complex_code="1080010", est_code="2",
court_no="1@2", # required, from list_cause_list_courts()
civil=True,
causelist_date="20-03-2026", # DD-MM-YYYY, defaults to today
)
for e in entries:
print(f"#{e.serial_number} {e.case_number} — {e.petitioner} v {e.respondent}")
```
---
### `JudgmentSearchClient`
Client for the eCourts judgment search portal (`judgments.ecourts.gov.in`).
```python
from bharat_courts import JudgmentSearchClient
async with JudgmentSearchClient(captcha_solver=solver) as client:
...
```
---
#### `search(search_text, *, page=1, page_size=10, search_opt="PHRASE", court_type="2", max_captcha_attempts=5) -> SearchResult`
Search for judgments by keyword. **CAPTCHA required.**
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `search_text` | `str` | Yes | — | Search query text |
| `page` | `int` | No | `1` | Page number (1-indexed) |
| `page_size` | `int` | No | `10` | Rows per page (portal supports 10/25/50/100/1000) |
| `search_opt` | `str` | No | `"PHRASE"` | `"PHRASE"`, `"ANY"`, or `"ALL"` |
| `court_type` | `str` | No | `"2"` | `"2"` for High Courts, `"3"` for SCR |
| `max_captcha_attempts` | `int` | No | `5` | Max CAPTCHA retry attempts |
**Returns:** `SearchResult` — contains `items: list[JudgmentResult]`, `total_count`, pagination info. Each `JudgmentResult` includes parsed metadata (CNR number, disposal nature, registration date) and `source_id` (CNR) when available.
**Raises:** `CaptchaError` if the CAPTCHA solver couldn't authenticate within `max_captcha_attempts` tries. Empty results now mean "the portal returned zero rows" — they no longer mask a silent CAPTCHA failure (older versions returned `SearchResult()` with no signal).
```python
from bharat_courts import JudgmentSearchClient
from bharat_courts.captcha.ocr import OCRCaptchaSolver
async with JudgmentSearchClient(captcha_solver=OCRCaptchaSolver()) as client:
results = await client.search("right to privacy")
print(f"Found {results.total_count} results")
for judgment in results.items:
print(f"{judgment.title}")
print(f" Court: {judgment.court_name}, Date: {judgment.judgment_date}")
print(f" CNR: {judgment.source_id}")
print(f" Metadata: {judgment.metadata}")
```
---
#### `search_all(search_text, *, page_size=25, search_opt="PHRASE", court_type="2", max_captcha_attempts=5) -> AsyncIterator[SearchResult]`
Iterate through all pages of search results. Yields one `SearchResult` per page, automatically handling pagination, token rotation, and session expiry (re-authenticates mid-walk if the portal session lapses).
```python
async with JudgmentSearchClient(captcha_solver=solver) as client:
async for page in client.search_all("land acquisition"):
for judgment in page.items:
print(f"{judgment.title} ({judgment.judgment_date})")
```
---
#### `download_pdf(judgment, *, court_type="2") -> JudgmentResult`
Download the PDF for a judgment result.
Important: `judgment.pdf_url` is **not** a directly-fetchable URL — it's the row's relative `path` from the portal's `open_pdf(...)` JS handler. This method does the `openpdfcaptcha` resolution dance to obtain a per-session `outputfile` URL, then GETs the actual PDF bytes. Each row's `pdf_val` (also stashed by the parser inside `judgment.metadata`) is forwarded automatically; without it the portal serves the first row's PDF for every subsequent call within the same session.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `judgment` | `JudgmentResult` | Yes | — | A result from `search()` |
| `court_type` | `str` | No | `"2"` | Same `"2"` / `"3"` as on `search()` |
**Returns:** the same `JudgmentResult` mutated in place — `pdf_bytes` is set on success.
**Raises:** `RuntimeError` if the response is empty, non-JSON, or doesn't start with `%PDF`.
```python
judgment = results.items[0]
await client.download_pdf(judgment)
if judgment.pdf_bytes:
with open("judgment.pdf", "wb") as f:
f.write(judgment.pdf_bytes)
```
---
#### `download_pdfs(judgments, *, court_type="2", stop_on_error=False) -> list[JudgmentResult]`
Bulk-download PDFs for multiple judgments. Skips entries that already have `pdf_bytes` set. Failed downloads are logged at WARNING level by default; pass `stop_on_error=True` to raise on the first failure instead.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `judgments` | `list[JudgmentResult]` | Yes | — | Judgments to download PDFs for |
| `court_type` | `str` | No | `"2"` | Forwarded to each `download_pdf` call |
| `stop_on_error` | `bool` | No | `False` | Re-raise the first download exception instead of logging it |
**Returns:** the same list, with `pdf_bytes` populated where successful.
```python
async with JudgmentSearchClient(captcha_solver=solver) as client:
results = await client.search("constitution")
await client.download_pdfs(results.items)
for j in results.items:
if j.pdf_bytes:
with open(f"{j.case_number}.pdf", "wb") as f:
f.write(j.pdf_bytes)
```
---
### `SCIClient`
Client for the Supreme Court of India (`www.sci.gov.in`). **No CAPTCHA required.**
The legacy host (`main.sci.gov.in`) that older versions of this SDK targeted has been in long-term maintenance for years and now returns HTTP 503 to every path. The live site is `www.sci.gov.in` (WordPress); `SCIClient` was rewritten against it.
```python
from bharat_courts import SCIClient
# Note: no captcha_solver parameter — the homepage feed doesn't use CAPTCHAs
async with SCIClient() as client:
...
```
---
#### `list_recent_judgments(*, limit=50) -> list[JudgmentResult]`
Scrape the homepage's "Latest Judgements / Orders" feed. Returns the 50 most recent items the portal surfaces inline (the portal caps it at 50 — pass a smaller `limit` to truncate).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `limit` | `int` | No | `50` | Max items to return |
**Returns:** `list[JudgmentResult]` — each carries:
- `title` — `"PETITIONER VS. RESPONDENT"`
- `case_number` — e.g. `"C.A. No. 6677/2026"`
- `judgment_date` — parsed from the row's "DD-MMM-YYYY" tail
- `source_id` — diary number (the portal's primary key)
- `pdf_url` — the `/sci-get-pdf/?diary_no=...` URL the in-page viewer iframe uses
- `source_url` — the matching `/view-pdf/?diary_no=...` URL (for opening in a browser)
- `metadata["petitioner"]`, `metadata["respondent"]`, `metadata["type"]` (`"j"` = judgment, `"o"` = order)
```python
async with SCIClient() as client:
recent = await client.list_recent_judgments(limit=10)
for j in recent:
print(f"{j.judgment_date}: {j.title} [diary {j.source_id}]")
```
---
#### `download_pdf(judgment) -> JudgmentResult`
Download the PDF bytes for a Supreme Court judgment via the `/sci-get-pdf/?diary_no=...` endpoint.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `judgment` | `JudgmentResult` | Yes | An item from `list_recent_judgments()` |
**Returns:** the same `JudgmentResult` with `pdf_bytes` populated.
**Raises:** `RuntimeError` if the response doesn't start with `%PDF`.
---
#### `search_by_year(year, month=None)` and `search_by_party(party_name)` — not implemented
Both methods now raise `NotImplementedError`. The legacy `main.sci.gov.in` form they hit is permanently 503; the equivalent flow on `www.sci.gov.in` is gated behind a CAPTCHA-protected case-no/diary-no/party-name form (`/judgements-case-no/`) that the SDK does not yet wire up. Use `list_recent_judgments()` for the most recent items.
---
### `CalcuttaHCClient`
Client for Calcutta High Court's own website (`calcuttahighcourt.gov.in`). Provides order/judgment search with PDF download for cases from September 2020 onwards (CIS system). Has better PDF coverage than the eCourts portal for Calcutta HC cases.
```python
from bharat_courts import CalcuttaHCClient
async with CalcuttaHCClient() as client:
...
```
---
#### `search_orders(*, case_type, case_number, year, establishment="appellate", max_captcha_attempts=5) -> tuple[CaseInfo | None, list[CaseOrder]]`
Search for orders/judgments by case number. **CAPTCHA required** (auto-retried, default 5 attempts).
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `case_type` | `str` | Yes | — | Numeric case type code (e.g. `"12"` for WPA) |
| `case_number` | `str` | Yes | — | Case registration number |
| `year` | `str` | Yes | — | Case year |
| `establishment` | `str` | No | `"appellate"` | `"appellate"`, `"original"`, `"jalpaiguri"`, or `"portblair"` |
| `max_captcha_attempts` | `int` | No | `5` | Max CAPTCHA retries |
**Returns:** `tuple[CaseInfo | None, list[CaseOrder]]`. The `CaseInfo` carries the case-level metadata the portal returns alongside the order rows (parties, CNR, full case number, side); previous versions silently dropped this. Returns `(None, [])` when nothing matched.
```python
case_info, orders = await client.search_orders(
case_type="12", # WPA
case_number="12886",
year="2024",
establishment="appellate",
)
if case_info:
print(f"{case_info.case_number} CNR: {case_info.cnr_number}")
print(f" {case_info.petitioner} v {case_info.respondent}")
for order in orders:
print(f"{order.order_date}: {order.order_type} by {order.judge}")
print(f" Neutral Citation: {order.neutral_citation}")
if order.pdf_url:
pdf = await client.download_order_pdf(order.pdf_url)
```
---
#### `download_order_pdf(pdf_url) -> bytes`
Download an order/judgment PDF. **No CAPTCHA required.**
**Raises:** `RuntimeError` if the response doesn't start with the `%PDF` magic bytes.
```python
pdf_bytes = await client.download_order_pdf(order.pdf_url)
```
---
### `ArchiveClient`
Read-only access to the public AWS Open Data judgment archives (no CAPTCHA, no
rate limits, no accounts). Requires the `archive` extra:
```bash
pip install 'bharat-courts[archive]'
```
```python
from bharat_courts import ArchiveClient
async with ArchiveClient(
cache_dir=None, # str | None — defaults to ~/.cache/bharat-courts/archive/
cache_max_bytes=None, # int | None — defaults to 5 GiB (or env override)
metadata_cache=True, # bool — disable to skip the local parquet mirror
) as client:
...
```
DuckDB runs the metadata queries against partitioned parquet files; PDFs are
served via direct HTTP GET (HC, one file per judgment) or random-access tar
extraction (SCI, one tar per year). Both layers cache on disk.
---
#### `search(*, court=None, year=None, judge=None, party=None, citation=None, cnr=None, limit=50) -> list[Judgment]`
Search both archives in one call. CNR-only queries auto-route via the prefix —
no need to specify `court=` for `fetch_pdf("DLHC...")`.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `court` | `Court \| str \| None` | No | `None` | Court object, code string (`"sci"`, `"delhi"`), or `None` to query both buckets |
| `year` | `int \| tuple[int, int] \| None` | No | `None` | Single year (`2020`) or inclusive range (`(2018, 2024)`). Drives partition pruning — strongly recommended for non-CNR queries |
| `judge` | `str \| None` | No | `None` | Case-insensitive substring on the judge field |
| `party` | `str \| None` | No | `None` | SCI: searches petitioner/respondent/title. HC: title only (HC parquet has no party columns) |
| `citation` | `str \| None` | No | `None` | SCI only — silently ignored for HC |
| `cnr` | `str \| None` | No | `None` | Exact CNR match. Auto-resolves source from the 4-letter prefix when `court` isn't given |
| `limit` | `int` | No | `50` | Total results across sources |
**Returns:** `list[Judgment]` sorted by `decision_date DESC`.
---
#### `iter_judgments(*, court=None, year=None, judge=None, party=None, citation=None, cnr=None, batch_size=500, max_results=None) -> AsyncIterator[Judgment]`
Stream judgments page-by-page via `LIMIT/OFFSET` with a stable sort
(`decision_date DESC, cnr`). Use this for bulk pulls — "all 18k Delhi 2020
judgments" — without materialising everything in memory.
Sources are streamed sequentially (SCI first, then HC) with no cross-source
date merge.
```python
count = 0
async for j in client.iter_judgments(court="delhi", year=2020, batch_size=500):
count += 1
# process(j)
```
---
#### `fetch_pdf(judgment_or_cnr, *, language="english") -> bytes`
Fetch a judgment PDF. Pass a `Judgment` (preferred — avoids a metadata lookup)
or a CNR string. SCI judgments support `language="hindi" | "tamil" | "gujarati" | …`
(see `bharat_courts.archive.endpoints.SCI_LANGUAGE_MAP`); HC PDFs are
English-only in the archive.
```python
data = await client.fetch_pdf("DLHC010230802020") # ~250 KB direct GET
data = await client.fetch_pdf("ESCR010000301950", language="english")
# First SCI fetch in a year downloads the year tar (~40–500 MB); subsequent
# fetches for that year are tar-extraction-fast.
```
**Raises:** `ArchivePdfError` for missing files, missing metadata fields, or
HTTP failures.
---
#### `prefetch_sci_year(year, language="english") -> str`
Pre-warm the SCI tar cache for a year. Useful before a batch of related
fetches.
---
#### `count(*, court=None, year=None) -> dict[str, int]`
Per-bucket row counts, e.g. `{"sci": 571}` for a single SCI year.
---
#### `cache_info() -> dict`
Snapshot: `{"cache_dir": ..., "files": ..., "bytes": ..., "max_bytes": ...}`.
---
#### Helpers
```python
from bharat_courts import infer_court_from_cnr
infer_court_from_cnr("DLHC010230802020") # → Court(code="delhi", ...)
infer_court_from_cnr("ESCR010000301950") # → SUPREME_COURT
infer_court_from_cnr("ZZZZ012345") # → None
```
Use this if you're routing CNRs yourself (e.g. into the live `JudgmentSearchClient`).
---
### Court Registry Functions
```python
from bharat_courts import get_court, get_court_by_name, list_high_courts, list_all_courts
from bharat_courts.courts import get_court_by_judgment_code
```
#### `get_court(code) -> Court | None`
Look up a court by its code. Case-insensitive.
```python
get_court("delhi") # Delhi High Court
get_court("bombay-nagpur") # Bombay HC, Nagpur Bench
get_court("sci") # Supreme Court of India
get_court("nonexistent") # None
```
#### `get_court_by_name(name) -> Court | None`
Look up a court by its full name. Case-insensitive exact match.
```python
get_court_by_name("Delhi High Court") # Court(name="Delhi High Court", ...)
```
#### `get_court_by_judgment_code(judgment_code) -> Court | None`
Look up a court by its `judgments.ecourts.gov.in` code. Returns the main court (not bench variants).
```python
get_court_by_judgment_code("7") # Delhi High Court
get_court_by_judgment_code("27") # Bombay High Court (main, not bench)
```
#### `list_high_courts() -> list[Court]`
Returns all 29 High Court entries (25 HCs + bench-specific entries for Bombay and Allahabad).
#### `list_all_courts() -> list[Court]`
Returns all 30 courts (Supreme Court + all High Courts).
#### Module-level constants
```python
from bharat_courts import ALL_COURTS, SUPREME_COURT
SUPREME_COURT # Court(name="Supreme Court of India", code="sci", state_code="0")
ALL_COURTS # list of all 30 Court objects
```
---
### Data Models
All models are Python dataclasses with `to_dict()` and `to_json()` serialization methods.
```python
# Available on all models
model.to_dict(exclude_none=False) # -> dict (dates become ISO strings, enums become values)
model.to_json(indent=None, exclude_none=False) # -> JSON string
```
#### `Court`
```python
@dataclass(frozen=True)
class Court:
name: str # "Delhi High Court"
code: str # "delhi"
state_code: str # "26" (hcservices.ecourts.gov.in)
court_type: CourtType # CourtType.HIGH_COURT
bench: str | None = None # "Lucknow Bench" (for bench-specific entries)
judgment_code: str = "" # "7" (judgments.ecourts.gov.in)
@property
def slug(self) -> str # code lowercased, spaces replaced with hyphens
@property
def judgment_compound_code(self) -> str # "{judgment_code}~{state_code}", e.g. "7~26"
```
#### `CourtType`
```python
class CourtType(str, Enum):
SUPREME_COURT = "supreme_court"
HIGH_COURT = "high_court"
DISTRICT_COURT = "district_court"
TRIBUNAL = "tribunal"
```
#### `CaseInfo`
Returned by `case_status()` and `case_status_by_party()`.
```python
@dataclass
class CaseInfo:
case_number: str # "3/2024"
case_type: str # Case type label, e.g. "W.P.(C)"
cnr_number: str = "" # "DLHC010582482024"
filing_number: str = ""
registration_number: str = ""
registration_date: date | None = None
petitioner: str = ""
respondent: str = ""
status: str = "" # empty for HC Services (showRecords doesn't return it)
court_name: str = ""
judges: list[str] = []
next_hearing_date: date | None = None
```
#### `CaseOrder`
Returned by `court_orders()`.
```python
@dataclass
class CaseOrder:
order_date: date
order_type: str # "Judgment" | "Order" | "Interim Order"
judge: str = ""
pdf_url: str = ""
pdf_bytes: bytes | None = None # populated by download_order_pdf(); excluded from serialization
order_text: str = ""
neutral_citation: str = "" # e.g. "2024:CHC-AS:1277" (Calcutta HC)
```
#### `CauseListPDF`
Returned by `cause_list()`.
```python
@dataclass
class CauseListPDF:
serial_number: int
bench: str # "Division Bench"
cause_list_type: str = "" # "COMPLETE CAUSE LIST"
pdf_url: str = ""
pdf_bytes: bytes | None = None # excluded from serialization
```
#### `JudgmentResult`
Returned by `JudgmentSearchClient.search()` / `search_all()` and `SCIClient.list_recent_judgments()`.
```python
@dataclass
class JudgmentResult:
title: str
court_name: str
case_number: str = ""
judgment_date: date | None = None
judges: list[str] = []
pdf_url: str = ""
pdf_bytes: bytes | None = None # populated by download_pdf(); excluded from serialization
citation: str = ""
bench_type: str = "" # "Division Bench" | "Single Bench" | "Full Bench"
source_url: str = ""
source_id: str = ""
metadata: dict = {}
```
#### `SearchResult`
Returned by `JudgmentSearchClient.search()`.
```python
@dataclass
class SearchResult:
items: list[CaseInfo | JudgmentResult | CauseListEntry] = []
total_count: int = 0
page: int = 1
page_size: int = 10
has_next: bool = False
@property
def total_pages(self) -> int # ceil(total_count / page_size)
```
#### `CauseListEntry`
Structured cause list data (for parsed cause list entries).
```python
@dataclass
class CauseListEntry:
serial_number: int
case_number: str
case_type: str = ""
petitioner: str = ""
respondent: str = ""
advocate_petitioner: str = ""
advocate_respondent: str = ""
court_number: str = ""
judge: str = ""
listing_date: date | None = None
item_number: str = ""
```
---
### CAPTCHA Solvers
All solvers implement the `CaptchaSolver` abstract base class:
```python
from bharat_courts.captcha.base import CaptchaSolver
class CaptchaSolver(ABC):
@abstractmethod
async def solve(self, image_bytes: bytes) -> str:
"""Given raw CAPTCHA image bytes, return the solved text."""
```
#### `OCRCaptchaSolver`
Automatic CAPTCHA solving using `ddddocr`. Requires `pip install bharat-courts[ocr]`.
```python
from bharat_courts.captcha.ocr import OCRCaptchaSolver
solver = OCRCaptchaSolver(
preprocess=False, # Apply image binarization + median filter before OCR
threshold=128, # Binarization threshold (0-255), used if preprocess=True
)
```
~75% accuracy on the judgments portal in our measurements; failed attempts are automatically retried with fresh sessions (default 5 retries — `P(all fail) ≈ 0.1%`). Outputs that aren't exactly 6 alphanumeric characters are rejected before being submitted, so the portal's "captcha must be 6 chars" envelope no longer burns a retry.
#### `ONNXCaptchaSolver`
Lightweight CAPTCHA solver using ONNX Runtime. Requires `pip install bharat-courts[onnx]`. Uses a pre-trained model from HuggingFace (captchabreaker), downloaded to `~/.cache/bharat-courts/` at init time.
**Requires `HF_TOKEN`**: The HuggingFace model repo requires authentication. Set `export HF_TOKEN=hf_...` (get a token at https://huggingface.co/settings/tokens). If you don't have a token, use `OCRCaptchaSolver` instead.
```python
from bharat_courts.captcha.onnx import ONNXCaptchaSolver
solver = ONNXCaptchaSolver()
# Or with a custom model file
solver = ONNXCaptchaSolver(model_path="/path/to/custom_model.onnx")
```
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `model_path` | `str \| Path \| None` | No | `None` | Path to a custom ONNX model. If `None`, downloads the default captchabreaker model. |
Validates that decoded text is exactly 6 characters — returns empty string on wrong length to trigger client retry.
#### `ManualCaptchaSolver`
Interactive solver that saves the CAPTCHA image and prompts the user.
```python
from bharat_courts.captcha.manual import ManualCaptchaSolver
# Prompt on stdin (saves image to /tmp/*.png for viewing)
solver = ManualCaptchaSolver()
# Or provide a custom callback (sync or async)
solver = ManualCaptchaSolver(callback=my_captcha_handler)
```
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `callback` | `Callable[[bytes], str \| Awaitable[str]] \| None` | No | `None` | Custom handler. Receives image bytes, returns solved text. If `None`, prompts on stdin. |
#### Custom Solver
Implement `CaptchaSolver` for your own solving strategy:
```python
from bharat_courts.captcha.base import CaptchaSolver
class MyCaptchaSolver(CaptchaSolver):
async def solve(self, image_bytes: bytes) -> str:
# Send to a CAPTCHA solving service, ML model, etc.
return "solved_text"
async with HCServicesClient(captcha_solver=MyCaptchaSolver()) as client:
...
```
## CLI
The CLI is organised into one command group per portal, matching the SDK module layout:
```
bharat-courts version
bharat-courts courts [--type all|hc|sc]
bharat-courts find [--text | --judge | --party | --citation | --cnr | --court | --year | --source]
bharat-courts hcservices benches | case-types | search | search-by-party | orders | cause-list
bharat-courts districtcourts states | districts | complexes | establishments | case-types | courts | search | search-by-party | orders | cause-list
bharat-courts calcuttahc search
bharat-courts judgments search | search-all
bharat-courts sci recent
bharat-courts archive query | get | download | count | cache
bharat-courts install-skills
```
The top-level `find` command is the federated entry point: it routes between archive and live based on which filters you pass. Use it as the default for "find a judgment"; reach for the portal-specific groups when you need a portal-only feature.
Global flags (apply to every subcommand):
| Flag | Description |
|------|-------------|
| `--json` | Emit machine-readable JSON instead of formatted text. Lists return arrays; single dataclasses return objects; `calcuttahc search` returns `{"case_info": ..., "orders": [...]}`. |
| `--captcha-attempts N` | Override the default CAPTCHA retry budget (5). Currently honoured by `judgments` and `calcuttahc`; `hcservices` and `districtcourts` use a fixed internal budget. |
| `--verbose` / `-v` | Enable INFO-level SDK logging on stderr. |
Every PDF-producing command takes `--download DIR` to save PDFs alongside the printed output. Filenames are `_.pdf`, sanitised.
### Examples
```bash
# Print version, list available courts
bharat-courts version
bharat-courts courts --type hc
# Federated find — routes between archive and live for you
bharat-courts find --judge "chandrachud" --year 2022 --court sci --limit 5
bharat-courts find --cnr DLHC010230802020 # auto-routed via CNR prefix
bharat-courts find --text "right to privacy" --limit 5 # → live (full-text)
bharat-courts find --text "asian hotels" --court delhi --year 2020 # mixed → archive
bharat-courts find --party "tata motors" --year 2020 --source archive # force backend
bharat-courts --json find --judge "bobde" --year 2020 --court sci # JSON output
# HC Services — discover bench / case-type codes, then search
bharat-courts hcservices benches delhi
bharat-courts hcservices case-types delhi --bench 1
bharat-courts hcservices search delhi --case-type 134 --case-number 1 --year 2024
bharat-courts hcservices orders delhi --case-type 134 --case-number 1 --year 2024 --download ./orders/
bharat-courts hcservices cause-list delhi --date 24-04-2026
# District Courts — drill down state -> district -> complex -> establishment
bharat-courts districtcourts states
bharat-courts districtcourts districts --state 8
bharat-courts districtcourts complexes --state 8 --dist 1
bharat-courts districtcourts case-types --state 8 --dist 1 --complex 1080010 --est 2
bharat-courts districtcourts search \
--state 8 --dist 1 --complex 1080010 --est 2 \
--case-type "89^2" --case-number 100 --year 2024
bharat-courts districtcourts cause-list \
--state 8 --dist 1 --complex 1080010 --est 2 \
--court-no "1@2" # --court-name auto-resolves if blank
# Calcutta HC (returns case_info + orders)
bharat-courts calcuttahc search --case-type 12 --case-number 12886 --year 2024
# Judgments portal
bharat-courts judgments search --text "right to privacy" --page-size 25
bharat-courts judgments search-all --text "land acquisition" --max-pages 5 --download ./pdfs/
# Supreme Court — homepage feed
bharat-courts sci recent --limit 10
bharat-courts sci recent --limit 5 --download ./sci-pdfs/
# Historical archive (AWS Open Data buckets — needs `pip install bharat-courts[archive]`)
bharat-courts archive query --court sci --judge "chandrachud" --year 2022 --limit 5
bharat-courts archive query --court delhi --year 2020 --judge endlaw --limit 3
bharat-courts archive get --cnr DLHC010230802020 --pdf --out ./judgment.pdf
bharat-courts archive download --court sci --year 2020 # pre-warm the year tar
bharat-courts archive count --court sci --year 2020 # → "sci: 571"
bharat-courts archive cache # show disk usage
bharat-courts archive cache --clear # wipe local cache
# JSON output for piping to jq / spreadsheets
bharat-courts --json courts --type sc | jq '.[].name'
bharat-courts --json hcservices benches bombay
bharat-courts --json archive query --court sci --year 2020 --judge bobde --limit 10
# Install the AI agent skill bundle (Claude Code, Copilot, etc.)
bharat-courts install-skills
```
## Configuration
Environment variables with `BHARAT_COURTS_` prefix:
| Variable | Default | Description |
|----------|---------|-------------|
| `BHARAT_COURTS_REQUEST_DELAY` | `1.0` | Seconds between requests |
| `BHARAT_COURTS_TIMEOUT` | `60` | Request timeout (seconds). Wide District Courts party-name searches genuinely take 30-60s on the portal — the previous default of 30 was too tight and triggered timeouts against endpoints that were about to respond. |
| `BHARAT_COURTS_MAX_RETRIES` | `3` | Retry count on failure (only applied to 5xx and connect/read timeouts; 4xx responses propagate immediately). |
| `BHARAT_COURTS_LOG_LEVEL` | `INFO` | Logging level |
Or use a `.env` file. See [.env.example](.env.example).
## Supported Courts
All 25 High Courts with verified eCourts state codes and judgment portal codes:
| Court | Code | State Code | Judgment Code |
|-------|------|------------|---------------|
| Allahabad HC | `allahabad` | 13 | 9 |
| Andhra Pradesh HC | `andhra` | 2 | 28 |
| Bombay HC | `bombay` | 1 | 27 |
| Calcutta HC | `calcutta` | 16 | 19 |
| Chhattisgarh HC | `chhattisgarh` | 18 | 22 |
| Delhi HC | `delhi` | 26 | 7 |
| Gauhati HC | `gauhati` | 6 | 18 |
| Gujarat HC | `gujarat` | 17 | 24 |
| Himachal Pradesh HC | `himachal` | 5 | 2 |
| J&K HC | `jammu` | 12 | 1 |
| Jharkhand HC | `jharkhand` | 7 | 20 |
| Karnataka HC | `karnataka` | 3 | 29 |
| Kerala HC | `kerala` | 4 | 32 |
| Madhya Pradesh HC | `mp` | 23 | 23 |
| Madras HC | `madras` | 10 | 33 |
| Manipur HC | `manipur` | 25 | 14 |
| Meghalaya HC | `meghalaya` | 21 | 17 |
| Orissa HC | `orissa` | 11 | 21 |
| Patna HC | `patna` | 8 | 10 |
| Punjab & Haryana HC | `punjab` | 22 | 3 |
| Rajasthan HC | `rajasthan` | 9 | 8 |
| Sikkim HC | `sikkim` | 24 | 11 |
| Telangana HC | `telangana` | 29 | 36 |
| Tripura HC | `tripura` | 20 | 16 |
| Uttarakhand HC | `uttarakhand` | 15 | 5 |
| Supreme Court | `sci` | 0 | — |
Bombay and Allahabad HCs also have bench-specific entries (e.g., `bombay-nagpur`, `allahabad-lucknow`).
## Contributing
Contributions are welcome! Here's how to get set up.
### Prerequisites
- **Python 3.11+** — check with `python3 --version`
- **git**
### Dev environment setup
```bash
# 1. Fork and clone
git clone https://github.com//bharat-courts.git
cd bharat-courts
# 2. Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# 3. Install with all extras (OCR, CLI, dev tools)
pip install -e ".[all]"
# 4. Verify everything works
pytest # 250 unit tests, no network needed
ruff check . && ruff format --check . # lint + format check
```
### Running tests
```bash
# Unit tests (fast, offline)
pytest
# Single test file
pytest tests/test_hcservices_parser.py
# Single test
pytest tests/test_hcservices_parser.py::test_parse_case_status_json
# With verbose output
pytest -v
# Live integration tests against real eCourts portals (requires ddddocr + network)
python tests/integration/hcservices.py # live HC Services + CAPTCHA solver
python tests/integration/archive.py # archive + facade against real S3
# see tests/integration/README.md for the full list
```
### Code style
The project uses [ruff](https://docs.astral.sh/ruff/) for linting and formatting:
```bash
# Check for issues
ruff check .
# Auto-fix what's possible
ruff check --fix .
# Format code
ruff format .
```
Config is in `pyproject.toml` — Python 3.11 target, 100-char line length, rules: E/F/I/N/W.
### Project structure
```
src/bharat_courts/
├── __init__.py # Public API exports
├── models.py # Dataclasses: CaseInfo, CaseOrder, CauseListPDF, etc.
├── config.py # Pydantic Settings (BHARAT_COURTS_ env prefix)
├── http.py # Rate-limited async HTTP client (httpx)
├── courts.py # Registry of 25+ HCs with eCourts codes
├── captcha/
│ ├── base.py # CaptchaSolver ABC
│ ├── manual.py # Stdin/callback solver
│ ├── ocr.py # ddddocr-based solver
│ └── onnx.py # ONNX Runtime solver (captchabreaker)
├── hcservices/ # HC Services portal (primary, fully working)
│ ├── client.py # HCServicesClient
│ ├── endpoints.py # URL + form builders
│ └── parser.py # JSON + HTML response parsers
├── districtcourts/ # District Courts portal (700+ courts)
│ ├── client.py # DistrictCourtClient
│ ├── endpoints.py # URL + form builders + state codes
│ └── parser.py # HTML response parsers
├── calcuttahc/ # Calcutta High Court (direct website)
│ ├── client.py # CalcuttaHCClient
│ ├── endpoints.py # URL + form builders
│ └── parser.py # JSON + HTML response parsers
├── judgments/ # Judgment Search portal (basic)
│ ├── client.py
│ ├── endpoints.py
│ └── parser.py
├── sci/ # Supreme Court (basic)
│ ├── client.py
│ └── parser.py
├── archive/ # AWS Open Data archive (opt-in via [archive] extra)
│ ├── client.py # ArchiveClient — async facade
│ ├── endpoints.py # Bucket URIs + SCI language map
│ ├── metadata.py # DuckDB query layer over partitioned parquet
│ ├── metadata_cache.py# Local mirror of parquet shards (TTL-invalidated)
│ ├── schema.py # Row → Judgment mapping (handles SCI + HC schemas)
│ └── storage.py # PDF cache (per-tar SCI, per-file HC) + LRU eviction
├── facade.py # Judgments — federated find()/fetch_pdf, routes archive vs live
└── cli.py # Click CLI entry point
```
### Areas where help is needed
- **Better CAPTCHA solving** — ddddocr is ~75% accurate on the judgments portal; the ONNX solver is an alternative, but a fine-tuned model would help further
- **District court search reliability** — `case_status`, `court_orders`, and `cause_list` were rewired this cycle to send the right portal field names; broader coverage testing would surface remaining edge cases (and `case_status_by_party` still has no pagination)
- **Supreme Court case search** — `SCIClient.search_by_year` / `search_by_party` are stubbed; the live `www.sci.gov.in` portal has a CAPTCHA-protected case-no/diary-no/party-name form that needs wiring up
- **HC Services case history** — `case_status` doesn't return Pending/Disposed (or registration date / next hearing) because the SDK hits `showRecords` only; calling `o_civil_case_history.php` afterwards would fill in the rest
- **More High Court coverage** — test the client against courts beyond Delhi/Bombay/Allahabad
- **Documentation** — more examples, tutorials
### Submitting changes
1. Fork the repo and create a branch (`git checkout -b my-feature`)
2. Make your changes
3. Run `pytest` and `ruff check .` to ensure tests pass and code is clean
4. Commit with a descriptive message
5. Open a pull request
## How it works
### HC Services Portal
The eCourts HC Services portal (`hcservices.ecourts.gov.in`) uses a PHP backend with:
1. **Session cookies** — `GET main.php` establishes `HCSERVICES_SESSID`
2. **Securimage CAPTCHAs** — pinned to the session (same image within one session)
3. **AJAX POST requests** — `cases_qry/index_qry.php` with `action_code` parameter
4. **JSON responses** — `{"con": ["[{...}]"], "totRecords": N, "Error": ""}`
### District Courts Portal
The District Courts portal (`services.ecourts.gov.in/ecourtindia_v6/`) uses a similar PHP backend with key differences:
1. **Session cookies** — `SERVICES_SESSID` (established on page load)
2. **Rotating `app_token`** — every AJAX response returns a new token that must be sent with the next request
3. **MVC-style AJAX** — `/?p=controller/action` URL pattern (e.g., `/?p=casestatus/submitCaseNo`)
4. **HTML responses** — search results are pre-rendered HTML tables (not JSON)
5. **4-level court hierarchy** — State → District → Court Complex → Establishment (discovered dynamically)
Both portals are handled transparently — session management, token rotation, CAPTCHA solving with retry, request/response parsing, and rate limiting.
### Historical Archive (AWS Open Data)
Two public S3 buckets in `ap-south-1`, CC-BY-4.0, maintained by Dattam Labs:
- `s3://indian-supreme-court-judgments/` — SCI judgments 1950–present, bi-monthly updates
- `s3://indian-high-court-judgments/` — 25 High Courts, quarterly updates
Both partition metadata as Hive-style parquet (SCI: `year=YYYY/`; HC:
`year=YYYY/court=_/bench=/`) and ship PDFs in
per-year tar bundles. The HC bucket additionally exposes individual PDFs at
`data/pdf/year=…/court=…/bench=…/`, so single-PDF fetches don't
need to download the whole tar.
`ArchiveClient` reads anonymously via DuckDB's `httpfs` extension (no AWS
account needed), translates rows through `row_to_judgment()` into the unified
`Judgment` shape, and serves PDFs with on-disk LRU caching. CNR-only queries
auto-route via the 4-letter prefix — `DLHC*` → Delhi, `ESCR*` → SCI,
`HCBM*` → Bombay, `WBCH*` → Calcutta, etc. (the full mapping is in
`courts._CNR_PREFIX_TO_COURT_CODE`, verified against a 2020 sample of every
HC partition).
The archive is **complementary** to the live clients, not a replacement: it
only contains delivered judgments and lags by 2–3 months, so case status,
cause lists, and in-progress orders still need `HCServicesClient` /
`DistrictCourtClient`.
## License
[MIT](LICENSE)