https://github.com/scriptogre/romanian-law-data
Romanian legal corpus (legislatie.just.ro → parquet). Daily releases of acts, articles, paragraphs. Powers intreaba-legea.
https://github.com/scriptogre/romanian-law-data
Last synced: 12 days ago
JSON representation
Romanian legal corpus (legislatie.just.ro → parquet). Daily releases of acts, articles, paragraphs. Powers intreaba-legea.
- Host: GitHub
- URL: https://github.com/scriptogre/romanian-law-data
- Owner: scriptogre
- Created: 2026-05-28T06:02:10.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2026-05-28T08:11:07.000Z (24 days ago)
- Last Synced: 2026-05-28T08:13:06.820Z (24 days ago)
- Language: Python
- Size: 174 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Romanian Law Data
Zstd-compressed Parquet exports of the Romanian legal corpus (acts, articles, paragraphs) for use with [DuckDB](https://duckdb.org/). Sourced from [legislatie.just.ro](https://legislatie.just.ro/) (Ministry of Justice) via its public SOAP API.
Automated daily via GitHub Actions. Download from [Releases](https://github.com/scriptogre/romanian-law-data/releases).
## Tables
| Table | Content | Rows |
|---|---|---|
| **acte** | One row per act (LEGE, OUG, HG, ORDIN, DECIZIE, …) with metadata + full text | ~187k |
| **articole** | One row per article (parsed from `acte.content`) | ~993k |
| **alineate** | One row per paragraph — the finest citation unit (e.g. `art. 188 alin. (1)`) | ~1.96M |
Tables use Romanian legal vocabulary (`acte`, `articole`, `alineate`); columns use English SQL convention (`type`, `published_at`, `gazette_number`, …) with Romanian `COMMENT ON` metadata in [`create_views.sql`](create_views.sql).
## Subject lenses
`create_views.sql` also exposes pre-filtered views over `acte` for each canonical code and for jurisprudence:
| View | Filters |
|---|---|
| `constitutie` | Constituția României (1991, republicată 2003) |
| `cod_civil` | Legea 287/2009 |
| `cod_penal` | Legea 286/2009 |
| `cod_muncii` | Legea 53/2003 (republicată) |
| `cod_procedura_civila` | Legea 134/2010 (republicată) |
| `cod_procedura_penala` | Legea 135/2010 |
| `cod_fiscal` | Legea 227/2015 |
| `jurisprudenta` | CCR + ÎCCJ decisions |
## Usage
```bash
# Download the latest bundle
gh release download -R scriptogre/romanian-law-data
tar xzf laws.tar.gz -C data/
```
```python
import duckdb
conn = duckdb.connect()
conn.execute(open("data/create_views.sql").read())
conn.execute("""
SELECT act_citation, link, article_citation, content
FROM articole
WHERE act_id IN (SELECT id FROM cod_penal)
AND article_number = 188
""").fetchall()
```
## Pipeline
```
collect.py SOAP API → data/raw_acts.jsonl
normalize.py fix encoding, dedup, extract dates + gazette → stdout
parse.py extract articles + alineate → stdout
export.py write parquet bundle + sha256 ← stdin
```
`collect` checkpoints `raw_acts.jsonl` (SOAP is slow + rate-limited, worth caching). `normalize → parse → export` is one pipe — no intermediate JSONL on disk.
```bash
uv sync
uv run python -m scripts.collect
uv run python -m scripts.normalize \
| uv run python -m scripts.parse \
| uv run python -m scripts.export
```
## License
The corpus is published by the Romanian Ministry of Justice and is public information. This repository only provides format conversion + pipeline tooling.