{"id":15722147,"url":"https://github.com/hueyy/lacuna-db","last_synced_at":"2026-03-05T21:32:18.857Z","repository":{"id":65598975,"uuid":"586434377","full_name":"hueyy/lacuna-db","owner":"hueyy","description":"legal data in machine-readable form","archived":false,"fork":false,"pushed_at":"2026-03-03T16:42:50.000Z","size":702048,"stargazers_count":12,"open_issues_count":0,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-03T18:32:02.379Z","etag":null,"topics":["dataset","datasette","git-scraping","law","lawtech","legal","legaltech","open-data","singapore"],"latest_commit_sha":null,"homepage":"https://lacunadb.huey.xyz/","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hueyy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-01-08T05:36:41.000Z","updated_at":"2026-03-03T16:42:54.000Z","dependencies_parsed_at":"2023-10-16T17:07:44.487Z","dependency_job_id":"1af87008-7fce-4808-b249-ec85b89c2c17","html_url":"https://github.com/hueyy/lacuna-db","commit_stats":{"total_commits":1946,"total_committers":2,"mean_commits":973.0,"dds":"0.054470709146968166","last_synced_commit":"b032dd9a93d8317d7760ab42af1e1da2573e0da9"},"previous_names":["hueyy/law-archive-data","hueyy/sg-courts-hearings-list","hueyy/lacuna-db"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/hueyy/lacuna-db","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hueyy%2Flacuna-db","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hueyy%2Flacuna-db/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hueyy%2Flacuna-db/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hueyy%2Flacuna-db/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hueyy","download_url":"https://codeload.github.com/hueyy/lacuna-db/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hueyy%2Flacuna-db/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30150428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T21:15:50.531Z","status":"ssl_error","status_checked_at":"2026-03-05T21:15:11.173Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","datasette","git-scraping","law","lawtech","legal","legaltech","open-data","singapore"],"created_at":"2024-10-03T22:04:25.440Z","updated_at":"2026-03-05T21:32:18.818Z","avatar_url":"https://github.com/hueyy.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LacunaDB\n\nThis repository contains Singapore legal data obtained various public sources and converted into a machine-readable format, including the following:\n\n- Court hearings: [`/data/hearings.json`](./data/hearings.json)\n- Senior Counsels: [`/data/sc.json`](./data/sc.json)\n- PDPC undertakings: [`/data/pdpc-undertakings.json`](./data/pdpc-undertakings.json)\n- PDPC decisions: [`/data/pdpc-decisions.json`](./data/pdpc-decisions.json)\n- LSS DT reports: [`/data/lss-dt-reports.json`](./data/lss-dt-reports.json)\n- State Court judgments [`/data/stc-judgments.json`](./data/stc-judgments.json)\n- Family Court and Juvenile Court judgments [`/data/fc-judgments.json`](./data/fc-judgments.json)\n- Telecommunications FBO licences [`/data/telco-fbo.json`](./data/telco-fbo.json)\n\nYou can view and query the data using [this Datasette instance](https://lacunadb.huey.xyz/data).\n\nThe code and configuration files in this repository are licensed under the EUPL-1.2 as set out in the [LICENCE](./LICENCE) file.\n\nThis repository is not affiliated with the Singapore Academy of Law, Singapore Courts, Law Society, or any other organisation, and is provided for educational purposes only.\n\n## Architecture\n\nThis is a big picture overview of the general architecture of this project:\n\n```mermaid\nflowchart LR\n  subgraph pipeline[\"Data Pipeline\"]\n    subgraph /input/ scripts\n    Website--\u003edata[\"/data/ (JSON files)\"]\n    end\n    subgraph build_script[\"build_db.bb script\"]\n    data--\u003esqlite[\"SQLite DB (/data/data.db)\"]\n    end\n  end\n  subgraph backend[\"Backend\"]\n    Datasette\n  end\n  build_script--\u003ebackend\n  subgraph frontend[\"Frontend\"]\n    html[\"HTML templates\"]\n    cljs[\"CLJS scripts\"]\n  end\n  frontend-- served by --\u003ebackend\n```\n\n### Data pipeline\n\nIn the data pipeline, everything is just a script (aka a microservice™). Although most of the scripts are [Babashka](https://github.com/babashka/babashka) scripts written in [Clojure](https://clojure.org/), new scripts can be in any language. \n\nThe data is obtained periodically via scheduled GitHub action workflows and committed to this repository. Each Github Action runs one of the input scripts in the [`/input` folder](./input/). Each input script stores the data obtained in a JSON file in the [`/data` folder](./data/). Each JSON file is just a snapshot in time, i.e. it contains only the data obtained in the last run of the respective script as opposed to all data ever obtained using that script. \n\nThe [`/.github/workflows/deploy.yml`](./.github/workflows/deploy.yml) runs the [`/scripts/build_db.bb`](./scripts/build_db.bb) script which uses the [`git-history`](https://github.com/simonw/git-history) tool to create a SQLite database from the historical data across all the commits in this repository. The script then builds a [Datasette](https://datasette.io/) Docker image and deploys that via [Fly.io](https://fly.io/).\n\nSome of the scripts in the `/scripts` folder run Python tools. This project uses [Poetry](https://github.com/babashka/babashka) to manage its [Python](https://www.python.org/) dependencies, so do install Poetry and the dependencies before running those scripts. \n\n### Frontend\n\nSee [`/app/README.md`](./app/README.md) for frontend development.\n\n## Development\n\n### Setup via devenv (recommended)\n\nThis project uses [devenv](https://devenv.sh/) to quickly and conveniently set up a reproducible development environment. `devenv` is particularly useful here because this project contains code written in various languages and has a variety of dependencies to be installed.\n\nAfter [installing `devenv`](https://devenv.sh/getting-started/), enter into a shell. This should automatically set up the environment and install all the dependencies:\n\n```shell\ndevenv shell\n```\n\nThe JSON data files across the various commits to the `git` repository should then be aggregated into a SQLite database for ease of analysis. To create and populate the SQLite database:\n\n```shell\ndevenv shell build-db\n```\n\nThis may take some time (possibly \u003e1h) as there have been many commits to this repository. The `build_db.bb` script also does some processing on the data, e.g. it creates and populates certain columns for ease of use based on the raw data (see e.g. [`/scripts/computed_columns.bb`](./scripts/computed_columns.bb)). Alternatively, you can download a copy of the database from [lacunadb.huey.xyz](https://lacunadb.huey.xyz/).\n\nYou can run the same command above to update the SQLite database as necessary (e.g. after pulling subsequent commits). \n\nOnce you have the SQLite data, you can analyse it by running [Datasette](https://datasette.io/) locally:\n\n```shell\ndevenv shell dev-datasette\n```\n\n\n### Manual setup\n\nMake sure you have [Babashka]((https://github.com/babashka/babashka)), [Python](https://www.python.org/), and [Poetry](https://github.com/babashka/babashka) installed.\n\nInstall the Poetry dependencies by running `poetry install --no-root`. \n\nThis project uses various CLI utilities, which you will need to install to run the input scripts:\n\n#### pdftotext\n\n[`pdftotext`](https://manpages.ubuntu.com/manpages/lunar/en/man1/pdftotext.1.html) is used to extract text from PDFs. It is bundled within [`poppler`](https://en.wikipedia.org/wiki/Poppler_(software)).\n\nOn Ubuntu/Debian:\n\n```shell\nsudo apt install poppler-utils\n```\n\nOn macOS, you can install it using [Homebrew](https://formulae.brew.sh/formula/poppler):\n\n```shell\nbrew install poppler\n```\n\n#### ocrmypdf\n\n[`ocrmypdf`](https://ocrmypdf.readthedocs.io/en/latest/) is used to run OCR on PDFs. It is a Poetry dependency already, but it does require [`tesseract`](https://github.com/tesseract-ocr/tesseract) and [`ghostscript`](https://www.ghostscript.com/) to be installed.\n\nOn Ubuntu/Debian:\n\n```shell\nsudo apt install tesseract-ocr ghostscript\n```\n\nOn macOS:\n\n```shell\nbrew install tesseract ghostscript\n```\n\n#### Building the sqlite database\n\nAfter cloning this repository and following [the setup steps above](#setup), you can generate the SQLite database on your machine by running the [`/scripts/build_db.bb` script](./scripts/build_db.bb):\n\n```shell\nbb --main scripts.build-db\n```\n\nIf you do not have SQLite installed, you will need to install it.\n\nOn Ubuntu/Debian:\n\n```shell\nsudo apt install sqlite3\n```\n\nOn macOS:\n\n```shell\nbrew install sqlite3\n```\n\n#### Running Datasette\n\nYou can use the [`/scripts/dev_docker.bb` script](./scripts/dev_docker.bb). \n\n```shell\nbb ./scripts/dev_docker.bb\n```\n\nIt may be helpful to refer to the Docker images or the `devenv.nix` configuration file for a better idea of how the project functions and how to run certain scripts.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhueyy%2Flacuna-db","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhueyy%2Flacuna-db","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhueyy%2Flacuna-db/lists"}