https://github.com/openaleph/ftm-lakehouse
Data standard and archive storage for structured FollowTheMoney data, leaked data, private and public document collections.
https://github.com/openaleph/ftm-lakehouse
aleph archive datalake deltalake followthemoney lakehouse openaleph opensanctions
Last synced: 4 months ago
JSON representation
Data standard and archive storage for structured FollowTheMoney data, leaked data, private and public document collections.
- Host: GitHub
- URL: https://github.com/openaleph/ftm-lakehouse
- Owner: openaleph
- License: agpl-3.0
- Created: 2024-10-04T06:32:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-12-18T17:49:17.000Z (6 months ago)
- Last Synced: 2025-12-18T22:33:59.122Z (6 months ago)
- Topics: aleph, archive, datalake, deltalake, followthemoney, lakehouse, openaleph, opensanctions
- Language: Python
- Homepage: https://openaleph.org/docs/lib/ftm-lakehouse
- Size: 27.5 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE
Awesome Lists containing this project
README
[](https://openaleph.org/docs/lib/ftm-lakehouse)
[](https://pypi.org/project/ftm-lakehouse/)
[](https://pepy.tech/projects/ftm-lakehouse)
[](https://pypi.org/project/ftm-lakehouse/)
[](https://github.com/openaleph/ftm-lakehouse/actions/workflows/python.yml)
[](https://github.com/pre-commit/pre-commit)
[](https://coveralls.io/github/openaleph/ftm-lakehouse?branch=main)
[](./LICENSE)
[](https://pydantic.dev)
# ftm-lakehouse
`ftm-lakehouse` provides a _data standard_ and _archive storage_ for leaked data, private and public document collections. The concepts and implementations are originally inspired by [mmmeta](https://github.com/simonwoerpel/mmmeta) and [Aleph's servicelayer archive](https://github.com/alephdata/servicelayer).
`ftm-lakehouse` acts as a multi-tenant storage and retrieval mechanism for structured entity data, documents and their metadata. It provides a high-level interface for generating and sharing document collections and importing them into various search and analysis platforms, such as [_OpenALeph_](https://openaleph.org), [_ICIJ Datashare_](https://datashare.icij.org/) or [_Liquid Investigations_](https://github.com/liquidinvestigations/)
## Installation
Requires python 3.11 or later.
```bash
pip install ftm-lakehouse
```
## Documentation
[openaleph.org/docs/lib/ftm-lakehouse](https://openaleph.org/docs/lib/ftm-lakehouse)
## Development
This package is using [poetry](https://python-poetry.org/) for packaging and dependencies management, so first [install it](https://python-poetry.org/docs/#installation).
Clone [this repository](https://github.com/openaleph/ftm-lakehouse) to a local destination.
Within the repo directory, run
poetry install --with dev
This installs a few development dependencies, including [pre-commit](https://pre-commit.com/) which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: `.pre-commit-config.yaml`)
### Testing
`ftm-lakehouse` uses [pytest](https://docs.pytest.org/en/stable/) as the testing framework.
make test
## License and Copyright
`ftm-lakehouse`, (c) 2024 [investigativedata.io](https://investigativedata.io)
`ftm-lakehouse`, (c) 2025 [Data and Research Center – DARC](https://dataresearchcenter.org)
`ftm-lakehouse` is licensed under the AGPLv3 or later license.