Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/scalar-dev/trawler

Trawler is an open source data catalog with a powerful knowledge-graph data model
https://github.com/scalar-dev/trawler

Last synced: 3 months ago
JSON representation

Trawler is an open source data catalog with a powerful knowledge-graph data model

Lists

README

        

# Trawler
Trawler is an open source data catalogue for mapping and monitoring your data and systems.

**NOTE**: Trawler is currently being rewritten from scratch with some new
*architecture ideas and is thus not suitable for production deployment.
The new version will support ingesting data via
[datahub](https://datahubproject.io/)'s tooling. This will allow us to focus on
the data model and user experience and avoid having to maintain a large number
of connectors (at least until the project has matured).

## Getting started
The easiest way to get started with trawler locally is to run our `docker-compose` file:

```
curl https://raw.githubusercontent.com/scalar-dev/trawler/master/docker-compose.example.yml -o docker-compose.yml
docker-compose up
```

You can use the `acryl-datahub` CLI to ingest metadata into `trawler`.

```bash
# Point the CLI at the local trawler metadata service
DATAHUB_GMS_URL = "http://localhost:8081/api/datahub/main"

# Run a recipe
datahub ingest -c recipe.yml
```

Trawler serves an embedded UI at `http://localhost:8081/ui`.

To read more, see [the datahub docs](https://datahubproject.io/docs/metadata-ingestion) or check out one of the
examples in `datahub/` in this repository.

## Goals
Trawler is intended to be different to other data catalog products:

- **Easy to deploy**. A basic but fully-functional deployment requires only a
single backend service and a PostgreSQL database. For additional features and
scalability, additional services may be needed but will always be optional.
Getting started with a powerful data catalog should be possible for every team,
small or large.

- **Federated**. Trawler will be the first data catalog to support federation via
the ActivityPub protocol. This will allow individual teams to run their own
trawler instances (should they wish) and to link their knowledge graphs together
or track changes in upstream data sources. Granting access to outside users or organisations
will be easy and secure.

- **Social**. Capturing institutional knowledge is critical to maintaining a useful data
catalogue. We will allow users to track documentation and communication related
to data assets alongside Trawler's core machine-generated metadata.

- **Flexible**. Existing products mostly have a fairly fixed set of entities and
properties which can be recorded within the data catalog. Where they support
extension, it can be painful. We intend to support a fully extensible metadata
model with a well-typed schema.

- **Compatible and extensible**. Trawler should be easy to integrate with legacy or
bespoke systems. We support popular existing tools for capturing and collating
metadata by implementing the [datahub](https://datahubproject.io) REST API.

- **Standards compliant**. We will support existing semantic web formats for
exchanging information about data assets:
[dcat](https://www.w3.org/TR/vocab-dcat-3/) and
[prov](https://www.w3.org/TR/prov-o/) via JSON-LD.

## Building trawler
If you're a fan of [nix](https://nixos.org/), you can run `nix develop` to get a
shell configured with the dependencies needed to build `trawler`.

To build and run the backend, go the `metadata/` and run

```bash
go generate .
go run cmd/server
```

To run the frontend, go the `ui/` directory and run

```bash
npm install --include=dev
npm run dev
```

## Scalar
Trawler is proudly developed and sponsored by [Scalar](https://www.scalar.dev) a
consultancy specialising in novel data engineering solutions.