An open API service indexing awesome lists of open source software.

https://github.com/graphsense/graphsense-lib

A central repository for Python utility functions and all components that interact with the GraphSense backend. The repository provides a CLI interface for managing essential GraphSense maintenance tasks and provides a REST interface used by the frontend (UI). It acts as the core repository, delivering foundational tool
https://github.com/graphsense/graphsense-lib

analytics api cryptocurrency forensics-tools

Last synced: 3 months ago
JSON representation

A central repository for Python utility functions and all components that interact with the GraphSense backend. The repository provides a CLI interface for managing essential GraphSense maintenance tasks and provides a REST interface used by the frontend (UI). It acts as the core repository, delivering foundational tool

Awesome Lists containing this project

README

          

# GraphSense Library

[![Test and Build Status](https://github.com/graphsense/graphsense-lib/actions/workflows/run_tests.yaml/badge.svg)](https://github.com/graphsense/graphsense-lib/actions) [![PyPI version](https://badge.fury.io/py/graphsense-lib.svg)](https://badge.fury.io/py/graphsense-lib) [![Python](https://img.shields.io/pypi/pyversions/graphsense-lib)](https://pypi.org/project/graphsense-lib/) [![Downloads](https://static.pepy.tech/badge/graphsense-lib)](https://pepy.tech/project/graphsense-lib)

A comprehensive Python library for the GraphSense crypto-analytics platform. It provides database access, data ingestion, maintenance tools, and analysis capabilities for cryptocurrency transactions and networks.

> **Note:** This library uses optional dependencies. Use `graphsense-lib[all]` to install all features.

## Quick Start

### Installation

```bash
# Install with all features
uv add graphsense-lib[all]

# Install from source
git clone https://github.com/graphsense/graphsense-lib.git
cd graphsense-lib
make install
```

### Serving the REST API locally

The web API requires two backend connections: a **Cassandra** cluster (blockchain data) and a **TagStore** (PostgreSQL). You can configure them via environment variables or a YAML config file.

#### Option A: Environment variables only

```bash
GS_CASSANDRA_ASYNC_NODES='[""]' \
GRAPHSENSE_TAGSTORE_READ_URL='postgresql+asyncpg://:@:/tagstore' \
GS_CASSANDRA_ASYNC_CURRENCIES='{"btc":{"raw": "btc_raw", "transformed": "btc_transformed"},"eth":{}}' \
uv run --extra web uvicorn graphsenselib.web.app:create_app --factory --host localhost --port 9000 --reload
```

#### Option B: YAML config file

Point `CONFIG_FILE` to a REST-specific config (see `instance/config.yaml` for a full example):

```bash
CONFIG_FILE=./instance/config.yaml make serve-web
```

Or without Make:

```bash
CONFIG_FILE=./instance/config.yaml \
uv run --extra web uvicorn graphsenselib.web.app:create_app --factory --host localhost --port 9000 --reload
```

#### Option C: `.graphsense.yaml` with a `web` key

If you already have a `.graphsense.yaml` (or `~/.graphsense.yaml`) for the CLI, you can add a `web` key containing the REST config. The app will pick it up automatically without setting `CONFIG_FILE`:

```yaml
# .graphsense.yaml
environments:
# ... your existing CLI config ...

web:
database:
nodes: [""]
currencies:
btc:
eth:
gs-tagstore:
url: "postgresql+asyncpg://:@:/tagstore"
```

```bash
make serve-web
```

**Config resolution order:** explicit `config_file` param > `CONFIG_FILE` env var > `./instance/config.yaml` > `.graphsense.yaml` `web` key > env vars only.

#### Optional REST settings (env vars)

| Variable | Default | Description |
|---|---|---|
| `GSREST_DISABLE_AUTH` | `false` | Disable API key authentication |
| `GSREST_ENSURE_TAGSTORE_SCHEMA_ON_STARTUP` | `false` | Auto-initialize TagStore tables/views at startup when missing |
| `GSREST_ALLOWED_ORIGINS` | `*` | CORS allowed origins |
| `GSREST_LOGGING_LEVEL` | — | Logging level (DEBUG, INFO, …) |
| `GS_CASSANDRA_ASYNC_PORT` | `9042` | Cassandra port |
| `GS_CASSANDRA_ASYNC_USERNAME` | — | Cassandra username |
| `GS_CASSANDRA_ASYNC_PASSWORD` | — | Cassandra password |

When enabling `GSREST_ENSURE_TAGSTORE_SCHEMA_ON_STARTUP=true`, keep in mind:

- The DB user must have DDL privileges (create tables/views/indexes/extensions/procedures).
- Startup may be slower because schema checks and potential initialization run before the app serves traffic.
- In multi-replica deployments, initialize schema once (migration/init job) to avoid startup races.

If TagStore is not configured (`gs-tagstore` missing) or the TagStore URL is unreachable, the REST app now falls back to a mock TagStore so endpoints still work. In this mode, tag-specific responses (labels, actors, taxonomies, tag counts) are empty.

### Basic Usage

#### Database Access with Configuration File

```python
from graphsenselib.db import DbFactory

# Using GraphSense config file (default: ~/.graphsense.yaml)
with DbFactory().from_config("development", "btc") as db:
highest_block = db.transformed.get_highest_block()
print(f"Highest BTC block: {highest_block}")

# Get block details
block = db.transformed.get_block(100000)
print(f"Block 100000: {block.block_hash}")
```

#### Direct Database Connection

```python
from graphsenselib.db import DbFactory

# Direct connection without config file
with DbFactory().from_name(
raw_keyspace_name="eth_raw",
transformed_keyspace_name="eth_transformed",
schema_type="account",
cassandra_nodes=["localhost"],
currency="eth"
) as db:
print(f"Highest block: {db.transformed.get_highest_block()}")
```

#### Async Database Services

The async services are used internally by the REST API and can also be used standalone. `AddressesService` depends on several other services:

```python
from graphsenselib.db.asynchronous.services import (
BlocksService, AddressesService, TagsService,
EntitiesService, RatesService,
)

# Services are initialized with their dependencies
blocks_service = BlocksService(db, rates_service, config, logger)
addresses_service = AddressesService(
db, tags_service, entities_service, blocks_service, rates_service, logger
)

address_info = await addresses_service.get_address("btc", "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa")
txs = await addresses_service.list_address_txs("btc", "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa")
```

## Command Line Interface

GraphSense-lib exposes a comprehensive CLI tool: `graphsense-cli`

### Basic Commands

```bash
# Show help and available commands
graphsense-cli --help

# Check version
graphsense-cli version

# Show current configuration
graphsense-cli config show

# Generate config template
graphsense-cli config template > ~/.graphsense.yaml

# Show config file path
graphsense-cli config path
```

## Modules

### Database Management

Query and manage the GraphSense database state.

```bash
# Show database management options
graphsense-cli db --help

# Check database state/summary
graphsense-cli db state -e development

# Get block information
graphsense-cli db block info -e development -c btc --height 100000

# Query logs (for Ethereum-based chains)
graphsense-cli db logs -e development -c eth --from-block 1000000 --to-block 1000100
```

### Schema Operations

Create and validate database schemas.

```bash
# Show schema options
graphsense-cli schema --help

# Create database schema for a currency
graphsense-cli schema create -e dev -c btc

# Validate existing schema
graphsense-cli schema validate -e dev -c btc

# Show expected schema for currency
graphsense-cli schema show-by-currency btc

# Show schema by type (utxo/account)
graphsense-cli schema show-by-schema-type utxo
```

### Data Ingestion

Ingest raw cryptocurrency data from nodes.

```bash
# Show ingestion options
graphsense-cli ingest --help

# Ingest blocks from cryptocurrency node
graphsense-cli ingest from-node \
-e dev \
-c btc \
--start-block 0 \
--end-block 1000 \
--create-schema

# Ingest with custom batch size
graphsense-cli ingest from-node \
-e dev \
-c eth \
--start-block 1000000 \
--end-block 1001000 \
--batch-size 100
```

### Delta Updates

Update transformed keyspace from raw keyspace.

```bash
# Show delta update options
graphsense-cli delta-update --help

# Check update status
graphsense-cli delta-update status -e dev -c btc

# Perform delta update
graphsense-cli delta-update update -e dev -c btc

# Validate delta update consistency
graphsense-cli delta-update validate -e dev -c btc

# Patch exchange rates for specific blocks
graphsense-cli delta-update patch-exchange-rates \
-e dev \
-c btc \
--start-block 100000 \
--end-block 200000
```

### Exchange Rates

Fetch and ingest exchange rates from various sources.

```bash
# Show exchange rate options
graphsense-cli exchange-rates --help

# Fetch from CoinDesk
graphsense-cli exchange-rates coindesk -e dev -c btc

# Fetch from CoinMarketCap (requires API key in config)
graphsense-cli exchange-rates coinmarketcap -e dev -c btc
```

### Monitoring

Monitor GraphSense infrastructure health and state.

```bash
# Show monitoring options
graphsense-cli monitoring --help

# Get database summary
graphsense-cli monitoring get-summary -e dev

# Get summary for specific currency
graphsense-cli monitoring get-summary -e dev -c btc

# Send notifications to configured handlers
graphsense-cli monitoring notify \
--topic "database-update" \
--message "BTC ingestion completed"
```

### Event Watching (Alpha)

Watch for cryptocurrency events and generate notifications.

```bash
# Show watch options
graphsense-cli watch --help

# Watch for money flows on specific addresses
graphsense-cli watch money-flows \
-e dev \
-c btc \
--address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa \
--threshold 1000000 # satoshis
```

### File Conversion Tools

Convert between different file formats.

```bash
# Show conversion options
graphsense-cli convert --help
```

## Configuration

GraphSense-lib uses a YAML configuration file that defines database connections and environment settings. Default locations: `./.graphsense.yaml`, `~/.graphsense.yaml`.

### Generate Configuration Template

```bash
graphsense-cli config template > ~/.graphsense.yaml
```

### Example Configuration Structure

```yaml
# Optional: default environment to use
default_environment: dev

environments:
dev:
# Cassandra cluster configuration
cassandra_nodes: ["localhost"]
port: 9042
# Optional authentication
# username: "cassandra"
# password: "cassandra"

# Currency/keyspace configurations
keyspaces:
btc:
raw_keyspace_name: "btc_raw"
transformed_keyspace_name: "btc_transformed"
schema_type: "utxo"

# Node connection for ingestion
ingest_config:
node_reference: "http://localhost:8332"
# Optional authentication for node
# username: "rpcuser"
# password: "rpcpassword"

# Keyspace setup for schema creation
keyspace_setup_config:
raw:
replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"
transformed:
replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"

eth:
raw_keyspace_name: "eth_raw"
transformed_keyspace_name: "eth_transformed"
schema_type: "account"

ingest_config:
node_reference: "http://localhost:8545"

keyspace_setup_config:
raw:
replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"
transformed:
replication_config: "{'class': 'SimpleStrategy', 'replication_factor': 1}"

prod:
cassandra_nodes: ["cassandra1.prod", "cassandra2.prod", "cassandra3.prod"]
username: "gs_user"
password: "secure_password"

keyspaces:
btc:
raw_keyspace_name: "btc_raw"
transformed_keyspace_name: "btc_transformed"
schema_type: "utxo"

ingest_config:
node_reference: "http://bitcoin-node.internal:8332"

keyspace_setup_config:
raw:
replication_config: "{'class': 'NetworkTopologyStrategy', 'datacenter1': 3}"
transformed:
replication_config: "{'class': 'NetworkTopologyStrategy', 'datacenter1': 3}"

# Optional: Slack notification configuration
slack_topics:
database-update:
hooks: ["https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"]

payment_flow_notifications:
hooks: ["https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"]

# Optional: API keys for external services
coingecko_api_key: ""
coinmarketcap_api_key: "YOUR_CMC_API_KEY"

# Optional: cache directory for temporary files
cache_directory: "~/.graphsense/cache"
```

## Advanced Features

### Tagpack Management

GraphSense-lib includes comprehensive tagpack management tools (formerly standalone tagpack-tool). For detailed documentation, see [Tagpack README](tagpack/docs/README.md).

```bash
# Validate tagpacks
graphsense-cli tagpack-tool tagpack validate /path/to/tagpack

# Insert tagpack into tagstore
graphsense-cli tagpack-tool insert \
--url "postgresql://user:pass@localhost/tagstore" \
/path/to/tagpack

# Show quality measures
graphsense-cli tagpack-tool quality show-measures \
--url "postgresql://user:pass@localhost/tagstore"
```

### Tagstore Operations

```bash
# Initialize tagstore database
graphsense-cli tagstore init

# Initialize with custom database URL
graphsense-cli tagstore init --db-url "postgresql://user:pass@localhost/tagstore"

# Get DDL SQL for manual setup
graphsense-cli tagstore get-create-sql
```

### Cross-chain Analysis

```python
# Using an initialized AddressesService (see above for setup)
related = await addresses_service.get_cross_chain_pubkey_related_addresses(
"1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa"
)

for addr in related:
print(f"Network: {addr.network}, Address: {addr.address}")
```

### Function Call Parsing

```python
from graphsenselib.utils.function_call_parser import parse_function_call

# Parse Ethereum function calls
function_signatures = {
"0xa9059cbb": [{
"name": "transfer",
"inputs": [
{"name": "to", "type": "address"},
{"name": "value", "type": "uint256"}
]
}]
}

parsed = parse_function_call(tx_input_bytes, function_signatures)
if parsed:
print(f"Function: {parsed['name']}")
print(f"Parameters: {parsed['parameters']}")
```

## Development

**Important:** Requires Python >=3.10, <3.13.

### Setup Development Environment

```bash
# Initialize development environment (installs deps + pre-commit hooks)
make dev

# Or install dev dependencies only
make install-dev
```

### Code Quality and Testing

Before committing, please format, lint, and test your code:

```bash
# Format code
make format

# Lint code
make lint

# Run fast tests
make test

# Or run all steps at once
make pre-commit
```

For comprehensive testing:

```bash
# Run complete test suite (including slow tests)
make test
```

#### Podman Notes

If you run the test suite with Podman, make sure your shell points at the Podman socket:

```bash
export DOCKER_HOST="unix://${XDG_RUNTIME_DIR}/podman/podman.sock"
```

The test fixtures automatically disable Ryuk when `DOCKER_HOST` contains `podman.sock` and rely on explicit fixture cleanup instead.

### Release Process

This repository uses two source-of-truth versions in the root `Makefile`:

- **Library version**: `RELEASESEM` (released with `vX.Y.Z`, `vX.Y.Z-rc.N`, or `vX.Y.Z-dev.N` tags)
- **OpenAPI/API version**: `WEBAPISEM` (written to `src/graphsenselib/web/version.py`)

The Python client package version is derived from the API version and should match it.

Library package versioning is dynamic via `setuptools_scm` (`pyproject.toml`):

- Git tag `v2.9.8` -> package version `2.9.8`
- Git tag `v2.9.8-rc.1` -> package version `2.9.8rc1`
- Git tag `v2.9.8-dev.1` -> package version `2.9.8.dev1`
- Commits after a tag append local metadata, for example `2.9.8.dev1+g.d`

Use the root Makefile helpers:

```bash
# Show all current versions
make show-versions

# Update and validate OpenAPI contract version
make update-api-version WEBAPISEM=v2.10.0
make check-api-version WEBAPISEM=v2.10.0

# Sync client version from API version and validate
make sync-client-version WEBAPISEM=v2.10.0
make check-client-version WEBAPISEM=v2.10.0

# Generate Python client (package version = OpenAPI info.version)
make generate-python-client

# Create both release tags from Makefile versions
make tag-version
```

Tagging behavior:

- Library release tag: `vX.Y.Z`, `vX.Y.Z-rc.N`, or `vX.Y.Z-dev.N` (from `RELEASESEM`)
- Client release tag: `webapi-vA.B.C` (from `WEBAPISEM`)

Recommended library versioning routine:

1. For development prereleases, set `RELEASESEM` to `vX.Y.Z-dev.N` (for example `v2.10.0-dev.1`)
2. For release candidates, set `RELEASESEM` to `vX.Y.Z-rc.N`
3. For stable releases, set `RELEASESEM` to `vX.Y.Z`
4. Create tags with `make tag-version`
5. Push tags with `git push origin --tags`

CI trigger background:

- Stable library tags (`vX.Y.Z`) trigger:
- GitHub Release creation
- Python library package build/publish (`graphsense-lib`)
- Docker image build/publish
- Client tags (`webapi-vA.B.C`) trigger Python client package build/publish (`clients/python`)
- Other library tags (`vX.Y.Z-rc.N`, `vX.Y.Z-dev.N`) do not trigger GitHub Release or Python package publish; they only trigger Docker image build/publish

1. Update CHANGELOG.md with new features and fixes
2. Update relevant versions (library/API/client) based on what changed
3. Sync API/client versions if needed (`make update-api-version` + `make sync-client-version`)
4. Create and push tags:

```bash
make tag-version
git push origin --tags
```

## Troubleshooting

### OpenSSL Errors

Some components use OpenSSL hash functions that aren't available by default in OpenSSL 3.0+ (e.g., ripemd160). This can cause test suite failures. To fix this, enable legacy providers in your OpenSSL configuration. See the "fix openssl legacy mode" step in `.github/workflows/run_tests.yaml` for an example.

### Common Issues

1. **Connection Refused**: Verify Cassandra is running and accessible
2. **Schema Validation Errors**: Ensure database schema matches expected version
3. **Import Errors**: Install with `[all]` option for complete feature set
4. **Python Version**: Requires Python >=3.10, <3.13

### Getting Help

- Check [GitHub Issues](https://github.com/graphsense/graphsense-lib/issues)
- Review [GraphSense Documentation](https://graphsense.github.io/)
- Use `--help` with any CLI command for detailed usage information
- For tagpack-specific issues, see [Tagpack Documentation](tagpack/docs/README.md)

## License

See LICENSE file for licensing details.

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Run `make pre-commit` to ensure code quality
5. Submit a pull request

---

**GraphSense** - Open Source Crypto Analytics Platform
Website: https://graphsense.github.io/