https://github.com/creatiwity/siren
Siren API to serve INSEE v3 data
https://github.com/creatiwity/siren
api docker-image insee rust siren siren-api sirene
Last synced: 8 days ago
JSON representation
Siren API to serve INSEE v3 data
- Host: GitHub
- URL: https://github.com/creatiwity/siren
- Owner: Creatiwity
- License: mit
- Created: 2019-10-11T09:36:54.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2025-03-04T14:54:34.000Z (about 1 year ago)
- Last Synced: 2025-03-21T16:21:26.878Z (about 1 year ago)
- Topics: api, docker-image, insee, rust, siren, siren-api, sirene
- Language: Rust
- Homepage: https://creatiwity.net/realisations/api-siren/23
- Size: 903 KB
- Stars: 17
- Watchers: 5
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SIREN API
[](https://github.com/Creatiwity/siren/actions?query=workflow%3ABuild)
[](https://hub.docker.com/r/creatiwity/siren/)
[](http://microbadger.com/images/creatiwity/siren)
[](https://hub.docker.com/r/creatiwity/siren/tags/)
REST API for serving INSEE files v3 with full-text search and geographic search capabilities.
## Getting started
To have a working copy of this project, follow the instructions.
### Installation
1. **Setup Rust**: Install [Rust](https://www.rust-lang.org) (version 1.70+ recommended)
2. **Environment variables**: Define your environment variables as defined in `.env.sample`. You can either manually define these environment variables or use a `.env` file.
3. **PostgreSQL database**: Setup PostgreSQL with required extensions (macOS commands):
```bash
brew install postgresql
createuser --pwprompt sirene # set password to sirenepw for instance
createdb --owner=sirene sirene
# Connect to database and enable required extensions
psql -U sirene -d sirene
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS unaccent;
\q
```
4. **Required PostgreSQL extensions**:
- `postgis` (for geographic search)
- `pg_trgm` (for full-text search with trigram similarity)
- `unaccent` (for accent-insensitive search, via immutable wrapper)
5. **Optional**: For development, you may want to install:
```bash
brew install diesel_cli # For database migrations
cargo install cargo-watch # For auto-reloading during development
```
## Documentation
### Configuration
Recommended configuration for production with docker:
```
RUST_LOG=sirene=warn
SIRENE_ENV=production
BASE_URL=[Your base URL, needed to update asynchronously]
API_KEY=[Any randomized string, needed to use the HTTP admin endpoint]
DATABASE_URL=postgresql://[USER]:[PASSWORD]@[PG_HOST]:[PG_PORT]/[PG_DATABASE]
DATABASE_POOL_SIZE=100
INSEE_CREDENTIALS=[API_KEY]
```
**How to generate INSEE_CREDENTIALS**
This variable is only needed if you want to have the daily updates.
1. Go to https://portail-api.insee.fr/catalog/all
2. Create an account or sign in
3. Create an application on this portal
4. Subscribe this application to API SIRENE (Sirene 4 - v3.11)
5. Generate a key in the application details
6. Copy the key paste it in `.env` instead of `[API_KEY]`
### CLI
**> sirene --help**
```
Sirene service used to update data in database and serve it through a HTTP REST API
Usage: sirene
Commands:
update Update data from CSV source files
serve Serve data from database to /unites_legales/ and /etablissements/
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
```
**> sirene serve --help**
```
Serve data from database to /unites_legales/ and /etablissements/
Usage: sirene serve [OPTIONS] --env --port --host
Options:
--env Configure log level [env: SIRENE_ENV=development] [possible values: development, staging, production]
--port Listen this port [env: PORT=3000]
--host Listen this host [env: HOST=localhost]
--api-key API key needed to allow maintenance operation from HTTP [env: API_KEY=]
--base-url Base URL needed to configure asynchronous polling for updates [env: BASE_URL=http://localhost:3000]
-h, --help Print help
```
**> sirene update --help**
```
Update data from CSV source files
Usage: sirene update [OPTIONS] [COMMAND]
Commands:
update-data Download, unzip and load CSV file in database in loader-table
swap-data Swap loader-table to production
sync-insee Synchronise daily data from INSEE since the last modification
finish-error Set a staled update process to error, use only if the process is really stopped
help Print this message or the help of the given subcommand(s)
Arguments:
Configure which part will be updated [possible values: unites-legales, etablissements, all]
Options:
--force Force update even if the source data where not updated
-h, --help Print help
```
### HTTP API
#### Lookup Endpoints
```
GET /v3/unites_legales/
GET /v3/etablissements/
```
#### Search Endpoints (NEW!)
**Search Establishments**
```
GET /v3/etablissements?q=&lat=&lng=&radius=&sort=&direction=&limit=&offset=
```
**Search Legal Units**
```
GET /v3/unites_legales?q=&sort=&direction=&limit=&offset=
```
**Query Parameters**:
- `q`: Full-text search query (searches in denomination and commune name for establishments, denomination only for legal units)
- `lat`, `lng`, `radius`: Geographic search (establishments only) - filters results within radius meters from (lat,lng) point
- `sort`: Sort field - `distance` (geo only), `relevance` (text search), `date_creation`, `date_debut`
- `direction`: Sort direction - `asc` or `desc` (defaults to sensible values per sort field)
- `limit`: Results per page (default: 20, max: 100)
- `offset`: Pagination offset (default: 0, max: 10000)
> **`total` field**: exact count for filter-only queries; capped at 10,000 for text searches (`q`) — if `total == 10000` there may be more results.
- `etat_administratif`: Filter by administrative status (A=active, F=closed)
- `code_postal`: Filter by postal code
- `siren`: Filter by SIREN (establishments only)
- `code_commune`: Filter by commune code
- `activite_principale`: Filter by main activity code
- `etablissement_siege`: Filter by headquarters status (establishments only)
- `categorie_juridique`: Filter by legal category (legal units only)
- `categorie_entreprise`: Filter by company category (legal units only)
- `date_creation`: Filter by creation date (legal units only)
- `date_debut`: Filter by start date (legal units only)
**Maintenance**
_This API is enabled only if you have provided an API_KEY when starting the `serve` process._
```
POST /admin/update
{
api_key: string,
group_type: "UnitesLegales" | "Etablissements" | "All",
force: bool,
asynchronous: bool,
}
```
If `asynchronous` is set to `true`, the update endpoint will immediately return the following:
```
Status: 202 Accepted
Location: /admin/update/status?api_key=string
Retry-After: 10
[Initial status for the started update]
```
```
GET /admin/update/status?api_key=string
```
If an update is in progress, the status code will be 202, otherwise 200.
```
POST /admin/update/status/error
{
api_key: string,
}
```
### Basic usage
Serve:
```
cargo run serve
```
Update:
```
cargo run update all
```
Help:
```
cargo run help
```
## Features
### Core Features
- REST API for INSEE SIREN/SIRET data
- Automatic updates from INSEE API
- PostgreSQL backend with efficient indexing
- Docker support for easy deployment
### New Search Features (v5.0+)
- **Full-text search**: Trigram similarity (`pg_trgm`) with case and accent-insensitive matching for partial and fuzzy matches
- **Geographic search**: Radius filtering and distance-based sorting using PostGIS
- **Field filtering**: Filter by administrative status, activity codes, dates, etc.
- **Flexible sorting**: By relevance, distance, or dates
- **Pagination**: Efficient offset/limit pagination — exact total for filter-only queries, capped at 10,000 for text searches (`q`) to avoid full-table counts
### Technical Features
- **PostgreSQL extensions**: PostGIS for spatial data, pg_trgm + unaccent for full-text search
- **Optimized queries**: Raw SQL with parameterized queries for performance
- **OpenAPI documentation**: Complete API documentation via Scalar
- **Async support**: Optional asynchronous updates for large datasets
## Tests
```bash
cargo test
```
## Deployment
A docker image is built and a sample `docker-compose.yml` with its `docker` folder are usable to test it.
### Docker Setup
```bash
docker-compose up -d
```
### Environment Variables
Required for production:
```
RUST_LOG=sirene=warn
SIRENE_ENV=production
BASE_URL=https://your-domain.com
API_KEY=your-secret-key
DATABASE_URL=postgresql://user:password@db:5432/sirene
DATABASE_POOL_SIZE=100
INSEE_CREDENTIALS=your-insee-api-key
```
## Migration depuis pg_search
Si vous avez une installation existante qui utilise l'extension `pg_search` (ParadeDB), exécutez le script de migration manuelle fourni à la racine du projet :
```bash
psql -U sirene -d sirene -f migrate_from_pg_search.sql
```
Ce script (transactionnel) :
1. Supprime les anciens index BM25 ParadeDB
2. Active `pg_trgm` et `unaccent`
3. Crée la fonction `immutable_unaccent()` (wrapper IMMUTABLE requis pour les index)
4. Reconstruit `search_denomination` avec normalisation `lower(immutable_unaccent(...))`
5. Crée les nouveaux index GIN
6. Recrée les tables de staging pour hériter des nouveaux index et colonnes
7. Désactive l'extension `pg_search`
Les nouvelles installations n'ont pas besoin de ce script : les migrations Diesel utilisent directement `pg_trgm`.
## Development
### Running locally
```bash
# Start the server
cargo run -- serve --env development --port 8080 --host 0.0.0.0
# Run tests
cargo test
# Run with auto-reload
cargo watch -x 'run -- serve --env development --port 8080'
```
### Database Migrations
```bash
# Run migrations
diesel migration run
# Create new migration
diesel migration generate migration_name
```
## API Documentation
The API includes comprehensive OpenAPI documentation accessible at:
- `/scalar` - Interactive Scalar API documentation
- `/openapi.json` - OpenAPI specification
## Examples
### Search Establishments
```bash
# Text search
curl "http://localhost:8080/v3/etablissements?q=boulangerie&limit=5"
# Geographic search (within 1km of Eiffel Tower)
curl "http://localhost:8080/v3/etablissements?lat=48.8584&lng=2.2945&radius=1000&sort=distance"
# Combined search with filters
curl "http://localhost:8080/v3/etablissements?q=restaurant&code_postal=75001&etat_administratif=A&sort=relevance&limit=10"
```
### Search Legal Units
```bash
# Text search with sorting
curl "http://localhost:8080/v3/unites_legales?q=creati&sort=date_creation&direction=desc&limit=5"
# Filter by activity code
curl "http://localhost:8080/v3/unites_legales?activite_principale=62.01Z&categorie_juridique=5710"
```
## Authors
- **Julien Blatecky** - [@Julien1619](https://twitter.com/Julien1619)
## License
[MIT](LICENSE.md)