{"id":43171945,"url":"https://github.com/opencitations/oc_sparql","last_synced_at":"2026-02-01T02:35:40.386Z","repository":{"id":276308892,"uuid":"928824487","full_name":"opencitations/oc_sparql","owner":"opencitations","description":"This repository contains the SPARQL service for OpenCitations","archived":false,"fork":false,"pushed_at":"2025-12-11T14:52:18.000Z","size":6651,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-12-12T18:43:54.549Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opencitations.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-07T09:59:14.000Z","updated_at":"2025-12-11T14:52:22.000Z","dependencies_parsed_at":"2025-02-07T13:35:39.976Z","dependency_job_id":"54a9c2ad-a6ea-4506-ab70-d4155e9b90d1","html_url":"https://github.com/opencitations/oc_sparql","commit_stats":null,"previous_names":["opencitations/oc_sparql"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/opencitations/oc_sparql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_sparql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_sparql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_sparql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_sparql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opencitations","download_url":"https://codeload.github.com/opencitations/oc_sparql/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_sparql/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28965430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T02:14:24.993Z","status":"ssl_error","status_checked_at":"2026-02-01T02:13:55.706Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-01T02:35:39.751Z","updated_at":"2026-02-01T02:35:40.381Z","avatar_url":"https://github.com/opencitations.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenCitations SPARQL Service\n\nThis repository contains the SPARQL service for OpenCitations, allowing users to query the OpenCitations datasets using SPARQL.\n\n## Overview\n\nThe service provides two main SPARQL endpoints:\n\n- **Index endpoint** (`/index`): For querying the OpenCitations Index database\n- **Meta endpoint** (`/meta`): For querying the OpenCitations Meta database\n\n## Features\n\n- SPARQL query interface powered by YASQE/YASR\n- Support for both GET and POST SPARQL queries \n- SPARQL Update queries are not permitted\n- Request logging\n- Docker deployment ready\n\n## Configuration\n\n### Environment Variables\n\nThe service requires the following environment variables. These values take precedence over the ones defined in `conf.json`:\n\n- `BASE_URL`: Base URL for the SPARQL endpoint\n- `LOG_DIR`: Directory path where log files will be stored\n- `SPARQL_ENDPOINT_INDEX`: URL for the index SPARQL endpoint\n- `SPARQL_ENDPOINT_META`: URL for the meta SPARQL endpoint\n- `SYNC_ENABLED`: Enable/disable static files synchronization (default: false)\n\nFor instance:\n\n```env\nBASE_URL=sparql.opencitations.net\nLOG_DIR=/home/dir/log/\nSPARQL_ENDPOINT_INDEX=http://qlever-service.default.svc.cluster.local:7011  \nSPARQL_ENDPOINT_META=http://virtuoso-service.default.svc.cluster.local:8890/sparql\nSYNC_ENABLED=true\n```\n\n\u003e **Note**: When running with Docker, environment variables always override the corresponding values in `conf.json`. If an environment variable is not set, the application will fall back to the values defined in `conf.json`.\n\n### Static Files Synchronization\n\nThe application can synchronize static files from a GitHub repository. This configuration is managed in `conf.json`:\n\n```json\n{\n  \"oc_services_templates\": \"https://github.com/opencitations/oc_services_templates\",\n  \"sync\": {\n    \"folders\": [\n      \"static\",\n      \"html-template/common\"\n    ],\n    \"files\": [\n      \"test.txt\"\n    ]\n  }\n}\n```\n\n- `oc_services_templates`: The GitHub repository URL to sync files from\n- `sync.folders`: List of folders to synchronize\n- `sync.files`: List of individual files to synchronize\n\nWhen static sync is enabled (via `--sync-static` or `SYNC_ENABLED=true`), the application will:\n1. Clone the specified repository\n2. Copy the specified folders and files\n3. Keep the local static files up to date\n\n\u003e **Note**: Make sure the specified folders and files exist in the source repository.\n\n## Running Options\n\n### Local Development\n\nFor local development and testing, the application uses the built-in web.py HTTP server.\n\nThe application supports the following command line arguments:\n\n- `--sync-static`: Synchronize static files at startup and enable periodic sync (every 30 minutes)\n- `--port PORT`: Specify the port to run the application on (default: 8080)\n\nExamples:\n```bash\n# Run with default settings\npython3 sparql_oc.py\n\n# Run with static sync enabled\npython3 sparql_oc.py --sync-static\n\n# Run on custom port\npython3 sparql_oc.py --port 8085\n\n# Run with both options\npython3 sparql_oc.py --sync-static --port 8085\n```\n\nThe Docker container is configured to run with `--sync-static` enabled by default.\n\n### Production Deployment (Docker)\n\nWhen running in Docker/Kubernetes, the application uses **Gunicorn** as the WSGI HTTP server for better performance and concurrency handling:\n\n- **Server**: Gunicorn with gevent workers\n- **Workers**: 2 concurrent worker processes\n- **Worker Type**: gevent (async) for handling thousands of simultaneous requests\n- **Timeout**: 1200 seconds (to handle long-running SPARQL queries)\n- **Connections per worker**: 800 simultaneous connections\n\nThe Docker container automatically uses Gunicorn and is configured with static sync enabled by default.\n\n\u003e **Note**: The application code automatically detects the execution environment. When run with `python3 sparql_oc.py`, it uses the built-in web.py server. When run with Gunicorn (as in Docker), it uses the WSGI interface.\n\nYou can customize the Gunicorn server configuration by modifying the `gunicorn.conf.py` file.\n\n### Dockerfile\n\nYou can change these variables in the Dockerfile:\n\n```dockerfile\n# Base image: Python slim for a lightweight container\nFROM python:3.11-slim\n\n# Define environment variables with default values\n# These can be overridden during container runtime\nENV BASE_URL=\"sparql.opencitations.net\" \\\n    LOG_DIR=\"/mnt/log_dir/oc_sparql\"  \\\n    SPARQL_ENDPOINT_INDEX=\"http://qlever-service.default.svc.cluster.local:7011\" \\\n    SPARQL_ENDPOINT_META=\"http://virtuoso-service.default.svc.cluster.local:8890/sparql\" \\\n    SYNC_ENABLED=\"true\"\n\n\n# Ensure Python output is unbuffered\nENV PYTHONUNBUFFERED=1\n# Install system dependencies required for Python package compilation\nRUN apt-get update \u0026\u0026 \\\n    apt-get install -y \\\n    git \\\n    python3-dev \\\n    build-essential\n\n# Set the working directory for our application\nWORKDIR /website\n\n# Clone the specific branch (sparql) from the repository\n# The dot at the end means clone into current directory\nRUN git clone --single-branch --branch main https://github.com/opencitations/oc_sparql .\n\n# Install Python dependencies from requirements.txt\nRUN pip install -r requirements.txt\n\n# Expose the port that our service will listen on\nEXPOSE 8080\n\n# Start the application with gunicorn for production\nCMD [\"gunicorn\", \"-c\", \"gunicorn.conf.py\", \"sparql_oc:application\"]","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencitations%2Foc_sparql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencitations%2Foc_sparql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencitations%2Foc_sparql/lists"}