{"id":43172065,"url":"https://github.com/opencitations/oc_meta","last_synced_at":"2026-02-01T02:35:45.973Z","repository":{"id":37831194,"uuid":"179958337","full_name":"opencitations/oc_meta","owner":"opencitations","description":"OpenCitations Meta Software is the software that manages OpenCitations Meta. OpenCitations Meta is the bibliographic database containing bibliographic metadata related to the documents involved in the citations stored in the OpenCitations indexes","archived":false,"fork":false,"pushed_at":"2026-01-20T09:39:09.000Z","size":177022,"stargazers_count":9,"open_issues_count":26,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2026-01-20T19:17:16.113Z","etag":null,"topics":["bibliographic-data","change-tracking","open-science","provenance","semantic-web"],"latest_commit_sha":null,"homepage":"https://api.opencitations.net/meta/v1","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/opencitations.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-04-07T11:46:32.000Z","updated_at":"2026-01-20T09:39:13.000Z","dependencies_parsed_at":"2025-12-06T10:09:46.224Z","dependency_job_id":null,"html_url":"https://github.com/opencitations/oc_meta","commit_stats":null,"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/opencitations/oc_meta","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_meta","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_meta/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_meta/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_meta/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/opencitations","download_url":"https://codeload.github.com/opencitations/oc_meta/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/opencitations%2Foc_meta/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28965430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T02:14:24.993Z","status":"ssl_error","status_checked_at":"2026-02-01T02:13:55.706Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bibliographic-data","change-tracking","open-science","provenance","semantic-web"],"created_at":"2026-02-01T02:35:45.895Z","updated_at":"2026-02-01T02:35:45.961Z","avatar_url":"https://github.com/opencitations.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[\u003cimg src=\"https://img.shields.io/badge/powered%20by-OpenCitations-%239931FC?labelColor=2D22DE\" /\u003e](http://opencitations.net)\n[![Run tests](https://github.com/opencitations/oc_meta/actions/workflows/run_tests.yml/badge.svg)](https://github.com/opencitations/oc_meta/actions/workflows/run_tests.yml)\n[![Coverage](https://byob.yarr.is/arcangelo7/badges/opencitations-oc_meta_coverage)](https://opencitations.github.io/oc_meta/)\n![PyPI](https://img.shields.io/pypi/pyversions/oc_meta?logo=python\u0026logoColor=white\u0026label=python\u0026color=blue)\n![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/opencitations/oc_meta)\n\n# OpenCitations Meta Software\n\nOpenCitations Meta contains bibliographic metadata associated with the documents involved in the citations stored in the [OpenCitations](https://opencitations.net/) infrastructure. The OpenCitations Meta Software performs several key functions:\n\n1. Data curation of provided CSV files\n2. Generation of RDF files compliant with the [OpenCitations Data Model](http://opencitations.net/model)\n3. Provenance tracking and management\n4. Data validation and fixing utilities\n\nAn example of a raw CSV input file can be found in [`example.csv`](https://github.com/opencitations/meta/blob/master/oc_meta/example.csv).\n\n## Table of contents\n\n- [OpenCitations Meta Software](#opencitations-meta-software)\n- [Meta production workflow](#meta-production-workflow)\n  - [Preprocessing input data (optional)](#preprocessing-input-data-optional)\n  - [Main processing](#main-processing)\n  - [Verifying processing results](#verifying-processing-results)\n  - [Manual upload to triplestore](#manual-upload-to-triplestore)\n- [Virtuoso bulk loading (performance optimization)](#virtuoso-bulk-loading-performance-optimization)\n- [Analysing the dataset](#analysing-the-dataset)\n  - [General statistics (SPARQL)](#general-statistics-sparql)\n  - [Venue statistics (CSV)](#venue-statistics-csv)\n- [Finding and merging duplicates](#finding-and-merging-duplicates)\n  - [Finding duplicate identifiers from files](#finding-duplicate-identifiers-from-files)\n  - [Merging duplicate entities](#merging-duplicate-entities)\n- [Running tests](#running-tests)\n- [Creating releases](#creating-releases)\n\n## Meta production workflow\n\nThe Meta production process involves several steps to process bibliographic metadata. An optional but recommended preprocessing step is available to optimize the input data before the main processing.\n\n### Preprocessing input data (optional)\n\nThe [`preprocess_input.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/meta/preprocess_input.py) script helps filter and optimize CSV files before they are processed by the main Meta workflow. This preprocessing step is particularly useful for large datasets as it:\n\n1. Removes duplicate entries across all input files\n2. Optionally filters out entries that already exist in the database (using either Redis or SPARQL)\n3. Splits large input files into smaller, more manageable chunks\n\nTo run the preprocessing script:\n\n```console\n# Basic usage: only deduplicate and split files (no storage checking)\npoetry run python -m oc_meta.run.meta.preprocess_input \u003cINPUT_DIR\u003e \u003cOUTPUT_DIR\u003e\n\n# With Redis storage checking\npoetry run python -m oc_meta.run.meta.preprocess_input \u003cINPUT_DIR\u003e \u003cOUTPUT_DIR\u003e --storage-type redis\n\n# With SPARQL storage checking\npoetry run python -m oc_meta.run.meta.preprocess_input \u003cINPUT_DIR\u003e \u003cOUTPUT_DIR\u003e --storage-type sparql --sparql-endpoint \u003cSPARQL_ENDPOINT_URL\u003e\n\n# Custom file size and Redis settings\npoetry run python -m oc_meta.run.meta.preprocess_input \u003cINPUT_DIR\u003e \u003cOUTPUT_DIR\u003e \\\n  --rows-per-file 5000 \\\n  --storage-type redis \\\n  --redis-host 192.168.1.100 \\\n  --redis-port 6380 \\\n  --redis-db 5\n```\n\nParameters:\n- `\u003cINPUT_DIR\u003e`: Directory containing the input CSV files to process\n- `\u003cOUTPUT_DIR\u003e`: Directory where the filtered and optimized CSV files will be saved\n- `--rows-per-file`: Number of rows per output file (default: 3000)\n- `--storage-type`: Type of storage to check IDs against (`redis` or `sparql`). If not specified, ID checking is skipped\n- `--redis-host`: Redis host (default: localhost)\n- `--redis-port`: Redis port (default: 6379)\n- `--redis-db`: Redis database number to use if storage type is Redis (default: 10)\n- `--sparql-endpoint`: SPARQL endpoint URL (required if storage type is `sparql`)\n\nThe script will generate a detailed report showing:\n- Total number of input rows processed\n- Number of duplicate rows removed\n- Number of rows with IDs that already exist in the database (if storage checking is enabled)\n- Number of rows that passed the filtering and were written to output files\n\n### Main processing\n\nThe main Meta processing is executed through the [`meta_process.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/meta_process.py) file, which orchestrates the entire data processing workflow:\n\n```console\npoetry run python -m oc_meta.run.meta_process -c \u003cCONFIG_PATH\u003e\n```\n\nParameters:\n- `-c --config`: Path to the configuration YAML file.\n\n#### What Meta process does\n\nThe Meta process performs the following key operations:\n\n1. **Preparation**:\n   - sets up the required directory structure\n   - initializes connections to Redis and the triplestore\n   - loads configuration settings\n\n2. **Data curation**:\n   - processes input CSV files containing bibliographic metadata\n   - validates and normalizes the data\n   - handles duplicate entries and invalid data\n\n3. **RDF creation**:\n   - converts the curated data into RDF format following the OpenCitations Data Model\n   - generates entity identifiers and establishes relationships\n   - creates provenance information for tracking data lineage\n\n4. **Storage and triplestore upload**:\n   - directly generates SPARQL queries for triplestore updates\n   - loads RDF data directly into the configured triplestore via SPARQL endpoint\n   - executes necessary SPARQL updates\n   - ensures data is properly indexed for querying\n\n#### Meta configuration\n\nThe Meta process requires a YAML configuration file that specifies various settings for the processing workflow. Here's an example of the configuration structure with explanations:\n\n```yaml\n# Endpoint URLs for data and provenance storage\ntriplestore_url: \"http://127.0.0.1:8805/sparql\"\nprovenance_triplestore_url: \"http://127.0.0.1:8806/sparql\"\n\n# Base IRI for RDF entities\nbase_iri: \"https://w3id.org/oc/meta/\"\n\n# JSON-LD context file\ncontext_path: \"https://w3id.org/oc/corpus/context.json\"\n\n# Responsible agent for provenance\nresp_agent: \"https://w3id.org/oc/meta/prov/pa/1\"\n\n# Source information for provenance\nsource: \"https://api.crossref.org/\"\n\n# Redis configuration for counter handling\nredis_host: \"localhost\"\nredis_port: 6379\nredis_db: 0\nredis_cache_db: 1\n\n# Processing settings\nsupplier_prefix: \"060\"\ndir_split_number: 10000\nitems_per_file: 1000\ndefault_dir: \"_\"\n\n# Output control\ngenerate_rdf_files: false\nzip_output_rdf: true\noutput_rdf_dir: \"/path/to/output\"\n\n# Data processing options\nsilencer: [\"author\", \"editor\", \"publisher\"]\nnormalize_titles: true\nuse_doi_api_service: false\n```\n\n### Verifying processing results\n\nAfter processing your data with the Meta workflow, you can verify that all identifiers were correctly processed and have associated data in the triplestore using the [`check_results.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/meta/check_results.py) script. This verification step helps identify potential issues such as missing OMIDs, missing provenance, or identifiers with multiple OMIDs.\n\n#### Running the verification script\n\n```console\npoetry run python -m oc_meta.run.meta.check_results \u003cCONFIG_PATH\u003e [--output \u003cOUTPUT_FILE\u003e]\n```\n\nParameters:\n- `\u003cCONFIG_PATH\u003e`: Path to the same meta_config.yaml file used for processing\n- `--output`: Optional path to save the report to a file. If not specified, results are printed to console\n\n#### What the script checks\n\nThe verification script performs the following checks:\n\n1. **Identifier analysis**:\n   - parses all identifiers from input CSV files (id, author, editor, publisher, venue columns)\n   - queries the triplestore to find associated OMIDs for each identifier\n\n2. **OMID verification**:\n   - checks if identifiers have corresponding OMIDs in the triplestore\n   - identifies identifiers without any OMID (potential processing failures)\n   - detects identifiers with multiple OMIDs (potential disambiguation issues)\n\n3. **Data graph verification** (when RDF file generation is enabled):\n   - verifies that data graphs exist in the generated RDF files\n   - reports missing data graphs for entities that should have been created\n\n4. **Provenance verification**:\n   - checks if provenance graphs exist in the generated RDF files\n   - queries the provenance triplestore to verify provenance data\n   - identifies OMIDs without associated provenance information\n\n### Manual upload to triplestore\n\nOccasionally, the automatic upload process during Meta execution might fail due to connection issues, timeout errors, or other problems. In such cases, you can use the [`on_triplestore.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/upload/on_triplestore.py) script to manually upload the generated SPARQL files to the triplestore.\n\n#### Running the manual upload script\n\n```console\npoetry run python -m oc_meta.run.upload.on_triplestore \u003cENDPOINT_URL\u003e \u003cSPARQL_FOLDER\u003e [OPTIONS]\n```\n\nParameters:\n- `\u003cENDPOINT_URL\u003e`: The SPARQL endpoint URL of the triplestore\n- `\u003cSPARQL_FOLDER\u003e`: Path to the folder containing SPARQL update query files (.sparql)\n\nOptions:\n- `--batch_size`: Number of quadruples to include in each batch (default: 10)\n- `--cache_file`: Path to the cache file tracking processed files (default: \"ts_upload_cache.json\")\n- `--failed_file`: Path to the file recording failed queries (default: \"failed_queries.txt\")\n- `--stop_file`: Path to the stop file used to gracefully interrupt the process (default: \".stop_upload\")\n\n## Virtuoso bulk loading (performance optimization)\n\nFor large-scale data ingestion into Virtuoso triplestores, the Meta process supports an optional bulk loading mode that significantly improves performance compared to standard SPARQL INSERT queries. This mode leverages Virtuoso's native `ld_dir`/`rdf_loader_run` mechanism for fast data loading.\n\n### Prerequisites\n\nBefore enabling bulk loading, ensure:\n\n1. **Docker setup**: Both data and provenance Virtuoso instances must run in Docker containers\n2. **Volume mapping**: Host directories for data and provenance must be mounted as volumes into their respective containers\n3. **DirsAllowed configuration**: The bulk load directory must be listed in `DirsAllowed` parameter in `virtuoso.ini`\n\nExample Docker volume mapping:\n```bash\n# Data container\ndocker run -d \\\n  --name virtuoso-data \\\n  -v /srv/meta/data_bulk:/database/bulk_load \\\n  -p 8890:8890 \\\n  -p 1111:1111 \\\n  openlink/virtuoso-opensource-7:latest\n\n# Provenance container\ndocker run -d \\\n  --name virtuoso-prov \\\n  -v /srv/meta/prov_bulk:/database/bulk_load \\\n  -p 8891:8890 \\\n  -p 1112:1111 \\\n  openlink/virtuoso-opensource-7:latest\n```\n\nExample `virtuoso.ini` configuration:\n```ini\n[Parameters]\nDirsAllowed = ., /database, /database/bulk_load\n```\n\n### Configuration\n\nEdit your `meta_config.yaml` to enable bulk loading:\n\n```yaml\nvirtuoso_bulk_load:\n  # Set to true to enable bulk loading mode\n  enabled: true\n\n  # Docker container name for the data triplestore\n  data_container: \"virtuoso-data\"\n\n  # Docker container name for the provenance triplestore\n  prov_container: \"virtuoso-prov\"\n\n  # Host directory mounted as volume in the data container\n  # Files will be generated directly here (visible to both host and container)\n  # This directory must be mounted in the data container as bulk_load_dir\n  data_mount_dir: \"/srv/meta/data_bulk\"\n\n  # Host directory mounted as volume in the provenance container\n  # Files will be generated directly here (visible to both host and container)\n  # This directory must be mounted in the prov container as bulk_load_dir\n  prov_mount_dir: \"/srv/meta/prov_bulk\"\n\n  # Path INSIDE the container where bulk load files are accessed\n  # This directory must be:\n  # 1. Mapped as a volume from the host to the container\n  # 2. Listed in the DirsAllowed parameter in virtuoso.ini\n  bulk_load_dir: \"/database/bulk_load\"\n```\n\n### Behavior\n\n- **Success**: All files are loaded successfully, files remain in the mounted directories\n- **Failure**: If any file fails to load, the process crashes immediately with a detailed error message\n- Files remain in the mounted directories for manual inspection or retry\n\n## Analysing the dataset\n\nTo gather statistics on the dataset, you can use the provided analysis tools.\n\n### General statistics (SPARQL)\n\nFor most statistics, such as counting bibliographic resources (`--br`) or agent roles (`--ar`), the `sparql_analyser.py` script is the recommended tool. It queries the SPARQL endpoint directly.\n\n```console\npoetry run python -m oc_meta.run.analyser.sparql_analyser \u003cSPARQL_ENDPOINT_URL\u003e --br --ar\n```\n\n### Venue statistics (CSV)\n\n**Warning:** using the SPARQL analyser for venue statistics (`--venues`) against an OpenLink Virtuoso endpoint is **not recommended**. The complex query required for venue disambiguation can exhaust Virtuoso's RAM, causing it to return partial (and thus incorrect) results. As this query is not yet optimized for Virtuoso, this count will be wrong.\n\nFor reliable venue statistics, use the `meta_analyser.py` script to process the raw CSV output files directly.\n\nTo count the disambiguated venues, run the following command:\n\n```console\npoetry run python -m oc_meta.run.analyser.meta_analyser -c \u003cPATH_TO_CSV_DUMP\u003e -w venues\n```\nThe script will save the result in a file named `venues_count.txt`.\n\n## Finding and merging duplicates\n\nThe OpenCitations Meta Software provides plugins to identify and merge duplicate entities in the dataset.\n\n### Finding duplicate identifiers from files\n\nThe [`duplicated_ids_from_files.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/find/duplicated_ids_from_files.py) script scans RDF files stored in ZIP archives to find duplicate identifiers.\n\n#### Running the script\n\n```console\npoetry run python -m oc_meta.run.find.duplicated_ids_from_files \u003cFOLDER_PATH\u003e \u003cCSV_PATH\u003e [OPTIONS]\n```\n\nParameters:\n- `\u003cFOLDER_PATH\u003e`: Path to the folder containing the `id` subfolder with ZIP files\n- `\u003cCSV_PATH\u003e`: Path to the output CSV file where duplicates will be saved\n\nOptions:\n- `--chunk-size`: Number of ZIP files to process per chunk (default: 5000). Decrease this value if you encounter memory issues\n- `--temp-dir`: Directory for temporary files (default: system temp directory). The script automatically cleans up temporary files after completion\n\n### Grouping entities for efficient merging\n\nBefore merging duplicates, it's recommended to group related entities using the [`group_entities.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/merge/group_entities.py) script. This preprocessing step analyzes the CSV files containing merge instructions and groups interconnected entities together, enabling efficient multiprocessing during the merge phase.\n\n#### Why group entities?\n\nThe grouping script solves two important problems:\n\n1. **RDF relationship consistency**: entities to be merged may have relationships with other entities in the dataset. When processing merges in parallel, interconnected entities must be handled in the same process to maintain consistency.\n\n2. **File-level conflicts**: entities sharing the same RDF file (e.g., `br/060/10000/1000.zip`) should be grouped together to minimize file lock contention during parallel processing.\n\nThe script performs:\n\n1. **Identifies relationships**: queries the SPARQL endpoint to find all entities related to those being merged\n2. **Groups by RDF connections**: uses a Union-Find algorithm to group entities that share relationships\n3. **Groups by file range**: additionally groups entities that share the same RDF file path (considering supplier prefix and number ranges)\n4. **Optimizes for parallelization**: combines small independent groups while keeping large interconnected groups separate\n5. **Creates balanced workloads**: targets a minimum group size to ensure efficient parallel processing\n\nWhile `oc_ocdm` Storer uses FileLock for safety, this grouping reduces lock contention by ensuring workers process non-overlapping file ranges.\n\n#### Running the grouping script\n\n```console\npoetry run python -m oc_meta.run.merge.group_entities \u003cCSV_FILE\u003e \u003cOUTPUT_DIR\u003e \u003cMETA_CONFIG\u003e [--min_group_size SIZE]\n```\n\nParameters:\n- `\u003cCSV_FILE\u003e`: Path to the CSV file containing merge instructions\n- `\u003cOUTPUT_DIR\u003e`: Directory where grouped CSV files will be saved\n- `\u003cMETA_CONFIG\u003e`: Path to the Meta configuration YAML file (reads `triplestore_url`, `dir_split_number`, `items_per_file`, `zip_output_rdf`)\n- `--min_group_size`: Minimum target size for groups (default: 50)\n\n### Merging duplicate entities\n\nOnce you have identified duplicates (and optionally grouped them), you can merge them using the [`entities.py`](https://github.com/opencitations/oc_meta/blob/master/oc_meta/run/merge/entities.py) script. This script processes the CSV files generated by the duplicate-finding scripts and performs the actual merge operations.\n\n#### Running the merge script\n\n```console\npoetry run python -m oc_meta.run.merge.entities \u003cCSV_FOLDER\u003e \u003cMETA_CONFIG\u003e \u003cRESP_AGENT\u003e [OPTIONS]\n```\n\nParameters:\n- `\u003cCSV_FOLDER\u003e`: Path to the folder containing CSV files with merge instructions (use the output from `group_entities.py` for optimal parallel processing)\n- `\u003cMETA_CONFIG\u003e`: Path to the Meta configuration YAML file\n- `\u003cRESP_AGENT\u003e`: Responsible agent URI for provenance\n\nOptions:\n- `--entity_types`: Types of entities to merge (default: `ra`, `br`, `id`)\n- `--stop_file`: Path to the stop file for graceful interruption (default: `stop.out`)\n- `--workers`: Number of parallel workers (default: 4)\n\n## Running tests\n\nThe test suite is automatically executed via GitHub Actions upon pushes and pull requests. The workflow is defined in [`.github/workflows/run_tests.yml`](https://github.com/opencitations/oc_meta/blob/master/.github/workflows/run_tests.yml) and handles the setup of necessary services (Redis, Virtuoso) using Docker.\n\nTo run the test suite locally, follow these steps:\n\n1. **Install dependencies:** \n   Ensure you have [Poetry](https://python-poetry.org/) and [Docker](https://www.docker.com/) installed. Then, install project dependencies:\n   ```console\n   poetry install\n   ```\n\n2. **Start services:**\n   Use the provided script to start the required Redis and Virtuoso Docker containers:\n   ```console\n   chmod +x test/start-test-databases.sh\n   ./test/start-test-databases.sh\n   ```\n   Wait for the script to confirm that the services are ready.\n   (The Virtuoso SPARQL endpoint will be available at http://localhost:8805/sparql and ISQL on port 1105.\n   Redis will be available at localhost:6379, using database 0 for some tests and database 5 for most test cases including counter handling and caching).\n\n3. **Execute tests:**\n   Run the tests using the following command, which also generates a coverage report:\n   ```console\n   poetry run coverage run --rcfile=test/coverage/.coveragerc\n   ```\n   To view the coverage report in the console:\n   ```console\n   poetry run coverage report\n   ```\n   To generate an HTML coverage report (saved in the `htmlcov/` directory):\n   ```console\n   poetry run coverage html -d htmlcov\n   ```\n\n4. **Stop services:**\n   Once finished, stop the Docker containers:\n   ```console\n   chmod +x test/stop-test-databases.sh\n   ./test/stop-test-databases.sh\n   ```\n\n## Creating releases\n\nThe project uses semantic-release for versioning and publishing releases to PyPI. To create a new release:\n\n1. **Commit changes:**\n   Make your changes and commit them with a message that includes `[release]` to trigger the release workflow.\n   For details on how to structure semantic commit messages, see the [Semantic Commits Guide](SEMANTIC_COMMITS.md).\n\n2. **Push to master:**\n   Push your changes to the master branch. This will trigger the test workflow first.\n\n3. **Automatic release process:**\n   If tests pass, the release workflow will:\n   - create a new semantic version based on commit messages\n   - generate a changelog\n   - create a GitHub release\n   - build and publish the package to PyPI\n\nThe release workflow is configured in [`.github/workflows/release.yml`](https://github.com/opencitations/oc_meta/blob/master/.github/workflows/release.yml) and is triggered automatically when:\n- The commit message contains `[release]`\n- The tests workflow completes successfully\n- The changes are on the master branch\n\n## How to cite\n\nIf you have used OpenCitations Meta in your research, please cite the following paper:\n\nArcangelo Massari, Fabio Mariani, Ivan Heibi, Silvio Peroni, David Shotton; OpenCitations Meta. *Quantitative Science Studies* 2024; 5 (1): 50–75. doi: [https://doi.org/10.1162/qss_a_00292](https://doi.org/10.1162/qss_a_00292)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencitations%2Foc_meta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencitations%2Foc_meta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencitations%2Foc_meta/lists"}