{"id":19784049,"url":"https://github.com/openeventdata/phoenix_pipeline","last_synced_at":"2025-07-13T00:34:59.304Z","repository":{"id":14012788,"uuid":"16714337","full_name":"openeventdata/phoenix_pipeline","owner":"openeventdata","description":"Turning news into events since 2014.","archived":false,"fork":false,"pushed_at":"2017-05-01T14:55:00.000Z","size":1383,"stargazers_count":51,"open_issues_count":12,"forks_count":31,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-30T22:38:58.762Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openeventdata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-02-11T00:37:57.000Z","updated_at":"2025-02-12T02:46:27.000Z","dependencies_parsed_at":"2022-08-24T13:37:33.506Z","dependency_job_id":null,"html_url":"https://github.com/openeventdata/phoenix_pipeline","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/openeventdata/phoenix_pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openeventdata%2Fphoenix_pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openeventdata%2Fphoenix_pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openeventdata%2Fphoenix_pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openeventdata%2Fphoenix_pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openeventdata","download_url":"https://codeload.github.com/openeventdata/phoenix_pipeline/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openeventdata%2Fphoenix_pipeline/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265075621,"owners_count":23707510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T06:10:07.184Z","updated_at":"2025-07-13T00:34:59.257Z","avatar_url":"https://github.com/openeventdata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"phoenix_pipeline\n================\n\n[![Build Status](https://travis-ci.org/openeventdata/phoenix_pipeline.svg?branch=master)](https://travis-ci.org/openeventdata/phoenix_pipeline)\n[![Join the chat at https://gitter.im/openeventdata/phoenix_pipeline](https://badges.gitter.im/openeventdata/phoenix_pipeline.svg)](https://gitter.im/openeventdata/phoenix_pipeline?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nTurning news into events since 2014.\n\n\nThis system links a series of Python programs to convert the files which have been \ndownloaded by a [web scraper](https://github.com/openeventdata/scraper) to\ncoded event data which is uploaded to a web site designated in the config file.\nThe system processes a single day of information, but this can be derived from\nmultiple text files. The pipeline also implements a filter for source URLs as\ndefined by the keys in the `source_keys.txt` file. These keys correspond to the\n`source` field in the MongoDB instance.\n\nFor more information please visit the [documentation](http://phoenix-pipeline.readthedocs.org/en/latest/).\n\n\n## Requirements\n\nThe pipeline requires either\n[Petrarch](https://github.com/openeventdata/petrarch) or\n[Petrarch2](https://github.com/openeventdata/petrarch2) to be installed. Both\nare Python programs and can be installed from Github using pip.\n\nThe pipeline assumes that stories are stored in a MongoDB in a particular\nformat. This format is the one used by the OEDA news RSS scraper. See [the\ncode](https://github.com/openeventdata/scraper/blob/master/mongo_connection.py)\nfor details on it structures stories in the Mongo. Using this pipeline with\ndifferently formatted databases will require changing field names throughout\nthe code. The pipeline also requires that stories have been parsed with\nStanford CoreNLP. See the [simple and\nstable](https://github.com/openeventdata/stanford_pipeline) way to do this, or\nthe [experimental distributed](https://github.com/oudalab/biryani) approach.\n\nThe pipeline requires one of two geocoding systems to be running: CLIFF-CLAVIN\nor Mordecai. For CLIFF, see a VM version\n[here](https://github.com/ahalterman/CLIFF-up) or a Docker container version\n[here](https://github.com/openeventdata/cliff_container). For Mordecai, see the\nsetup instructions [here](https://github.com/openeventdata/mordecai). The\nversion of the pipeline deployed in production currently uses CLIFF/CLAVIN, but\nfuture development will focus on improvements to Mordecai.\n\n## Configuration\n\nThe pipeline has two configuration files. `PHOX_config.ini` specifies which\ngeolocation system to use, how to name the files produced by the pipeline, and\nhow to upload the files to a remote server if desired.\n\n`petr_config.ini` is the configuration file for Petrarch2 itself, including the\nlocation of dictionaries, new actor extraction options, and the one-a-day filter. For\nmore details see the main [Petrarch2 repo](https://github.com/openeventdata/petrarch2/).\n\n## Running\n\nTo run the program:\n\n```\npython pipeline.py\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopeneventdata%2Fphoenix_pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopeneventdata%2Fphoenix_pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopeneventdata%2Fphoenix_pipeline/lists"}