{"id":46414974,"url":"https://github.com/viaacode/prefect-flow-arc-indexer","last_synced_at":"2026-03-05T14:03:15.017Z","repository":{"id":245414179,"uuid":"818178357","full_name":"viaacode/prefect-flow-arc-indexer","owner":"viaacode","description":"Prefect flow for indexing postgres JSON records in Elasticsearch","archived":false,"fork":false,"pushed_at":"2026-01-12T10:31:34.000Z","size":109,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-01-12T18:09:02.515Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viaacode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-21T09:10:30.000Z","updated_at":"2026-01-12T10:29:21.000Z","dependencies_parsed_at":"2024-08-13T16:10:42.233Z","dependency_job_id":"70b0e5ff-caf6-4f59-aad3-0571b0b1021f","html_url":"https://github.com/viaacode/prefect-flow-arc-indexer","commit_stats":null,"previous_names":["viaacode/prefect-flow-arc-indexer"],"tags_count":33,"template":false,"template_full_name":null,"purl":"pkg:github/viaacode/prefect-flow-arc-indexer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viaacode%2Fprefect-flow-arc-indexer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viaacode%2Fprefect-flow-arc-indexer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viaacode%2Fprefect-flow-arc-indexer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viaacode%2Fprefect-flow-arc-indexer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viaacode","download_url":"https://codeload.github.com/viaacode/prefect-flow-arc-indexer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viaacode%2Fprefect-flow-arc-indexer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30130031,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T12:40:50.676Z","status":"ssl_error","status_checked_at":"2026-03-05T12:39:32.209Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-05T14:03:14.349Z","updated_at":"2026-03-05T14:03:15.009Z","avatar_url":"https://github.com/viaacode.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hetarchief Elasticsearch indexer\n\nPrefect flow to index documents from a postgres database in Elasticsearch for hetarchief.be. \n\n\n## Synopsis\n\nThis flow runs the following tasks:\n1. Retrieve a list of all indexes from the postgres view\n2. If there is a full sync:\n   1. delete all indexes that are no longer present in the view\n   2. retrieve the indexes ordered by size\n   3. For each index (smallest first)\n      1. create new index \n      2. stream documents to the elasticsearch API\n      3. replace old index by the new index by switching the alias and deleting the old index\n3. If it is not a full sync, stream documents to the elasticsearch API to be deleted or indexed\n\nIf an error occurs during streaming, the created indexes are rolled back.\n\nThe Flow's diagram:\n\n![Diagram of Prefect flow](diagram.png)\n\n## Prerequisites\n\nThe following Prefect Blocks need to be configured:\n- location and credentials of the postgres database (type: `prefect_sqlalchemy.DatabaseCredentials`)\n- location and credentials of the elasticsearch cluster (type: `prefect_meemoo.credentials.ElasticsearchCredentials`)\n\n\n## Usage\n\nThe Prefect Flow requires setting the following parameters:\n- `db_block_name`: name of the database block\n- `db_table`: name of the table or view where the index documents are stored\n- `es_block_name`: name of the elasticsearch block\n- `db_column_es_id`: name of the column that contains the document identifiers(default: `\"id\"`)\n- `db_column_es_index`: name of the column that contains the index alias (default: `\"index\"`)\n- `or_ids_to_run`: list of indexes that need to be included. Set to `None` for all indexes. (default: `None`)\n- `full_sync`: set the sync to a full reload (default: `False`)\n- `db_batch_size`: size of the database cursor (default: `1000`)\n- `es_chunk_size`: elasticsearch chunk size (default: `500`)\n- `es_request_timeout`: elasticsearch request timeout (default: `30`)\n- `es_max_retries`: elasticsearch retries when a document failed to index (default: `10`)\n- `es_retry_on_timeout`: retry indexing a document when at timeout (default: `True`)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviaacode%2Fprefect-flow-arc-indexer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviaacode%2Fprefect-flow-arc-indexer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviaacode%2Fprefect-flow-arc-indexer/lists"}