{"id":46687518,"url":"https://github.com/informationgrid/ingrid-harvester","last_synced_at":"2026-03-09T02:30:44.620Z","repository":{"id":246186036,"uuid":"820321601","full_name":"informationgrid/ingrid-harvester","owner":"informationgrid","description":"Standalone component that collects data from diverse sources and stores it in Elasticsearch indices for processing, ensuring data is always available in a unified format.","archived":false,"fork":false,"pushed_at":"2026-03-03T17:58:10.000Z","size":11617,"stargazers_count":4,"open_issues_count":2,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-03-03T18:42:44.806Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://ingrid-oss.eu/latest/components/harvester/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"eupl-1.2","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/informationgrid.png","metadata":{"files":{"readme":"README.MD","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-26T08:37:48.000Z","updated_at":"2026-02-23T17:49:22.000Z","dependencies_parsed_at":"2024-07-16T19:44:52.624Z","dependency_job_id":"c168bb1b-5776-43da-85a6-69db39becc70","html_url":"https://github.com/informationgrid/ingrid-harvester","commit_stats":null,"previous_names":["informationgrid/ingrid-harvester"],"tags_count":55,"template":false,"template_full_name":null,"purl":"pkg:github/informationgrid/ingrid-harvester","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/informationgrid%2Fingrid-harvester","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/informationgrid%2Fingrid-harvester/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/informationgrid%2Fingrid-harvester/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/informationgrid%2Fingrid-harvester/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/informationgrid","download_url":"https://codeload.github.com/informationgrid/ingrid-harvester/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/informationgrid%2Fingrid-harvester/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30280805,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T02:23:26.802Z","status":"ssl_error","status_checked_at":"2026-03-09T02:22:46.175Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-03-09T02:30:42.258Z","updated_at":"2026-03-09T02:30:44.614Z","avatar_url":"https://github.com/informationgrid.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InGrid Harvester\n\n\u003cimg src=\"https://ingrid-oss.eu/8.2.0/assets/components/ingrid-plattform.png\" alt=\"InformationGrid illustration\" width=\"480\" align=\"right\"\u003e\n\nThis repository is part of **[InGrid](https://ingrid-oss.eu)**, an open-source solution for building, managing, and exposing metadata-driven information systems. \n\n**About InGrid Harvester:**  \nStandalone component that collects data from diverse sources and stores it in Elasticsearch indices for processing, ensuring data is always available in a unified format.\n\n# Installation\n\nThe InGrid Harvester runs two components in a single docker container: the actual `server` application and the admin `client`. It depends on an Elasticsearch instance and a PostgreSQL installation.\n\n## General steps\n\n* Checkout this repo\n* Add readonly wemove docker hub credentials to your docker setup\n    ```bash\n    sudo docker login docker-registry.wemove.com\n    Username: readonly\n    Password: readonly\n    ```\n\n## Configuration\n\n### General notes\n\n* If you want the InGrid Harvester to be accessed at a sub-path (i.e., not directly at root), you have to **both**\n  * set `BASE_URL` to the desired path (environment variable)\n  * set `contextPath` in the client config file to the same value\n* This is in addition to appropriate nginx settings\n\n### Configuration files\n\n| Config file location (project) | Config file location (docker container)                    | Purpose                                         |\n|--------------------------------|------------------------------------------------------------|-------------------------------------------------|\n| server/config.json             | /opt/ingrid/harvester/server/config.json                   | Harvester configuration                         |\n| server/config-general.json     | /opt/ingrid/harvester/server/config-general.json           | General settings (Elasticsearch, Postgres, ...) |\n| client/src/assets/config.json  | /opt/ingrid/harvester/server/app/webapp/assets/config.json | Client settings                                 |\n\nIn a docker setup, you probably want to map these files from the host system into the container.\n\n### Environment variables\n\nSeveral general settings can also be configured via environment variables. These settings take precedence over configuration files.\n\n| Variable                    | Note                                                              |\n|-----------------------------|-------------------------------------------------------------------|\n| DB_CONNECTION_STRING        |                                                                   |\n| DB_URL                      |                                                                   |\n| DB_PORT                     |                                                                   |\n| DB_NAME                     |                                                                   |\n| DB_USER                     |                                                                   |\n| DB_PASSWORD                 |                                                                   |\n| ELASTIC_URL                 |                                                                   |\n| ELASTIC_VERSION             | Major version (6, 7, or 8)                                        |\n| ELASTIC_USER                |                                                                   |\n| ELASTIC_PASSWORD            |                                                                   |\n| ELASTIC_REJECT_UNAUTHORIZED | Whether to reject Es connections if the certificate is invalid    |\n| ELASTIC_INDEX               |                                                                   |\n| ELASTIC_ALIAS               |                                                                   |\n| ELASTIC_PREFIX              |                                                                   |\n| ELASTIC_NUM_SHARDS          |                                                                   |\n| ELASTIC_NUM_REPLICAS        |                                                                   |\n| PORTAL_URL                  | Base URL for displaying portal website (no trailing slash)        |\n| PROXY_URL                   | URL needs to contain credentials and port, if applicable          |\n| ALLOW_ALL_UNAUTHORIZED      | If all connections should be allowed, regardless of SSL state     |\n| IMPORTER_PROFILE            | Profile to use for the application: diplanung, mcloud             |\n| BASE_URL                    | Subpath where the Harvester is being served at, if not on `/`     |\n\n\n## Local development setup\n\n### Running in a local docker container\n\nYou can use the same setup as outlined in the section `Test setup` below, but with `docker-compose-dev.yml`. This scales down memory requirements and uses `ts-node-dev` instead of `node`.\n\n### Running in a terminal\n\nPrerequisites:\n* node.js v16\n* Postgresql \u003e= v14\n* Elasticsearch \u003e= 6\n\nYou may wish to run the server and the client outside of the docker container, for debugging and faster deployment/development purposes. Currently you have to change some files to achieve this, outlined below:\n\n* `server/config-general.json`:\n  * change the value of `elasticsearch.url` to `http://localhost:9200`\n  * change the value of `elasticsearch.password`\n* Now, first start an Elasticsearch instance (either from the docker container or directly on your machine), then run the client and server separately:\n  ```bash\n  cd client\n  npm run start\n  ```\n  ```bash\n  cd server\n  npm run start-{profile}\n  ```\n  where `{profile}` is one of `mcloud`, `diplanung`, `lvr`\n* Now you can access the harvester\n    * via GUI: http://localhost:4200\n    * via Elasticsearch API: http://localhost:9200\n\n\n## Test setup\n\n* `server/config-general.json`: change the value of `elasticsearch.password`\n* Build, run, and detach the containers:\n    ```bash\n    sudo docker-compose -f docker-compose.yml up --build -d\n    ```\n* Now you can access the harvester\n    * via GUI: http://localhost:8090\n    * via Elasticsearch API: http://localhost:9200\n        * user: `read_user`\n        * password: *the one you set in `elasticsearch/create-users.json`*\n\n\n## Test setup in a Kubernetes environment\n\n* TODO\n\n\n## Production setup in a Kubernetes environment\n\n* TODO\n\n\u003cbr /\u003e\n\u003cbr /\u003e\n\n---\n\n***Below you find the old version of the readme, which targeted an RPM release***\n\n\n# Configuration\n\nEdit the file config.js to define the location of the excel file to be imported ('filePath'). You can also\nconfigure the address of the Elasticsearch URL where the data shall be indexed to ('elasticsearch.url').\n\nTo disable authentication during development, comment the following line in \"AuthMiddleware.ts\"\n\u003e// throw new Unauthorized(\"Unauthorized\");\n\n# Run\n\nExecute the following command to run a single import:\n\nRun Elasticsearch:\n\u003e docker-compose up -d\n\nFor the server:\n\u003e npm run start-dev\n\nFor the server (node 16+):\n\u003e npm run start-dev-16\n\nFor the client:\n\u003e npm run start\n\n# Test\n\n\u003e npm run test\n\nor\n\n\u003e mocha -r ts-node/register test/*.spec.ts\n\n# Development\n\nThe main document is \"server/model/index-document.ts\", which represents the Elasticsearch document. This model is used by all harvester and helps to stay synchronized. When adding a new index field then the compiler will let you know about missing implementations.\n\n# Release\n\n* Update changelog-file\n* create annotated tag with message \"Release\"\n  * `git tag -m \"Release X.Y.Z\"`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finformationgrid%2Fingrid-harvester","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finformationgrid%2Fingrid-harvester","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finformationgrid%2Fingrid-harvester/lists"}