{"id":20585751,"url":"https://github.com/triglav-dataflow/triglav-agent-hdfs","last_synced_at":"2026-01-28T04:49:06.691Z","repository":{"id":59158065,"uuid":"77115026","full_name":"triglav-dataflow/triglav-agent-hdfs","owner":"triglav-dataflow","description":"HDFS agent for Triglav, data-driven workflow tool","archived":false,"fork":false,"pushed_at":"2017-09-04T02:53:13.000Z","size":74,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-09-18T11:47:20.507Z","etag":null,"topics":["hdfs","jruby","ruby","triglav-agent"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/triglav-dataflow.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-22T05:30:45.000Z","updated_at":"2017-03-14T15:09:35.000Z","dependencies_parsed_at":"2022-09-13T20:11:40.321Z","dependency_job_id":null,"html_url":"https://github.com/triglav-dataflow/triglav-agent-hdfs","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triglav-dataflow%2Ftriglav-agent-hdfs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triglav-dataflow%2Ftriglav-agent-hdfs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triglav-dataflow%2Ftriglav-agent-hdfs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/triglav-dataflow%2Ftriglav-agent-hdfs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/triglav-dataflow","download_url":"https://codeload.github.com/triglav-dataflow/triglav-agent-hdfs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240636591,"owners_count":19832921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hdfs","jruby","ruby","triglav-agent"],"created_at":"2024-11-16T07:09:16.247Z","updated_at":"2026-01-28T04:49:06.652Z","avatar_url":"https://github.com/triglav-dataflow.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Triglav::Agent::Hdfs\n\nTriglav Agent for Hdfs\n\n## Requirements\n\n* JRuby \u003e= 9.1.5.0\n* Java \u003e= 1.8.0_45\n\n\n## Prerequisites\n\n* HDFS path to be monitored must be created or modified atomically. To modify HDFS path atomically, use either of following strategies for example:\n  * Create a tmp directory and copy files into the directory, then move to the target path\n  * Create a marker file such as `_SUCCESS` after copying is done, and monitor the `_SUCESSES` file\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'triglav-agent-hdfs'\n```\n\nAnd then execute:\n\n    $ bundle\n\nOr install it yourself as:\n\n    $ gem install triglav-agent-hdfs\n\n## CLI\n\n```\nUsage: triglav-agent-hdfs [options]\n    -c, --config VALUE               Config file (default: config.yml)\n    -s, --status VALUE               Status stroage file (default: status.yml)\n    -t, --token VALUE                Triglav access token storage file (default: token.yml)\n        --dotenv                     Load environment variables from .env file (default: false)\n    -h, --help                       help\n        --log VALUE                  Log path (default: STDOUT)\n        --log-level VALUE            Log level (default: info)\n```\n\nRun as:\n\n```\nTRIGLAV_ENV=development bundle exec triglav-agent-hdfs --dotenv -c config.yml\n```\n\n## Configuration\n\nPrepare config.yml as [example/config.yml](./example/config.yml).\n\nYou can use erb template. You may load environment variables from .env file with `--dotenv` option.\n\n### serverengine section\n\nYou can specify any [serverengine](https://github.com/fluent/serverengine) options at this section\n\n### triglav section\n\nSpecify triglav api url, and a credential to authenticate.\n\nThe access token obtained is stored into a token storage file (--token option).\n\n### hdfs section\n\nThis section is the special section for triglav-agent-hdfs.\n\n* **monitor_interval**: The interval to watch tables (number, default: 60)\n* **connection_info**: key-value pairs of hdfs connection info where keys are resource URI pattern in regular expression, and values are connection information\n\n### Specification of Resource URI\n\nResource URI must be a form of:\n\n```\nhdfs://{namespace}/#{path}\n```\n\nPath accepts `strftime` format such as `%Y-%m-%d`.\n\n## How it behaves\n\n1. Authenticate with triglav\n  * Store the access token into the token storage file\n  * Read the token from the token storage file next time\n  * Refresh the access token if it is expired\n2. Repeat followings in `monitor_interval` seconds:\n3. Obtain resource (table) lists of the specified prefix (keys of connection_info) from triglav.\n4. Connect to hdfs with an appropriate connection info for a resource uri, and find tables which are newer than last check.\n5. Store checking information into the status storage file for the next time check.\n\n## Development\n\n### Prepare\n\n```\nbundle\nbundle exec rake vendor_jars\n```\n\n```\n./prepare.sh\n```\n\nEdit `.env` file or `config.yml` file directly.\n\n### Start\n\nStart up triglav api on localhost.\n\nRun triglav-agent-hdfs as:\n\n```\nTRIGLAV_ENV=development bundle exec triglav-agent-hdfs --dotenv --debug -c example/config.yml\n```\n\nThe debug mode with --debug option ignores the `last_modification_time` value in status file.\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/triglav-agent-hdfs/triglav-agent-hdfs. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.\n\n\n## License\n\nThe gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).\n\n## ToDo\n\n* prepare mocks of both triglav and hdfs for tests\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftriglav-dataflow%2Ftriglav-agent-hdfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftriglav-dataflow%2Ftriglav-agent-hdfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftriglav-dataflow%2Ftriglav-agent-hdfs/lists"}