{"id":17495518,"url":"https://github.com/rdig/dataacquisitionschedulerpocappplatform","last_synced_at":"2026-04-02T01:59:55.943Z","repository":{"id":248363300,"uuid":"828486295","full_name":"rdig/dataAcquisitionSchedulerPocAppPlatform","owner":"rdig","description":"Data Acquisition Scheduler Proof of Concept hosted built to be hosted on DO's App Platform","archived":false,"fork":false,"pushed_at":"2024-07-15T00:31:36.000Z","size":17032,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-19T17:32:49.549Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rdig.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-14T09:52:07.000Z","updated_at":"2024-07-15T00:31:39.000Z","dependencies_parsed_at":"2024-07-14T11:26:28.080Z","dependency_job_id":"3d8b1763-f3ca-4255-b716-3037158518bd","html_url":"https://github.com/rdig/dataAcquisitionSchedulerPocAppPlatform","commit_stats":{"total_commits":22,"total_committers":2,"mean_commits":11.0,"dds":"0.045454545454545414","last_synced_commit":"514c2e6618f2d983be3818be5c3537b01761387d"},"previous_names":["rdig/dataacquisitionschedulerpocappplatform"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdig%2FdataAcquisitionSchedulerPocAppPlatform","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdig%2FdataAcquisitionSchedulerPocAppPlatform/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdig%2FdataAcquisitionSchedulerPocAppPlatform/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rdig%2FdataAcquisitionSchedulerPocAppPlatform/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rdig","download_url":"https://codeload.github.com/rdig/dataAcquisitionSchedulerPocAppPlatform/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243240937,"owners_count":20259497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-19T14:06:38.087Z","updated_at":"2025-12-26T06:01:14.563Z","avatar_url":"https://github.com/rdig.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dataAcquisitionSchedulerPocAppPlatform\nData Acquisition Scheduler Proof of Concept hosted built to be hosted on DO's App Platform\n\nThis builds on the initial work done via [dataAcquisitionSchedulerPOC](https://github.com/rdig/dataAcquisitionSchedulerPOC)\n\nCode for the scheduler _(deployed as an [DO App Platform](https://docs.digitalocean.com/products/app-platform/) Node service)_ is in the main `index.js` file, and gets deployed and built via the `Dockerfile` in the root directory.\n\nThe serverless function _(deployed as an [DO App Platform Function](https://docs.digitalocean.com/products/functions/))_ is in the `worker` folder\n\n### General Description\n- scheduler manages a list of urls to fetch data from _(since it's a PoC, that data is only the page title)_\n- once it notices a new url in that list, it assigns it to a worker (serverless function)\n- worker will take itself off the \"available list\", fetch the url, parse it and save it in the database\n- then it will make itself available again via that \"available list\" _(currently a db collection since it's just a PoC)_\n- scheduler will then mark the url as \"done\" and remove it from the list\n- scheduler has also a time off, which, if the worker will not fetch the url in time, it will de-assign the worker and put the url back in the list for another worker to pick up\n- lastly, the scheduler will listen for responses from the worker, and if the worker encounters and error fetching or parsing the url, it will de-assign the worker, but remove the url from the list fully, as at this point it will consider it a url that cannot be fetched\n\n### Available endpoints:\n_(protected by a bearer token, all, except for the main entry)_\n\n- `/` - main entry\n- `/add` - add a new urls for the scheduler to process.\n  - Accepts JSON formatted as an array of objects with a `url` key _(note that url strings will be validated)_\n  - Example: `curl -H \"Content-Type: application/json\" -H \"x-auth-bearer: XXX\" -X POST -d '[{\"url\":\"https://www.google.com\"}, {\"url\":\"https://www.yahoo.com\"}]' https://\u003cscheduler-app-url\u003e/add`\n- `/processor` - show the current list of urls that are assigned to workers\n- `/db` - list out the current database entries\n- `/queue` - show the current queue _(urls that are waiting to be processed, but are not assigned to workers yet)_\n- `/reset` - clear out the queue list\n- `/worker` - Acces to the worker serverless function directly _(if deployed)_\n  - Accepts JSON formatted as an single object with a `url` key _(note that url strings will be validated)_\n  - Example: `curl -H \"Content-Type: application/json\" -H \"x-auth-bearer: XXX\" -X POST -d '{\"url\":\"https://www.google.com\"}' https://\u003cscheduler-app-url\u003e/worker`\n\n### Demo\n\n![demo](./demo.gif)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frdig%2Fdataacquisitionschedulerpocappplatform","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frdig%2Fdataacquisitionschedulerpocappplatform","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frdig%2Fdataacquisitionschedulerpocappplatform/lists"}