{"id":25426726,"url":"https://github.com/cptlobster/aggregation-framework","last_synced_at":"2025-10-26T19:49:19.795Z","repository":{"id":273615605,"uuid":"920287206","full_name":"cptlobster/aggregation-framework","owner":"cptlobster","description":"A Swiss-army knife Scala library for scraping and processing data from the web.","archived":false,"fork":false,"pushed_at":"2025-05-25T20:44:07.000Z","size":208,"stargazers_count":1,"open_issues_count":17,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-03T09:34:39.976Z","etag":null,"topics":["data-scraping","http","kafka","kafka-producer","scala","selenium"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cptlobster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE_GPL.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-21T22:08:50.000Z","updated_at":"2025-05-25T20:44:13.000Z","dependencies_parsed_at":"2025-05-14T05:15:31.513Z","dependency_job_id":"87ca0d7a-e51c-4166-9ff8-a0a459df1f02","html_url":"https://github.com/cptlobster/aggregation-framework","commit_stats":null,"previous_names":["cptlobster/aggregation_framework","cptlobster/aggregation-framework"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cptlobster/aggregation-framework","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cptlobster%2Faggregation-framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cptlobster%2Faggregation-framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cptlobster%2Faggregation-framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cptlobster%2Faggregation-framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cptlobster","download_url":"https://codeload.github.com/cptlobster/aggregation-framework/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cptlobster%2Faggregation-framework/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005417,"owners_count":26083883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-scraping","http","kafka","kafka-producer","scala","selenium"],"created_at":"2025-02-17T00:21:32.117Z","updated_at":"2025-10-10T21:37:23.034Z","avatar_url":"https://github.com/cptlobster.png","language":"Scala","readme":"# aggregation-framework\n\nA Swiss-army knife library for scraping and processing data from the web. Provides a unified interface for multiple\ndifferent HTTP clients, and convenience functionality for parsing and preprocessing data for your applications to use.\n\n- Quickly build HTTP requests for a variety of data formats and APIs.\n- Parse common data formats such as XML, HTML, and JSON.\n- Push your aggregated data automatically to your preferred database (such as Kafka, MySQL, or Postgres).\n- Write your own collectors for non-standard data formats.\n\n```mermaid\ngraph LR\n    EXT1[(External HTTP API)]\n    EXT2[(External HTTP API)]\n    EXT3[(External HTTP API)]\n\n    COL1[/Collector/]\n    COL2[/Collector/]\n    COL3[/Collector/]\n    \n    DB[(Application Database)]\n    BE1[Backend Application]\n    BE2[Backend Application]\n    BE3[Backend Application]\n  \n    subgraph AP[Aggregation Framework]\n        COL1\n        COL2\n        COL3\n    end\n    \n    EXT1 --\u003e COL1\n    EXT2 --\u003e COL2\n    EXT3 --\u003e COL3\n    \n    COL1 \u0026 COL2 \u0026 COL3 --\u003e DB --\u003e BE1 \u0026 BE2 \u0026 BE3\n```\n\n## Get Started\n\nAdd Aggregation Framework and your preferred extensions to your project. For sbt:\n\n```sbt\n// add Forge as a resolver\nresolvers += \"Gitea Package API\" at \"https://forge.cptlobster.dev/api/packages/cptlobster/maven\"\n\nlibraryDependencies += \"dev.cptlobster\" %% \"aggregation-framework-core\" % \"0.1.0-SNAPSHOT\"\n// for JSON parsing\nlibraryDependencies += \"dev.cptlobster\" %% \"aggregation-framework-json\" % \"0.1.0-SNAPSHOT\"\n```\n\n*Note: Snapshot versions are available here at forge.cptlobster.dev. Release versions will be made available on Maven\nCentral at a future date.*\n\nTo create a consumer, [follow the tutorial](docs/tutorial.md).\n\n## Target Artifacts\n\nThe project is split into a collection of packages. These are split so that you don't have to install a ton of external\npackages that you aren't going to use.\n\nThe core package is located under `/core` in this repository, and the extension packages are located under their own\nsubdirectories in `/ext`. Each extension package has its own README that describes it in more detail.\n\n```mermaid\ngraph BT\n    CORE[aggregation-framework-core]\n    JSON[aggregation-framework-json]\n    KAFKA[aggregation-framework-kafka]\n    SEL[aggregation-framework-selenium]\n    RUNNER[aggregation-framework-runner]\n    CORE --\u003e JSON \u0026 KAFKA \u0026 SEL \u0026 RUNNER\n```\n\n## Development\nThis project uses sbt for project and dependency management. Install sbt via your preferred package manager; if you use\nIntelliJ, it can manage sbt for you.\n\nTo build the entire project:\n\n```shell\nsbt compile\n```\n\n## License\nThis program is licensed under the [GNU Lesser General Public License, version 3](LICENSE_LGPL.md).\n\n*This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General\nPublic License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any\nlater version.*\u003cbr /\u003e\n*This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied\nwarranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.*\n\u003cbr /\u003e\n*You should have received a copy of the GNU Lesser General Public License (and the GNU General Public License) along\nwith this program. If not, see https://www.gnu.org/licenses/.*","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcptlobster%2Faggregation-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcptlobster%2Faggregation-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcptlobster%2Faggregation-framework/lists"}