{"id":19757183,"url":"https://github.com/mach1el/openproject-crawler","last_synced_at":"2025-02-28T02:20:56.165Z","repository":{"id":248795201,"uuid":"829796732","full_name":"mach1el/openproject-crawler","owner":"mach1el","description":"Scraping data on OpenProject","archived":false,"fork":false,"pushed_at":"2024-08-09T07:28:11.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-10T22:43:39.604Z","etag":null,"topics":["crawler","golang","golang-channel","golang-crawling","openproject-crawler","python","python-asyncio","python-crawling"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mach1el.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-17T03:08:18.000Z","updated_at":"2024-08-09T07:28:12.000Z","dependencies_parsed_at":"2024-07-17T06:08:54.016Z","dependency_job_id":"29bbe816-729b-46eb-ba5b-63a42afa418c","html_url":"https://github.com/mach1el/openproject-crawler","commit_stats":null,"previous_names":["mach1el/openproject-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach1el%2Fopenproject-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach1el%2Fopenproject-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach1el%2Fopenproject-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mach1el%2Fopenproject-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mach1el","download_url":"https://codeload.github.com/mach1el/openproject-crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241089116,"owners_count":19907690,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","golang","golang-channel","golang-crawling","openproject-crawler","python","python-asyncio","python-crawling"],"created_at":"2024-11-12T03:18:21.020Z","updated_at":"2025-02-28T02:20:56.138Z","avatar_url":"https://github.com/mach1el.png","language":"Go","readme":"# openproject-crawler\n\n![OpenProject](https://img.shields.io/badge/OpenProject-2D8CFF?style=for-the-badge\u0026logo=openproject\u0026logoColor=white)\n![Go](https://img.shields.io/badge/go-%2300ADD8.svg?style=for-the-badge\u0026logo=go\u0026logoColor=white)\n![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54)\n![Selenium](https://img.shields.io/badge/-selenium-%43B02A?style=for-the-badge\u0026logo=selenium\u0026logoColor=white)\n\nThis tool supports collecting data from OpenProject, forcing users to use the available API of OpenProject and additional Web Selenium for scraping more data, which the API doesn't support. Scraping processes are using asynchronous programming to make it faster and stable.\n\n## Installation\n\nTo install the required dependencies, use:\n```bash\npip install -r requirements.txt\n```\n\n## Important variables\n\n- *`username`*: This variable could be change when collect data from API or from web portal, for the API value should be `apikey`, [check this](https://www.openproject.org/docs/api/introduction/#api-key-through-basic-auth) for more information. For the portal value should be the username you use to access the web portal\n- *`password`*: Also like the username, for the API it must be access token, check [this note](https://www.openproject.org/docs/api/introduction/#api-key-through-basic-auth).\n- *`api_url`*: The value should be `https://myopenproject.example/api/v3` (endswith `/api/v3`)\n- *`portal_url`*: The value should be `https://myopenproject.example` (no need any uri path)\n\n## Example to use (Python)\n\nFor example, to use this module, I provide a script named [utils.py](src/python/sample/utils.py) to scrape data from a specific project. This will use the asynchronous method, execpt `DataParser`; it will use [ThreadPool](https://docs.python.org/3/library/concurrent.futures.html) instead. Hence, you need to setup it in an asynchronous way with *`async/await`* syntax. Give some explanation.\n\n- *`Crawler`* class where to init crawler and get data such project's ID, project's tasks ID, tasks's activities\n  - function `get_projects_id` -\u003e Get all projects available and its ID\n  - function `get_tasks_id` -\u003e Get all tasks that belong to project `\"my_project\"` with filters parameters in HTTP request\n  - function `get_tasks_activities_data` -\u003e Scrape data from `work_packages/{id}`\n\n### Setup Python venv to use the tool\n\n* Navigate to the project source: \n\n```bash\ncd /path/to/openproject-crawler/src/python\n```\n\n* Create a virutal environment:\n\n```bash\npython -m venv venv\n```\n\n* Active environment\n\n  + `On Windows`\n\n      ```\n      .\\venv\\Scripts\\activate\n      ```\n  + `Unix or MacOS`\n\n      ```bash\n      source venv/bin/activate\n      ```\n* Install the required dependencies:\n\n```bash\npip install -e .\n```\n\n## Example to use (GoLang)\n\nGiven detail usage on [main.go](src/golang/openproject-crawler/cmd/main.go) as same as Python process, the flow is\n\n`Get projects ID` -\u003e `Get tasks ID of specific project` -\u003e `Get tasks activities of specific project`\n\n```bash\ngo run main.go\n```\n\n# Data structure\n\n* Projects ID:\n```json\n{\n  \"1\" : \"mainproject\",\n  \"2\" : \"demoproject\"\n}\n```\n\n* Tasks ID:\n\n  * Golang data:\n\n    ```go\n    [45 278 13 225]\n    ```\n\n  * Python data:\n\n    ```python\n    [45, 278, 13, 225]\n    ```\n\n* Tasks activities:\n\n```json\n{\n  \"Task name\": \"Scraping data from openproject\",\n  \"Task info\": {\n    \"Project\": \"Data collection\",\n    \"ID\": \"2\",\n    \"Type\": \"Task\",\n    \"Priority\": \"Normal\",\n    \"Create date\": \"2024-06-09 15:12:26\",\n    \"End Date\": \"2024-06-19 16:44:31\",\n    \"Duration\": \"10 days\"\n  },\n  \"Task activities\": [\n    {\n      \"Datetime\": \"2024-06-19 16:44:31\",\n      \"Action\": [\n        \"Status changed from In progress to Closed\"\n      ]\n    }\n  ]\n}\n```\n\n## License\n![GitHub License](https://img.shields.io/github/license/mach1el/openproject-crawler?style=flat-square\u0026color=%23FF5E0E)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmach1el%2Fopenproject-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmach1el%2Fopenproject-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmach1el%2Fopenproject-crawler/lists"}