{"id":19607153,"url":"https://github.com/robocorp/example-simple-web-scraper","last_synced_at":"2025-04-27T19:33:11.685Z","repository":{"id":55014251,"uuid":"315296225","full_name":"robocorp/example-simple-web-scraper","owner":"robocorp","description":"Opens a web page and stores some content.","archived":false,"fork":false,"pushed_at":"2024-01-04T10:29:16.000Z","size":61,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":17,"default_branch":"main","last_synced_at":"2025-04-05T03:03:22.320Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robocorp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-11-23T11:39:12.000Z","updated_at":"2024-06-04T10:48:37.000Z","dependencies_parsed_at":"2022-08-14T09:10:12.144Z","dependency_job_id":"2c6a181a-7f58-41df-8717-a85a719e0346","html_url":"https://github.com/robocorp/example-simple-web-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robocorp%2Fexample-simple-web-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robocorp%2Fexample-simple-web-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robocorp%2Fexample-simple-web-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robocorp%2Fexample-simple-web-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robocorp","download_url":"https://codeload.github.com/robocorp/example-simple-web-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251196481,"owners_count":21550960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T10:09:06.891Z","updated_at":"2025-04-27T19:33:11.440Z","avatar_url":"https://github.com/robocorp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A simple web scraper example\n\nThe example task opens a web page, scrapes the web page for intended values and stores those values into a file.\nThe output will be stored in the \"output\" directory.\n\nWhen run, the task will:\n\n- Open a real web browser\n- Navigate to `\"https://finance.yahoo.com/crypto\"`\n- Brush off the Accept these Cookies pop-out.\n- Detect if the targeted information is available - the Cryptocurrencies table\n- Collect the Top 10 cryptocurrencies\n- Pretty print them to the screen\n- Save the collected data to a CSV file\n\n## The Preparation\n\nThe `conda.yaml` file found in this project will contain the necessary python packages.\nYou'll find appropriate explanations or links.\n\nThe `robot.yaml` file found in this project will contain meta information that will configure the execution of the necessary tasks.\n\n## The Main Task\n\nThe main robot file (`tasks.py`) contains the `task: web_scraper_top_10_crypto` your robot is going to complete when run.\nThis is indicated by the `@task` decorator that is imported from the `robocorp.tasks` library.\n\n```python\n@task\ndef web_scraper_top_10_crypto() -\u003e None:\n    \"\"\"\n    Automate web scraping for the current Top 10 Cryptocurrency.\n    Output would be printed out in this format:\n        ##################################################\n        ### Top 10 Cryptocurrencies:\n        ##################################################\n        ### 1  | Bitcoin USD              | $ 44,853.27\n    Output will also be saved in a CSV file in the output folder.\n    ---------------------------\n    The Robocorp Web Inspector was used to generate the locator values/selectors/identifiers.\n    \"\"\"\n    ...\n```\n\nThe Python script doesn't need anything else to be executed with the `robocorp` library \u0026 CLI facilitators.\nYou can easily execute the `@task` using the [Robocorp Code VSCode Extension](https://marketplace.visualstudio.com/items?itemName=robocorp.robocorp-code)\n\nFind comments and helpful insights in the code itself.\n\n\u003e Note on creating locators: Simply put, a locator is an object constructed after a selector is executed.\n\u003e The selector value is usually pretty difficult to construct. As such, you can use the `Robocorp Web Inspector`\n\u003e from the [Robocorp Code VSCode Extension](https://marketplace.visualstudio.com/items?itemName=robocorp.robocorp-code)\n\u003e to build valid selectors with ease.\n\nWhen the `@task` finishes executing, the screen output will look similar to this:\n```\n##################################################\n### Top 10 Cryptocurrencies:\n##################################################\n### 1  | Bitcoin USD              | $ 42,383.66\n### 2  | Ethereum USD             | $ 2,208.85\n### 3  | Tether USDt USD          | $ 1.0007\n### 4  | BNB USD                  | $ 304.37\n### 5  | Solana USD               | $ 96.09\n### 6  | XRP USD                  | $ 0.550529\n### 7  | USD Coin USD             | $ 1.0002\n### 8  | Lido Staked ETH USD      | $ 2,206.01\n### 9  | Cardano USD              | $ 0.534951\n### 10 | Avalanche USD            | $ 35.68\n##################################################\n```\n\nYou can take a look inside the `output` folder and find different files related to the execution.\nThe `output` folder will contain log files, journals, but also the CSV output of the `@task`: `top-10-cryptos-(today's date).csv`\n\n\u003e Note: It is important to know that the Control Room Work Items will \u0026 should be part of the `output` folder as well.\n\nThere you can find the `log.html` file that you might find interesting.\nIt will provide you with keen insights into the `@task` execution in a detailed and nicely formatted log.\n\n\u003e Note: Observe how the `print` function is called and displayed in the `log.html` format.\n\n## Summary\n\nYou executed a web scraper task, congratulations!\n\n- How to set up a project and its dependencies\n- How to define a task\n- Use the `browser` library provided by `robocorp`\n- Navigate to a new web page \u0026 wait for the load state\n- Resolve intermediate steps before getting to the target content\n- Ignore issues if elements are non existent\n- Detect valid selectors by using the  `Robocorp Web Inspector` from [Robocorp Code VSCode Extension](https://marketplace.visualstudio.com/items?itemName=robocorp.robocorp-code)\n- Wait and assert if the elements actually exist on the web page\n- Scrape the values of the targeted elements\n- Pretty print them to the screen output\n- Save all to a CSV file\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobocorp%2Fexample-simple-web-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobocorp%2Fexample-simple-web-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobocorp%2Fexample-simple-web-scraper/lists"}