{"id":15023680,"url":"https://github.com/janipalsamaki/web-scraper-robot","last_synced_at":"2026-02-16T15:34:22.226Z","repository":{"id":79481787,"uuid":"236444593","full_name":"janipalsamaki/web-scraper-robot","owner":"janipalsamaki","description":"Tutorial for creating a web scraper software robot using Robot Framework.","archived":false,"fork":false,"pushed_at":"2020-04-01T12:10:18.000Z","size":542,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-14T23:36:16.740Z","etag":null,"topics":["robot-framework","robotframework","tutorial"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/janipalsamaki.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-01-27T08:11:33.000Z","updated_at":"2025-04-07T07:50:26.000Z","dependencies_parsed_at":"2023-07-31T06:30:25.161Z","dependency_job_id":null,"html_url":"https://github.com/janipalsamaki/web-scraper-robot","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/janipalsamaki/web-scraper-robot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janipalsamaki%2Fweb-scraper-robot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janipalsamaki%2Fweb-scraper-robot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janipalsamaki%2Fweb-scraper-robot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janipalsamaki%2Fweb-scraper-robot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/janipalsamaki","download_url":"https://codeload.github.com/janipalsamaki/web-scraper-robot/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/janipalsamaki%2Fweb-scraper-robot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29511604,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-16T09:05:14.864Z","status":"ssl_error","status_checked_at":"2026-02-16T08:55:59.364Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["robot-framework","robotframework","tutorial"],"created_at":"2024-09-24T19:59:19.893Z","updated_at":"2026-02-16T15:34:22.200Z","avatar_url":"https://github.com/janipalsamaki.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"---\nid: web-scraper-robot-tutorial\ntitle: Web scraper robot tutorial\ndescription: Tutorial for creating a web scraper software robot using Robot Framework and RPA Framework.\n---\n\nTo learn some of the more advanced features of the Robot Framework, you are going to build a web scraper robot.\n\nWhen run, the robot will:\n\n- open a real web browser\n- collect the latest tweets by given Twitter user\n- create a file system directory by the name of the Twitter user\n- store the text content of each tweet in separate files in the directory\n- store a screenshot of each tweet in the directory\n\n![Tweet screenshot](tweet.png)\n\n## Prerequisites\n\n\u003e To complete this tutorial, you need a working [Python](https://www.python.org/) (version 3) installation. On macOS / Linux, you can open the terminal and try running `python3 --version` to check if you have the required Python installed. On Windows, you can open the command prompt and try running `py --version` to check if you have the required Python installed.\n\n## Create a directory for your software robot projects\n\nCreate a directory for your software robot projects. If you already have an existing directory for your projects, you can use that.\n\n## Set up a virtual Python environment\n\nNavigate to your projects directory in the terminal or the command prompt. Set up a virtual Python environment by running the following command:\n\nWindows:\n\n```\npy -m venv venv\n```\n\nmacOS / Linux:\n\n```bash\npython3 -m venv venv\n```\n\nActivate the Python virtual environment:\n\nWindows:\n\n```\nvenv\\Scripts\\activate\n```\n\nmacOS / Linux:\n\n```bash\n. venv/bin/activate\n```\n\n## Install Robocode CLI\n\n```bash\npip install robocode\n```\n\n## Initialize the software robot directory\n\n```bash\nrobo init web-scraper-robot\n```\n\nNavigate to the directory:\n\n```bash\ncd web-scraper-robot\n```\n\n## Install RPA Framework\n\n```bash\npip install rpa-framework\n```\n\n## Robot task file\n\nPaste the following Robot Framework code in the `tasks/robot.robot` file:\n\n```robot\n*** Settings ***\nDocumentation   Web scraper robot. Stores tweets.\nResource        keywords.robot\nVariables       variables.py\n\n*** Tasks ***\nStore the latest tweets by given user name\n    Store the latest ${NUMBER_OF_TWEETS} tweets by user name \"${USER_NAME}\"\n```\n\n## Robot keywords file\n\nPaste the following Robot Framework code in the `resources/keywords.robot` file:\n\n```robot\n*** Settings ***\nLibrary     OperatingSystem\nLibrary     RPA.Browser\n\n*** Keywords ***\nStore the latest ${number_of_tweets} tweets by user name \"${user_name}\"\n    Open Twitter homepage   ${user_name}\n    Store tweets            ${user_name}    ${number_of_tweets}\n    [Teardown]              Close Browser\n\nOpen Twitter homepage\n    [Arguments]             ${user_name}\n    Open Available Browser  ${TWITTER_URL}/${user_name}\n\nStore tweets\n    [Arguments]                     ${user_name}            ${number_of_tweets}\n    ${tweets_locator}=              Get tweets locator      ${user_name}\n    Wait Until Element Is Visible   ${tweets_locator}\n    @{tweets}=                      Get WebElements         ${tweets_locator}\n    ${tweet_directory}=             Get tweet directory     ${user_name}\n    Create Directory                ${tweet_directory}\n    ${index}=                       Set Variable            1\n\n    FOR     ${tweet}  IN  @{tweets}\n        Exit For Loop If            ${index} \u003e ${number_of_tweets}\n        ${screenshot_file}=         Set Variable    ${tweet_directory}/tweet-${index}.png\n        ${text_file}=               Set Variable    ${tweet_directory}/tweet-${index}.txt\n        ${text}=                    Set Variable    ${tweet.find_element_by_xpath(\".//div[@lang='en']\").text}\n        Capture Element Screenshot  ${tweet}        ${screenshot_file}\n        Create File                 ${text_file}    ${text}\n        ${index}=                   Evaluate        ${index} + 1\n    END\n\nGet tweets locator\n    [Arguments]     ${user_name}\n    [Return]        xpath://article[descendant::span[contains(text(), \"\\@${user_name}\")]]\n\nGet tweet directory\n    [Arguments]     ${user_name}\n    [Return]        ${CURDIR}/../output/tweets/${user_name}\n```\n\n## Variables file\n\nPaste the following Python code in the `variables/variables.py` file:\n\n```py\nNUMBER_OF_TWEETS = 3\nTWITTER_URL = \"https://twitter.com\"\nUSER_NAME = \"robotframework\"\n```\n\n## Wrap the robot\n\n```bash\nrobo wrap --force\n```\n\n## Run the robot\n\nWindows:\n\n```\nrobo run entrypoint.cmd\n```\n\nmacOS / Linux:\n\n```bash\nrobo run entrypoint.sh\n```\n\nThe robot should have created a directory `temp/robocode/web-scraper-robot/output/tweets/robotframework` containing images (screenshots of the tweets) and text files (the texts of the tweets).\n\n## Robot script explained\n\n### `web-scraper-robot.robot`\n\n```robot\n*** Settings ***\nDocumentation   Web scraper robot. Stores tweets.\nResource        keywords.robot\nVariables       variables.py\n\n*** Tasks ***\nStore the latest tweets by given user name\n    Store the latest ${NUMBER_OF_TWEETS} tweets by user name \"${USER_NAME}\"\n```\n\nThe main robot file (`.robot`) contains the task(s) your robot is going to complete when run.\n\n`Settings` section provides short documentation (`Documentation`) for the script.\n\n`Resource` is used to import a _resource file_. The resource file typically contains the keywords for the robot.\n\n`Variables` is used to import _variables_. The convention is to define the variables using Python (`.py` files).\n\n`Tasks` section defines the tasks for the robot.\n\n`Store the latest tweets by given user name` is the name of the task.\n\n`Store the latest ${NUMBER_OF_TWEETS} tweets by user name \"${USER_NAME}\"` is a keyword call. The keyword is imported from the `keywords.robot` file where it is implemented.\n\n`${NUMBER_OF_TWEETS}` and `${USER_NAME}` are references to variables defined in the `variables.py` file.\n\n### `keywords.robot`\n\n#### Settings section\n\n```robot\n*** Settings ***\nLibrary     OperatingSystem\nLibrary     RPA.Browser\n```\n\n`Settings` section imports two _libraries_ using `Library`.\n\nLibraries typically contain Python code that accomplishes tasks, such as creating file system directories and files (`OperatingSystem`) and commanding a web browser (`RPA.Browser`).\n\nThe libraries provide _keywords_ that can be used in robot scripts.\n\n#### Keywords section\n\n```robot\n*** Keywords ***\nStore the latest ${number_of_tweets} tweets by user name \"${user_name}\"\n    Open Twitter homepage   ${user_name}\n    Store tweets            ${user_name}    ${number_of_tweets}\n    [Teardown]              Close Browser\n```\n\n`Keywords` section defines the keywords for the robot.\n\n`Store the latest ${number_of_tweets} tweets by user name \"${user_name}\"` is a keyword that takes two _arguments_: `${number_of_tweets}` and `${user_name}`.\n\nThe keyword is called in the main robot file (`.robot`), providing values for the arguments.\n\nIn this case, the default value for the number of tweets is `3`, and the default value for the user name is `robotframework` (see `variables.py`). With those values, the keyword implementation might look like this after Robot Framework has parsed the provided values:\n\n```robot\nStore the latest 3 tweets by user name \"robotframework\"\n    Open Twitter homepage   robotframework\n    Store tweets            robotframework  3\n    [Teardown]              Close Browser\n```\n\nKeywords can call other keywords.\n\n`Open Twitter homepage` is another keyword. It takes one argument: `${user_name}`.\n\n`Store tweets` keyword takes two arguments: `${user_name}` and `${number_of_tweets}`.\n\n`[Teardown]` tells Robot Framework to run the given keyword (`Close Browser`) always as the last step. `[Teardown]` will always run, even if the steps before it would fail for any reason.\n\n```robot\nOpen Twitter homepage\n    [Arguments]             ${user_name}\n    Open Available Browser  ${TWITTER_URL}/${user_name}\n```\n\n`Open Twitter homepage` is one of your keywords. It is not provided by any external library. You can define as many keywords as you need. Your keywords can call other keywords, both your own and keywords provided by libraries.\n\n```robot\n    [Arguments]     ${user_name}\n```\n\n`[Arguments]` line should be read from left to right. `[Arguments]` line tells Robot Framework the names of the arguments this keyword expects. In this case, there is one argument: `${user_name}`.\n\n```robot\n    Open Available Browser  ${TWITTER_URL}/${user_name}\n```\n\n`Open Available Browser` is a keyword provided by the `RPA.Browser` library. In this case, you call it with one argument: the URL (`https://twitter.com/robotframework`).\n\nThe arguments here reference both a variable (`${TWITTER_URL}`, defined in `variables.py`) and an argument (`${user_name}`, provided when calling your keyword).\n\n```robot\nStore tweets\n    [Arguments]                     ${user_name}            ${number_of_tweets}\n    ${tweets_locator}=              Get tweets locator      ${user_name}\n    Wait Until Element Is Visible   ${tweets_locator}\n    @{tweets}=                      Get WebElements         ${tweets_locator}\n    ${tweet_directory}=             Get tweet directory     ${user_name}\n    Create Directory                ${tweet_directory}\n    ${index}=                       Set Variable            1\n\n    FOR     ${tweet}  IN  @{tweets}\n        Exit For Loop If            ${index} \u003e ${number_of_tweets}\n        ${screenshot_file}=         Set Variable    ${tweet_directory}/tweet-${index}.png\n        ${text_file}=               Set Variable    ${tweet_directory}/tweet-${index}.txt\n        ${text}=                    Set Variable    ${tweet.find_element_by_xpath(\".//div[@lang='en']\").text}\n        Capture Element Screenshot  ${tweet}        ${screenshot_file}\n        Create File                 ${text_file}    ${text}\n        ${index}=                   Evaluate        ${index} + 1\n    END\n```\n\n`Store tweets` keyword contains the steps for collecting and storing a screenshot and the text of each tweet.\n\nThis keyword could also be provided by a library. Libraries are typically used when the implementation might be complex and would be difficult to implement using Robot Framework syntax. Using ready-made libraries is recommended to avoid unnecessary time spent on implementing your own solution if a ready-made solution exists.\n\n[RPA Framework](https://pypi.org/project/rpa-framework/) provides many open-source libraries for typical RPA (Robotic Process Automation) use cases.\n\nIn this example, Robot Framework syntax is used as an example of what kind of \"programming\" logic is possible with Robot Framework syntax.\n\nMore complex business logic is better implemented by taking advantage of libraries (such as `OperatingSystem` and `RPA.Browser` or your own library).\n\n```robot\n[Arguments]                     ${user_name}            ${number_of_tweets}\n```\n\n`Store tweets` takes two arguments: `${user_name}` and `${number_of_tweets}`.\n\n```robot\n${tweets_locator}=              Get tweets locator      ${user_name}\n```\n\nA _locator_ (an instruction for the browser to find specific element(s)) is provided by the keyword `Get tweets locator` that takes one argument: `${user_name}`. The computed locator is stored in a _local variable_ `${tweets_locator}`. Having the assignment symbol (`=`) is not required, but including it is a recommended convention for communicating the intent of the assignment.\n\n![Inspecting the DOM to find tweet elements](inspecting-the-dom-to-find-tweet-elements.png)\n_Inspecting the DOM to find tweet elements_\n\nIn this case the returned locator is an [XPath](https://developer.mozilla.org/en-US/docs/Web/XPath) expression (`//article[descendant::span[contains(text(), \"@robotframework\")]]`) prefixed by `SeleniumLibrary` specific `xpath:` prefix.\n\n\u003e Tip: You can test XPath expressions in [Firefox](https://www.mozilla.org/en-US/firefox/new/) and in [Chrome](https://www.google.com/chrome/). Right-click on a web page and select `Inspect` or `Inspect Element` to open up the developer tools. Select the `Console` tab. In the console, type `$x('//div')` and hit Enter. The console will display the matched elements (in this case, all the `div` elements). Experiment with your query until it works. You can use the query with `SeleniumLibrary` as an element locator by prefixing the query with `xpath:`.\n\n![Locating elements in browser console with XPath](locating-elements-in-browser-console-with-xpath.png)\n_Locating elements in browser console with XPath_\n\n```robot\nWait Until Element Is Visible   ${tweets_locator}\n```\n\n`Wait Until Element Is Visible` is a keyword provided by the `RPA.Browser` library. It takes a _locator_ as an argument and waits for the element to be visible or until timeout (five seconds by default).\n\n```robot\n@{tweets}=                      Get WebElements         ${tweets_locator}\n```\n\n`Get WebElements` keyword (`RPA.Browser`) is used to find and return elements matching the given locator argument (`${tweets_locator}`). The elements are stored in a local _list variable_, `@{tweets}`. List variables start with `@` instead of `$`.\n\n```robot\n${tweet_directory}=             Get tweet directory     ${user_name}\n```\n\n`Get tweet directory` keyword returns a directory path based on the given `${user_name}` argument. The path is stored in a local `${tweet_directory}` variable.\n\n```robot\nCreate Directory                ${tweet_directory}\n```\n\n`Create Directory` keyword is provided by the `OperatingSystem` library. It creates a file system directory based on the given path argument (`${tweet_directory}`).\n\n```robot\n${index}=                       Set Variable            1\n```\n\n`Set Variable` keyword is used to assign raw values to variables. In this case, a local variable `${index}` is created with the value of `1`. This variable keeps track of the loop index in order to create unique names for the stored files.\n\n```robot\n    FOR     ${tweet}  IN  @{tweets}\n        ...\n    END\n```\n\nRobot Framework supports loops using the `FOR` syntax. The found tweet elements are looped, and a set of steps is executed for each tweet.\n\n```robot\nExit For Loop If            ${index} \u003e ${number_of_tweets}\n```\n\n`Exit For Loop If` keyword is used to terminate the loop when the given condition returns `True`. In this case, the loop is terminated when the given amount of tweets have been processed.\n\n`Capture Element Screenshot` and `Create File` keywords are used to take a screenshot of each element and to create a text file containing the element text.\n\n```robot\n${index}=                   Evaluate        ${index} + 1\n```\n\nThe previously initialized `${index}` variable is incremented by one at the end of each loop iteration using the `Evaluate` keyword. `Evaluate` takes an _expression_ as an argument and returns the evaluated value.\n\n```robot\nGet tweets locator\n    [Arguments]     ${user_name}\n    [Return]        xpath://article[descendant::span[contains(text(), \"\\@${user_name}\")]]\n```\n\n`Get tweets locator` keyword returns an element locator based on the given `${user_name}` argument. Robot Framework uses `[Return]` syntax for returning values.\n\n```robot\nGet tweet directory\n    [Arguments]     ${user_name}\n    [Return]        ${CURDIR}/../output/tweets/${user_name}\n```\n\n`Get tweet directory` keyword implementation uses one of the _prefined variables_ in Robot Framework. `${CURDIR}` returns the current working directory.\n\n### `variables.py`\n\n```py\nNUMBER_OF_TWEETS = 3\nTWITTER_URL = \"https://twitter.com\"\nUSER_NAME = \"robotframework\"\n```\n\nVariables are defined using Python by convention.\n\n## Summary\n\nYou executed a web scraper robot, congratulations!\n\nDuring the process, you learned some concepts and features of the Robot Framework and some good practices:\n\n- Defining `Settings` for your script (`*** Settings ***`)\n- Documenting scripts (`Documentation`)\n- Importing libraries (`OperatingSystem, RPA.Browser`)\n- Using keywords provided by libraries (`Open Available Browser`)\n- Splitting robot script to multiple files (`*.py`, `*.robot`)\n- Creating your own keywords\n- Defining arguments (`[Arguments]`)\n- Calling keywords with arguments\n- Returning values from keywords (`[Return]`)\n- Using predefined variables (`${CURDIR}`)\n- Using your own variables\n- Creating loops with Robot Framework syntax\n- Running teardown steps (`[Teardown]`)\n- Opening a real browser\n- Navigating to web pages\n- Locating web elements\n- Building and testing locators (`$x('//div')`)\n- Scraping text from web elements\n- Taking screenshots of web elements\n- Creating file system directories\n- Creating and writing to files\n- Installing Robocode CLI (`pip install robocode`)\n- Installing RPA Framework (`pip install rpa-framework`)\n- Creating an executable package (`robo wrap`)\n- Running robot files (`robo run entrypoint.sh`, `robo run entrypoint.cmd`)\n- Organizing your project files in subdirectories (`robo init web-scraper-robot`)\n- Using a Python virtual environment (`venv`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanipalsamaki%2Fweb-scraper-robot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjanipalsamaki%2Fweb-scraper-robot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjanipalsamaki%2Fweb-scraper-robot/lists"}