https://github.com/robocorp/example-html-table-robot
This robot demonstrates how to work with HTML tables using Beautiful Soup and RPA Framework.
https://github.com/robocorp/example-html-table-robot
Last synced: about 1 year ago
JSON representation
This robot demonstrates how to work with HTML tables using Beautiful Soup and RPA Framework.
- Host: GitHub
- URL: https://github.com/robocorp/example-html-table-robot
- Owner: robocorp
- License: apache-2.0
- Created: 2021-02-15T10:46:24.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-01-03T11:54:01.000Z (over 2 years ago)
- Last Synced: 2025-04-27T19:46:19.106Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 27.3 KB
- Stars: 8
- Watchers: 16
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Working with HTML tables
This robot demonstrates how to work with HTML tables.
## The example HTML table
We use the table at https://www.w3schools.com/html/html_tables.asp as an example:
```html
Company
Contact
Country
Alfreds Futterkiste
Maria Anders
Germany
Centro comercial Moctezuma
Francisco Chang
Mexico
Ernst Handel
Roland Mendel
Austria
Island Trading
Helen Bennett
UK
Laughing Bacchus Winecellars
Yoshi Tannamuri
Canada
Magazzini Alimentari Riuniti
Giovanni Rovelli
Italy
```
## The HTML parser library: Beautiful Soup
The robot uses the `beautifulsoup4` and `robocorp` dependencies in the `conda.yaml` configuration file.
> [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
> We use a python dataclass to store the table data, for more complex examples use [Pandas](https://pypi.org/project/pandas/).
## The HTML table custom parser library
> HTML tables come in many shapes and forms. This example uses a well-formatted and straightforward table. More complex tables might require more effort to parse. Still, the idea is the same: Read and parse the HTML. Return a generic data structure that is easy to work with.
The `get_html_table` function returns the example HTML table markup from https://www.w3schools.com/html/html_tables.asp.
The `read_table_from_html` is provided by the `html_tables.py` library. It parses and returns the given HTML table as a `Table` structure.