Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gugarosa/jucesp_rpa
🤖 An RPA-based tool for extracting information over JUCESP.
https://github.com/gugarosa/jucesp_rpa
jucesp rpa selenium web-scrapping
Last synced: about 1 month ago
JSON representation
🤖 An RPA-based tool for extracting information over JUCESP.
- Host: GitHub
- URL: https://github.com/gugarosa/jucesp_rpa
- Owner: gugarosa
- License: gpl-3.0
- Created: 2020-10-20T00:56:08.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-10-05T12:18:28.000Z (over 2 years ago)
- Last Synced: 2024-10-18T07:39:49.134Z (4 months ago)
- Topics: jucesp, rpa, selenium, web-scrapping
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 7
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# JUCESP Robot Process Automation
*This repository holds all the necessary code to run the an automation robot that extracts company-related information at [JUCESP](https://www.jucesponline.sp.gov.br/BuscaAvancada.aspx).*
---
## Package Guidelines
### Installation
Install all the pre-needed requirements using:
```Python
pip install -r requirements.txt
```### Configuration File
Please copy `config.ini.example` to `config.ini` and fill out the 2Captcha API key.
---
## Usage
### Advanced Search
The first step is to perform the advanced search at JUCESP and extracts its HTML content. To accomplish such a step, one needs to use the following script:
```Python
python advanced_search.py -h
```*Note that `-h` invokes the script helper, which assists users in employing the appropriate parameters.*
### Parse Advanced Search
After conducting the search, one needs to parse the HTML into a CSV holding the companies' identifier and city. Please, use the following script to accomplish such a procedure:
```Python
python parse_advanced_search.py -h
```### Company Information
With the identifier of each company, it is possible to extract their information HTML, as dollows:
```Python
python company_info.py -h
```### Parse Company Information
Finally, all companies HTML will be dumped to `companies/` folder. One can use the following script to parse their information into a readable CSV:
```Python
python parse_company_info.py -h
```### Bash Script
Instead of invoking every script to conduct the automation, it is also possible to use the provided shell script, as follows:
```Bash
./pipeline.sh
```Such a script will conduct every step needed to accomplish the automation process. Furthermore, one can change any input argument that is defined in the script.
---
## Support
We know that we do our best, but it is inevitable to acknowledge that we make mistakes. If you ever need to report a bug, report a problem, talk to us, please do so! We will be available at our bests at this repository.
---