Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/joeri-abbo/python-credly-scraper
This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an
https://github.com/joeri-abbo/python-credly-scraper
badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling
Last synced: 3 days ago
JSON representation
This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an
- Host: GitHub
- URL: https://github.com/joeri-abbo/python-credly-scraper
- Owner: Joeri-Abbo
- Created: 2023-03-31T23:02:54.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-04-10T13:38:17.000Z (7 months ago)
- Last Synced: 2024-04-10T15:31:56.763Z (7 months ago)
- Topics: badges, crawler, credly, data-extraction, json, organizations, python, python3, requests-library, skills, web-crawling
- Language: Python
- Homepage:
- Size: 63.6 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project: Credly Crawler (WIP)
This project consists of Python scripts designed to crawl and extract data from the Credly platform. The main components
of the project are:1. crawl-by-arg.py
2. crawl-by-search-terms.py
3. crawl-by-skills.py
4. get-badges.py
5. helper.py## Requirements
Python 3.x
requests library
Install the requirements using the following command:```bash
pip install requests
```## Usage
### 1. crawl-by-arg.py
This script crawls the Credly platform using a single search term passed as a command-line argument.
Usage:
```bash
python crawl-by-arg.py
```### 2. crawl-by-search-terms.py
This script crawls the Credly platform using a list of search terms specified in the `data/search-terms.json` file.
#### Usage:
```bash
python crawl-by-search-terms.py
```### 3. crawl-by-skills.py
This script crawls the Credly platform using a list of skills that are retrieved from the data/skills.json file.
#### Usage:
```bash
python crawl-by-skills.py
```### 4. get-badges.py
This script retrieves all badges for each organization specified in the data/organizations.json file. The badges are
then saved to the data/badges.json file.#### Usage:
```bash
python get-badges.py
```### 5. helper.py
This script contains helper functions used by the other scripts in this project. Functions include:
- get_skills_file()
- get_organizations_file()
- get_badges_file()
- get_search_terms_file()
- get_items_by_search_term(search_term)
- search_terms()
- get_items_from_file(file_name)
- set_items_from_file(file_name, items)
- crawl_search_terms(terms)## Notes
Before running the scripts, make sure to create the necessary data files in the data directory:
- skills.json
- organizations.json
- badges.json
- search-terms.jsonEach of these files should contain an empty JSON object {} if there is no initial data.