https://github.com/jharemza/workday_scraper
Python script for scraping Workday-based job listings and logging results to Notion.
https://github.com/jharemza/workday_scraper
api automation job-search notion python scraping
Last synced: 2 months ago
JSON representation
Python script for scraping Workday-based job listings and logging results to Notion.
- Host: GitHub
- URL: https://github.com/jharemza/workday_scraper
- Owner: jharemza
- License: mit
- Created: 2025-04-26T12:35:18.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-07-22T16:31:20.000Z (3 months ago)
- Last Synced: 2025-07-22T18:12:16.077Z (3 months ago)
- Topics: api, automation, job-search, notion, python, scraping
- Language: Python
- Homepage:
- Size: 76.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ง Workday Job Scraper to Notion
This project automates the collection of job postings from Workday-powered career portals and uploads them to a Notion database for centralized tracking and analysis.
> โ๏ธ Originally developed to scrape and track M&T Bank job listings, the scraper now supports **multi-institutional configuration** via YAML.
---
## ๐ Features
- ๐ Supports multiple institutions with distinct Workday URLs
- ๐ Filters jobs by location and keyword (`searchText`)
- ๐ฅ Uploads job metadata to a Notion database
- ๐ Avoids duplicates by checking existing Req IDs
- ๐ Parses job descriptions into Notion-friendly blocks
- ๐พ Stores raw job data in structured JSON files
- ๐งฉ Modular design with clear separation of concerns
- ๐ Progress bars and log files with per-institution labeling---
## ๐๏ธ Folder Structure
```bash
.
โโโ config/ # Institution config in YAML
โ โโโ institutions.yaml
โโโ json_output/ # Raw JSON responses (ignored in git)
โโโ docs/ # Architecture & version docs
โ โโโ v0.4.0_architecture.md
โโโ scraper.py # Main entry point
โโโ institution_runner.py # Runs scraping + upload per institution
โโโ notion_client.py # Notion API interface
โโโ job_parser.py # HTML to Notion block conversion
โโโ config_loader.py # Loads YAML config
โโโ .env # Notion token & DB IDs (not committed)
```---
## โ๏ธ Setup
### 1. Clone the Repo
```bash
git clone https://github.com/yourusername/workday_scraper.git
cd workday_scraper
```### 2. Install Dependencies
### โ Option A: Using Conda (Recommended)
Create and activate the environment:
```bash
conda env create -f environment.yml
conda activate workday_scraper
```> This installs all required packages in an isolated environment named workday_scraper.
### ๐งช Option B: Using pip (Alternative)
If you're not using Conda, manually install the required packages:
```bash
pip install requests python-dotenv tqdm beautifulsoup4 html2text PyYAML
```Note: This project does not include a requirements.txt file by default.
If youโd like to generate one from an active environment:```bash
pip freeze > requirements.txt
```Then later you can reuse it with:
```bash
pip install -r requirements.txt
```### 3. Create a .env File
```env
NOTION_TOKEN=your_notion_secret_token
DATABASE_ID=your_notion_database_id
APPLIED_DATABASE_ID=your_applied_jobs_database_id
```### 4. Define Institutions
You may use the included `config/institutions.yaml` to work with those institutions.
Otherwise, edit `config/institutions.yaml` to define the Workday URLs and filters for each institution you wish to query using the sample format:
```yaml
institutions:
- name: ""
workday_url: ""
locations:
- "Remote, USA"
- "Walla Walla, WA"
- "Ding Dong, TX"
search_text: "sql"
```## ๐งช Usage
To run the scraper across all defined institutions:
```Bash
python scraper.py
```- Logs will be written to `scraper.log`
- JSON job data is saved to `json_output/`
- Notion is updated with new postings
## ๐งฑ Versioning
This project follows [Semantic Versioning](https://semver.org/).
Detailed architecture docs are versioned under `/docs`.
## โ Example Notion Properties
To work with this scraper, your Notion database should include the following properties:
- Properties to be Scraped
- `Company` (Title)
- `Position` (Rich Text)
- `Req ID` (Rich Text)
- `Job Posting URL` (URL)
- `Stage` (Status)
- `Base Pay Low` (Number)
- `Base Pay High` (Number)
- `Application Deadline` (Date)
- Propertios for Manual Update(s)
- `Due Date` (Date)
- `Resume` (Files & media)
- `Cover Letter` (Files & media)
- `Ready to Apply Date` (Date)
- `Applied Date` (Date)
- `HR Screen Date` (Date)
- `Interview Date` (Date)## ๐ License
MIT License. See [LICENSE](LICENSE) for details.
## ๐โโ๏ธ Contributing
Contributions welcome! Feel free to open issues or submit pull requests.
## ๐ฌ Contact
Maintained by [Jeremiah Haremza](https://github.com/jharemza).