Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jwc20/waasuapi
A scraper API for fetching available jobs from YCombinator's workatastartup.com
https://github.com/jwc20/waasuapi
beautifulsoup jobsearch python scraper scraper-api selenium
Last synced: 12 days ago
JSON representation
A scraper API for fetching available jobs from YCombinator's workatastartup.com
- Host: GitHub
- URL: https://github.com/jwc20/waasuapi
- Owner: jwc20
- License: mit
- Created: 2022-08-31T22:31:31.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-26T13:54:21.000Z (5 months ago)
- Last Synced: 2025-01-20T12:48:32.339Z (17 days ago)
- Topics: beautifulsoup, jobsearch, python, scraper, scraper-api, selenium
- Language: Python
- Homepage:
- Size: 470 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# workatastartup-api
### Install
To install, run the following commands.
```
[email protected]:jwc20/workatastartup-api.git
cd workatastartup-api
```### Usage
To use the api, first setup and activate a python virtual environment (venv).
```
python3 -m venv venv
. ./venv/bin/activate
```Install the requirements.
```
pip3 install -r requirements.txt
```You can see the example of the scraper in example.py.
### Log In
Set your username and password.
```
username = "example_username"
password = "example_password"
```Please update the chromedriver path in `waasu/__init__.py`. you can check by using the 'which' command in mac and linux.
```python
chrome_driver_path = '/usr/local/bin/chromedriver-linux64/chromedriver' # linuxchrome_driver_path = '/opt/homebrew/bin/chromedriver' # mac (m1)
```Once set, create a client and log in using the following method.
```
client = waasu.WorkAtAStartUp()
client.log_in(username=username, password=password)
```### Scrape
To scrape for companies, you can add queries (keywords) and add additional filter parameters.
```
query = ["python", "javascript", "data", "typescript"]
client.get_companies(query=query, jobType="contract", role="eng", scroll_delay=10)#=> [
# {
# 'about': 'Our mission is to take cars off the road by making it safe and easy to vanpool in the post Covid world.',
# 'company_url': 'magicbus.io',
# 'founders': ['Jason Kraft', 'Chris Upjohn'],
# 'jobs': [
# {
# 'details': 'Remote fulltime Visa Required 3+ years',
# 'job_name': 'Software Developer - DevOps',
# 'job_url': 'https://www.workatastartup.com/jobs/43098',
# },
# {
# 'details': 'Remote contract 6+ years',
# 'job_name': 'Software Developer - Frontend',
# 'job_url': 'https://www.workatastartup.com/jobs/16340',
# },
# ],
# 'location': 'Los Angeles, CA',
# 'name': 'MagicBus',
# 'size': '18 people',
# 'tech': "We're working on problems related to geospatial route optimization, machine learning, and natural language processing in addition to mobile web and app development.",
# 'waasu_url': 'https://www.workatastartup.com/companies/magicbus',
# },
# {
# 'about': "Tenjin manages mobile growth infrastructure for our clients by organizing, analyzing, and securing the rush of data generated by mobile devices and marketing channels. We're reshaping mobile marketing by breaking down data silos and building an integrated data platform to replace the detached services in use today.",
# 'company_url': 'tenjin.com',
# ...
# ]
```### Optional Filters
```
hasEquity=("any", "true", "false") ("any" same as "false")
demographic=("any", "black-founders", "women-founders", "latinx-founders")
hasSalary=("any", "true", "false")
industry=("any", "B2B Software and Services", "Consumer", "Education", "Healthcare", "Financial Technology and Services", "Real Estate and Construction", "Industrials", "Government", "Unspecified")
interviewProcess=("any", "true", "false")
jobType=("any", "fulltime", "contract", "intern")
remote=("any", "only", "yes", "no")
sortBy=("keyword"(default), "recommended", "most_active", "created_desc", "company_created_desc", "size", "size_desc", "name", "name_desc")
usVisaNotRequired=("any", "true", "false")
query=query
companySize=("any", "seed", "small", "medium", "large")
expo=("any")
```#### Roles Filters
For some roles, there are addtional role_type filters.
```
role=("any", "eng", "design", "product", "science", "sales", "marketing", "support", "recruiting", "operations", "finance", "legal")# role_type
"eng" => role_type=("android", "be", "data_sci", "devops", "embedded", "eng_mgmt", "fe", "ios", "fs", "ml", "robotics", "qa", "electrical", "hw", "mechanical", "bio", "chemical")
"design" => role_type("web", "mobile", "ui_ux", "brand_graphic", "animation", "hardware", "user_research", "illustration", "ar_vr", "design_mgmt")
"science" => role_type("bio", "biotech", "chem", "genetics", "health", "immuno", "lab", "onc", "pharma", "process", "research")
```### Dependencies
```
beautifulsoup4
lxml
requests
selenium
pprintpp
```### See Also
- [hnjobs](https://hnjobs.emilburzo.com/)
- [Who Is Hiring?](https://kennytilton.github.io/whoishiring/)
- [hn_search](https://news.ycombinator.com/item?id=10313519)
- [O(n)](https://getoofn.com/)