An open API service indexing awesome lists of open source software.

https://github.com/jwc20/waasuapi

A scraper API for fetching available jobs from YCombinator's workatastartup.com
https://github.com/jwc20/waasuapi

beautifulsoup jobsearch python scraper scraper-api selenium

Last synced: 6 months ago
JSON representation

A scraper API for fetching available jobs from YCombinator's workatastartup.com

Awesome Lists containing this project

README

          

# workatastartup-api

### Install

To install, run the following commands.

```
git@github.com:jwc20/workatastartup-api.git
cd workatastartup-api
```

### Usage

To use the api, first setup and activate a python virtual environment (venv).

```
python3 -m venv venv
. ./venv/bin/activate
```

Install the requirements.

```
pip3 install -r requirements.txt
```

You can see the example of the scraper in example.py.

### Log In

Set your username and password.

```
username = "example_username"
password = "example_password"
```

Please update the chromedriver path in `waasu/__init__.py`. you can check by using the 'which' command in mac and linux.

```python
chrome_driver_path = '/usr/local/bin/chromedriver-linux64/chromedriver' # linux

chrome_driver_path = '/opt/homebrew/bin/chromedriver' # mac (m1)
```

Once set, create a client and log in using the following method.

```
client = waasu.WorkAtAStartUp()
client.log_in(username=username, password=password)
```

### Scrape

To scrape for companies, you can add queries (keywords) and add additional filter parameters.

```
query = ["python", "javascript", "data", "typescript"]
client.get_companies(query=query, jobType="contract", role="eng", scroll_delay=10)

#=> [
# {
# 'about': 'Our mission is to take cars off the road by making it safe and easy to vanpool in the post Covid world.',
# 'company_url': 'magicbus.io',
# 'founders': ['Jason Kraft', 'Chris Upjohn'],
# 'jobs': [
# {
# 'details': 'Remote fulltime Visa Required 3+ years',
# 'job_name': 'Software Developer - DevOps',
# 'job_url': 'https://www.workatastartup.com/jobs/43098',
# },
# {
# 'details': 'Remote contract 6+ years',
# 'job_name': 'Software Developer - Frontend',
# 'job_url': 'https://www.workatastartup.com/jobs/16340',
# },
# ],
# 'location': 'Los Angeles, CA',
# 'name': 'MagicBus',
# 'size': '18 people',
# 'tech': "We're working on problems related to geospatial route optimization, machine learning, and natural language processing in addition to mobile web and app development.",
# 'waasu_url': 'https://www.workatastartup.com/companies/magicbus',
# },
# {
# 'about': "Tenjin manages mobile growth infrastructure for our clients by organizing, analyzing, and securing the rush of data generated by mobile devices and marketing channels. We're reshaping mobile marketing by breaking down data silos and building an integrated data platform to replace the detached services in use today.",
# 'company_url': 'tenjin.com',
# ...
# ]
```

### Optional Filters

```
hasEquity=("any", "true", "false") ("any" same as "false")
demographic=("any", "black-founders", "women-founders", "latinx-founders")
hasSalary=("any", "true", "false")
industry=("any", "B2B Software and Services", "Consumer", "Education", "Healthcare", "Financial Technology and Services", "Real Estate and Construction", "Industrials", "Government", "Unspecified")
interviewProcess=("any", "true", "false")
jobType=("any", "fulltime", "contract", "intern")
remote=("any", "only", "yes", "no")
sortBy=("keyword"(default), "recommended", "most_active", "created_desc", "company_created_desc", "size", "size_desc", "name", "name_desc")
usVisaNotRequired=("any", "true", "false")
query=query
companySize=("any", "seed", "small", "medium", "large")
expo=("any")
```

#### Roles Filters

For some roles, there are addtional role_type filters.

```
role=("any", "eng", "design", "product", "science", "sales", "marketing", "support", "recruiting", "operations", "finance", "legal")

# role_type
"eng" => role_type=("android", "be", "data_sci", "devops", "embedded", "eng_mgmt", "fe", "ios", "fs", "ml", "robotics", "qa", "electrical", "hw", "mechanical", "bio", "chemical")
"design" => role_type("web", "mobile", "ui_ux", "brand_graphic", "animation", "hardware", "user_research", "illustration", "ar_vr", "design_mgmt")
"science" => role_type("bio", "biotech", "chem", "genetics", "health", "immuno", "lab", "onc", "pharma", "process", "research")
```

### Dependencies

```
beautifulsoup4
lxml
requests
selenium
pprintpp
```

### See Also

- [hnjobs](https://hnjobs.emilburzo.com/)
- [Who Is Hiring?](https://kennytilton.github.io/whoishiring/)
- [hn_search](https://news.ycombinator.com/item?id=10313519)
- [O(n)](https://getoofn.com/)