https://github.com/oxylabs/how-to-scrape-indeed

A tutorial for collecting job postings from Indeed using Python and Oxylabs Web Scraper API.
https://github.com/oxylabs/how-to-scrape-indeed

api job-posting python scraper-api web-scraper web-scraping

Last synced: over 1 year ago
JSON representation

A tutorial for collecting job postings from Indeed using Python and Oxylabs Web Scraper API.

Host: GitHub
URL: https://github.com/oxylabs/how-to-scrape-indeed
Owner: oxylabs
Created: 2024-01-19T08:50:56.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2025-02-11T13:17:02.000Z (over 1 year ago)
Last Synced: 2025-03-29T13:10:00.163Z (over 1 year ago)
Topics: api, job-posting, python, scraper-api, web-scraper, web-scraping
Language: Python
Homepage: https://oxylabs.io/products/scraper-api/web
Size: 49.8 KB
Stars: 151
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

          # How to Scrape Indeed

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=877&url_id=112)

Here's the process of extracting job postings from [Indeed](https://www.indeed.com/) with the help of Oxylabs [Web Scraper API](https://oxylabs.io/products/scraper-api/web) (**1-week free trial**) and Python.

For the complete guide with in-depth explanations and visuals, check our [blog post](https://oxylabs.io/blog/how-to-scrape-indeed).

## Project setup

### Creating a virtual environment

```python

python -m venv indeed_env #Windows

python3 -m venv indeed_env #Macand Linux

```

### Activating the virtual environment

```python

.\indeed_env\Scripts\Activate#Windows

source indeed_env/bin/activate #Macand Linux

```

### Installing libraries

```python

$ pip install requests

```

## Overview of Web Scraper API

The following is an example that shows how Web Scraper API works.

```python

# scraper_api_demo.py

import requests

payload = {

    "source": "universal",

    "url": "https://www.indeed.com"

}

response = requests.post(

    url="https://realtime.oxylabs.io/v1/queries",

    json=payload,

    auth=(username,password),

)

print(response.json())

```

## Web Scraper API parameters

### Parsing the page title and retrieving results in JSON

```python

"title": {

    "_fns": [

                {

                    "_fn": "xpath_one",

                    "_args": ["//title/text()"]

                }

            ]

        }

},

```

If you send this as `parsing_instructions`, the output would be the following JSON.

```python

{ "title": "Job Search | Indeed", "parse_status_code": 12000 }

```

Note that the `parse_status_code` means a successful response.

The following code prints the title of the Indeed page.

```python

# indeed_title.py

import requests

payload = {

    "source": "universal",

    "url": "https://www.indeed.com",

    "parse": True,

    "parsing_instructions": {

        "title": {

            "\_fns": [

                        {

                            "\_fn": "xpath_one",

                            "\_args": [

                                "//title/text()"

                                ]

                        }

                    ]

                }

    },

}

response = requests.post(

    url="https://realtime.oxylabs.io/v1/queries",

    json=payload,

    auth=('username', 'password'),

)

print(response.json()['results'][0]['content'])

```

## Scraping Indeed job postings

### Selecting a job listing

```python

`.job_seen_beacon`

```

### Creating the placeholder for a job listing

```

"job_listings": {

    "_fns": [

        {

            "_fn": "css",

            "_args": [".job_seen_beacon"]

        }

    ],

    "_items": {

        "job_title": {

            "_fns": [

                {

                "_fn": "xpath_one",

                "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"]

                }

            ]

        },

        "company_name": {

            "_fns": [

                {

                    "_fn": "xpath_one",

                    "_args": [".//span[@data-testid='company-name']/text()"]

                }

            ]

        },

```

### Adding other selectors

```json

{

  "source": "universal",

  "url": "https://www.indeed.com/jobs?q=work+from+home&l=San+Francisco%2C+CA",

  "parse": true,

  "parsing_instructions": {

    "job_listings": {

      "_fns": [

        {

          "_fn": "css",

          "_args": [".job_seen_beacon"]

        }

      ],

      "_items": {

        "job_title": {

          "_fns": [

            {

              "_fn": "xpath_one",

              "_args": [".//h2[contains(@class,'jobTitle')]/a/span/text()"]

            }

          ]

        },

        "company_name": {

          "_fns": [

            {

              "_fn": "xpath_one",

              "_args": [".//span[@data-testid='company-name']/text()"]

            }

          ]

        }

      }

    }

  }

}

```

For other data points, see the file [here](src/job_search_payload.json).

### Saving the payload as a separator JSON file

```python

# parse_jobs.py

import requests

import json

payload = {}

with open("job_search_payload.json") as f:

    payload = json.load(f)

response = requests.post(

    url="https://realtime.oxylabs.io/v1/queries",

    json=payload,

    auth=("username", "password"),

)

print(response.status_code)

with open("result.json", "w") as f:

    json.dump(response.json(), f, indent=4)

```

## Exporting to JSON and CSV

```python

# parse_jobs.py

with open("results.json", "w") as f:

    json.dump(data, f, indent=4)

df = pd.DataFrame(data["results"][0]["content"]["job_listings"])

df.to_csv("job_search_results.csv", index=False)

```

## Final word

Check our [documentation](https://developers.oxylabs.io/scraper-apis/web-scraper-api) for more API parameters and variables found in this tutorial.

If you have any questions, feel free to contact us at support@oxylabs.io.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oxylabs/how-to-scrape-indeed

Awesome Lists containing this project

README