{"id":20710098,"url":"https://github.com/oxylabs/how-to-scrape-imdb","last_synced_at":"2026-04-02T02:02:43.475Z","repository":{"id":234832028,"uuid":"744482831","full_name":"oxylabs/how-to-scrape-imdb","owner":"oxylabs","description":"A quick guide to effortlessly scraping IMDb data, such as public movie lists and reviews.","archived":false,"fork":false,"pushed_at":"2025-06-26T08:23:12.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-10T01:19:27.921Z","etag":null,"topics":["imdb-api","imdb-information","imdb-movies","imdb-rating","imdb-scraper","imdb-webscrapping","python-imdb-crawler","python-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"https://oxylabs.io/blog/how-to-scrape-imdb","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-17T11:47:52.000Z","updated_at":"2025-06-26T08:23:15.000Z","dependencies_parsed_at":"2025-06-30T00:07:29.685Z","dependency_job_id":"4904bddf-1c94-43fa-8cda-988b2aa6393f","html_url":"https://github.com/oxylabs/how-to-scrape-imdb","commit_stats":null,"previous_names":["oxylabs/how-to-scrape-imdb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oxylabs/how-to-scrape-imdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fhow-to-scrape-imdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fhow-to-scrape-imdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fhow-to-scrape-imdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fhow-to-scrape-imdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/how-to-scrape-imdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fhow-to-scrape-imdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31294379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T01:43:37.129Z","status":"online","status_checked_at":"2026-04-02T02:00:08.535Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["imdb-api","imdb-information","imdb-movies","imdb-rating","imdb-scraper","imdb-webscrapping","python-imdb-crawler","python-scraper","web-scraping"],"created_at":"2024-11-17T02:09:48.101Z","updated_at":"2026-04-02T02:02:43.464Z","avatar_url":"https://github.com/oxylabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# How to Scrape IMDb Data: Step-by-Step Guide\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877\u0026utm_medium=affiliate\u0026groupid=877\u0026utm_content=how-to-scrape-imdb-github\u0026transaction_id=102f49063ab94276ae8f116d224b67)\n\n[![](https://dcbadge.limes.pink/api/server/Pds3gBmKMH?style=for-the-badge\u0026theme=discord)](https://discord.gg/Pds3gBmKMH) [![YouTube](https://img.shields.io/badge/YouTube-Oxylabs-red?style=for-the-badge\u0026logo=youtube\u0026logoColor=white)](https://www.youtube.com/@oxylabs)\n\nLearn how to easily scrape public IMDb data for your projects using Python and Oxylabs [IMDb Scraper API](https://oxylabs.io/products/scraper-api/web/imdb). Get your **1-week free trial** by registering on the [dashboard](https://dashboard.oxylabs.io/en/) and code your way to success.\n\nSee the complete [blog post](https://oxylabs.io/blog/how-to-scrape-imdb) for an in-depth tutorial with images.\n\n- [1. Setting up for scraping IMDb](#1-setting-up-for-scraping-imdb)\n  * [Creating a virtual environment](#creating-a-virtual-environment)\n  * [Activating the virtual environment](#activating-the-virtual-environment)\n  * [Installing required libraries](#installing-required-libraries)\n- [2. Overview of IMDb Scraper API](#2-overview-of-imdb-scraper-api)\n  * [Scraping a title](#scraping-a-title)\n- [3. Scraping movie info from a list](#3-scraping-movie-info-from-a-list)\n- [4. Scraping movie reviews](#4-scraping-movie-reviews)\n- [5. Exporting to JSON and CSV](#5-exporting-to-json-and-csv)\n\n## 1. Setting up for scraping IMDb\nAs you’ll be writing a Python script, make sure you have [Python](https://www.python.org/downloads/) 3.8 or newer installed on your machine.\n\n### Creating a virtual environment\n```bash\npython -m venv imdb_env #Windows\npython3 -m venv imdb_env #Mac and Linux\n```\nReplace `imdb_env` with the name you'd like to give to your virtual environment.\n\n### Activating the virtual environment\n```bash\n.\\imdb_env\\Scripts\\Activate #Windows\nsource imdb_env/bin/activate #Mac and Linux\n```\n\n### Installing required libraries\nWe'll use the `requests` library for this project to make HTTP requests. Install it by running the following command:\n\n```bash\n$ pip install requests pandas\n```\n\n## 2. Overview of IMDb Scraper API\n\nOxylabs' [IMDb Scraper API](https://oxylabs.io/products/scraper-api/web/imdb) allows you to extract data from many complex websites easily. Claim your **1-week free trial** by registering on the [Oxylabs dashboard](https://dashboard.oxylabs.io/en/) and follow along. Below you can see a basic example that shows how Scraper API works:\n\n```python\n# scraper_api_demo.py\nimport requests\n\nUSERNAME = \"username\"\nPASSWORD = \"password\"\n\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://www.imdb.com\"\n}\n\nresponse = requests.post(\n    url=\"https://realtime.oxylabs.io/v1/queries\",\n    json=payload,\n    auth=(USERNAME,PASSWORD),\n)\n\nprint(response.json())\n```\n\n### Scraping a title\nThe following code prints the title of the IMDb page:\n\n```python\n# imdb_title.py\nimport requests\n\nUSERNAME = \"username\"\nPASSWORD = \"password\"\n\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://www.imdb.com\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"title\": {\n            \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \"//title/text()\"\n                                ]\n                        }\n                    ]\n                }\n    },\n}\n\n\nresponse = requests.post(\n    url=\"https://realtime.oxylabs.io/v1/queries\",\n    json=payload,\n    auth=(USERNAME,PASSWORD),\n)\n\n\nprint(response.json()['results'][0]['content'])\n```\n\nLearn more about the Custom Parser feature [here](https://developers.oxylabs.io/scraper-apis/custom-parser).\n\n## 3. Scraping movie info from a list\nBefore scraping a page, we need to examine the page structure. See the steps on our [blog post](https://oxylabs.io/blog/how-to-scrape-imdb#3.-scraping-movie-info-from-a-list). We'll target this [IMDb top 250](https://www.imdb.com/chart/top/?ref_=nv_mv_250) listing page:\n```python\n# dump_payload.py\nimport json\n\npayload = {\n    \"source\": \"universal\",\n    \"url\": \"https://www.imdb.com/chart/top/?ref_=nv_mv_250\",\n    \"parse\": True,\n    \"parsing_instructions\": {\n        \"movies\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"xpath\",\n                    \"_args\": [\n                        \"//li[contains(@class,'ipc-metadata-list-summary-item')]\"\n                    ]\n                }\n            ],\n            \"_items\": {\n                \"movie_name\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//h3/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"year\":{\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//*[contains(@class,'cli-title-metadata-item')]/text()\"\n                            ]\n                        }\n                    ]\n                },\n                \"rating\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"xpath_one\",\n                            \"_args\": [\n                                \".//*[contains(@aria-label,'IMDb rating')]/text()\"\n                            ]\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n\nwith open(\"top_250_payload.json\", 'w') as f:\n    json.dump(payload, f, indent=4)\n```\nA good way to organize your code is to save the payload as a separator JSON file. It will allow you to keep your Python file short:\n\n```python\n# parse_top_250.py\nimport requests\nimport json\n\nUSERNAME = \"username\"\nPASSWORD = \"password\"\n\npayload = {}\nwith open(\"top_250_payload.json\") as f:\n    payload = json.load(f)\n\n\nresponse = requests.post(\n    url=\"https://realtime.oxylabs.io/v1/queries\",\n    json=payload,\n    auth=(USERNAME, PASSWORD),\n)\n\n\nprint(response.status_code)\n\n\nwith open(\"result.json\", \"w\") as f:\n    json.dump(response.json(),f, indent=4)\n```\n## 4. Scraping movie reviews\nLet's scrape [movie reviews](https://www.imdb.com/title/tt0111161/reviews?ref_=tt_urv) of Shawshank Redemption:\n\n```json\n{\n    \"source\": \"universal\",\n    \"url\": \"https://www.imdb.com/title/tt0111161/reviews?ref_=tt_urv\",\n    \"parse\": true,\n    \"parsing_instructions\": {\n        \"movie_name\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"css_one\",\n                    \"_args\": [\n                        \".parent a\"\n                    ]\n                },\n                {\n                    \"_fn\": \"element_text\"\n                }\n            ]\n        },\n        \"reviews\": {\n            \"_fns\": [\n                {\n                    \"_fn\": \"css\",\n                    \"_args\": [\n                        \".imdb-user-review\"\n                    ]\n                }\n            ],\n            \"_items\": {\n                \"review_title\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"css_one\",\n                            \"_args\": [\n                                \".title\"\n                            ]\n                        },\n                        {\n                            \"_fn\": \"element_text\"\n                        }\n                    ]\n                },\n                \"review-body\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"css_one\",\n                            \"_args\": [\n                                \".content\u003e.show-more__control\"\n                            ]\n                        },\n                        {\n                            \"_fn\": \"element_text\"\n                        }\n                    ]\n                },\n                \"rating\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"css_one\",\n                            \"_args\": [\n                                \".rating-other-user-rating\"\n                            ]\n                        },\n                        {\n                            \"_fn\": \"element_text\"\n                        }\n                    ]\n                },\n                \"name\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"css_one\",\n                            \"_args\": [\n                                \".display-name-link a\"\n                            ]\n                        },\n                        {\n                            \"_fn\": \"element_text\"\n                        }\n                    ]\n                },\n                \"review_date\": {\n                    \"_fns\": [\n                        {\n                            \"_fn\": \"css_one\",\n                            \"_args\": [\n                                \".review-date\"\n                            ]\n                        },\n                        {\n                            \"_fn\": \"element_text\"\n                        }\n                    ]\n                }\n            }\n        }\n    }\n}\n```\nOnce your payload file is ready, you can use the same Python code file shown in the previous section, point to this payload, and run the code to get the results.\n\n## 5. Exporting to JSON and CSV\n\n```python\n# parse_reviews.py\n# save results into a variable data\n# save the data as a json file\nwith open(\"results_reviews.json\", \"w\") as f:\njson.dump(data, f, indent=4)\n# save the reviews in a CSV file\ndf = pd.DataFrame(data['results'][0]['content']['reviews'])\ndf.to_csv('reviews.csv', index=False)\n```\nYou might also be interested in reading up about scraping other targets such as [YouTube](https://oxylabs.io/blog/how-to-scrape-youtube), [Google News](https://oxylabs.io/blog/how-to-scrape-google-news), or [Netflix](https://oxylabs.io/products/scraper-api/web/netflix).\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fhow-to-scrape-imdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fhow-to-scrape-imdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fhow-to-scrape-imdb/lists"}