{"id":26236147,"url":"https://github.com/dan3002/imdb-crawler","last_synced_at":"2026-04-27T01:31:19.685Z","repository":{"id":280419015,"uuid":"941890501","full_name":"DAN3002/IMDb-Crawler","owner":"DAN3002","description":"A powerful Python-based web crawler that collects comprehensive movie information from IMDb using both GraphQL API and web scraping techniques. This tool can gather detailed movie data including basic information, reviews, and ratings for any type of movies based on customizable filters.","archived":false,"fork":false,"pushed_at":"2025-03-06T10:47:05.000Z","size":376,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-02T08:45:48.025Z","etag":null,"topics":["crawler","imdb","imdb-dataset","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DAN3002.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-03T08:19:40.000Z","updated_at":"2025-05-27T07:20:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"5f826a95-cfbf-4254-a3aa-4d15caabea27","html_url":"https://github.com/DAN3002/IMDb-Crawler","commit_stats":null,"previous_names":["dan3002/imdb-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DAN3002/IMDb-Crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAN3002%2FIMDb-Crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAN3002%2FIMDb-Crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAN3002%2FIMDb-Crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAN3002%2FIMDb-Crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DAN3002","download_url":"https://codeload.github.com/DAN3002/IMDb-Crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DAN3002%2FIMDb-Crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32319559,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"ssl_error","status_checked_at":"2026-04-26T23:26:25.802Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","imdb","imdb-dataset","selenium"],"created_at":"2025-03-13T03:27:32.631Z","updated_at":"2026-04-27T01:31:19.679Z","avatar_url":"https://github.com/DAN3002.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# IMDb Movie Crawler\n\nA powerful Python-based web crawler that collects comprehensive movie information from IMDb using both GraphQL API and web scraping techniques. This tool can gather detailed movie data including basic information, reviews, and ratings for any type of movies based on customizable filters.\n\n## Features\n\n- Advanced movie filtering capabilities similar to IMDb's search:\n  - By country of origin\n  - By year range\n  - By genre\n  - By rating range\n  - By number of votes\n  - By title type (movie, TV series, etc.)\n  - By language\n  - By user reviews count\n- Collects detailed movie information including:\n  - Basic details (title, original title, year, runtime)\n  - Ratings and reviews\n  - Plot summaries\n  - Genre information\n  - Country of origin\n  - Popularity rankings\n  - Certificate ratings\n- Comprehensive user review collection\n- JSON to CSV conversion for easy data analysis\n- Robust logging system\n- Rate limiting to prevent server overload\n- Progress saving and error handling\n- Session management with automatic retry\n\n## Project Structure\n```\nIMDb-Crawler/\n├── imdb_crawler.py           # Main crawler for basic movie information\n├── movie_detail_crawler.py   # Detailed movie information crawler\n├── user_review_crawler.py    # Movie reviews crawler\n├── filter_movies.py          # Movie filtering script\n├── json_to_csv_converter.py  # JSON to CSV conversion utility\n├── utils/\n│   └── logger.py            # Logging utility\n├── logs/                    # Log files directory\n├── output/                  # Output files directory\n└── error_logs/             # Error logging directory\n```\n## Requirements\n\n- Python 3.x\n- Chrome Browser\n- Selenium WebDriver\n\n## Installation\n\n1. Clone the repository:\n```\ngit clone https://github.com/DAN3002/IMDb-Crawler.git\ncd IMDb-Crawler\n```\n\n2. Install required packages:\n```\npip install -r requirements.txt\n```\n\n3. Install Chrome WebDriver for your Chrome browser version\n\n## Usage\n\n1. Basic Movie Crawling:\n```bash \npython imdb_crawler.py\n```\n\n2. Detailed Movie Information:\n```bash\npython movie_detail_crawler.py\n```\n\n3. User Reviews Collection:\n```bash\npython user_review_crawler.py\n```\n\n4. Custom Filtering:\n```python\n# Example in filter_movies.py\nfilter_criteria = {\n    'votes_min': 1000,           # Minimum votes\n    'rating_min': 7.0,           # Minimum rating\n    'year_range': (2000, 2024),  # Year range\n    'countries': ['US', 'UK'],   # Countries\n    'genres': ['Action', 'Drama'],# Genres\n    'reviews_min': 5             # Minimum reviews\n}\n```\n\n5. Convert Results to CSV:\n```bash\npython json_to_csv_converter.py\n```\n\n## Customizing Filter Criteria When Crawling Movies\n\n```python\n# Example in imdb_crawler.py\nvariables = {\n    \"first\": self.PAGE_SIZE,\n    \"locale\": \"vi-VN\",\n    \"originCountryConstraint\": {\n      \"anyPrimaryCountries\": [\"VN\"]\n    },\n    \"titleTypeConstraint\":{\"anyTitleTypeIds\":[\"movie\"],\"excludeTitleTypeIds\":[]},\n    \"sortBy\": \"POPULARITY\",\n    \"sortOrder\": \"ASC\"\n}\n```\n\n## Output Formats\n\nThe crawler generates several output files:\n\n- `movie_details.json`: Complete movie information\n- `filtered_movies.json`: Filtered movie results\n- `movie_reviews.json`: User reviews data\n- Corresponding CSV files for each JSON file\n\n## Error Handling\n\nThe crawler includes comprehensive error handling and logging:\n\n- Automatic session refresh on connection issues\n- Rate limiting to prevent IP blocking\n- Progress saving for long-running crawls\n- Detailed error logs in `error_logs` directory\n\n## Author\n\nThis project is created by [@DAN3002](https://github.com/DAN3002).\n\n## Contributing\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n## License\n\nThis project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.\n\n## Disclaimer\n\nThis tool is for educational purposes only. Please review IMDb's terms of service and robots.txt before using this crawler. Ensure you comply with IMDb's usage policies and implement appropriate rate limiting.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdan3002%2Fimdb-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdan3002%2Fimdb-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdan3002%2Fimdb-crawler/lists"}