{"id":27444374,"url":"https://github.com/ascender1729/leetcode_scraper","last_synced_at":"2026-06-20T04:31:04.576Z","repository":{"id":284887601,"uuid":"956384822","full_name":"ascender1729/leetcode_scraper","owner":"ascender1729","description":"Extract topic tags from LeetCode problems to streamline interview preparation.","archived":false,"fork":false,"pushed_at":"2025-03-28T07:07:23.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-15T16:15:40.081Z","etag":null,"topics":["beautifulsoup","coding-interview","data-analysis","graphql","leetcode","python","scraper","web-scraping"],"latest_commit_sha":null,"homepage":"https://github.com/ascender1729/leetcode_scraper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ascender1729.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-28T06:56:51.000Z","updated_at":"2025-03-28T07:20:53.000Z","dependencies_parsed_at":"2025-03-28T08:20:54.213Z","dependency_job_id":null,"html_url":"https://github.com/ascender1729/leetcode_scraper","commit_stats":null,"previous_names":["ascender1729/leetcode_scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ascender1729/leetcode_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ascender1729%2Fleetcode_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ascender1729%2Fleetcode_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ascender1729%2Fleetcode_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ascender1729%2Fleetcode_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ascender1729","download_url":"https://codeload.github.com/ascender1729/leetcode_scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ascender1729%2Fleetcode_scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34557551,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-20T02:00:06.407Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","coding-interview","data-analysis","graphql","leetcode","python","scraper","web-scraping"],"created_at":"2025-04-15T03:15:57.617Z","updated_at":"2026-06-20T04:31:04.558Z","avatar_url":"https://github.com/ascender1729.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LeetCode Topic Tag Scraper\r\n\r\nA Python tool to extract topic tags for LeetCode problems. This script takes a list of LeetCode problem URLs and automatically extracts the topic tags associated with each problem (e.g., \"Array\", \"String\", \"Dynamic Programming\", etc.).\r\n\r\n## Why is this useful?\r\n\r\nWhen studying for coding interviews, it's helpful to know which data structures and algorithms are being tested by each LeetCode problem. This tool automatically extracts those tags to help you focus your study efforts on specific topics.\r\n\r\n## Prerequisites\r\n\r\n- Python 3.6 or higher\r\n- pip (Python package installer)\r\n\r\n## Installation\r\n\r\n1. Clone this repository:\r\n```bash\r\ngit clone https://github.com/ascender1729/leetcode_scraper.git\r\ncd leetcode_scraper\r\n```\r\n\r\n2. Create a virtual environment:\r\n```bash\r\n# Windows\r\npython -m venv venv\r\nvenv\\Scripts\\activate\r\n\r\n# macOS/Linux\r\npython -m venv venv\r\nsource venv/bin/activate\r\n```\r\n\r\n3. Install dependencies:\r\n```bash\r\npip install requests beautifulsoup4 pandas\r\n```\r\n\r\n## How to Use\r\n\r\n### Step 1: Get Company-Specific LeetCode Problems\r\n\r\nYou can obtain company-specific LeetCode problem lists from these repositories:\r\n- [LeetCode-Questions-CompanyWise](https://github.com/krishnadey30/LeetCode-Questions-CompanyWise)\r\n- [leetcode-company-wise-problems](https://github.com/liquidslr/leetcode-company-wise-problems)\r\n\r\nTo get the links:\r\n1. Clone one of these repositories:\r\n   ```bash\r\n   git clone https://github.com/krishnadey30/LeetCode-Questions-CompanyWise.git\r\n   # OR\r\n   git clone https://github.com/liquidslr/leetcode-company-wise-problems.git\r\n   ```\r\n\r\n2. Find the CSV file for your desired company (e.g., Google, Amazon, Microsoft)\r\n\r\n3. Extract all the LeetCode problem links from the CSV file\r\n\r\n### Step 2: Create Input File\r\n\r\n1. Create a text file called `links.txt` in the project directory\r\n2. Paste all the LeetCode problem links you extracted, one link per line, for example:\r\n```\r\nhttps://leetcode.com/problems/two-sum/\r\nhttps://leetcode.com/problems/add-two-numbers/\r\nhttps://leetcode.com/problems/longest-substring-without-repeating-characters/\r\n```\r\n\r\n### Step 3: Run the Script\r\n\r\n```bash\r\npython scraper.py\r\n```\r\n\r\nThe script will:\r\n1. Process each link in the `links.txt` file\r\n2. Extract the topic tags for each problem\r\n3. Save the results to `output/leetcode_topics.csv`\r\n\r\n## Output\r\n\r\nThe output CSV file will have three columns:\r\n- `problem_link`: The full URL of the problem\r\n- `problem_id`: The problem ID/slug (e.g., \"two-sum\")\r\n- `topics`: Comma-separated list of topic tags for the problem\r\n\r\nExample output:\r\n```\r\nproblem_link,problem_id,topics\r\nhttps://leetcode.com/problems/two-sum/,two-sum,\"Array, Hash Table\"\r\nhttps://leetcode.com/problems/add-two-numbers/,add-two-numbers,\"Linked List, Math, Recursion\"\r\n```\r\n\r\n## Troubleshooting\r\n\r\n### Rate Limiting\r\nIf you're processing a large number of problems, you might encounter rate limiting from LeetCode. If this happens:\r\n- Try increasing the delay between requests by modifying the `time.sleep()` value\r\n- Run the script in smaller batches\r\n\r\n### Module Not Found Errors\r\nIf you see \"ModuleNotFoundError\", make sure you've installed all required dependencies:\r\n```bash\r\npip install requests beautifulsoup4 pandas\r\n```\r\n\r\n### Connection Errors\r\nIf you're getting connection errors, it might be due to network issues or LeetCode blocking the requests. Try:\r\n- Checking your internet connection\r\n- Waiting for a few minutes before trying again\r\n- Using a VPN if necessary\r\n\r\n## Example Workflow\r\n\r\nHere's a complete example of how you might use this tool:\r\n\r\n```bash\r\n# Clone the repository\r\ngit clone https://github.com/ascender1729/leetcode_scraper.git\r\ncd leetcode_scraper\r\n\r\n# Set up the environment\r\npython -m venv venv\r\nvenv\\Scripts\\activate\r\npip install requests beautifulsoup4 pandas\r\n\r\n# Get company-specific problem lists\r\ngit clone https://github.com/krishnadey30/LeetCode-Questions-CompanyWise.git\r\n# Now open the CSV for your target company and copy the links\r\n\r\n# Create links.txt with your LeetCode problem URLs\r\n# (each URL on a new line)\r\n\r\n# Run the scraper\r\npython scraper.py\r\n\r\n# Results will be in output/leetcode_topics.csv\r\n```\r\n\r\n## How It Works\r\n\r\nThe script uses two approaches to extract topic tags:\r\n\r\n1. **GraphQL API**: First attempts to use LeetCode's GraphQL API to fetch topic tags directly\r\n2. **HTML Parsing**: If the API approach fails, it scrapes the HTML content of the problem page\r\n\r\nThe script saves progress every 10 problems, so if it's interrupted, you won't lose all your data.\r\n\r\n## License\r\n\r\nMIT\r\n\r\n## Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request to the [repository](https://github.com/ascender1729/leetcode_scraper).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fascender1729%2Fleetcode_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fascender1729%2Fleetcode_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fascender1729%2Fleetcode_scraper/lists"}