Projects in Awesome Lists tagged with scrape
A curated list of projects in awesome lists tagged with scrape .
https://github.com/twintproject/twint
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
elasticsearch kibana osint python scrape scrape-followers scrape-following scrape-likes tweep tweets twint twitter
Last synced: 05 Oct 2025
https://github.com/alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
ai artificial-intelligence automation crawler machine-learning python scrape scraper scraping web-scraping webautomation webscraping
Last synced: 13 May 2025
https://github.com/anorov/cloudflare-scrape
A Python module to bypass Cloudflare's anti-bot page.
anti-bot-page cloudflare protected-page scrape scraping-websites
Last synced: 13 May 2025
https://github.com/Anorov/cloudflare-scrape
A Python module to bypass Cloudflare's anti-bot page.
anti-bot-page cloudflare protected-page scrape scraping-websites
Last synced: 26 Mar 2025
https://github.com/d60/twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
bot client python python-web-scraper python3 scrape scraper scraping search tweepy twitter twitter-api twitter-bot twitter-client twitter-internal-api twitter-scraper wrapper x x-api
Last synced: 15 May 2025
https://github.com/microlinkhq/metascraper
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
Last synced: 13 May 2025
https://github.com/trevorhobenshield/twitter-api-client
Implementation of X/Twitter v1, v2, and GraphQL APIs
api async automation bot client scrape search twitter twitter-api twitter-bot twitter-scraper x x-api x-bot x-scraper
Last synced: 14 May 2025
https://github.com/altimis/scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
dowload-images followers following python save-image scrape scrape-followers scrape-following scrape-images scrape-likes scrape-tweets scraper scraping selenium-webdriver tweets twitter twitter-scraper
Last synced: 14 May 2025
https://github.com/glebarez/cero
Scrape domain names from SSL certificates of arbitrary hosts
domain-names recon scrape ssl tls websecurity
Last synced: 12 Apr 2025
https://github.com/austinoboyle/scrape-linkedin-selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
linkedin python scrape scraper scraping selenium selenium-webdriver web-scraper web-scraping
Last synced: 04 Apr 2025
https://github.com/ScriptSmith/instamancer
Scrape Instagram's API with Puppeteer
data-mining instagram instagram-api instagram-scraper puppeteer scrape
Last synced: 04 Apr 2025
https://github.com/scriptsmith/instamancer
Scrape Instagram's API with Puppeteer
data-mining instagram instagram-api instagram-scraper puppeteer scrape
Last synced: 04 Apr 2025
https://github.com/unixfox/pupflare
A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)
anti-bot-page chromium cloudflare cloudflare-bypass cloudflare-scrape docker koa protected-page proxy puppeteer scrape scraping-websites
Last synced: 09 Apr 2025
https://github.com/danieldotnl/ha-multiscrape
Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.
hacs home-assistant home-assistant-custom rest scrape scraper scraping sensor
Last synced: 07 Apr 2025
https://github.com/anonyfox/elixir-scrape
Scrape any website, article or RSS/Atom Feed with ease!
data-science elixir feed html information-retrieval readability rss scrape scraping
Last synced: 13 Apr 2025
https://github.com/Anonyfox/elixir-scrape
Scrape any website, article or RSS/Atom Feed with ease!
data-science elixir feed html information-retrieval readability rss scrape scraping
Last synced: 30 Mar 2025
https://github.com/yaroslaff/nudecrawler
Crawl telegra.ph searching for nudes!
crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping
Last synced: 04 Apr 2025
https://github.com/ultralytics/google-images-download
Google/Bing Images Web Downloader
bing-images-downloader download google-image-downloader google-image-search google-images-downloader images scrape scraper
Last synced: 14 Mar 2025
https://github.com/andrewstuart/goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library
decoder golang goquery html html-unmarshaling scrape selector selectors struct unmarshaling unmarshall unmarshaller
Last synced: 13 Oct 2025
https://github.com/andrewstuart/Goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library
decoder golang goquery html html-unmarshaling scrape selector selectors struct unmarshaling unmarshall unmarshaller
Last synced: 12 Mar 2025
https://github.com/evyatarmeged/humanoid
Node.js package to bypass CloudFlare's anti-bot JavaScript challenges
anti-bot anti-bot-page bot bypass bypass-waf humanoid scrape scraping web-scraping
Last synced: 10 Apr 2025
https://github.com/essamamdani/search-result-scraper-markdown
This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.
ai browserless exa exaai fastapi firecrawl groq httpx jina markdown metasearch openai proxy python reader requests scrape searxng
Last synced: 28 Oct 2025
https://github.com/drudge/n8n-nodes-puppeteer
n8n node for browser automation using Puppeteer
browser chromium n8n n8n-nodes pdf proxy-server puppeteer scrape scraping screenshot screenshots script stealth-mode
Last synced: 05 Oct 2025
https://github.com/Jimut123/jimutmap
API to get enormous amount of high resolution satellite images from satellites.pro quickly through multi-threading! create map your own map dataset. Bringing data to Humans.
api areal-image beautifulsoup4 dataset deep-learning-dataset enormous fake-header geo high image images jimutmap ml multithreading resolution satellite satellite-data scrape scraping segmentation-mask
Last synced: 07 Apr 2025
https://github.com/tegridydev/auto-md
Convert Files / Folders / GitHub Repos Into AI / LLM-ready Files
ai ai-tool convert github llm llm-tools md python python-convert python-script scrape
Last synced: 05 Apr 2025
https://github.com/JMousqueton/ransomware.live
🏴☠️💰 Another Ransomware gang tracker
cti encyclopedia negotiation parse python ransom ransomware scrape screenshot threat-intelligence threatintel victim
Last synced: 30 Mar 2025
https://github.com/jmousqueton/ransomware.live
🏴☠️💰 Another Ransomware gang tracker
cti encyclopedia negotiation parse python ransom ransomware scrape screenshot threat-intelligence threatintel victim
Last synced: 22 Mar 2025
https://github.com/html2rss/html2rss
📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.
atom-feed extract feed feed-configs html html2rss json rss rss-aggregator rss-bridge rss-builder rss-feed rss-feed-scraper rss-generator ruby scrape scraper scraping scraping-websites yahoo-pipes
Last synced: 14 Mar 2025
https://github.com/fefit/visdom
A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
confuse css-selector dom-manipulation html-parse jquery rust scrape
Last synced: 27 Dec 2025
https://github.com/serp-spider/core
:spider: The PHP SERP Spider - A search engine scraper
Last synced: 05 Apr 2025
https://github.com/warifp/shopee-scrape
Shopee Scrape is a tool that functions to collect data - the data needed, such as finding data from photos, prices, names, store locations and others.
curl curl-functions curl-library curlphp indonesia marketplace php php-library scrape scrape-images scrape-websites scraped-data scraper scraper-engine shopee shopee-api
Last synced: 22 Mar 2025
https://github.com/issung/gchan
Scrape boards & threads from 4chan. Download images, videos and HTML if desired.
4chan 4chan-downloader 4chan-scraper csharp daemon dotnet gchan scrape scraper winforms
Last synced: 19 Aug 2025
https://github.com/SilentDemonSD/FZBypassBot
A Elegant Fast Multi Threaded Bypass Bot for Bigger Deeds. Try Now !!
bypass bypasscaptcha bypassing link-bypasser link-shortener scrape scrapers scraping-websites telegram-bot
Last synced: 08 Jul 2025
https://github.com/lapwat/reCatchable
Turn a site into a book. Download a whole website and upload it to your reMarkable.
ebook epub remarkable remarkable-tablet remarkable-tablets scrape scraper
Last synced: 05 Apr 2025
https://github.com/vanyasem/vk-scraper
Scrape VK media
api downloader python scrape scraper vk vk-api vkontakte vkontakte-api
Last synced: 23 Oct 2025
https://github.com/swader/diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
ai artificial-intelligence bot crawl crawling diffbot machine-learning nlp php scrape scraped-data scraper scraping
Last synced: 21 Aug 2025
https://github.com/brenns10/tswift
MetroLyrics API for Python
lyrics python scrape taylor-swift
Last synced: 07 Oct 2025
https://github.com/NightMachinery/readability-cli
A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!
cleaner cli html mercury-parser mozilla-readability read readability reader sanitize-html scrape scraping scraping-websites webpage
Last synced: 06 Apr 2025
https://github.com/montoyamoraga/scrapers
scrapers for building your own image databases
scrape scraper scraping selenium
Last synced: 19 Mar 2025
https://github.com/sanjaysunil/email-scraper
Generate thousands of temporary emails within seconds!
automation email email-generator email-scraper email-scrapping email-service python scrape scraper temp-email temporary web-scraper web-scraping
Last synced: 26 Oct 2025
https://github.com/oxylabs/scrape-google-python
In this tutorial, we showcase how to scrape public Google data with Python and Oxylabs API.
google python python-scraping python-web-scraper scrape scrape-google scraper-api scraping scraping-api web-scraping
Last synced: 21 Sep 2025
https://github.com/obscurely/falion
An open source, programmed in rust, privacy focused tool and crate for interacting with programming resources (like stackoverflow) fast, efficiently and asynchronously/parallel using the CLI or GUI.
async cli fast parallel resources rust scrape stackoverflow ui
Last synced: 07 Apr 2025
https://github.com/alias-rahil/lyrics-searcher
A Simple Lyrics Finder That Just Works
azlyrics cli genius google javascript lyrics-fetcher lyrics-finder lyrics-searcher musixmatch rapidapi scrape song typescript
Last synced: 10 Oct 2025
https://github.com/saadmanrafat/imgur-scraper
Retrieve years of imgur.com's data without any authentication.
command-line-tool data-mining hacktoberfest2021 imgur imgur-api imgur-scraper machine-learning no-authentication pypi python scrape
Last synced: 16 Mar 2025
https://github.com/projectwallace/extract-css-core
Extract all CSS from a given url, both server side and client side rendered.
css extract extract-css inline-styling js-styling scrape wallace
Last synced: 19 Sep 2025
https://github.com/stamen/the-ultimate-tile-stitcher
stitch & scrape tiles from slippy map services
carto-tools scrape slippy-map stitch tilemap
Last synced: 14 Jul 2025
https://github.com/ph-7/crawling-emails
Very simple bash script to crawl email addresses from a specific website.
bash crawler email email-scraper scrape scrape-email scraper scraping shell wget
Last synced: 22 Aug 2025
https://github.com/searchformyusername/fastproxy
MultiThreaded Application to Scrape Working Web Proxies
beginner-friendly contributions-welcome hacktoberfest hacktoberfest-accepted hide ip live proxy pytest python scrape scraping security threading web
Last synced: 16 Jun 2025
https://github.com/1uc1f3r616/fastproxy
MultiThreaded Application to Scrape Working Web Proxies
beginner-friendly contributions-welcome hacktoberfest hacktoberfest-accepted hide ip live proxy pytest python scrape scraping security threading web
Last synced: 13 Apr 2025
https://github.com/oxylabs/oxylabs-sdk-go
Go SDK for the Oxylabs Scraper APIs.
api mslm oxylabs oxylabs-library proxy scrape scraper sdk sdk-oxylabs serp-api serp-api-go
Last synced: 31 Aug 2025
https://github.com/pedro-stanaka/prom-scrape-analyzer
Get insights from your scrape endpoint (even it speaks Protobuf)
metrics observability prometheus scrape
Last synced: 06 Apr 2025
https://github.com/scriptsmith/insta-scrape
Scrape Instagram
hashtag instagram instagram-api instagram-hashtag scrape scraping
Last synced: 09 Apr 2025
https://github.com/fernandod1/producthunt-scraper
Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.
crawler crawling crawling-sites data-mining datamining producthunt producthunt-api producthunt-users python python-script python3 scrape scraped-data scraper scraper-engine scraping scraping-bot scraping-python scraping-tool scraping-websites
Last synced: 16 Jun 2025
https://github.com/toolworks-dev/auto-md
Convert Files / Folders / GitHub Repos Into AI / LLM-ready Files
ai ai-tool convert github llm llm-tools md python python-convert python-script scrape webapp
Last synced: 13 Apr 2025
https://github.com/shashwatah/gitwiz
A handy portal to query public repos on multiple version control platforms.
api express git github gitlab gitwiz graphql handlebars javascript nodejs portal repositories repository scrape search search-engine ts-node types typescript version-control
Last synced: 12 Apr 2025
https://github.com/ohmybahgosh/FONTS_DOT_COM_RIPPER
Script to extract entire font families from Fonts.com, rips them as woff2 and final output includes woff2 and ttf files
bash bash-script curl datamining download-fonts font fonts scrape scrape-websites scraper sed shell-script typography woff2 woff2-files xidel
Last synced: 27 Mar 2025
https://github.com/ph-7/emails-scraper
:ram: Simple PHP Email Grabber to get emails from a txt file containing the list of urls (add one url per line).
crawler email email-grabber email-scraper grabber php php-scraper scrape scrape-email scraper scraping script
Last synced: 09 Apr 2025
https://github.com/tomas2d/puppeteer-table-parser
Scrape and parse HTML tables with the Puppeteer table parser.
csv html javascript puppeteer puppeteer-tables scrape scraping table typescript
Last synced: 22 Aug 2025
https://github.com/waynechang65/ptt-crawler
ptt-crawler is a web crawler module designed to scarpe data from Ptt.
api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler
Last synced: 08 Oct 2025
https://github.com/ninja-beans/cloudflare-iuam-solver
CloudflareIuamSolver is the Java library for breaking through the Cloudflare's "I am Under Attack Mode"
anti-bot-page cli cloudflare cloudflare-bypass curl java protected-page scrape scraping scraping-websites
Last synced: 14 Apr 2025
https://github.com/thatsinewave/spy.pet-info
This repository serves as an index for all info the community has gathered on the Spy.pet situation and as well as my own tables and tools written for these investigations. Spy.pet was taken down by Discord on 11.08.2024, this is just an archive of what bots where in each server.
bot bots database discord discord-api discord-bot discord-data discord-py discord-token scrape scraper scraping scraping-websites scrapper security security-scanner security-tools spy-pet spypet thatsinewave
Last synced: 30 Apr 2025
https://github.com/meomundep/meomundep-airdrop-data-base.
Contribute stars if you want me to make scripts fast :)
airdrop airdrop-claim-bot airdrop-farm airdrop-free airdrops-bot airdrops-tools data data-base github meomundep scrape web
Last synced: 08 Aug 2025
https://github.com/miroshnikov/scrapyteer
Web crawling & scraping framework for Node.js on top of headless Chrome browser
crawer crawling crawling-framework crawling-sites crawling-tool headless scrape scraper scraping scraping-websites scrapy scrapy-crawler spider spider-framework web-crawler web-crawling web-scraping web-scraping-nodejs
Last synced: 26 Oct 2025
https://github.com/rishi-raj-jain/twitterusernamefromuserid
twitterUsernameviaUserID is an advanced Twitter scraping tool written in Python and Selenium that allows for scraping tweet usernames from the twitter id's, without using Twitter's API.
automation chrome chromedriver json opensource python python3 scrape selenium time tweet-usernames twint twitter twitter-api
Last synced: 05 May 2025
https://github.com/ruichongliu/Crawler_pubg.op.gg
This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取
beautifulsoup4 crawler pubg python3 scrape selenium
Last synced: 25 Mar 2025
https://github.com/harshcasper/blind-app-reviews
Scraped reviews of over 25 companies from the Blind App ⚡️
blind-app company-reviews dataset nlp scrape scraped-data text-mining webscraping
Last synced: 20 Feb 2025
https://github.com/supadata-ai/mcp
Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.
ai crawler llm mcp scrape tiktok transcript whisper youtube
Last synced: 14 Oct 2025
https://github.com/petrpatek/airbnb-scraper
Apify public actor for scraping Airbnb homes.
airbnb airbnb-api apify crawler data-extraction scrape
Last synced: 20 Mar 2025
https://github.com/malina/metascraper
Metascraper is a Crystal library for web scraping.
Last synced: 15 Mar 2025
https://github.com/jkamlah/scrape-editorial-board
Scraping editorial board of journals
editorial-board journals scrape
Last synced: 29 Jun 2025
https://github.com/richardsondev/pse-outages
Tracking Puget Sound Energy outage history since March 2021
git-history git-scrape git-scraper git-scraping outages power-outage power-outages puget-sound-data pugetsound scrape scraping washington-state
Last synced: 06 Jan 2026
https://github.com/trevorhobenshield/reddit-api-client
Reddit API
api automation bot reddit scrape
Last synced: 12 Apr 2025
https://github.com/agentfabulous/aosp-bulletin-scrape
A simple scraper to retrieve Android AOSP Security Bulletins.
android aosp beautifulsoup beautifulsoup4 python scrape scraper security security-bulletin security-tools shell
Last synced: 21 Mar 2025
https://github.com/aigptcode/osint
⭐ Project Review: OSINT Data Collection Tool ⭐ This project provides a foundational tool for OSINT (Open-Source Intelligence) data collection, using Python to aggregate information from social media platforms, search engines, and WHOIS data. The code is modular and easy to follow
ai api google hack llm osint osint-python osint-tool scrape scraping twitter twitter-google
Last synced: 29 Oct 2025
https://github.com/vante-dev/ookla-speedtest-results
Scrape data from speedtest.net
data-structures ookla ookla-speedtest ooklaserver-speedtest scrape speedtest
Last synced: 22 Apr 2025
https://github.com/social-media-public-analysis/dozent
Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.
accelerator dask download followers following image likes python save-image scrape scraper scraping selenium social-media tweets twitter webdriver
Last synced: 06 Oct 2025
https://github.com/koraa/iacr-events-scraper
Scrape https://iacr.org/events/ and export an ICS file for your calendar
Last synced: 28 Jul 2025
https://github.com/jeffwilliams/nest-scrape
Scrape Nest temperature sensor information from the Nest website
Last synced: 11 Sep 2025
https://github.com/theohbrothers/pssitescraper
Cmdlets for scraping a site.
html powershell pwsh scrape site sitemap uri uri-scheme url website
Last synced: 12 Apr 2025
https://github.com/utkarsh914/serp-extended
This npm module allows to execute search on Google with or without proxies. It provides different options for scraping the google results (either the list of the referenced sites or the number of results)
google googlescraper scrape scrapegoogle scraper seo serp serps
Last synced: 26 Jul 2025
https://github.com/adambielat/roscrape
A selenium-based tool used to scrape and friend request all the owners of a limited.
automation roblox rolimons scrape scraping selenium
Last synced: 20 Mar 2025
https://github.com/techguy940/proxies-scraper
Scraper for HTTP,HTTPS,SOCKS4,SOCKS5 Proxies
anonymous api elite free free-proxies free-proxy http https proxies proxy-list proxy-scraper scrape scraper socks4 socks5 transparent
Last synced: 13 Apr 2025
https://github.com/brianary/selecthtml
A PowerShell module for extracting data from HTML using XPath
fsharp html html-parsing powershell powershell-module scrape xpath
Last synced: 24 Apr 2025
https://github.com/gayanukabulegoda/web-scraping-starter-kit
Repository designed to help freshers easily grasp the basics of web scripting, offering simple guides and examples to build a strong foundation.
python python-web-scraper python3 scrap-data scrape scraping scraping-data scraping-images scraping-python scraping-web simple-scraping web-scraper web-scraping web-scraping-project web-scraping-python web-scraping-tutorials web-scrapper-python web-scrapping
Last synced: 27 Sep 2025
https://github.com/samfisherirl/sxweet_scraper_gui
Sxweet_Scraper_GUI, forked from https://github.com/Altimis/Scweet, continuing updates and changes. Twitter educational research reader.
Last synced: 14 Sep 2025
https://github.com/ahmadxgani/yt-downloader
download playlist video or single video from youtube
cli javascript nodejs scrape youtube-dl
Last synced: 15 Apr 2025