An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with scrape

A curated list of projects in awesome lists tagged with scrape .

https://github.com/twintproject/twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

elasticsearch kibana osint python scrape scrape-followers scrape-following scrape-likes tweep tweets twint twitter

Last synced: 05 Oct 2025

https://github.com/anorov/cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.

anti-bot-page cloudflare protected-page scrape scraping-websites

Last synced: 13 May 2025

https://github.com/Anorov/cloudflare-scrape

A Python module to bypass Cloudflare's anti-bot page.

anti-bot-page cloudflare protected-page scrape scraping-websites

Last synced: 26 Mar 2025

https://github.com/d60/twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

bot client python python-web-scraper python3 scrape scraper scraping search tweepy twitter twitter-api twitter-bot twitter-client twitter-internal-api twitter-scraper wrapper x x-api

Last synced: 15 May 2025

https://github.com/microlinkhq/metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

metadata parse scrape

Last synced: 13 May 2025

https://github.com/altimis/scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

dowload-images followers following python save-image scrape scrape-followers scrape-following scrape-images scrape-likes scrape-tweets scraper scraping selenium-webdriver tweets twitter twitter-scraper

Last synced: 14 May 2025

https://github.com/glebarez/cero

Scrape domain names from SSL certificates of arbitrary hosts

domain-names recon scrape ssl tls websecurity

Last synced: 12 Apr 2025

https://github.com/austinoboyle/scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

linkedin python scrape scraper scraping selenium selenium-webdriver web-scraper web-scraping

Last synced: 04 Apr 2025

https://github.com/unixfox/pupflare

A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)

anti-bot-page chromium cloudflare cloudflare-bypass cloudflare-scrape docker koa protected-page proxy puppeteer scrape scraping-websites

Last synced: 09 Apr 2025

https://github.com/danieldotnl/ha-multiscrape

Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.

hacs home-assistant home-assistant-custom rest scrape scraper scraping sensor

Last synced: 07 Apr 2025

https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 13 Apr 2025

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 30 Mar 2025

https://github.com/andrewstuart/goq

A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

decoder golang goquery html html-unmarshaling scrape selector selectors struct unmarshaling unmarshall unmarshaller

Last synced: 13 Oct 2025

https://github.com/andrewstuart/Goq

A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

decoder golang goquery html html-unmarshaling scrape selector selectors struct unmarshaling unmarshall unmarshaller

Last synced: 12 Mar 2025

https://github.com/jaredlgillespie/proxyscrape

Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).

proxy python python3 scrape scraper

Last synced: 11 May 2025

https://github.com/JaredLGillespie/proxyscrape

Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).

proxy python python3 scrape scraper

Last synced: 12 Apr 2025

https://github.com/evyatarmeged/humanoid

Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

anti-bot anti-bot-page bot bypass bypass-waf humanoid scrape scraping web-scraping

Last synced: 10 Apr 2025

https://github.com/essamamdani/search-result-scraper-markdown

This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.

ai browserless exa exaai fastapi firecrawl groq httpx jina markdown metasearch openai proxy python reader requests scrape searxng

Last synced: 28 Oct 2025

https://github.com/rocketlaunchr/google-search

scrape google search results

api go golang google scrape search

Last synced: 23 Oct 2025

https://github.com/Jimut123/jimutmap

API to get enormous amount of high resolution satellite images from satellites.pro quickly through multi-threading! create map your own map dataset. Bringing data to Humans.

api areal-image beautifulsoup4 dataset deep-learning-dataset enormous fake-header geo high image images jimutmap ml multithreading resolution satellite satellite-data scrape scraping segmentation-mask

Last synced: 07 Apr 2025

https://github.com/tegridydev/auto-md

Convert Files / Folders / GitHub Repos Into AI / LLM-ready Files

ai ai-tool convert github llm llm-tools md python python-convert python-script scrape

Last synced: 05 Apr 2025

https://github.com/drkain/scrape-youtube

A lightning fast package to scrape YouTube search results

bot discord maintainer-wanted movie nodejs npm playlist scrape search video youtube

Last synced: 06 Apr 2025

https://github.com/html2rss/html2rss

📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

atom-feed extract feed feed-configs html html2rss json rss rss-aggregator rss-bridge rss-builder rss-feed rss-feed-scraper rss-generator ruby scrape scraper scraping scraping-websites yahoo-pipes

Last synced: 14 Mar 2025

https://github.com/fefit/visdom

A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.

confuse css-selector dom-manipulation html-parse jquery rust scrape

Last synced: 27 Dec 2025

https://github.com/serp-spider/core

:spider: The PHP SERP Spider - A search engine scraper

scrap scrape scraping serp

Last synced: 05 Apr 2025

https://github.com/warifp/shopee-scrape

Shopee Scrape is a tool that functions to collect data - the data needed, such as finding data from photos, prices, names, store locations and others.

curl curl-functions curl-library curlphp indonesia marketplace php php-library scrape scrape-images scrape-websites scraped-data scraper scraper-engine shopee shopee-api

Last synced: 22 Mar 2025

https://github.com/issung/gchan

Scrape boards & threads from 4chan. Download images, videos and HTML if desired.

4chan 4chan-downloader 4chan-scraper csharp daemon dotnet gchan scrape scraper winforms

Last synced: 19 Aug 2025

https://github.com/SilentDemonSD/FZBypassBot

A Elegant Fast Multi Threaded Bypass Bot for Bigger Deeds. Try Now !!

bypass bypasscaptcha bypassing link-bypasser link-shortener scrape scrapers scraping-websites telegram-bot

Last synced: 08 Jul 2025

https://github.com/lapwat/reCatchable

Turn a site into a book. Download a whole website and upload it to your reMarkable.

ebook epub remarkable remarkable-tablet remarkable-tablets scrape scraper

Last synced: 05 Apr 2025

https://github.com/swader/diffbot-php-client

[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library

ai artificial-intelligence bot crawl crawling diffbot machine-learning nlp php scrape scraped-data scraper scraping

Last synced: 21 Aug 2025

https://github.com/brenns10/tswift

MetroLyrics API for Python

lyrics python scrape taylor-swift

Last synced: 07 Oct 2025

https://github.com/NightMachinery/readability-cli

A CLI for Mozilla Readability. Get clean, uncluttered, ready-to-read HTML from any webpage!

cleaner cli html mercury-parser mozilla-readability read readability reader sanitize-html scrape scraping scraping-websites webpage

Last synced: 06 Apr 2025

https://github.com/montoyamoraga/scrapers

scrapers for building your own image databases

scrape scraper scraping selenium

Last synced: 19 Mar 2025

https://github.com/fritzh321/logo-scrape

🕷🚀 Scrapes/Crawls the logo from a provided url(s)/website for your Node.js applications.

crawler fetch logo nodejs scrape website

Last synced: 22 Feb 2025

https://github.com/oxylabs/scrape-google-python

In this tutorial, we showcase how to scrape public Google data with Python and Oxylabs API.

google python python-scraping python-web-scraper scrape scrape-google scraper-api scraping scraping-api web-scraping

Last synced: 21 Sep 2025

https://github.com/obscurely/falion

An open source, programmed in rust, privacy focused tool and crate for interacting with programming resources (like stackoverflow) fast, efficiently and asynchronously/parallel using the CLI or GUI.

async cli fast parallel resources rust scrape stackoverflow ui

Last synced: 07 Apr 2025

https://github.com/scalawilliam/amazon-wishlist-api

Scrape Amazon wishlist and provide an API. Play 2.5, JSoup, React.

amazon jsoup react scala scrape

Last synced: 14 Apr 2025

https://github.com/projectwallace/extract-css-core

Extract all CSS from a given url, both server side and client side rendered.

css extract extract-css inline-styling js-styling scrape wallace

Last synced: 19 Sep 2025

https://github.com/stamen/the-ultimate-tile-stitcher

stitch & scrape tiles from slippy map services

carto-tools scrape slippy-map stitch tilemap

Last synced: 14 Jul 2025

https://github.com/ph-7/crawling-emails

Very simple bash script to crawl email addresses from a specific website.

bash crawler email email-scraper scrape scrape-email scraper scraping shell wget

Last synced: 22 Aug 2025

https://github.com/veltzer/pyscrapers

project to produce various useful scrapers

download facebook images instagram pics scrape social vk

Last synced: 10 Apr 2025

https://github.com/pedro-stanaka/prom-scrape-analyzer

Get insights from your scrape endpoint (even it speaks Protobuf)

metrics observability prometheus scrape

Last synced: 06 Apr 2025

https://github.com/toolworks-dev/auto-md

Convert Files / Folders / GitHub Repos Into AI / LLM-ready Files

ai ai-tool convert github llm llm-tools md python python-convert python-script scrape webapp

Last synced: 13 Apr 2025

https://github.com/ohmybahgosh/FONTS_DOT_COM_RIPPER

Script to extract entire font families from Fonts.com, rips them as woff2 and final output includes woff2 and ttf files

bash bash-script curl datamining download-fonts font fonts scrape scrape-websites scraper sed shell-script typography woff2 woff2-files xidel

Last synced: 27 Mar 2025

https://github.com/ph-7/emails-scraper

:ram: Simple PHP Email Grabber to get emails from a txt file containing the list of urls (add one url per line).

crawler email email-grabber email-scraper grabber php php-scraper scrape scrape-email scraper scraping script

Last synced: 09 Apr 2025

https://github.com/zyrouge/node-youtube-ext

A simple YouTube scraper.

scrape youtube ytdl

Last synced: 27 Mar 2025

https://github.com/tomas2d/puppeteer-table-parser

Scrape and parse HTML tables with the Puppeteer table parser.

csv html javascript puppeteer puppeteer-tables scrape scraping table typescript

Last synced: 22 Aug 2025

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

api crawl crawler javascript nodejs ptt scrape scraper scraping spider typescript web-crawler webcrawler

Last synced: 08 Oct 2025

https://github.com/ninja-beans/cloudflare-iuam-solver

CloudflareIuamSolver is the Java library for breaking through the Cloudflare's "I am Under Attack Mode"

anti-bot-page cli cloudflare cloudflare-bypass curl java protected-page scrape scraping scraping-websites

Last synced: 14 Apr 2025

https://github.com/thatsinewave/spy.pet-info

This repository serves as an index for all info the community has gathered on the Spy.pet situation and as well as my own tables and tools written for these investigations. Spy.pet was taken down by Discord on 11.08.2024, this is just an archive of what bots where in each server.

bot bots database discord discord-api discord-bot discord-data discord-py discord-token scrape scraper scraping scraping-websites scrapper security security-scanner security-tools spy-pet spypet thatsinewave

Last synced: 30 Apr 2025

https://github.com/rishi-raj-jain/twitterusernamefromuserid

twitterUsernameviaUserID is an advanced Twitter scraping tool written in Python and Selenium that allows for scraping tweet usernames from the twitter id's, without using Twitter's API.

automation chrome chromedriver json opensource python python3 scrape selenium time tweet-usernames twint twitter twitter-api

Last synced: 05 May 2025

https://github.com/ruichongliu/Crawler_pubg.op.gg

This is a web crawler for pubg.op.gg, written by Ruichong Liu. 绝地求生游戏数据抓取

beautifulsoup4 crawler pubg python3 scrape selenium

Last synced: 25 Mar 2025

https://github.com/harshcasper/blind-app-reviews

Scraped reviews of over 25 companies from the Blind App ⚡️

blind-app company-reviews dataset nlp scrape scraped-data text-mining webscraping

Last synced: 20 Feb 2025

https://github.com/supadata-ai/mcp

Official Supadata MCP Server - Adds powerful video & web scraping to Cursor, Claude and any other LLM clients.

ai crawler llm mcp scrape tiktok transcript whisper youtube

Last synced: 14 Oct 2025

https://github.com/petrpatek/airbnb-scraper

Apify public actor for scraping Airbnb homes.

airbnb airbnb-api apify crawler data-extraction scrape

Last synced: 20 Mar 2025

https://github.com/malina/metascraper

Metascraper is a Crystal library for web scraping.

crystal scrape scraped-data

Last synced: 15 Mar 2025

https://github.com/ruddra/django1.7-scrapy1.0.3

An example project built using django 1.7 and scrapy 1.0.3

django python scrape scrapping scrapy

Last synced: 06 Oct 2025

https://github.com/jkamlah/scrape-editorial-board

Scraping editorial board of journals

editorial-board journals scrape

Last synced: 29 Jun 2025

https://github.com/aigptcode/osint

⭐ Project Review: OSINT Data Collection Tool ⭐ This project provides a foundational tool for OSINT (Open-Source Intelligence) data collection, using Python to aggregate information from social media platforms, search engines, and WHOIS data. The code is modular and easy to follow

ai api google hack llm osint osint-python osint-tool scrape scraping twitter twitter-google

Last synced: 29 Oct 2025

https://github.com/social-media-public-analysis/dozent

Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.

accelerator dask download followers following image likes python save-image scrape scraper scraping selenium social-media tweets twitter webdriver

Last synced: 06 Oct 2025

https://github.com/asjadnaqvi/pakistan-national-budgets

Data scraped from Pakistan Federal Budget PDFs

budget dta excel federal national pakistan scrape

Last synced: 04 Oct 2025

https://github.com/brandenc40/safer

An API to scrape data from the Department of Transportation's Safety and Fitness Electronic Records (SAFER) System.

company dot fmcsa golang mc mx safer scaping scrape scraper snapshot usdot webscraping

Last synced: 11 Apr 2025

https://github.com/koraa/iacr-events-scraper

Scrape https://iacr.org/events/ and export an ICS file for your calendar

calendar crypto scrape

Last synced: 28 Jul 2025

https://github.com/jeffwilliams/nest-scrape

Scrape Nest temperature sensor information from the Nest website

nest scrape

Last synced: 11 Sep 2025

https://github.com/utkarsh914/serp-extended

This npm module allows to execute search on Google with or without proxies. It provides different options for scraping the google results (either the list of the referenced sites or the number of results)

google googlescraper scrape scrapegoogle scraper seo serp serps

Last synced: 26 Jul 2025

https://github.com/adambielat/roscrape

A selenium-based tool used to scrape and friend request all the owners of a limited.

automation roblox rolimons scrape scraping selenium

Last synced: 20 Mar 2025

https://github.com/brianary/selecthtml

A PowerShell module for extracting data from HTML using XPath

fsharp html html-parsing powershell powershell-module scrape xpath

Last synced: 24 Apr 2025

https://github.com/simonwaldherr/scrapeems

ScrapeEMS is a #golang #cli tool to scrape the EMS (#ELDIS Management Suite)

eldis ems golang login scrape scraper web

Last synced: 30 Mar 2025

https://github.com/acidjazz/tubestrip

PHP Laravel Youtube Scraper

crawl guzzle laravel scrape youtube

Last synced: 22 Jun 2025

https://github.com/samfisherirl/sxweet_scraper_gui

Sxweet_Scraper_GUI, forked from https://github.com/Altimis/Scweet, continuing updates and changes. Twitter educational research reader.

scrape scweet sxweet twitter

Last synced: 14 Sep 2025

https://github.com/ahmadxgani/yt-downloader

download playlist video or single video from youtube

cli javascript nodejs scrape youtube-dl

Last synced: 15 Apr 2025

https://github.com/sarthakpranesh/gyan

Get Google Images and Wikipedia information on {You Say What}

api golang google gyan knowledge scrape wikipedia

Last synced: 23 Mar 2025