Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with scraper
A curated list of projects in awesome lists tagged with scraper .
https://github.com/twiny/spidy
Domain names collector - Crawl websites and collect domain names along with their availability status.
backlinks crawler domain expired-domain golang scraper seotools spider
Last synced: 01 Aug 2024
https://github.com/HarryShomer/Hockey-Scraper
Python Package for scraping NHL Play-by-Play and Shift data
hockey nhl python scraper sports web-scraping
Last synced: 01 Aug 2024
https://github.com/zehina/webtoon-downloader
Webtoons Scraper able to download all chapters of any series wanted.
manhwa manhwa-scraper python python3 scraper webtoon-crawler webtoon-downloader webtoons webtoons-downloader
Last synced: 28 Sep 2024
https://github.com/toadlyBroodle/spam-bot-3000
Social media research and promotion, semi-autonomous CLI bot
automation bot cli command-line-tool facebook firefox geckodriver hashtag instagram keyword marketing promotion python reddit research scrape-dumps scraper selenium social-media twitter
Last synced: 01 Aug 2024
https://github.com/Zehina/Webtoon-Downloader
Webtoons Scraper able to download all chapters of any series wanted.
manhwa manhwa-scraper python python3 scraper webtoon-crawler webtoon-downloader webtoons webtoons-downloader
Last synced: 01 Aug 2024
https://github.com/fernandod1/instagram-to-discord
Monitor instagram user account and automatically post new images to discord channel via a webhook. Working 2022!
discord discord-bot discordapp discordbot instagram instagram-bot instagram-downloader instagram-photos instagram-scraper monitor monitoring-scripts monitors python python3 scraper scraping scraping-python scraping-websites scrapper webhook-discord
Last synced: 01 Oct 2024
https://github.com/voliveirajr/seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
asp-net python scraper scraping scraping-websites scrapper scrapy selenium selenium-webdriver webcrawler webcrawling
Last synced: 28 Sep 2024
https://github.com/alash3al/scraply
Scraply a simple dom scraper to fetch information from any html based website
crawler crawling dom golang scraper scrapers scraping-websites scrapy server
Last synced: 01 Aug 2024
https://github.com/yaboipy/godm
The Fast And Advanced Discord Multi-Tool
automation bot discord discord-multi-tool discord-multitool discord-raid-tool discord-token discordapp dm dm-spammer mass-dm massdm multitool proxy raid-bot scraper spammer websocket
Last synced: 26 Sep 2024
https://github.com/urbanadventurer/bing-ip2hosts
bingip2hosts is a Bing.com web scraper that discovers websites by IP address
bing discovery hostnames ipaddress kali kali-linux osint osint-reconnaissance osint-tool reconnaissance scraper search-engine webscraping
Last synced: 31 Jul 2024
https://github.com/luckylittle/blinkist-m4a-downloader
Grabs all of the audio files from all of the Blinkist books
audiobooks blinkist books crawler data-archiving data-mining data-processing go golang scraper spider
Last synced: 31 Jul 2024
https://github.com/basilioss/obsidian-scrapers
Get information from link for Obsidian
obsidian obsidian-md parser scraper
Last synced: 01 Aug 2024
https://github.com/ohmybahgosh/YT-DLP-SCRIPTS
...Just a place for me to share my various YT-DLP & related bash scripts.
bash bash-script downloading ffmpeg ffmpeg-script parser scraper shell-script youtube-dl yt-dlp
Last synced: 31 Jul 2024
https://github.com/html2rss/html2rss
📰 Build RSS 2.0 feeds from websites (and JSON APIs) with a few CSS selectors.
atom-feed extract feed feed-configs html html2rss json rss rss-aggregator rss-bridge rss-builder rss-feed rss-feed-scraper rss-generator ruby scrape scraper scraping scraping-websites yahoo-pipes
Last synced: 30 Jul 2024
https://github.com/Linch1/WeChartWeb3
Build a poocoin clone, scrape all the prices from pancakeswap or any other similar dex, build an historical record and offer an api to your users.
blockchain bsc cryptocurrency dex dextools ethereum historical-data nodejs pancakeswap poocoin scraper uniswap
Last synced: 02 Aug 2024
https://github.com/lachlanjc/predictcovid
Visualize & track the 2020 COVID-19 pandemic by country.
coronavirus covid-19 covid19 dataviz prisma2 redwoodjs scraper
Last synced: 02 Oct 2024
https://github.com/aahouzi/instagram-scraper-2021
Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).
browsermob-proxy data facebook facebook-graph-api graphql-api instagram instagram-api instagram-bot instagram-crawler instagram-feed instagram-scraper instagram-stories meta scraper selenium webscraping
Last synced: 27 Sep 2024
https://github.com/cdimascio/essence
Automatically extract the main text content (and more) from an HTML document
extractor hacktoberfest html-extractor scraper web-content-extractor webpage-extractor website-extractor
Last synced: 02 Oct 2024
https://github.com/aahouzi/Instagram-Scraper-2021
Scrape Instagram content and stories, using a new technique based on the har file (No Token + No public API).
browsermob-proxy data facebook facebook-graph-api graphql-api instagram instagram-api instagram-bot instagram-crawler instagram-feed instagram-scraper instagram-stories meta scraper selenium webscraping
Last synced: 31 Jul 2024
https://github.com/Crinibus/scraper
Web scraper for scraping, tracking and visualizing prices of products on various websites.
amazon avcables computersalg coolshop ebay elgiganten expert komplett mm-vision newegg prices products proshop python scrape-prices scraper sharkgaming shein tech-scraper web-scraping
Last synced: 01 Aug 2024
https://github.com/codingforentrepreneurs/Web-Scraping
Learn how to leverage Python's amazing tools to scrape data from other websites. The end goal of this course is to scrape blogs to analyze trending keywords and phrases. We'll be using Python 3.6, Requests, BeautifulSoup, Asyncio, Pandas, Numpy, and more!
aysncio beautifulsoup beautifulsoup4 joincfe numpy pandas python python-requests python3 requests scraper sraping tutorial web-scraping
Last synced: 05 Aug 2024
https://github.com/situmorang-com/whatsapp-group-contacts-scraper
How to scrap whatsapp group contacts from https://web.whatsapp.com/
javascript scraper whatsapp whatsapp-group whatsapp-parser whatsapp-web
Last synced: 30 Sep 2024
https://github.com/shailshouryya/yt-videos-list
Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.
automation bravedriver chromedriver csv firefox-headless geckodriver operadriver safaridriver scraper selenium txt youtube youtube-api youtube-channel youtube-dl youtube-downloader youtube-playlist yt yt-downloader ytdl
Last synced: 28 Sep 2024
https://github.com/Karlheinzniebuhr/the-weather-scraper
A Lightweight Weather Scraper
datasets machine-learning scraper weather wunderground
Last synced: 01 Aug 2024
https://github.com/pavlovtech/WebReaper
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping
Last synced: 01 Aug 2024
https://github.com/scrapehero/zillow_real_estate
Zillow.com Web Scraper written in Python and LXML to extract real estate listings available based on a zip code.
html lxml parsing python-requests scraper web-scraping
Last synced: 01 Aug 2024
https://github.com/jadkins89/Recipe-Scraper
A JS package for scraping recipes from the web.
food-recipes recipe-scraper recipes scraper
Last synced: 01 Aug 2024
https://github.com/st1vms/unofficial-claude-api
Unofficial Claude API supporting direct HTTP chat creation/deletion/retrieval, messages with multiple file attachments and auto session gathering using Firefox with geckodriver.
api assistant chatbot claude claude-api claude3 documented easy-to-use file-attachment firefox free image-processing image-recognition large-file-upload long-text python scraper selenium summarizer unofficial-api
Last synced: 01 Aug 2024
https://github.com/mondeja/pymarketcap
Python3 API wrapper and web scraper for https://coinmarketcap.com
api asyncio c coinmarketcap cryptocurrencies cryptotrading cython graphs libcurl pypi python scraper trading urllib
Last synced: 01 Oct 2024
https://github.com/patxijuaristi/google_maps_scraper
Script to scrape data from Google Maps places.
google-maps google-maps-scraping google-my-business python scraper selenium selenium-python
Last synced: 13 Aug 2024
https://github.com/cowboy-bebug/app-store-scraper
Single API ☝ App Store Review Scraper 🧹
app-store appstore review-data scraper
Last synced: 01 Aug 2024
https://github.com/henson/Scraper
Tracking the most popular Github repos, updated daily.
Last synced: 01 Aug 2024
https://github.com/ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
archiveteam archiving crawl crawler crawlers crawling downloader ftp lua scraper scraping spider warc webarchiving wget wget-lua zstd
Last synced: 06 Aug 2024
https://github.com/html2rss/html2rss-web
🕸 Generates and delivers RSS feeds via HTTP. Docker image available! Create your own feeds or get started quickly with the included configs.
builder docker feed feed-configs html2rss html2rss-configs roda rolling-release rss rss-aggregator rss-feed rss-feed-scraper ruby scraper serves webfeed webfeeds website-scraper
Last synced: 30 Jul 2024
https://github.com/gurbaaz27/linkedin-comments-scraper
Script to scrape comments (including name, profile link, pfp, designation, email(if present), and comment) from a LinkedIn post from the URL of the post.
linkedin linkedin-comments-scraper linkedin-post python python3 scraper selenium selenium-python selenium-webdriver webscraping
Last synced: 28 Sep 2024
https://github.com/qeeqbox/osint
Build custom OSINT tools and APIs (Ping, Traceroute, Scans, Archives, DNS, Scrape, Whois, Metadata & built-in database for more info) with this python package
dns osint ping python scan scraper tool traceroute whois
Last synced: 01 Aug 2024
https://github.com/ScrapingAnt/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
amazon amazon-scraper amazon-scraping-library data-mining js node-js price-scraper price-scraping scrape-products scraper scraping scraping-api scraping-data scraping-python scraping-web scraping-websites web-crawler web-crawlers web-crawling
Last synced: 01 Aug 2024
https://github.com/5agado/conversation-analyzer
Analyzer and statistics generator for text-based conversations. Includes Facebook scraper and parser
data-science facebook quantified-self scraper
Last synced: 01 Aug 2024
https://github.com/bellingcat/reddit-post-scraping-tool
Given a subreddit name and a keyword, this program returns all top (by default) posts that contain the specified keyword.
command-line gui open-source-research python reddit scraper visual-basic
Last synced: 02 Aug 2024
https://github.com/scoooooott/tinyPornManager
A Pornhub.com scraper addon for tinyMediaManager v4
metadata playwright playwright-java pornhub pornhub-metadata pornhub-scraper scraper tinymediamanager
Last synced: 01 Aug 2024
https://github.com/philshem/gmaps_popular_times_scraper
Scraper for Google Maps "Popular Times" for place entries
google-maps python3 scraper scrapers
Last synced: 01 Aug 2024
https://github.com/aofdev/instagram-get-images
Instagram get images 🌄 (hashtags, account, locations) with puppeteer
hacktoberfest images instagram instagram-scraper puppeteer scraper
Last synced: 03 Aug 2024
https://github.com/piquette/qtrn
A cli tool to streamline financial markets data analysis :wrench:
cli data data-science finance go golang options quotes scraper stock stock-analysis stock-market
Last synced: 01 Aug 2024
https://github.com/linkpreview/linkpreview
Open Graph, Twitter Card, Oembed preview. Shows visual cards that mimic link previews in Social Media like facebook, twitter, vk and other sites that support link preview.
cheeriojs linkpreview nodejs oembed opengraph react reactjs redux scraper scraping twittercard
Last synced: 30 Jul 2024
https://github.com/mahesh-hegde/rrip
Bulk image downloader for reddit.
downloader golang reddit scraper
Last synced: 01 Aug 2024
https://github.com/DrakenWan/Yale3
A simple LinkedIn profile scraper implemented as a chrome extension
browser-extension chrome chrome-extension hrms linkedin linkedin-profile linkedin-profile-scraping-tool linkedin-scraper recruiter recruiting recruitment scraper
Last synced: 31 Jul 2024
https://github.com/philipjkim/goreadability
Webpage summary extractor using Facebook Open Graph and arc90's readability
Last synced: 03 Aug 2024
https://github.com/openbytedev/sourcescraper
Simple library which helps you to retrieve the source of various video streaming sites.
extractor nodejs npm-package scraper scrapers scraping scraping-tool source-extraction
Last synced: 28 Sep 2024
https://github.com/scrapehero/yellowpages-scraper
Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.
business-directory extract html lxml parsing python scraper web-scraper yellow-pages yellow-pages-scraper
Last synced: 01 Aug 2024
https://github.com/OpenByteDev/SourceScraper
Simple library which helps you to retrieve the source of various video streaming sites.
extractor nodejs npm-package scraper scrapers scraping scraping-tool source-extraction
Last synced: 31 Jul 2024
https://github.com/lapwat/reCatchable
Turn a site into a book. Download a whole website and upload it to your reMarkable.
ebook epub remarkable remarkable-tablet remarkable-tablets scrape scraper
Last synced: 01 Aug 2024
https://github.com/rodolflying/GPT_scraper
This repository provides a way to scrape full user history (or use) ChatGPT through 2 methods: frontend "hidden" API based or Selenium based, both have their own pros. It can be helpful for avoiding the usage of API credits while still using ChatGPT programmatically
automation chatgpt chrome gpt4 scraper selenium webdriver
Last synced: 05 Aug 2024
https://github.com/not-kennethreitz/pysoundcloud
Scraping the Un–scrapable™
kennethreitz python requests-html scraper soundcloud
Last synced: 29 Sep 2024
https://github.com/daijro/SearchifyX
Fast flashcard searcher study tool
education quizizz quizlet scraper webscraper webscraping
Last synced: 04 Aug 2024
https://github.com/absingh31/tor_spider
Python project to crawl and scrap the lesser known deep web or one can say dark web. Just provide the onion link and get started.
crawler file-manager ioc python3 scraper scraping socks stem tor tor-config tor-spider
Last synced: 03 Aug 2024
https://github.com/fa0311/twitter-openapi-typescript
Implementation of Twitter internal API (Twitter graphql API) in TypeScript
graphql openapi scraper twitter typescript undocumented unofficial
Last synced: 31 Jul 2024
https://github.com/orsifrancesco/instagram-without-api-node
A simple Node.js code to get unlimited instagram public pictures by every user without api, without credentials.
instagram instagram-api instagram-scraper node node-js nodejs scraper scraping scraping-api without-api
Last synced: 02 Aug 2024
https://github.com/bajins/tool-gin
基于go-gin框架建立减少冗余动作项目,如:下载一些工具
crawler gin gin-gonic golang key keygen mobaxterm-keygen navicat nginx-conf nginx-configuration python3 registry-workshop scraper shell spider xftp xmanager xshell
Last synced: 01 Oct 2024
https://github.com/vanyasem/vk-scraper
Scrape VK media
api downloader python scrape scraper vk vk-api vkontakte vkontakte-api
Last synced: 25 Sep 2024
https://github.com/Matthew17-21/Captcha-Tools
All-in-one Python (And now Go!) module to help solve captchas with Capmonster, 2captcha, Anticaptcha, and Capsolver API's!
2captcha 2captcha-api anticaptcha anticaptcha-client capsolver capsolvercom captcha hcaptcha recaptcha scraper scraping scraping-api sneakerbot sneakerbots sneakers
Last synced: 01 Aug 2024
https://github.com/fa0311/TwitterFrontendFlow
Unofficial Client for Twitter Internal API
scraper twitter twitter-bot unofficial
Last synced: 04 Aug 2024
https://github.com/daijro/EssayGen
Essay generator
bot essay essay-generation python scraper tor webscraper webscraping
Last synced: 07 Aug 2024
https://github.com/ozencb/yts-scraper
Download .torrent files from YTS YIFY
downloader python scraper torrent-files yify yts yts-scraper
Last synced: 01 Aug 2024
https://github.com/nazliander/scrape-nr-of-deaths-istanbul
A scraper and simple time series analysis example with Selenium and Seaborn.
docker scraper selenium-python
Last synced: 12 Aug 2024
https://github.com/LexiestLeszek/scrapeGPT
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 01 Aug 2024
https://github.com/hfreire/browser-as-a-service
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
browser browser-as-a-service crawler docker github-actions javascript puppeteer rest-api scraper server webcrawler
Last synced: 02 Aug 2024
https://github.com/a11ywatch/crawler
gRPC web crawler turbo charged for performance
a11ywatch crawler grpc scraper
Last synced: 29 Sep 2024
https://github.com/bjesus/pipet
a swiss-army tool for scraping and extracting data from online assets, made for hackers
css curl gjson json playwright scraper scraping
Last synced: 01 Oct 2024
https://github.com/oscarmorrison/nightmare-heroku
😱 a setup for nightmarejs on heroku
heroku nightmare nightmarejs node scary scraper
Last synced: 06 Aug 2024
https://github.com/orsifrancesco/instagram-without-api
A simple PHP code to get unlimited instagram public pictures by every user without api, without credentials.
instagram instagram-api instagram-scraper php scraper scraping without-api
Last synced: 02 Aug 2024
https://github.com/crackernutter/EsriRESTScraper
A Python class that scrapes ESRI Rest Endpoints and exports data to a geodatabase
arcgis-server esri featureclass geodatabase geometry ijson polygon python rest-api schema scraper
Last synced: 13 Aug 2024
https://github.com/serpapi/public-roadmap
Public Roadmap for SerpApi, LLC (https://serpapi.com)
baidu-scraper google-image-scraper google-maps-scraping google-search-scraper scraper scraping serp-api serpapi web-scraper web-scraping webscraping yahoo-scraper
Last synced: 08 Aug 2024
https://github.com/HelloChatterbox/wikipedia_for_humans
api scraper wiki wikipedia wikipedia-api
Last synced: 01 Aug 2024
https://github.com/tibobrc/Blinkist-to-Readwise
Extract highlights from your Blinkist account and upload them to your Readwise account, or download them to a CSV file.
blinkist blinkist-highlights blinkist-to-readwise highlights python readwise readwise-highlights scraper
Last synced: 02 Aug 2024
https://github.com/lorepozo/magnet
Search for a torrent from the command-line and start streaming
magnet-link scraper stream torrent
Last synced: 31 Jul 2024
https://github.com/greenpeace/gpes-check-my-pages
Scrapping script used to test the Spanish web archive and redirects system, with more than 10,000 pages. It checks redirections, http responses, analytics, files hosted in soon-to-die servers, canonical urls and more.
command-line-tool csv golang scraper
Last synced: 03 Aug 2024
https://github.com/openzim/warc2zim
Command line tool to convert a file in the WARC format to a file in the ZIM format
Last synced: 09 Aug 2024
https://github.com/dobizz/TikTok
Download public videos on TikTok using Python with Selenium
chromedriver concurrency downloader javascript python3 reverse-engineering robots scraper selenium tiktok tiktok-api
Last synced: 29 Jul 2024
https://github.com/Tatsh/youtube-unofficial
Access parts of your account unavailable through normal YouTube API access.
command-line python scraper utilities utility youtube
Last synced: 31 Jul 2024
https://github.com/sanghviharshit/pocket-tagger
📖👓🏷Tag your getpocket.com articles automatically using natural language processing
articles getpocket google-cloud natural-language-processing nlp pocket scraper tag
Last synced: 31 Jul 2024
https://github.com/un1cum/free-proxies-and-useragents
Proxies & useragents parser
free free-proxies free-proxy free-useragent linux proxy proxy-scraper python python3 scraper termux termux-tool useragent useragent-scraper windows windows-10 windows-11 windows10 windows11
Last synced: 27 Sep 2024
https://github.com/bellingcat/vk-url-scraper
Scrape VK URLs to fetch info and media - python API or command line tool.
command-line media-downloader open-source-research python scraper vk
Last synced: 26 Sep 2024
https://github.com/tamarasaurus/immo-feed
A extensible app for scraping property listings
api immobilier real-estate scraper
Last synced: 12 Aug 2024
https://github.com/kunalnagarco/imdb-scraper
🎬 An attempt at the most complete IMDb API
imdb imdb-api imdb-dataset imdb-information imdb-movies imdb-webscrapping scraper scraping-api scraping-websites
Last synced: 31 Jul 2024
https://github.com/donderjoekel/Mangarr
An *arr inspired approach to downloading manga using individual sources
manga manga-scraper manhua manhua-scraper manhwa manhwa-scraper scraper
Last synced: 01 Aug 2024
https://github.com/kalbhor/Image-Scraper
Fast concurrent image scraper
golang image-scraper multithreading scraper
Last synced: 04 Aug 2024
https://github.com/mattmoony/d4v1d
Social-Media OSINT tool - gather info on users across multiple platforms; easily extensible by design. 📷
graph information-gathering instagram network osint py python recon reconnaissance scraper social-network web
Last synced: 01 Oct 2024
https://github.com/RandomNinjaAtk/docker-raromprocessor
RA ROM Processor is a Docker container that is used to aquire/orgainze/process/verify/dedupe/scrape a ROMs library automatically by matching ROMs to the RetroAchievement.org website Hash database.
bash emulationstation rahasher retroachievements retrogaming roms scraper script
Last synced: 01 Aug 2024
https://github.com/xiaoluoboding/vercel-metafy
Easily scrape metadata from websites as a service using Vercel.
metadata scraper serverless-functions vercel
Last synced: 10 Aug 2024
https://github.com/yjl9903/AnimeGarden
動漫花園 3-rd party mirror site and Anime Torrent aggregation site
animation anime anime-tracker animegarden animelist animespace anitomy bangumi dmhy scraper torrent
Last synced: 06 Aug 2024