Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with scraper

A curated list of projects in awesome lists tagged with scraper .

https://github.com/cantino/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 31 Jul 2024

https://github.com/huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 29 Sep 2024

https://github.com/NaiboWang/EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

batch-processing batch-script code-free crawler data-collection frontend gui html input-parameters layman parameters robotics rpa scraper spider visual visualization visualprogramming web www

Last synced: 31 Jul 2024

https://github.com/cheeriojs/cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.

cheerio dom hacktoberfest html htmlparser htmlparser2 jquery parser scraper selector

Last synced: 29 Sep 2024

https://github.com/iawia002/annie

👾 Fast and simple video download library and CLI tool written in Go

bilibili crawler download downloader go golang iqiyi qq scraper tumblr video youku youtube

Last synced: 30 Jul 2024

https://github.com/iawia002/lux

👾 Fast and simple video download library and CLI tool written in Go

bilibili crawler download downloader go golang iqiyi qq scraper tumblr video youku youtube

Last synced: 29 Sep 2024

https://github.com/asciimoo/colly

Elegant Scraper and Crawler Framework for Golang

crawler crawling framework go golang scraper scraping spider

Last synced: 30 Jul 2024

https://github.com/gocolly/colly

Elegant Scraper and Crawler Framework for Golang

crawler crawling framework go golang scraper scraping spider

Last synced: 29 Sep 2024

https://github.com/apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

Last synced: 29 Sep 2024

https://github.com/codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

crawler crawling news news-aggregator python scraper

Last synced: 29 Sep 2024

https://github.com/feder-cr/auto_jobs_applier_aihawk

Auto_Jobs_Applier_AIHawk is a tool that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized way.

application-resume automate automation bot challenge chatgpt chrome gpt human-resources job jobs jobsearch jobseeker opeai python python3 resume scraper scraping selenium

Last synced: 02 Oct 2024

https://github.com/apifytech/apify-js

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation crawler crawling headless headless-chrome javascript nodejs npm playwright puppeteer scraper scraping typescript web-crawler web-crawling web-scraping

Last synced: 05 Aug 2024

https://github.com/pwxcoo/chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

chinese chinese-characters chinese-language chinese-nlp chinese-simplified chinese-traditional data json json-data json-dataset python3 scraper

Last synced: 02 Oct 2024

https://github.com/guyueyingmu/avbook

AV 电影管理系统, avmoo , javbus , javlibrary 爬虫,线上 AV 影片图书馆,AV 磁力链接数据库,Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

adult adult-video avmoo crawler database guzzlehttp javbus javlibrary laravel magnet magnet-link scraper spider

Last synced: 30 Sep 2024

https://github.com/evil0ctal/douyin_tiktok_download_api

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

api asgi async asyncio crawler douyin douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi httpx no-watermark online-parsing python pywebio scraper spider tiktok tiktok-scraper web-scraping

Last synced: 29 Sep 2024

https://github.com/mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler

Last synced: 29 Sep 2024

https://github.com/Evil0ctal/Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。

api asgi async asyncio crawler douyin douyin-scraper douyin-tiktok-api douyin-tiktok-download fastapi httpx no-watermark online-parsing python pywebio scraper spider tiktok tiktok-scraper web-scraping

Last synced: 31 Jul 2024

https://github.com/fent/node-ytdl-core

YouTube video downloader in javascript.

node scraper video-downloader youtube youtube-downloader

Last synced: 29 Sep 2024

https://github.com/madawei2699/mygptreader

A community-driven way to read and chat with AI bots - powered by chatGPT.

ai chatgpt crawler daily-news embedding gpt-35-turbo hot-news openai prompt reader scraper slack-bot

Last synced: 30 Sep 2024

https://github.com/madawei2699/myGPTReader

A community-driven way to read and chat with AI bots - powered by chatGPT.

ai chatgpt crawler daily-news embedding gpt-35-turbo hot-news openai prompt reader scraper slack-bot

Last synced: 31 Jul 2024

https://github.com/justanotherarchivist/snscrape

A social networking service scraper in Python

python scraper social-media social-network

Last synced: 30 Sep 2024

https://github.com/JustAnotherArchivist/snscrape

A social networking service scraper in Python

python scraper social-media social-network

Last synced: 31 Jul 2024

https://github.com/apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation beautifulsoup crawler crawling headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping

Last synced: 30 Sep 2024

https://github.com/IonicaBizau/scrape-it

🔮 A Node.js scraper for humans.

hacktoberfest node-scraper scraper

Last synced: 31 Jul 2024

https://github.com/ionicabizau/scrape-it

🔮 A Node.js scraper for humans.

hacktoberfest node-scraper scraper

Last synced: 30 Sep 2024

https://github.com/UltimaHoarder/UltimaScraper

Scrape all the media from an OnlyFans account - Updated regularly

archive datascraping onlyfans scraper

Last synced: 31 Jul 2024

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 27 Sep 2024

https://github.com/javscraper/emby.plugins.javscraper

Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。

adult emby fanart-poster fc2 japanese jav jav-scraper javbus jellyfin jsproxy metadata plugin scraper synology

Last synced: 30 Sep 2024

https://github.com/JavScraper/Emby.Plugins.JavScraper

Emby/Jellyfin 的一个日本电影刮削器插件,可以从某些网站抓取影片信息。

adult emby fanart-poster fc2 japanese jav jav-scraper javbus jellyfin jsproxy metadata plugin scraper synology

Last synced: 31 Jul 2024

https://github.com/aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

python python3 scraper scraping selenium

Last synced: 01 Aug 2024

https://github.com/jae-jae/querylist

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

crawler querylist scraper spider

Last synced: 30 Sep 2024

https://github.com/jae-jae/QueryList

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

crawler querylist scraper spider

Last synced: 30 Jul 2024

https://github.com/geziyor/geziyor

Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.

crawler go scraper scraping spider

Last synced: 01 Oct 2024

https://github.com/meetDeveloper/freeDictionaryAPI

There was no free Dictionary API on the web when I wanted one for my friend, so I created one.

api dictionary-api dictonary free-api google google-dictionary scraper

Last synced: 31 Jul 2024

https://github.com/lucasjinreal/weibo_terminater

Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator

chatbot chinese corpus scraper sina weibo

Last synced: 30 Sep 2024

https://github.com/facundoolano/google-play-scraper

Node.js scraper to get data from Google Play

api crawler google-play nodejs scraper

Last synced: 01 Oct 2024

https://github.com/serene-arc/bulk-downloader-for-reddit

Downloads and archives content from reddit

archive downloader gfycat imgur python reddit scraper

Last synced: 30 Sep 2024

https://github.com/aliparlakci/bulk-downloader-for-reddit

Downloads and archives content from reddit

archive downloader gfycat imgur python reddit scraper

Last synced: 31 Jul 2024

https://github.com/PaulMcInnis/JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 31 Jul 2024

https://github.com/paulmcinnis/jobfunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 29 Sep 2024

https://github.com/feder-cr/linkedIn_auto_jobs_applier_with_AI

LinkedIn_AIHawk is a tool that automates the jobs application process on LinkedIn. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized way.

application-resume automate automation bot challenge chatgpt chrome gpt job jobsearch jobseeker linkedin-api linkedin-scraper opeai python python3 resume scraper scraping selenium

Last synced: 12 Aug 2024

https://github.com/extractus/article-extractor

To extract main article from given URL with Node.js

article article-extractor article-parser crawler extract nodejs readability scraper

Last synced: 01 Oct 2024

https://github.com/website-scraper/node-website-scraper

Download website to local directory (including all css, images, js, etc.)

hacktoberfest javascript nodejs scraper website-scraper

Last synced: 30 Sep 2024

https://github.com/ahmadibrahiim/website-downloader

💡 Download the complete source code of any website (including all assets). [ Javascripts, Stylesheets, Images ] using Node.js

assets downloader offline-web-pages scraper

Last synced: 30 Sep 2024

https://github.com/edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

bugbounty crawler crawling endpoint-discovery endpoints go golang hacktoberfest infosec osint penetration-testing pentesting recon reconnaissance redteam scraper secret-keys secrets-detection security security-tools

Last synced: 30 Sep 2024

https://github.com/AhmadIbrahiim/Website-downloader

💡 Download the complete source code of any website (including all assets). [ Javascripts, Stylesheets, Images ] using Node.js

assets downloader offline-web-pages scraper

Last synced: 01 Aug 2024

https://github.com/teamnewpipe/newpipeextractor

NewPipe's core library for extracting data from streaming sites

bandcamp crawler extractor mediaccc newpipe peertube scraper soundcloud youtube

Last synced: 26 Sep 2024

https://github.com/felipecsl/wombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

crawler dsl ruby scraper

Last synced: 01 Oct 2024

https://github.com/Adyzng/jd-autobuy

Python爬虫,京东自动登录,在线抢购商品

crawler jingdong python scraper

Last synced: 04 Aug 2024

https://github.com/justfoolingaround/animdl

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

anime download python scraper

Last synced: 30 Sep 2024

https://github.com/lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

crawler crawler-python crawling extraction-engine html machine-learning scraper scraping

Last synced: 30 Sep 2024

https://github.com/TeamNewPipe/NewPipeExtractor

NewPipe's core library for extracting data from streaming sites

bandcamp crawler extractor mediaccc newpipe peertube scraper soundcloud youtube

Last synced: 07 Aug 2024

https://github.com/avnsx/fansly-downloader

Easy to use fansly.com content downloading tool. Written in python, but ships as a standalone Executable App for Windows too. Enjoy your Fansly content offline anytime, anywhere in the highest possible content resolution! Fully customizable to download in bulk or single: photos, videos & audio from timeline, messages, collection & specific posts 👍

cross-platform database datascraping downloader fansly fansly-download fansly-downloader fansly-scraper gui image-download linux macos open-source portable python reddit scraper video video-download windows

Last synced: 25 Sep 2024

https://github.com/Avnsx/fansly-downloader

Easy to use fansly.com content downloading tool. Written in python, but ships as a standalone Executable App for Windows too. Enjoy your Fansly content offline anytime, anywhere in the highest possible content resolution! Fully customizable to download in bulk or single: photos, videos & audio from timeline, messages, collection & specific posts 👍

cross-platform database datascraping downloader fansly fansly-download fansly-downloader fansly-scraper gui image-download linux macos open-source portable python reddit scraper video video-download windows

Last synced: 04 Aug 2024

https://github.com/cinemagoer/cinemagoer

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies

actors cast character cinema cinemagoer company database db imdb internet-movie-database movie movie-database movies parser python scraper sql

Last synced: 01 Oct 2024

https://github.com/huaying/instagram-crawler

Get Instagram posts/profile/hashtag data without using Instagram API

auto autoliker instagram instagram-bot instagram-crawler instagram-liker instagram-scraper likers python scraper webdriver

Last synced: 30 Sep 2024

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 03 Oct 2024

https://github.com/vesche/scanless

online port scan scraper

command-line pentesting port-scanner scanning scraper

Last synced: 26 Sep 2024

https://github.com/shadowmoose/RedditDownloader

Scrapes Reddit to download media of your choice.

archival backup downloader media python3 reactjs reddit scraper

Last synced: 01 Aug 2024

https://github.com/shadowmoose/redditdownloader

Scrapes Reddit to download media of your choice.

archival backup downloader media python3 reactjs reddit scraper

Last synced: 30 Sep 2024

https://github.com/zerodytrash/TikTok-Live-Connector

Node.js library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.

api api-wrapper bot broadcast chat chat-reader connector hacktoberfest javascript live livestream nodejs package scraper stream tiktok tiktok-api tiktok-live webcast websocket

Last synced: 01 Aug 2024

https://github.com/metafates/mangal

📖 The most advanced (yet simple) cli manga downloader in the entire universe! Lua scrapers, export formats, anilist integration, fancy TUI and more!

anilist anime cli comic-downloader command-line go golang linux lua macos manga manga-downloader manga-reader mangadex mangal pdf scraper terminal tui windows

Last synced: 01 Oct 2024

https://github.com/altimis/scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

dowload-images followers following python save-image scrape scrape-followers scrape-following scrape-images scrape-likes scrape-tweets scraper scraping selenium-webdriver tweets twitter twitter-scraper

Last synced: 28 Sep 2024

https://github.com/vifreefly/kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

crawler headless-chrome kimurai scraper scrapy

Last synced: 30 Sep 2024

https://github.com/fredwu/crawler

A high performance web crawler / scraper in Elixir.

crawler elixir files offline scraper scraper-engine spider

Last synced: 31 Jul 2024

https://github.com/vladkens/twscrape

2024! X / Twitter API scrapper with authorization support. Allows you to scrape search results, User's profiles (followers/following), Tweets (favoriters/retweeters) and more.

api async automation elonmusk httpx python scraper snscrape twitter twitter-api twitter-bot twitter-scraper x-api

Last synced: 01 Oct 2024

https://github.com/Nriver/Episode-ReName

电视剧/番剧自动化重命名工具, 一键批量改名. 可配合QBittorrent下载后自动重命名, 方便Emby自动刮削. 支持Windows, Linux, MacOS, Docker 和 群晖套件环境运行

automation command-line command-line-tool docker linux macos python python3 qbittorrent rename rename-script scraper synology windows

Last synced: 02 Aug 2024

https://github.com/oltarasenko/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 01 Aug 2024

https://github.com/elixir-crawly/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

crawler crawling elixir erlang extract-data scraper scraping scraping-websites spider

Last synced: 29 Sep 2024

https://github.com/jikan-me/jikan

Unofficial MyAnimeList PHP+REST API which provides functions other than the official API

anime api json library manga myanimelist myanimelist-api parsing php psr-2 psr-4 rest rest-php scraper

Last synced: 04 Aug 2024

https://github.com/iawia002/Lulu

[Unmaintained] A simple and clean video/music/image downloader 👾

crawler crawling downloader python python3 scraper scraping video

Last synced: 09 Aug 2024

https://github.com/DataHenHQ/till

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

crawler man-in-the-middle mitm proxy-server scraper scraping web-scraping

Last synced: 31 Jul 2024

https://github.com/postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider

Last synced: 31 Jul 2024

https://github.com/feder-cr/Auto_Jobs_Applier_AIHawk

LinkedIn_AIHawk is a tool that automates the jobs application process on LinkedIn. Utilizing artificial intelligence, it enables users to apply for multiple job offers in an automated and personalized way.

application-resume automate automation bot challenge chatgpt chrome gpt job jobsearch jobseeker linkedin-api linkedin-scraper opeai python python3 resume scraper scraping selenium

Last synced: 24 Sep 2024

https://github.com/sananth12/ImageScraper

:scissors: High performance, multi-threaded image scraper

command-line commandline-tool pypi python scraper scraping terminal

Last synced: 31 Jul 2024

https://github.com/JosephLai241/URS

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

archiving command-line comments csv data-analysis data-science json livestream osint-tool praw pyo3 python reddit reddit-scraper redditor rust scraper subreddit trees wordcloud

Last synced: 31 Jul 2024

https://github.com/ruippeixotog/scala-scraper

A Scala library for scraping content from HTML pages

dsl hacktoberfest html-parsing scala scraper

Last synced: 31 Jul 2024

https://github.com/fanyong920/jvppeteer

Headless Chrome For Java (Java 爬虫)

chrome chrome-headless crawler java jvppeteer puppeteer scraper

Last synced: 27 Sep 2024

https://github.com/gajus/surgeon

Declarative DOM extraction expression evaluator. 👨‍⚕️

css-selector parser scraper subroutines

Last synced: 01 Oct 2024

https://github.com/graniet/operative-framework

operative framework is a rust investigation OSINT framework, you can interact with multiple targets, execute multiple modules, create links with target, export rapport to PDF file, add note to target or results, interact with RESTFul API, write your own modules.

enterprise fingerprint forensics framework gathering geoint investigation linkedin osint phone rust rust-lang scraper societe whatsapp whatsapp-api whatsapp-web

Last synced: 11 Aug 2024

https://github.com/gaulliath/operative-framework

operative framework is a rust investigation OSINT framework, you can interact with multiple targets, execute multiple modules, create links with target, export rapport to PDF file, add note to target or results, interact with RESTFul API, write your own modules.

enterprise fingerprint forensics framework gathering geoint investigation linkedin osint phone rust rust-lang scraper societe whatsapp whatsapp-api whatsapp-web

Last synced: 30 Sep 2024

https://github.com/d60/twikit

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

api-wrapper bot client free python python3 scrape scraper scraping search twitter twitter-api twitter-bot twitter-client twitter-internal-api twitter-scraper twitter-scraper-2023 wrapper x x-api

Last synced: 31 Jul 2024

https://github.com/slotix/dataflowkit

Extract structured data from web sites. Web sites scraping.

cdp chrome-fetcher crawling extract-data go golang golang-library headless scraper scraping scraping-websites

Last synced: 30 Jul 2024

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 30 Sep 2024