Crawler | Ecosyste.ms: Awesome

https://github.com/rajatomar788/pywebcopy

Locally saves webpages to your hard disk with images, css, js & links as is.

archive-tool crawler html html-parser mirror python web webpage

Last synced: 20 Nov 2024

https://github.com/avidlearnerinprogress/python-automation-scripts

Simple yet powerful automation stuffs.

beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day

Last synced: 04 Jan 2025

https://github.com/avidLearnerInProgress/python-automation-scripts

Simple yet powerful automation stuffs.

beautifulsoup codetopdf comic-downloader crawler cricinfo cricket-api crime-data-scraper images imdb-webscrapping instagram instagram-scraper medium-downloader news-scraper pdf pdf-converter quora quora-crawler scraping-websites selenium-webdriver word-of-the-day

Last synced: 10 Nov 2024

https://github.com/erma0/douyin

抖音爬虫——采集账号主页、喜欢、收藏、音乐原声、话题、搜索、合集、作品、关注、粉丝等公开数据。

crawler douyin python spider

Last synced: 29 Oct 2024

https://github.com/zhuyingda/webster

a reliable high-level web crawling & scraping framework for Node.js.

automation-test automation-ui chromium crawler crawling headless-chrome javascript javascript-framework nodejs nodejs-framework puppeteer scraping-framework spider

Last synced: 03 Jan 2025

https://github.com/crawljax/crawljax

Crawljax

crawler crawling dom dynamic event-driven-crawling javascript test-generation web-analysis web-testing

Last synced: 05 Jan 2025

https://github.com/nanshihui/scan-t

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 03 Nov 2024

https://github.com/nanshihui/Scan-T

a new crawler based on python with more function including Network fingerprint search

crawler netfingerprint python sybersecurity

Last synced: 13 Nov 2024

https://github.com/chushuai/wscan

Wscan is a web security scanner that focuses on web security, dedicated to making web security accessible to everyone.

cel-go chromedp crawler headless martian passive-vulnerability-scanner poc sql-injection subdomains testwaf vulnerability-scanner waf webscan wscan xss

Last synced: 21 Nov 2024

https://github.com/jaeksoft/opensearchserver

Open-source Enterprise Grade Search Engine Software

crawler custom-search enterprise indexing java lucene ocr opensearchserver search search-engine synonyms webcrawler webcrawling

Last synced: 04 Jan 2025

https://github.com/abhisharma404/vault

swiss army knife for hackers

crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability

Last synced: 03 Nov 2024

https://github.com/chenjiandongx/mmjpg

👩 美女写真套图爬虫（一）

crawler meinv

Last synced: 05 Jan 2025

https://github.com/dirtyfilthy/freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion

crawler darknet hidden-services onion scraper spider tor

Last synced: 06 Nov 2024

https://github.com/AlexMathew/scrapple

A framework for creating semi-automatic web content extractors

beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression

Last synced: 31 Oct 2024

https://github.com/stanzhai/html2article

Html网页正文提取

article content crawler html spider topic

Last synced: 05 Jan 2025

https://github.com/stanzhai/Html2Article

Html网页正文提取

article content crawler html spider topic

Last synced: 20 Nov 2024

https://github.com/ChenZixinn/spider_reverse

crawler python requests spider

Last synced: 31 Oct 2024

https://github.com/yhy0/Jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)

apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 02 Jan 2025

https://github.com/yhy0/jie

Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gathering, and exploitation, elevating it to an indispensable toolkit for both security professionals and penetration testers.(expectations)

apollo-exp crawler jie scan scanner security-copilot shiro-exp vul vulnerability vulnerability-detection vulnerability-exploitation vulnerability-scanners

Last synced: 04 Jan 2025

https://github.com/shaohua0116/ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

conference crawler data-analysis iclr iclr2020 machine-learning visualization

Last synced: 27 Nov 2024

https://github.com/hect0x7/jmcomic-crawler-python

Python API for JMComic | 提供Python API访问禁漫天堂，同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀

18comic crawler downloader github-actions jmcomic pypi python readthedocs

Last synced: 04 Jan 2025

https://github.com/AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping

Last synced: 26 Oct 2024

https://github.com/andythefactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

articles articles-data crawler datasets-preparation news newspaper3k python requests scraper scraping

Last synced: 03 Jan 2025

https://github.com/tasos-py/Search-Engines-Scraper

Search google, bing, yahoo, and other search engines with python

bing crawler google python scraper search-engine yahoo

Last synced: 20 Nov 2024

https://github.com/rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

automation bot bot-detection chrome chromedriver cloudflare crawler crawling datadome headless headless-chrome playwright puppeteer puppeteer-extra rebrowser scraping selenium stealth web-scraping webdriver

Last synced: 04 Jan 2025

https://github.com/cyubuchen/free_proxy_website

获取免费socks/https/http代理的网站集合

crawler free-proxy-list ip proxy proxy-checker spider

Last synced: 17 Nov 2024

https://github.com/gadfly0x/signature_algorithm

各种App、小程序、网站的请求签名或加密算法。现已有：自如、小红书、蛋壳公寓、luckin coffee(瑞幸咖啡)、bangkokair(曼谷航空)

crawler reverse-engineering spider

Last synced: 11 Nov 2024

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 03 Jan 2025

https://github.com/lgraubner/sitemap-generator

Easily create XML sitemaps for your website.

crawler google seo sitemap sitemap-generator xml-sitemap

Last synced: 27 Nov 2024

https://github.com/heqin-zhu/music-recover

:musical_note: 缓存文件转换为 MP3 文件

crawler mp3 python regex

Last synced: 25 Nov 2024

https://github.com/snakem982/pandora-box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 06 Jan 2025

https://github.com/platonai/PulsarRPA

Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

crawler data-mining data-science rpa scraper scraping web-automation web-crawler web-mining web-scraping web-sql

Last synced: 05 Nov 2024

https://github.com/howie6879/magic_google

Google search results crawler, get google search results that you need

crawler google google-search spider

Last synced: 04 Jan 2025

https://github.com/smuyyh/crawlerforreader

Android 本地网络小说爬虫，基于jsoup及xpath

android bookreader crawler jsoup xpath

Last synced: 06 Jan 2025

https://github.com/shaohua0116/ICLR2019-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on installing and using Selenium and ChromeDriver on Ubuntu.

crawler crawling-python openreview tutorial

Last synced: 27 Nov 2024

https://github.com/mhmdiaa/second-order

Second-order subdomain takeover scanner

crawler crawling infosec mapping penetration-testing penetration-testing-tools pentesting recon reconnaissance security security-tools web-application-security wordlist wordlist-generator

Last synced: 02 Jan 2025

https://github.com/brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

crawler distributed-crawler robot sitemap web-crawler

Last synced: 25 Oct 2024

https://github.com/microsoft/ghcrawler

Crawl GitHub APIs and store the discovered orgs, repos, commits, ...

crawler data github github-api github-webhooks ospo

Last synced: 25 Sep 2024

https://github.com/chishui/jssoup

JavaScript + BeautifulSoup = JSSoup

beautifulsoup crawler html javascript nodejs parser react-native spider

Last synced: 06 Jan 2025

https://github.com/elvisyjlin/media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok

crawler instagram pixiv reddit scraper tiktok tumblr twitter

Last synced: 04 Nov 2024

https://github.com/python3spiders/allnewsspider

澎湃新闻，新浪新闻，腾讯新闻，搜狐新闻，新闻联播，泰晤士报，纽约时报，BBCNews，旨在爬取所有新闻门户网站的新闻，禁止将所得数据商用！

bbc-news crawler newsapi nytimes sina sohu spider tencent thetimes xwlb

Last synced: 07 Jan 2025

https://github.com/duzun/hquery.php

An extremely fast web scraper that parses megabytes of invalid HTML in a blink of an eye. PHP5.3+, no dependencies.

broken-html crawler css-selectors domcrawler fast hquery html html-parser invalid-html jquery-like jquery-selectors parser php psr-0 psr-4 scraper selectors xml xml-parser

Last synced: 04 Jan 2025

https://github.com/Josue87/EmailFinder

Search emails from a domain through search engines

crawler osint

Last synced: 13 Nov 2024

https://github.com/salimk/rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 01 Jan 2025

https://github.com/scrapy-plugins/scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

crawler crawler-detection plugin proxy scraping scrapy

Last synced: 04 Jan 2025

https://github.com/salimk/Rcrawler

An R web crawler and scraper

crawler crawlers r rpackage scraper webcrawler webscraper webscraping webscrapping

Last synced: 25 Oct 2024

https://github.com/Malwarize/webpalm

🕸️ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 08 Nov 2024

https://github.com/xiyuan-fengyu/ppspider

web spider built by puppeteer, support task-queue and task-scheduling by decorators，support nedb / mongodb, support data visualization; 基于puppeteer的web爬虫框架，提供灵活的任务队列管理调度方案，提供便捷的数据保存方案（nedb/mongodb），提供数据可视化和用户交互的实现方案

angular cheerio crawler headless mongodb nedb node node-spider nodejs nodejs-spider proxy puppeteer spider task-queue task-scheduling typescript

Last synced: 04 Jan 2025

https://github.com/crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

crawler crawling hacktoberfest php scraper scraping scraping-websites web-crawler web-crawling web-scraper web-scraping

Last synced: 04 Jan 2025

https://github.com/rivermont/spidy

The simple, easy to use command line web crawler.

crawler crawling python python3 web-crawler web-spider

Last synced: 29 Oct 2024

https://github.com/mikemeliz/torcrawl.py

Crawl and extract (regular or onion) webpages through TOR network

crawler extractor onion osint python tor

Last synced: 17 Nov 2024

https://github.com/dmi3kno/polite

Be nice on the web

crawler memoise r r-package rate-limiter robotstxt rstats rvest scraper webscraping

Last synced: 25 Oct 2024

https://github.com/yangjianxin1/qqmusicspider

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

crawler music musicspider qqmusic scrapy

Last synced: 06 Jan 2025

https://github.com/jackluson/chinese-fund-crawler

中国场外基金数据爬取&汇总分析

crawler fund morningstar

Last synced: 11 Nov 2024

https://github.com/dennis-tra/nebula

🌌 A network agnostic DHT crawler, monitor, and measurement tool that exposes timely information about DHT networks.

cid crawler filecoin golang hacktoberfest ipfs libp2p

Last synced: 05 Jan 2025

https://github.com/krypton-byte/tiktok-downloader

Tiktok Downloader/Scraper using requests & bs4

asynchronous asyncio beautifulsoup bs4 crawler downloader flask krypton-byte lightweight nowm python python3 requests tiktok watermark web without

Last synced: 05 Jan 2025

https://github.com/snakem982/Pandora-Box

A Simple Mihomo GUI.

crawler gui linux mac mihomo windows

Last synced: 28 Oct 2024

https://github.com/MikeMeliz/TorCrawl.py

Crawl and extract (regular or onion) webpages through TOR network

crawler extractor onion osint python tor

Last synced: 06 Nov 2024

https://github.com/jaybizzle/laravel-crawler-detect

A Laravel wrapper for CrawlerDetect - the web crawler detection library

bot crawler detect laravel php spider

Last synced: 05 Jan 2025

https://github.com/infinitbyte/gopa

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 14 Dec 2024

https://github.com/infinilabs/crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)

crawler crawling elasticsearch lightweight scraping spider web-crawler web-scraping web-spider

Last synced: 03 Jan 2025

https://github.com/TikHubIO/TikHub-API-Python-SDK

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).

api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin

Last synced: 29 Oct 2024

https://github.com/lgraubner/sitemap-generator-cli

Creates an XML-Sitemap by crawling a given site.

cli crawler google seo sitemap xml-sitemap

Last synced: 11 Nov 2024

https://github.com/yaroslaff/nudecrawler

Crawl telegra.ph searching for nudes!

crawl crawler find nsfw nsfw-recognition nude nudes nudity-detection onlyfans python python3 scrape scraper scraping search spider telegra-ph tits web-scraping webscraping

Last synced: 04 Jan 2025

https://github.com/twtrubiks/line-bot-tutorial

line-bot-tutorial use python flask

bot crawler heroku line ptt python-flask tutorial

Last synced: 05 Jan 2025

https://github.com/flairnlp/fundus

A very simple news crawler with a funny name

cc-news commoncrawl corpus crawler news-crawler news-scraping nlp python rss scraper sitemap text-extraction web-corpus web-scraping

Last synced: 05 Jan 2025

https://github.com/oppsec/pinkerton

🕵️ Pinkerton is an JavaScript file crawler and secret finder tool developed in Python

crawl crawler hacktoberfest javascript pentest python python3 redteam secrets

Last synced: 07 Jan 2025

https://github.com/mustafadalga/instagram-bot

An Instagram bot developed using the Selenium Framework

automation automation-selenium bot bulk-comments bulk-unfollow crawler crawling download-stories instagram instagram-api instagram-bot instagram-downloader instagram-without-api mass-liking python python3 selenium selenium-framework selenium-python selenium-webdriver

Last synced: 28 Sep 2024

https://github.com/GraySilver/wencai

This is a wencai crawler.（i问财的策略回测接口的Pythonic工具包）

crawler finance pandas quant quantitative-finance tushare wencai

Last synced: 30 Oct 2024

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 06 Jan 2025

https://github.com/evil0ctal/fast-powerful-whisper-ai-services-api

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

asr crawler douyin-api fastapi faster-whisper openai-whisper speech-recognition speech-to-text speech-to-text-api tiktok-analytics tiktok-api tiktok-crawler video-analysis whisper-ai whisper-api whisperbot

Last synced: 06 Jan 2025

https://github.com/s0rg/crawley

The unix-way web crawler

cli crawler go golang golang-application pentest pentest-tool pentesting unix-way web-crawler web-scraping web-spider

Last synced: 01 Jan 2025

https://github.com/eight04/comiccrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 03 Jan 2025

https://github.com/devanshbatham/Gorecon

Gorecon is a All in one Reconnaissance Tool , a.k.a swiss knife for Reconnaissance , A tool that every pentester/bughunter might wanna consider into their arsenal

admin-panel-finder backups-finder cmsdetecter configurationfiles crawler directory-bruteforce dns dnsrecon email-hunter geo-ip nameserver recon reconaissance reverse-dns scanner subdomain-enumeration subdomain-scanner subnet-lookup whois-lookup wordpress-scanner

Last synced: 04 Nov 2024

https://github.com/eight04/ComicCrawler

An image crawler written in Python.

cli crawler gui image-crawler python tkinter

Last synced: 07 Dec 2024

https://github.com/zhupingqi/RuiJi.Net

crawler framework, distributed crawler extractor

crawler extractor headless-chrome netcore owin scraper scrapy

Last synced: 13 Nov 2024

https://github.com/Jasonnor/th-music-video-generator

Touhou Project random music video generator/player, crawling image and video from websites to generate MV.

crawler javascript music-video touhou web

Last synced: 11 Nov 2024

https://github.com/chenjiandongx/github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 02 Jan 2025

https://github.com/MarshalX/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

crawler crawling crawling-python parser telegram telegram-org telegram-updates

Last synced: 19 Nov 2024

https://github.com/marshalx/telegram-crawler

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.

crawler crawling crawling-python parser telegram telegram-org telegram-updates

Last synced: 06 Jan 2025

https://github.com/chenjiandongx/Github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 12 Nov 2024

https://github.com/algolia/algoliasearch-netlify

Official Algolia Plugin for Netlify. Index your website to Algolia when deploying your project to Netlify with the Algolia Crawler

algolia algolia-crawler algoliasearch crawler jamstack netlify netlify-plugin search

Last synced: 04 Jan 2025

https://github.com/hezhizheng/go-movies

golang spider Crawler 爬虫电影

colly crawler docker fasthttp go gocolly golang movies redis spider

Last synced: 05 Jan 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 31 Oct 2024

https://github.com/antchfx/antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go

crawler crawling framework golang scraping web-crawler web-spider

Last synced: 26 Oct 2024

https://github.com/lucasjinreal/weibo_terminator_workflow

Update Version of weibo_terminator, This is Workflow Version aim at Get Job Done!

crawler nlp scraper sentiment-analysis weibo-terminator

Last synced: 02 Jan 2025

https://github.com/dwisiswant0/galer

A fast tool to fetch URLs from HTML attributes by crawl-in.

crawler devtool extractor galer go golang spider url-extractor url-parser waybackurls

Last synced: 01 Jan 2025

https://github.com/xyntax/filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

crawler fuzzing pentesting scrapy

Last synced: 01 Jan 2025

https://github.com/zrashwani/arachnid

Crawl all unique internal links found on a given website, and extract SEO related information - supports javascript based sites

crawler php scraping seo

Last synced: 29 Oct 2024

https://github.com/zntfdr/selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 02 Jan 2025

https://github.com/zntfdr/Selenops

A Swift Web Crawler 🕷

command-line-tool crawler scripting swift web

Last synced: 25 Nov 2024

https://github.com/myvyang/chromium_for_spider

dynamic crawler for web vulnerability scanner

chromium crawler puppeteer security spider

Last synced: 21 Nov 2024

https://github.com/commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

apache-storm common-crawl commoncrawl crawler news storm-crawler warc web-crawler

Last synced: 16 Nov 2024

https://github.com/cwjokaka/ok_ip_proxy_pool

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池

aiohttp async beautifulsoup4 crawler flask http ip pool proxy proxypool py python python3 spider sqlite

Last synced: 02 Jan 2025

https://github.com/vitorfs/woid

Simple news aggregator displaying top stories in real time

crawler django news

Last synced: 01 Jan 2025

https://github.com/sudheer-ranga/aliexpress-product-scraper

Get Aliexpress product details as a json response including feedbacks, variants, shipping info, description, images, etc.,

aliexpress aliexpress-api aliexpress-crawler aliexpress-product-json aliexpress-product-scraper aliexpress-scraper aliexpress-spider crawler dropship dropshipping hacktoberfest hacktoberfest19 hacktoberfest2019 product-json product-reviews product-scraper scraper spider

Last synced: 07 Jan 2025

https://github.com/kong36088/ZhihuSpider

多线程知乎用户爬虫，基于python3

crawler multi-threading python python3 spider zhihu

Last synced: 26 Nov 2024

https://github.com/dwisiswant0/gf-secrets

Secret and/or credential patterns used for gf.

alienvault-otx bugbounty crawler gau gf gitleaks infosec open-threat-exchange secrets-detection trufflehog trufflehog3 wayback wayback-machine waybackurl

Last synced: 01 Jan 2025

https://github.com/ScottSloan/Bili23-Downloader

下载 Bilibili 视频/番剧/电影/纪录片等资源

bilibili crawler linux macos python videodownloader windows wxpython

Last synced: 27 Oct 2024

https://github.com/lgh06/web-page-monitor

Web Site Page Changes Monitor. 网站网页页面更新变更监控提醒。

change-alert change-detection change-monitor crawler monitor website-change-monitor website-monitoring

Last synced: 02 Jan 2025