Crawler | Ecosyste.ms: Awesome

https://github.com/0xhjk/x12306

12306查票助手，一键查询沿途所有站点，先上车后补票，让你的出行更省心。

12306 12306buyticket 12306helper 12306qiang-piao crawler fk12306 helper reqeusts spider ticket train x12306

Last synced: 14 Nov 2024

https://github.com/kylemocode/medium-stat-box

Practical pinned gist which show your latest medium status 📌

awesome-pinned-gists crawler github-action github-gists medium-stats

Last synced: 02 Nov 2024

https://github.com/twtrubiks/auto_crawler_ptt_beauty_image

Auto Crawler Ptt Beauty Image Use Python Schedule

beauty crawler heroku image ptt python schedule tutorial

Last synced: 16 Nov 2024

https://github.com/hackfengJam/ArticleSpider

Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: implementation roadmap, distributed-crawler and coping with anti-crawling strategies).

crawler distributed-systems django elasticsearch scrapy

Last synced: 31 Oct 2024

https://github.com/heyingcai/cetty

基于事件分发的爬虫框架

crawler event-dispatcher gather spider

Last synced: 13 Nov 2024

https://github.com/scrapy-plugins/scrapy-zyte-api

Zyte API integration for Scrapy

crawler plugin proxy scraping scrapy

Last synced: 12 Nov 2024

https://github.com/xfgryujk/taobaoanalysis

练习NLP，分析淘宝评论的项目

crawler nlp taobao

Last synced: 08 Nov 2024

https://github.com/haxzie-xx/instagram-downloader

Node.js/Express app to retrive instagram video/image download urls

crawler downloader express instagram instagram-scraper nodejs

Last synced: 27 Oct 2024

https://github.com/jfreegman/toxcrawler

A Tox DHT network crawler

crawler dht dht-network tox toxcore

Last synced: 08 Nov 2024

https://github.com/gamemann/bestbuy-parser

A personal tool using Python's Scrapy framework to scrape Best Buy's product pages for RTX 3080 TIs and notify if available/not sold out.

3080 automation best bestbuy bot buy crawler parser python python3 rtx scrapy ti

Last synced: 27 Oct 2024

https://github.com/wenyalintw/google-patents-scraper

Automatically download all PDF files of searching results & their patent families found on Google Patents.

crawler google-patents patent patents pdf scraper scraping scrapy web-scraping

Last synced: 11 Nov 2024

https://github.com/apocelipes/schannel-qt5

A GUI client of schannel powered by therecipe/qt and golang

client-side crawler go golang goqt linux qcharts qt5

Last synced: 09 Nov 2024

https://github.com/migalabs/armiarma

Armiarma is a Libp2p open-network crawler with a current focus on Ethereum's CL network

crawler ethereum libp2p monitoring

Last synced: 15 Nov 2024

https://github.com/andreaskoch/gargantua

The fast website crawler

command-line crawler golang xml-sitemap

Last synced: 16 Nov 2024

https://github.com/VeliovGroup/spiderable-middleware

🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 04 Aug 2024

https://github.com/subins2000/phpwebcrawler

A Web Crawler Created in PHP

crawler php

Last synced: 13 Nov 2024

https://github.com/veliovgroup/spiderable-middleware

🤖 Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

crawler meteor meteor-package middleware nodejs npm npm-package seo seo-optimization spiderable

Last synced: 14 Oct 2024

https://github.com/code4everything/visual-spider

欢迎体验我们全新的桌面端效率工具RunFlow，https://myrest.top/myflow

crawler crawler4j-java java-8 java8 javafx javafx-application spider visualization

Last synced: 29 Sep 2024

https://github.com/miry/medup

Download all content from Medium and Dev.to to local folder

cli crawler devto json markdown medium sync tool

Last synced: 06 Nov 2024

https://github.com/ph-7/crawling-emails

Very simple bash script to crawl email addresses from a specific website.

bash crawler email email-scraper scrape scrape-email scraper scraping shell wget

Last synced: 28 Oct 2024

https://github.com/debugtalk/webcrawler

A web crawler based on requests-html, mainly targets for url validation test.

crawler requests-html web-crawler weblink

Last synced: 08 Nov 2024

https://github.com/minhhungit/github-action-rss-crawler

Auto crawl RSS feeds using Github Action

crawler csharp github-actions litedb netcore rss rss-crawler rss-items

Last synced: 09 Nov 2024

https://github.com/gomjellie/pysaint

[deprecated] 유세인트 파이썬 클라이언트

crawler sap soongsil unofficial

Last synced: 28 Oct 2024

https://github.com/fanhuaandluomu/sina_spider

新浪微博爬虫：登录、关键词微博查询、微博监控

crawler python-2 sina-spider

Last synced: 12 Oct 2024

https://github.com/juzeon/advanced-php-crawler

新浪博客文章/wenku8轻小说文库爬虫，可抓取图片保存，一键制作电子书。kindle读书党的神器！

calibre crawler gitbook kindle php sina

Last synced: 10 Nov 2024

https://github.com/mamal72/iranian-calendar-events

Fetch Iranian calendar events (Jalali, Hijri and Gregorian) from time.ir website

crawler events iranian jalali jalali-calendar persian

Last synced: 02 Nov 2024

https://github.com/kshru9/web-crawler

A multithreaded web crawler using two mechanism - single lock and thread safe data structures

concurrency concurrent-data-structure cpp crawler data-structures html-parser lock multithreading openssl pagerank pthread reader-writer-lock search-engine socket threading threadsafe webcrawler website-downloader

Last synced: 28 Oct 2024

https://github.com/k1low/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 17 Oct 2024

https://github.com/pykong/pypergrabber

Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.

crawler email-inbox google-scholar pdf pmid pubmed python sci-hub scraper

Last synced: 08 Nov 2024

https://github.com/k1LoW/utsusemi

A tool to generate a static website by crawling the original site.

api aws aws-lambda crawler s3-website serverless serverless-framework

Last synced: 04 Aug 2024

https://github.com/mjavadhpour/telegram-member-inviter

Crawling client's groups and channels to invite their members to a target group.

crawler python python3 robot telegram telegram-client telethon

Last synced: 16 Nov 2024

https://github.com/mendableai/firecrawl-py

Crawl and convert any website into clean markdown

ai crawler llm python scraper

Last synced: 08 Nov 2024

https://github.com/fedebotu/iclr2023-openreviewdata

Crawl & Visualize ICLR 2023 Data from OpenReview

crawler dataset iclr iclr2023 openreview peer-review review scraper

Last synced: 06 Nov 2024

https://github.com/italia/publiccode-crawler

publiccode.yml crawler for the Open Source software catalog of Developers Italia

crawler developers-italia hacktoberfest publiccode publiccodeyml

Last synced: 10 Nov 2024

https://github.com/zenrows/scaling-to-distributed-crawling

Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

crawler crawling distributed python python3 scraping spider

Last synced: 16 Nov 2024

https://github.com/riptl/ytpriv

YT metadata exporter

big-data crawler csv datascience json video youtube

Last synced: 16 Nov 2024

https://github.com/tychozzz/article_crawler

✨ Article Crawler is a package used to crawl articles with Markdown format from a specific webpage and store them locally in HTML / Markdown formats.

article crawler html markdown pypi python

Last synced: 12 Nov 2024

https://github.com/marcel0024/cococrawler

An declarative and easy to use web crawler and scraper in C#

cococrawler crawler crawling-tool csharp dotnet dotnetcore scraper scraping-tool webcrawler webcrawler-csharp webcrawling webscraper

Last synced: 12 Oct 2024

https://github.com/spider-rs/spider-nodejs

Spider ported to Node.js

crawler distributed-systems headless-chrome indexer nodejs scraper spider typescript

Last synced: 05 Nov 2024

https://github.com/ERap320/CrowLeer

Powerful C++ web crawler based on libcurl

cli crawler crawling download

Last synced: 16 Nov 2024

https://github.com/alex-page/get-site-urls

🔗 Get all of the URL's from a website.

crawler sitemap-generator urls

Last synced: 27 Oct 2024

https://github.com/novemberde/serverless-crawler-demo

Serverless Architecture Crawler demo

aws crawler demo handson serverless

Last synced: 10 Nov 2024

https://github.com/dachcom-digital/pimcore-lucene-search

Pimcore Website Indexer (powered by Zend Search Lucene)

crawler lucene lucenesearch pimcore

Last synced: 14 Nov 2024

https://github.com/bartozzz/crawlerr

A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.

crawler jsdom nodejs scraper spider web-crawler

Last synced: 08 Nov 2024

https://github.com/Smartproxy/Python-scraper-tutorial

A short introduction to scraping with Python with given steps and an example scraper script.

beautifulsoup crawler data-mining data-science github-python json-database-python learning python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 04 Aug 2024

https://github.com/aliosm/kontests

Competitive programming contests schedule

a2oj atcoder codeforces codeforces-gym codeshef competitive-programming crawler csacademy hackerearth hackerrank kickstart leetcode topcoder

Last synced: 09 Oct 2024

https://github.com/mattwang44/uspto-patft-web-crawler

Crawler for fetching information of US Patents and PDF bulk download

crawler patent patent-crawler pyqt5 python3 uspto

Last synced: 02 Oct 2024

https://github.com/jurooravec/crawlee-one

Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.

actor apify crawlee crawler framework scraper scraping web

Last synced: 13 Nov 2024

https://github.com/matheusfelipeog/froxy

Hide your IP with free proxies using Froxy 🔄

crawler free-proxy froxy hide-ip proxies proxies-scraper proxy python requests requests-module scraping

Last synced: 17 Nov 2024

https://github.com/nicolasmure/crawlerdetectbundle

A Symfony bundle for the Crawler-Detect library (detects bots/crawlers/spiders via the user agent)

bot bundle crawler php symfony

Last synced: 16 Nov 2024

https://github.com/harismuneer/android-apps-downloader

📱 A tool to download android apps from Google Play Store and Xiaomi App Store (the famous Chinese Store).

android-application-downloader android-apps-crawler android-market-scraper android-research android-scraper android-tool app-downloader crawler crawling-tool google-play-application-downloader google-play-store-scraper gplaycli open-source-project python-scraper research-tool scraper scraping-tool wget-utility xiaomi-apps xiaomi-store-scraper

Last synced: 12 Nov 2024

https://github.com/alessandrodd/googleplay_api

Google Play Unofficial Python 3 API Library

android crawler googleplay googleplay-api playstore

Last synced: 27 Oct 2024

https://github.com/ivan-sincek/chad

Search Google Dorks like Chad. / Broken link hijacking tool.

broken-link-hijacking bug-bounty crawler ethical-hacking google-dorking google-dorks offensive-security penetration-testing playwright python red-team-engagement scraper search-engine security social-media social-media-takeover threat-hunting threat-intelligence web web-penetration-testing

Last synced: 15 Nov 2024

https://github.com/ysh329/douban-crawler

抓取豆瓣小组相关信息（小组、用户、帖子）。

crawler douban douban-crawler

Last synced: 23 Oct 2024

https://github.com/o8e/soccer-scrape

:page_with_curl: Scrape football data from Bet365

bet365 betting crawler es6 football javascript puppeteer scraper soccer

Last synced: 13 Nov 2024

https://github.com/kagami/tistore

:camera: Tistory photo grabber

crawler cross-platform electron tistory

Last synced: 22 Oct 2024

https://github.com/xiongwilee/techweekly

高可配的技术周报邮件推送工具

crawler nodejs techweekly

Last synced: 08 Nov 2024

https://github.com/wwwwwydev/crawlist

A universal solution for web crawling lists

crawl crawler crawler-python python reptile

Last synced: 12 Nov 2024

https://github.com/feng19/spider_man

SpiderMan,a base-on Broadway fast high-level web crawling & scraping framework for Elixir.

crawler data-mining elixir erlang framework spider

Last synced: 29 Oct 2024

https://github.com/tokahuke/lopez

Crawling and scraping the Web for fun and profit

crawler rust scraper seo web-scraping

Last synced: 14 Nov 2024

https://github.com/capjamesg/indieweb-search

Source code for the IndieWeb search engine.

crawler indieweb search search-engine

Last synced: 16 Nov 2024

https://github.com/alanshaw/libp2p-dht-scrape-aas

🧹 A libp2p DHT scraper as a service allowing anyone to collect, consume and use to generate useful reports & visualisations.

crawler dht kademlia libp2p p2p scraper

Last synced: 09 Nov 2024

https://github.com/mechazawa/redbetter-wm2

Better.php crawler for Redacted that uses WhatManager

crawler flac redacted seedbox transcoding whatcd whatmanager

Last synced: 06 Nov 2024

https://github.com/rzo1/crawler4j

Open Source Web Crawler for Java - A maintained fork of yasserg/crawler4j

crawler crawler4j java spider web-crawler web-spider

Last synced: 29 Sep 2024

https://github.com/tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

crawler crawling crawling-framework elasticsearch java scraping storm storm-crawler vaadin

Last synced: 10 Nov 2024

https://github.com/gruppio/slackwebhooksgithubcrawler

Search for Slack Webhooks token publicly exposed on Github

crawler crawling hack messages nodejs puppeteer slack slack-bot slack-webhook slackbot webhook

Last synced: 16 Nov 2024

https://github.com/asing1001/movierater

A useful website for finding movie's rating in Chinese and English. By crawling Yahoo, Ptt, IMDB.

apollo-client chai crawler graphql material-ui mocha mongodb movies nodejs reactjs redis server-side-rendering service-worker sinon typescript

Last synced: 07 Nov 2024

https://github.com/norconex/collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.

crawler filesystem-crawler java norconex-filesystem-collector search-engine

Last synced: 11 Nov 2024

https://github.com/RuedigerVoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/ruedigervoigt/exoskeleton

A Python framework to build polite, but tenacious crawlers / scrapers with a MariaDB backend

crawler crawling-framework database machine-learning mariadb network python python-3 scraping

Last synced: 08 Nov 2024

https://github.com/Actomaton/ActoCrawler

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

crawler swift

Last synced: 09 Aug 2024

https://github.com/fanhuaandluomu/qqspider

爬取QQ用户信息（qq号、昵称、生日、地址等基本信息）并做简要analysis。

crawler python qq spider

Last synced: 12 Nov 2024

https://github.com/yokawasa/scrapy-azuresearch-crawler-samples

Scrapy as a Web Crawler for Azure Search Samples

azure azure-search crawler python python3 scrapy search

Last synced: 30 Oct 2024

https://github.com/thaoshibe/crawl-original-google-images

python scripts for crawling original image from Google Images

chrome-extension crawler crawling crawling-python google google-images pafy scraper youtube youtube-dl youtube-search

Last synced: 11 Oct 2024

https://github.com/nvk681/gumo

A crawler that extracts data from a dynamic webpage. Written in node js.

crawler elasticsearch neo4j nodejs

Last synced: 11 Oct 2024

https://github.com/waynechang65/ptt-crawler

ptt-crawler is a web crawler module designed to scarpe data from Ptt.

crawler javascript nodejs ptt scraper scraping spider web-crawler webcrawler

Last synced: 19 Oct 2024

https://github.com/loomisloud/onion-crawler

Tor website crawler (specific for Alphabay at the time)

crawler onion parser python tor

Last synced: 17 Nov 2024

https://github.com/s045pd/sharingan

We will try to find your visible basic footprint from social media as much as possible - 😤 more sites is comming soon

asyncio crawler httpx python38 social-network

Last synced: 07 Nov 2024

https://github.com/xiyuan-fengyu/ppspider_example

ppspider爬虫例子，B站视频信息及评论爬取，qq音乐信息及评论爬取，推特主题评论和用户信息爬取

bilibili cheerio crawler ppspider puppeteer qq-music spider twitter

Last synced: 07 Nov 2024

https://github.com/petehouston/udemy-crawler

Crawling Udemy course info and save into JSON format.

crawler crawling node node-cli udemy udemy-api udemy-crawl

Last synced: 23 Oct 2024

https://github.com/tower1229/crawler

Nodejs crawler for cnbeta.com

crawler nodejs

Last synced: 14 Oct 2024

https://github.com/ArchiveTeam/WebArchiver

Decentralized web archiving

archiver archiving crawler decentralized python warc web webarchiving

Last synced: 06 Nov 2024

https://github.com/p0dalirius/crawlersuseragents

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

bugbounty crawler crawlers pentest request tool user-agent web

Last synced: 29 Oct 2024

https://github.com/fernandod1/producthunt-scraper

Producthunt.com famous website scraper script. Scrap all offers and save in spreadsheet excel file.

crawler crawling crawling-sites data-mining datamining producthunt producthunt-api producthunt-users python python-script python3 scrape scraped-data scraper scraper-engine scraping scraping-bot scraping-python scraping-tool scraping-websites

Last synced: 12 Nov 2024

https://github.com/smolijar/offensive-fortune

A script for generating fortune cookie from the the funniest and most offensive stuff collected off the Internet.

crawler fortune fortune-cookie vilejoke

Last synced: 07 Nov 2024

https://github.com/twtrubiks/youtube-trends-spider

crawler youtube trends use selenium on python

crawler python selenium tutorial youtube-trends-spider

Last synced: 16 Nov 2024

https://github.com/bkeepers/spiderman

your friendly neighborhood web crawler

crawler crawler-engine http httprb nokogiri ruby spider spider-framework web-crawler web-scraping webcrawler webscraping

Last synced: 09 Nov 2024

https://github.com/cristianzsh/python-hacking-tools

Python tools for ethical hacking

arp-spoofing backdoor code-injection crawler dns interceptor keylogger mac malware network packet python scanner scapy scapy-arp send-email sniffer spoofing tool tools

Last synced: 17 Nov 2024

https://github.com/mauriceconrad/xml-parser

A Node.js XML DOM, Parser & Stringifier.

crawler crawling dom html html-parser html-parsing xml xml-parser xml-parsing xml-schema

Last synced: 28 Oct 2024

https://github.com/josecelano/my-favourite-appliances

Laravel CRUD sample

crawler crud laravel sample

Last synced: 29 Oct 2024

https://github.com/alinebastos/crawler

Web Crawler created with Node.js and Puppeteer

crawler fs javascript nodejs puppeteer scraping

Last synced: 05 Nov 2024

https://github.com/enijkamp/supermonkey

A crawler for automated Android UI testing.

ai android crawler

Last synced: 09 Nov 2024

https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel

A toolkit for Spatie's Crawler and Laravel.

crawler laravel laravel-crawler php-crawler php-scraper spatie-crawler

Last synced: 12 Nov 2024

https://github.com/paambaati/websight

🕷A simple but *really* fast crawler built with Node.js & TypeScript

coding-challenge crawler interview-questions javascript monzo nodejs typescript

Last synced: 08 Nov 2024

https://github.com/PadishahIII/SecretScraper

SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.

crawler cyper hyperscan pentest-tool pentesting python sensitivity-analysis webscraper

Last synced: 13 Aug 2024

https://github.com/neuralegion/bright-cli

Command Line Interface (CLI) tool for NeuraLegion's solutions.

api cli crawler cyber-security devops har nexploit oas secops security typescript

Last synced: 14 Nov 2024

https://github.com/vignif/crawler-google-scholar

This bot crawls and downloads statistics and pictures from google scholar's researchers.

crawler downloading-statistics google-scholar indexes statistics

Last synced: 06 Nov 2024

https://github.com/racinmat/premium-downloader

crawler pornhub pornhub-downloader python

Last synced: 06 Nov 2024

https://github.com/omkarcloud/botasaurus-starter

🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 08 Nov 2024

https://github.com/lixi5338619/lxparse

用于解析列表页链接和提取详细页内容的库

crawler htmlparse python

Last synced: 05 Nov 2024