Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-10 00:06:02 UTC
- JSON Representation
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 26 Nov 2024
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 14 Dec 2024
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Nov 2024
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 17 Nov 2024
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 27 Dec 2024
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Nov 2024
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 25 Nov 2024
https://github.com/mustafadalga/website-crawler
Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.
crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling
Last synced: 17 Nov 2024
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 29 Nov 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 27 Nov 2024
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 27 Nov 2024
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 05 Dec 2024
https://github.com/uranusx86/dcard-crawler-analyzer
get Dcard & Meteor forum content and analyze !
crawl crawler dcard nlp python word-cloud word-count word-frequency
Last synced: 20 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 23 Dec 2024
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 20 Nov 2024
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 17 Dec 2024
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 28 Nov 2024
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 28 Nov 2024
https://github.com/tcc0lin/magiccrawler
Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Last synced: 17 Nov 2024
https://github.com/tanja-4732/od-get
A Rust tool for recursively crawling & downloading data from open directories
cli crawler open-directory open-directory-downloader rust
Last synced: 14 Nov 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 17 Nov 2024
https://github.com/hoishing/selenium-crawler
a web crawler written in python, powered by Selenium and Tesseract OCR
Last synced: 17 Nov 2024
https://github.com/victorhuu/amazonmovieintegration
本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据
crawler data-warehouse movies pandas scrapy xpath
Last synced: 28 Nov 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 28 Nov 2024
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 09 Nov 2024
https://github.com/danielemoraschi/go-sitemap-common
Simple GO sitemap generator and crawler.
crawler golang sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 03 Dec 2024
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 02 Dec 2024
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 28 Nov 2024
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Nov 2024
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 28 Nov 2024
https://github.com/karantyagi/web-crawler
BFS and DFS implementations for a wikipedia crawler
Last synced: 13 Nov 2024
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 22 Dec 2024
https://github.com/par7133/splash-bot-crawler
Splash Bot creates splash on the fly of your websites - GPL License 🔥
bot crawler gallery open-source opensource php splash
Last synced: 13 Nov 2024
https://github.com/hantang/list-movies-top
豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取
Last synced: 07 Jan 2025
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 14 Nov 2024
https://github.com/sammwyy/craw
a website-crawler library for nodejs
crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website
Last synced: 16 Nov 2024
https://github.com/nelcifranmagalhaes/web_crawler
A web crawler for all Naruto characters
anime beautifulsoup characters crawler naruto python
Last synced: 03 Dec 2024
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 09 Dec 2024
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 18 Nov 2024
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 31 Dec 2024
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 13 Nov 2024
https://github.com/krishealty/whoknows
All in One Advanced and Detailed Web Scanner with over 1000 plug-ins.
bug-bounty bypass crawler enumeration ethical-hacking footprinting hacking hacking-tool intelligence-gathering javascript offensive-security osint pentesting pentesting-tools security-tools subdomain-enumeration vulnerability-analysis vulnerability-detection web-application-security web-reconnaissance
Last synced: 07 Jan 2025
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 09 Dec 2024
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 22 Nov 2024
https://github.com/xcrypt0r/xcrawler
✂️ A crawling example for maplestory with various languages using multi-threading
crawler crawling multithreading parsing regexp
Last synced: 09 Jan 2025
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 25 Dec 2024
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 13 Nov 2024
https://github.com/santhin/real-estate
Real estate crawler with ML on scraped data
crawler jupyter-notebook ml real-estate scrapy
Last synced: 24 Nov 2024
https://github.com/beanwei/zmt-post-crawler
Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend
Last synced: 28 Dec 2024
https://github.com/m-osource/cassiopeiabot
C++ multithread Linux Web Crawler
algorithm berkeleydb bot cassiopeia cplusplus crawler download engine hashing html-parser information-retrieval link-analysis multithread open-source regex search web web-crawler webcrawler www
Last synced: 08 Jan 2025
https://github.com/aminehsan/crawler-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scarping
Last synced: 04 Dec 2024
https://github.com/ycrao/some-spider-code
some spider code 财经资讯以及基金股票外汇价格爬虫
crawler economics fin-eco-news finance forex fund-value spider stock-price
Last synced: 19 Nov 2024
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. Data include texts, images and links ( Discourse论坛对话(图片,文本)数据爬取并解析,以直接用于(多模态)指令微调).
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 14 Nov 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 24 Dec 2024
https://github.com/zhaotianff/qzone
想起那天夕阳下的奔跑,那是我逝去的青春
crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf
Last synced: 15 Nov 2024
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 16 Nov 2024
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 06 Dec 2024
https://github.com/enansari/guess-price-car
Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتبخونه
crawler jadi machine-learning maktabkhoone maktabkhooneh python
Last synced: 09 Jan 2025
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 05 Jan 2025
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 16 Nov 2024
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 12 Nov 2024
https://github.com/milouk/web-crawler
Phoneutria Crawler
crawler crawlers database internet jar java spider web web-crawler
Last synced: 18 Nov 2024
https://github.com/chen0040/ios-stock-tracker
Stock tracker implemented using Objective-C for iOS
crawler ios-app objective-c stock-prices
Last synced: 16 Dec 2024
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 01 Dec 2024
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 16 Nov 2024
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 13 Nov 2024
https://github.com/toannd96/chromedp-example-login
chromedp crawler golang goquery
Last synced: 18 Nov 2024
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 10 Jan 2025
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 21 Dec 2024
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Nov 2024
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 25 Dec 2024
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 23 Dec 2024