Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-27 00:06:15 UTC
- JSON Representation
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 14 Dec 2024
https://github.com/smikodanic/dex8-sdk
DEX8 SDK is software development kit for DEX8.com platform.
crawler crawler-engine data-extraction dex8 scraper scraping-websites spider
Last synced: 26 Dec 2024
https://github.com/terminaldweller/crawley
A creepy crawler that runs as a sleepy daemon.
Last synced: 26 Dec 2024
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 13 Jan 2025
https://github.com/pmuens/crawler
Multi-threaded Web crawler with support for custom fetching and persisting logic
crawler crawler-engine rust rust-lang web-crawler web-crawling
Last synced: 26 Dec 2024
https://github.com/tinoco/ticapsoriginal_div2png
Ticapsoriginal programmatically div design to png generator of html code from url
beutifulsoup crawler data design div2png generated-art generator html2image parse programmatically-layout pycodestyle python requests ticapsoriginal url urllib
Last synced: 09 Jan 2025
https://github.com/gabrielolobo/crawley
This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.
crawler poetry python scrapping
Last synced: 11 Jan 2025
https://github.com/gnehs/twse-financial-ratios-crawler
透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均
Last synced: 26 Dec 2024
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 11 Jan 2025
https://github.com/sahaavi/web-scraping
Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!
beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping
Last synced: 26 Dec 2024
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 26 Dec 2024
https://github.com/ggteixeira/motorcycle-simulator
A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..
crawler motorcycle olx scraper
Last synced: 11 Jan 2025
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 06 Dec 2024
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 10 Jan 2025
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 17 Jan 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 23 Jan 2025
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 20 Jan 2025
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 08 Jan 2025
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 03 Jan 2025
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 10 Jan 2025
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 05 Jan 2025
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 03 Jan 2025
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 23 Oct 2024
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 09 Oct 2024
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 05 Dec 2024
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 10 Jan 2025
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 03 Jan 2025
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 10 Jan 2025
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 29 Dec 2024
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 10 Jan 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Jan 2025
https://github.com/rmncldyo/google-reverse-image-search
A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.
beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image
Last synced: 04 Jan 2025
https://github.com/cold-bin/jwzx-mail
use golang to construct cqupt-jwzx crawler application
Last synced: 11 Jan 2025
https://github.com/usethisname1419/connectioncrawler
crawls a website and checks for connections
connection crawler http-headers reporting website-analyzer
Last synced: 26 Jan 2025
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 04 Jan 2025
https://github.com/zhqiang1989/youtube-graph-collector
A demo in python on how to collect youtube video engagement graph data
Last synced: 11 Jan 2025
https://github.com/bytejoseph/osintgit
OSINT investigation tool for Github
crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 26 Jan 2025
https://github.com/monumentality/ifiend
Check latest YouTube uploads without leaving the comfort of your terminal.
crawler headless-chrome terminal-based youtube yt-dlp
Last synced: 11 Jan 2025
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 20 Jan 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Jan 2025
https://github.com/thomas-rothe/symfonywebcrawler
PHP project for helping in SEO
crawler docker php php8 seo sitemap-xml symfony7
Last synced: 17 Jan 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 19 Jan 2025
https://github.com/rafaelmoraes003/tech-news
Analysis and manipulation of news data from a technology website obtained through data scraping using Python.
crawler data-scraping https mongodb parsel pymongo python web-scraping
Last synced: 26 Jan 2025
https://github.com/jlenon7/sef_automation
📑 Crawler that automatically enrol in open vacancies in SEF website.
athenna crawler esm nodejs playwright portugal residence sef typescript
Last synced: 13 Dec 2024
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 21 Jan 2025
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 26 Jan 2025
https://github.com/licoy/win4000-images-crawler
基于scrapy爬取&下载win4000.com的图片壁纸
Last synced: 08 Dec 2024
https://github.com/estavadormir/scrappist
A web scrapper that takes an URL/URLs and converts into a PDF.
bun cli crawler pdf-generation
Last synced: 11 Jan 2025
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 12 Jan 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 12 Jan 2025
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 18 Jan 2025
https://github.com/tylpk1216/favorite-youtube-to-video
Download your favorite youtube video in PHP
Last synced: 26 Jan 2025
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 07 Jan 2025
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 12 Jan 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 08 Jan 2025
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 26 Jan 2025
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 12 Jan 2025
https://github.com/jenting/compare-drugstore-price
Compare price between cosmeceutical shops
cosmed crawler golang poya side-project watsons
Last synced: 05 Dec 2024
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 12 Jan 2025
https://github.com/eghuro/crawlcheck
Extensible web crawler
configuration crawler http plugin python robots-txt sitemap
Last synced: 12 Jan 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 03 Jan 2025
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 12 Jan 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 11 Jan 2025
https://github.com/purrproof/smartcrawl
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 29 Nov 2024
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 22 Dec 2024
https://github.com/lin-jun-xiang/python-crawler
Using CloudScraper, Requests, API, Thread, Async... for scrape the data
async cloudscraper crawler multithreading python requests scraper selenium
Last synced: 21 Dec 2024
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 13 Jan 2025
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 12 Jan 2025
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 20 Dec 2024
https://github.com/mnemocron/VPNNetworkShareCrawler
ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it
Last synced: 23 Oct 2024
https://github.com/mahdijamebozorg/cryptonewscrawler
An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.
crawler crypto cryptocurrency data-mining datamining information-retrieval llm python
Last synced: 16 Jan 2025
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 17 Jan 2025
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 20 Jan 2025