Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-28 00:06:05 UTC
- JSON Representation
https://github.com/mikiw/reactweb3
Ethereum transaction crawler in ReactJs.
Last synced: 10 Jan 2025
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 06 Dec 2024
https://github.com/theabbie/shopcrawler
Crawler for Discovering Product URLs on E-commerce Websites (assignment)
Last synced: 17 Jan 2025
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 23 Jan 2025
https://github.com/tylpk1216/new-taipei-parkinfo
Find the available parking in New Taipei, Taiwan.
Last synced: 26 Jan 2025
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 27 Jan 2025
https://github.com/triekai/review-radar
An intelligent tool that analyzes Google Maps reviews to detect potential fake reviews and suspicious patterns.
crawler google-maps nextjs openai react
Last synced: 24 Jan 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 03 Jan 2025
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 10 Jan 2025
https://github.com/jenting/compare-drugstore-price
Compare price between cosmeceutical shops
cosmed crawler golang poya side-project watsons
Last synced: 05 Dec 2024
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 03 Jan 2025
https://github.com/mnemocron/VPNNetworkShareCrawler
ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it
Last synced: 23 Oct 2024
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 20 Jan 2025
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 07 Jan 2025
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 10 Jan 2025
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 03 Jan 2025
https://github.com/qqxs/usda_pomological_watercolors
爬取美国农业部果树水彩的数据
crawler koa2 nodejs watercolors
Last synced: 18 Jan 2025
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 10 Jan 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 08 Jan 2025
https://github.com/iyowei/fs-deep-walk
专注于深度扫描指定磁盘位置。
crawler directory file folder folder-tooling fs nodejs recursively-search scan scandir scandir-recursive scanner walker
Last synced: 29 Dec 2024
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 13 Jan 2025
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 10 Jan 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 03 Jan 2025
https://github.com/rmncldyo/google-reverse-image-search
A simple python wrapper designed for leveraging Google's search by image capabilities to perform reverse image searches programatically.
beautifulsoup beautifulsoup4 crawler google google-image google-image-crawler google-image-scraper google-image-search google-images google-reverse-image-crawler google-reverse-image-scraper google-reverse-image-search image image-search python python3 requests reverse-image-search scraper search-by-image
Last synced: 04 Jan 2025
https://github.com/marcosvbras/twitton
A simple Python library to make Twitter Search API easily to use
crawler crawling python spider twitter twitter-api
Last synced: 05 Dec 2024
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 11 Jan 2025
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 07 Jan 2025
https://github.com/cold-bin/jwzx-mail
use golang to construct cqupt-jwzx crawler application
Last synced: 11 Jan 2025
https://github.com/bradsec/gofindfiles
Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.
crawler osint osint-tool recon scraper web-scraper
Last synced: 07 Jan 2025
https://github.com/tri613/nespresso
A mobile version for nespresso coffee website :coffee:
Last synced: 04 Jan 2025
https://github.com/zhqiang1989/youtube-graph-collector
A demo in python on how to collect youtube video engagement graph data
Last synced: 11 Jan 2025
https://github.com/monumentality/ifiend
Check latest YouTube uploads without leaving the comfort of your terminal.
crawler headless-chrome terminal-based youtube yt-dlp
Last synced: 11 Jan 2025
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 05 Jan 2025
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 15 Jan 2025
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 11 Jan 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 08 Jan 2025
https://github.com/thomas-rothe/symfonywebcrawler
PHP project for helping in SEO
crawler docker php php8 seo sitemap-xml symfony7
Last synced: 17 Jan 2025
https://github.com/kernelerr/pixivurls
An awesome tool to get Pixiv image URLs.
Last synced: 19 Jan 2025
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Jan 2025
https://github.com/appliedsoul/headless-screenshot
High-level library for taking screenshot of websites based on headless chrome (puppeteer)
crawler headless-chromium javascript nodejs scrapper screenshot testing
Last synced: 19 Jan 2025
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 08 Jan 2025
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 08 Jan 2025
https://github.com/edumucelli/rubybikes
A set of Bike Sharing System parsers in Ruby
Last synced: 24 Dec 2024
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 02 Jan 2025
https://github.com/moj124/web_crawler
The web_crawler is a asynchoronous gevent link crawler that maps all the associated local links constrained by the input webpage url.
crawler crawler-python links-spider
Last synced: 20 Jan 2025
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 28 Dec 2024
https://github.com/timpletin/comming-soon
Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event
crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript
Last synced: 21 Jan 2025
https://github.com/estavadormir/scrappist
A web scrapper that takes an URL/URLs and converts into a PDF.
bun cli crawler pdf-generation
Last synced: 11 Jan 2025
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 14 Jan 2025
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 12 Jan 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 12 Jan 2025
https://github.com/wilmsn/simple_deye_crawler
A simple crawler to get data from the Deye Inverter using the status webpage
crawler deye fhem inverter shell-script
Last synced: 18 Jan 2025
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 25 Dec 2024
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 12 Jan 2025
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 05 Dec 2024
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 27 Jan 2025
https://github.com/briangershon/crawlee-playwright
Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript
crawlee crawler playwright starter-template typescript vite
Last synced: 20 Dec 2024
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 02 Jan 2025
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 21 Jan 2025
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 12 Jan 2025
https://github.com/kyagara/lol-match-crawler
Very simple crawler for League of Legends matches.
crawler league-of-legends pgx postgres riot-games sql
Last synced: 01 Dec 2024
https://github.com/vishaalpkumar/skysift
A distributed search engine from scratch
aws crawler css distributed-systems html java search-engine
Last synced: 22 Dec 2024
https://github.com/shaharashe/url-crawler
crawler design-patterns http-requests java
Last synced: 06 Jan 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 01 Dec 2024
https://github.com/homuchen/instagram-crawler
Instagram crawler
crawler instagram nodejs-crawler
Last synced: 01 Dec 2024
https://github.com/qzcool/uscis-case-status-estimation-system-stat-ez
Estimates time of case results arrival, for applicants who are waiting for their USCIS case results with the receipt numbers at hand.
beautifulsoup crawler immigration web
Last synced: 21 Jan 2025
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 20 Dec 2024
https://github.com/roc41d/http-web-crawler
Http web crawler with Nodejs + TDD
crawler http javascript jest jest-test nodejs webcrawler
Last synced: 21 Jan 2025
https://github.com/leonardopinho/instagramfeed
Image list based on a tag for the Instagram feed.
Last synced: 07 Dec 2024
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 12 Jan 2025
https://github.com/radityaharya/sitesweeper
Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file
crawler pdf python website-crawler
Last synced: 05 Dec 2024