Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-27 00:06:15 UTC
- JSON Representation
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 01 Jan 2025
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 18 Dec 2024
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 18 Dec 2024
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 08 Nov 2024
https://github.com/amirespahbodi/url_crawler
url crawler
crawler fastapi pydantic python3 sqlalchemy
Last synced: 02 Jan 2025
https://github.com/igorbrizack/web-scraper
Aplicação de raspagem de dados HTML, construída em python.
crawler pytest python3 scraper
Last synced: 26 Jan 2025
https://github.com/shiritai/wallpaper_master
My first individual project!
crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master
Last synced: 01 Jan 2025
https://github.com/christopher-besch/therapy_search
Compute Call Times from arztsuche-bw into a Calendar.
appointments calendar crawler gatsby therapy time-management typescript
Last synced: 28 Dec 2024
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 01 Jan 2025
https://github.com/soakit/book-download
book-download
crawler html2epub nodejs novel-downloader
Last synced: 28 Dec 2024
https://github.com/lysagxra/eromedownloader
Erome albums and profile downloader
bulk bulk-downloader concurrent-processing crawler downloader erome erome-download erome-downloader parallel-processing profile-downloader python python3
Last synced: 17 Jan 2025
https://github.com/sammwyy/craw
a website-crawler library for nodejs
crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website
Last synced: 17 Jan 2025
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 25 Dec 2024
https://github.com/lukasherz/22fs-sc-twitter-crawler
used for a research project in social computing @ uzh (fs22)
crawler crawling database twitter twitter-api-v2
Last synced: 25 Dec 2024
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 25 Dec 2024
https://github.com/yordadev/fenrisjs
A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.
analysis crawler link-collection link-crawler nodejs nodejs-application
Last synced: 10 Jan 2025
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 28 Dec 2024
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 17 Jan 2025
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 02 Jan 2025
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 18 Jan 2025
https://github.com/projectx3193275578/prjctxx8264
A simple, open-source, easy to use, and free download manager for malware samples.
crawler downloader malware manager samples
Last synced: 05 Jan 2025
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Jan 2025
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 13 Jan 2025
https://github.com/khadkarajesh/aptoide
Aptoide app crawler using beautifulsoup
beautifulsoup4 crawler flask python3 web-application
Last synced: 13 Jan 2025
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 23 Jan 2025
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 03 Jan 2025
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 19 Dec 2024
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 11 Oct 2024
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 26 Jan 2025
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 11 Jan 2025
https://github.com/zabuzard/wslotter
WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.
Last synced: 12 Jan 2025
https://github.com/andmerk93/scrapy_parser_pep
Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах
Last synced: 24 Jan 2025
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 18 Jan 2025
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 18 Jan 2025
https://github.com/karantyagi/web-crawler
BFS and DFS implementations for a wikipedia crawler
Last synced: 12 Jan 2025
https://github.com/par7133/splash-bot-crawler
Splash Bot creates splash on the fly of your websites - GPL License 🔥
bot crawler gallery open-source opensource php splash
Last synced: 12 Jan 2025
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 03 Jan 2025
https://github.com/hoishing/selenium-crawler
a web crawler written in python, powered by Selenium and Tesseract OCR
Last synced: 18 Jan 2025
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
crawl crawler pyppeteer python
Last synced: 18 Jan 2025
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 26 Jan 2025
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 26 Jan 2025
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 24 Oct 2024
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 28 Oct 2024
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 24 Jan 2025
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 19 Jan 2025
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 05 Jan 2025
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 24 Jan 2025
https://github.com/toannd96/chromedp-example-login
chromedp crawler golang goquery
Last synced: 19 Jan 2025
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper
Last synced: 05 Jan 2025
https://github.com/tsaohucn/crawler_fb_group
This is crawler use selenium for facebook groups
crawler facebook-groups rails ruby
Last synced: 20 Jan 2025
https://github.com/liebki/githubnet
This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.
crawler crawling github github-trending htmlagilitypack microsoft
Last synced: 24 Jan 2025
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 24 Jan 2025
https://github.com/basemax/jadi-net-blog
This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.
blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp
Last synced: 24 Jan 2025
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 30 Dec 2024
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 29 Dec 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 02 Jan 2025
https://github.com/rogerluo410/gcrawler
Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.
Last synced: 02 Jan 2025
https://github.com/d7isme/pixiv-downloader-mod
Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.
chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader
Last synced: 09 Jan 2025
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 12 Jan 2025
https://github.com/marceloneppel/crawler
Simple web crawler developed in Go.
Last synced: 03 Dec 2024
https://github.com/palpitate-xus/sge_data_insert
利用Github Actions实现自动获取sge数据并存入数据库
Last synced: 16 Dec 2024
https://github.com/massongit/ibaraki-univ-circle-crawler
Crawls official circles in Ibaraki University from university's website
Last synced: 03 Dec 2024
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp
Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.
anglesharp crawler minhaentrada
Last synced: 31 Dec 2024
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 10 Jan 2025
https://github.com/zfael/scrape-it-all
Modular web scraper for Node.JS
crawler scraper scraping scraping-websites web-scraping
Last synced: 23 Dec 2024
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 26 Dec 2024
https://github.com/filipsedivy/tachometer-check
🚘 MDČR - kontrola tachometru
Last synced: 23 Dec 2024
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 03 Dec 2024
https://github.com/miiraak/scrapc
C# WinForms - Crawler & Scraper Web content
crawler csharp html scraper url web windows-forms
Last synced: 13 Oct 2024
https://github.com/istador/mediaindexer
Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.
Last synced: 22 Jan 2025
https://github.com/tetreum/xupopter_runner
Executes crawling recipes coming from Xupopter Chrome Extension.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/ndoolan360/go-crawler
A simple web crawling program written in Go in an afternoon. 🕷️🕸️
afternoon-project crawler scraper
Last synced: 18 Jan 2025
https://github.com/aminehsan/datamining-divar.ir
Analyzing and Extracting Insights from Ads on 'divar.ir'
crawler data-mining data-science divar-ir scraping
Last synced: 04 Dec 2024
https://github.com/allotmentandy/socialmedialinkextractor
php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links
crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube
Last synced: 23 Dec 2024
https://github.com/tetreum/xupopter_client
Simple interface to manage Xupopter recipes aswell as it's runners.
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/alphabs/navercafeclient
네이버 카페 글 목록 크롤링을 위한 닷넷 라이브러리
crawler crawling dotnet naver naver-api naver-cafe web-scraper web-scraping
Last synced: 29 Nov 2024
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 14 Jan 2025