Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-10 00:06:02 UTC
- JSON Representation
https://github.com/zenixls2/2chpreprocess
Dump messages from 2ch with some preprocessing for ML analysis
Last synced: 04 Dec 2024
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 14 Nov 2024
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 12 Nov 2024
https://github.com/vivekg13186/lucas
A web crawler
crawler crawler-engine crawling-framework java
Last synced: 09 Dec 2024
https://github.com/erickj3/strike-api
this is a web scraping api with nestsj
api crawler flow nestjs scraping typescript
Last synced: 24 Nov 2024
https://github.com/eklem/vinmonopolet-crawler
Crawling Vinmonopolet-data and indexing it to a norch search index
crawler dataset javascript norch search-engine
Last synced: 04 Dec 2024
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 09 Dec 2024
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/radityaharya/sitesweeper
Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file
crawler pdf python website-crawler
Last synced: 05 Dec 2024
https://github.com/nagilum/focus
Simple CLI tool, written in C#, to crawl a site and log the responses.
cli crawl crawler csharp playwright
Last synced: 16 Nov 2024
https://github.com/estavadormir/scrappist
A web scrapper that takes an URL/URLs and converts into a PDF.
bun cli crawler pdf-generation
Last synced: 12 Nov 2024
https://github.com/mstephen19/apify-click-events
Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to
apify apify-sdk crawler scraper web-automation
Last synced: 10 Dec 2024
https://github.com/vhdm/twitter-hashtag-crawler
Twitter hashtag crawler by selenium, without using the Twitter API ;)
Last synced: 05 Jan 2025
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 19 Dec 2024
https://github.com/gxjansen/website-to-pdf
Creates a PDF based on the content of a website/subomain
claude-3-sonnet crawler python3
Last synced: 10 Dec 2024
https://github.com/athulmurali/flickr-api-docs-crawler
A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus
api beautifulsoup4 crawler documentation python3
Last synced: 09 Jan 2025
https://github.com/docongminh/vinbdi-crawler
crawl data using scrapy + bs4
bs4-requests crawler scrapy splash
Last synced: 28 Dec 2024
https://github.com/ahsouza/iquizz-api
API RESTfull developed in Node.Js with MongoDB
animations cluster crawler docker docker-compose ejs-templates es8 font-awesome grunt-task helmet-detection heroku javascript jquery material-design mongodb nodejs passport-strategy passportjs pusher token-authetication
Last synced: 10 Dec 2024
https://github.com/mohammadreza-mohammadi94/python-webscraper-projects
A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.
bs4 crawler object-oriented-programming python requests scrapy webscraping
Last synced: 26 Dec 2024
https://github.com/emarifer/search-engine
A mini Google. Custom web crawler & indexer written in Golang.
crawler dashboard deep-first-search fiber-framework full-text-search golang gorm-orm htmx htmx-go hyperscript indexer inverted-index response-caching search-engine templ worker-pool
Last synced: 16 Nov 2024
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 14 Nov 2024
https://github.com/notreeceharris/webstalker
🕸 A Powerful Relational Web Crawler
Last synced: 14 Nov 2024
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 28 Dec 2024
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 28 Dec 2024
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 25 Dec 2024
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 13 Nov 2024
https://github.com/nowshad-sust/corona
A simple data endpoint for coronavirus updates
api corona coronavirus-updates crawler dcoker-compose excel nodejs
Last synced: 23 Nov 2024
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 07 Jan 2025
https://github.com/kenanbek/tutorial-python-crawler
Crawling website data using Python with requests and Beautiful Soup libraries
beautifulsoup crawler crawling miner parser python python-requests requests
Last synced: 11 Dec 2024
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/yaoshanliang/linkedinspider
Crawl job information from LinkedIn for data analysis
big-data crawler python social-network-analysis
Last synced: 11 Dec 2024
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 12 Nov 2024
https://github.com/claudio-code/nap-web-crawler
Created It crawler to find broken links in docs of framework and languages
Last synced: 11 Dec 2024
https://github.com/chenbingwei1201/threads_scraper
A Python package for scraping Threads posts.
chromedriver crawler csv-format pypi pypi-package python python3 scraper scraping-websites
Last synced: 11 Dec 2024
https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper
Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.
codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider
Last synced: 15 Nov 2024
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/sinipelto/repo-license-crawler
Collects and summarizes license information on Python and NPM packages into output files.
crawler crawler-python license license-checker license-checking license-crawler license-management licenses licensing nodejs npm npm-license-crawler npm-license-tracker npm-licenses python python-script python3
Last synced: 11 Dec 2024
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 28 Dec 2024
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 14 Nov 2024
https://github.com/shentengtu/cht-yp-crawler
Simple Crawler of www.iyp.com.tw.
crawler node-js nodejs yellow-pages yellowpages
Last synced: 12 Nov 2024
https://github.com/kianoushamirpour/crawl_google_scholar_with_selenium_fastapi_mongodb
Crawl google scholar profiles with selenium, store the extracted data in the MongoDB and serve the queries with FastAPI.
crawler fastapi google-scholar mongodb python selenium
Last synced: 25 Dec 2024
https://github.com/bennettdams/vace-it-crawler
Python (Scrapy) crawler to access data of FACEIT.com
Last synced: 14 Nov 2024
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 25 Dec 2024
https://github.com/anthonysigogne/scrapy
A list of simple scrapers made with Scrapy
crawler elasticsearch python scrapy spider
Last synced: 12 Nov 2024
https://github.com/m1/smap
smap is a site-mapping engine written in Go.
crawler go go-library go-package golang golang-library golang-package golang-tools sitemap sitemap-generator web-crawler web-crawling
Last synced: 10 Dec 2024
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 05 Dec 2024
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 28 Dec 2024
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 12 Dec 2024
https://github.com/bandie91/extip
Fetch external IP from known ext. ip providers
address cli crawler external ip ipv4-address parallel
Last synced: 03 Jan 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 08 Jan 2025
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/fulcrum6378/twitter_profile_exporter
A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.
crawler exporter profile social-media sqlite twitter twitter-api
Last synced: 03 Jan 2025
https://github.com/ariefrahmansyah/crawler
Simple website crawler using Go programming language.
Last synced: 05 Dec 2024
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 23 Dec 2024
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
Last synced: 13 Dec 2024
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 23 Dec 2024
https://github.com/tonystrawberry/tcj-nihongo-crawler
🤖 Scraper for personal usage
crawler scraper selenium selenium-webdriver
Last synced: 14 Nov 2024
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 14 Nov 2024
https://github.com/xoraus/revieworacle
The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.
ai crawler datascience machinelearning scrappy selenium-webdriver
Last synced: 14 Nov 2024
https://github.com/tetreum/xupopter_chrome_extension
Extension to easily create crawling recipes
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 14 Dec 2024
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 10 Jan 2025
https://github.com/zahraarshia/cti_crawl
This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.
crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper
Last synced: 09 Jan 2025
https://github.com/tinoco/ticapsoriginal_website_score_overview
Ticapsoriginal website sitemaps checker score overview
advertools beautifulsoup behave bs4 chart crawler linkbuilding matplotlib metrics metrics-visualization parser python requests score sitemaps ticapsoriginal tqdm unittesting urllib
Last synced: 09 Jan 2025
https://github.com/octcarp/sustech_cs209a-java2_f24_proj
(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Yida Tao @yidatao .
crawler spring-boot stackexchange sustech visualization
Last synced: 01 Jan 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 23 Dec 2024
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 23 Dec 2024
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 19 Nov 2024
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 23 Dec 2024
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 10 Jan 2025
https://github.com/machinecyc/lotteryinsight
Use crawler to collect Taiwan Lotto data, and save data into local MySQL server.
crawler data docker lottery mysql-database python3 taiwan
Last synced: 05 Dec 2024
https://github.com/g-ongenae/morphalou-crawler
A Crawler for CNRTL's Morphologie words
crawler french lexical-databases list-of-words words
Last synced: 15 Oct 2024
https://github.com/tigercosmos/web-crawler
Web Crawler in Java Maven Project
Last synced: 05 Dec 2024
https://github.com/ggteixeira/corpus-cleaner
Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.
beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping
Last synced: 12 Nov 2024
https://github.com/apexcaptain/allergy-alert
오늘 날짜를 기준으로 모 대학의 학교 홈페이지에서 제공하는 식당 정보를 Crawling하여 회관별/메뉴 분류 별로 메뉴들과 메뉴 별 알러지 유발 식품에 대한 정보를 알려줍니다.
crawler docker expressjs puppeteer reactjs sqlite typescript
Last synced: 01 Dec 2024