Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-29 00:06:32 UTC
- JSON Representation
https://github.com/omkarcloud/dentalkart-scraper
🚀 SCRAPE 1000'S OF PRODUCTS FROM DENTALKART 🤖
beautifulsoup crawler crawling crawling-framework crawling-python dentalkart dentalkart-product-scraper dentalkart-scraper dentalkart-scraping node-crawler scraper scraping scraping-framework scraping-python selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 10 Jan 2025
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 03 Jan 2025
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 19 Dec 2024
https://github.com/imkrunalkanojiya/seo-checker
Resolve your SEO related issue by using SEO Checker Rest API
crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools
Last synced: 03 Jan 2025
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 23 Oct 2024
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 19 Dec 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 11 Jan 2025
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 05 Jan 2025
https://github.com/systemfsoftware/youtube-autocomplete-scraper
YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.
actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api
Last synced: 11 Jan 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 23 Jan 2025
https://github.com/fernandod1/yahoo-finance-scraper
This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.
crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api
Last synced: 12 Jan 2025
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Jan 2025
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 12 Jan 2025
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 14 Dec 2024
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Jan 2025
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 24 Jan 2025
https://github.com/idanhoro/nasa-heat-maps-prediction
In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms
beautifulsoup crawler machine-learning pillow prediction python sklearn
Last synced: 20 Jan 2025
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 19 Jan 2025
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 19 Jan 2025
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 19 Jan 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 16 Jan 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/linkspreed/twig
Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more
crawler engine search search-engine web5
Last synced: 03 Jan 2025
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 21 Jan 2025
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 12 Jan 2025
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 06 Dec 2024
https://github.com/sirius-mhlee/naver-cafe-crawler
NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4
beautifulsoup4 crawler pandas selenium tqdm
Last synced: 14 Jan 2025
https://github.com/tsoliangwu0130/ex-dividend-date-notification
crawler email-notification python3 stock-market vanguard
Last synced: 08 Jan 2025
https://github.com/jfcherng/wiki-cgroup-crawler
此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。
crawler php-71 wiki-cgroup-crawler wikipedia
Last synced: 22 Jan 2025
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 21 Jan 2025
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 08 Dec 2024
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Jan 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 11 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 23 Dec 2024
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 14 Jan 2025
https://github.com/emarifer/search-engine
A mini Google. Custom web crawler & indexer written in Golang.
crawler dashboard deep-first-search fiber-framework full-text-search golang gorm-orm htmx htmx-go hyperscript indexer inverted-index response-caching search-engine templ worker-pool
Last synced: 17 Jan 2025
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 23 Jan 2025
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 23 Jan 2025
https://github.com/tcc0lin/magiccrawler
Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Last synced: 18 Jan 2025
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 16 Jan 2025
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 14 Dec 2024
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 14 Dec 2024
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 27 Jan 2025
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 19 Dec 2024
https://github.com/juangesino/gazette
A personal news aggregator application using Meteor.
crawler meteor meteorjs news news-aggregator news-feed scraper
Last synced: 23 Jan 2025
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 22 Jan 2025
https://github.com/enansari/guess-price-car
Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتبخونه
crawler jadi machine-learning maktabkhoone maktabkhooneh python
Last synced: 09 Jan 2025
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 27 Jan 2025
https://github.com/pythoript/pgn-scraper
PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.
7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip
Last synced: 23 Jan 2025
https://github.com/milouk/web-crawler
Phoneutria Crawler
crawler crawlers database internet jar java spider web web-crawler
Last synced: 19 Jan 2025
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Dec 2024
https://github.com/dhchenx/quick-crawler
A toolkit for quickly performing crawler functions
Last synced: 01 Dec 2024
https://github.com/sefinek/niedlascamu.pl-tracker
Śledzenie zmian na stronie niedlascamu.pl.
crawl crawler niedlascamu tracker tracking
Last synced: 07 Dec 2024
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 15 Dec 2024
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 17 Dec 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/purrproof/smartcrawl
An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.
blockchain cli crawler explorer framework go golang hacktoberfest
Last synced: 27 Jan 2025
https://github.com/openpj/manifoldcf-sdk
Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions
apache crawler docker ecm extensions integrations manifoldcf migration sdk search
Last synced: 25 Jan 2025
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 02 Jan 2025
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Jan 2025
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 03 Dec 2024
https://github.com/loggerhead/dianping_crawler
基于 Scrapy (python 3.5) 的大众点评爬虫
Last synced: 24 Jan 2025
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 22 Jan 2025
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 03 Jan 2025
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 19 Jan 2025
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 26 Jan 2025
https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper
Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.
console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox
Last synced: 01 Dec 2024
https://github.com/redco/goose-phantom-environment
Environment for Goose parser which allows to run it in PhantomJS
crawler environment goose goose-parser nodejs parse parser phantomjs scraper
Last synced: 22 Dec 2024
https://github.com/devidw/google-untitled-spam-spider
A spam spider which is targeting 'Untitled' spam pages from the Google search results.
crawler crawling crawling-algorithm crawling-python crawling-sites crawling-tool google-untitled python python3 spam spam-detection spammer untitled untitled-spam
Last synced: 07 Dec 2024
https://github.com/victorhuu/amazonmovieintegration
本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据
crawler data-warehouse movies pandas scrapy xpath
Last synced: 26 Jan 2025
https://github.com/lucasbotang/project_financial_markets_text_mining
Predict stock market movement based on news
crawler data-science natural-language-processing python
Last synced: 25 Jan 2025
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 17 Jan 2025
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 06 Dec 2024
https://github.com/camilamaia/crawl4us
[WIP] A Python web crawler looking wildly for tables 🕵️♀️
beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping
Last synced: 08 Dec 2024
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 28 Dec 2024