Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-09 00:06:38 UTC
- JSON Representation
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 11 Oct 2024
https://github.com/zhaotianff/crawler-line
C# command-line crawler
command-line command-line-tool crawler csharp dotnet-core
Last synced: 15 Jan 2025
https://github.com/gatenlp/wpextract
Create datasets from WordPress sites for research or archiving
corpus crawler nlp text-extraction text-mining web-scraping wordpress
Last synced: 13 Nov 2024
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 23 Dec 2024
https://github.com/mrmarble/mineseek
Minecraft server scanner
crawler minecraft minecraft-server scanner slp
Last synced: 17 Jan 2025
https://github.com/oxylabs/web-crawler
Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.
api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping
Last synced: 17 Nov 2024
https://github.com/obaskly/kikfriender.com-bot
A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.
ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist
Last synced: 08 Jan 2025
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 22 Jan 2025
https://github.com/simoninithomas/news-crawler-parse-backend
This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend
Last synced: 17 Jan 2025
https://github.com/sergioburdisso/solidscraper
Easy to use JQuery-Like API for Web Scraping/Crawling.
crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping
Last synced: 23 Nov 2024
https://github.com/pyaesoneaungrgn/2d-crawler
2D crawler for set.or.th
2d 2d-crawler crawler myanmar php
Last synced: 09 Nov 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/ktont/curlas
a nodejs spider tool
chrome-extension crawler spider
Last synced: 13 Jan 2025
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 02 Jan 2025
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Nov 2024
https://github.com/mohammadrezaamani/squirrel
Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.
Last synced: 21 Dec 2024
https://github.com/linkspreed/twig
Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more
crawler engine search search-engine web5
Last synced: 03 Jan 2025
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 07 Feb 2025
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 01 Feb 2025
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 08 Jan 2025
https://github.com/nava45/simplempcrawler
Simple Multiprocessing Crawler in python
crawler multiprocessing python
Last synced: 05 Jan 2025
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 12 Jan 2025
https://github.com/genfuture/cryptocurrency-scraper
Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools
binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper
Last synced: 17 Jan 2025
https://github.com/gill-singh-a/crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
crawler multithreading osint python python3 requests scraper
Last synced: 09 Nov 2024
https://github.com/benderpan/fakeagent.net
Fake Agent for .Net Standard.
agent crawler fake-agent http-headers
Last synced: 23 Dec 2024
https://github.com/skulltech/arachnid
Crawling Instagram for reasons.
crawler instagram instagram-scraper python3 scraper scrapy
Last synced: 01 Feb 2025
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 27 Oct 2024
https://github.com/omkarcloud/multiple-account-generation-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING MULTIPLE ACCOUNTS ON A WEBSITE. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/kluhan/kraken
Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.
celery crawler google-play-store python web-crawling
Last synced: 08 Feb 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/superreal/octopus
Recursive and multi-threaded broken link checker
Last synced: 07 Jan 2025
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 16 Jan 2025
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/fabrix-app/spool-scraper
Spool: Webscraper
cheerio crawler fabrix nodejs scraping spools typescript webscraper
Last synced: 13 Jan 2025
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 13 Jan 2025
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 07 Jan 2025
https://github.com/galaxiat/galaxiat.serve.seo
Node.JS package to serve React app and prerender path (cron)
crawler cron puppeteer seo seo-optimization ssr
Last synced: 23 Dec 2024
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 31 Dec 2024
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 24 Jan 2025
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 18 Dec 2024
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 19 Jan 2025
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 11 Jan 2025
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 19 Jan 2025
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Jan 2025
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 19 Jan 2025
https://github.com/marcbperez/python-webcrawler
Crawls HTML pages for prices and other pieces of data.
Last synced: 20 Jan 2025
https://github.com/nextlevelshit/fick
Fucking Incredible Command line King. Add CLI flavour to any website you like to.
Last synced: 20 Jan 2025
https://github.com/first-coding/django-and-web
This is a django and Web front - and back -end separation project.
Last synced: 28 Dec 2024
https://github.com/becky-dai/flower-knowledge-graph-visualization
A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目
crawler css django echarts html js knowledge-graph neo4j python
Last synced: 21 Dec 2024
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 25 Dec 2024
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 20 Dec 2024
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 29 Jan 2025
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 14 Jan 2025
https://github.com/sean2077/leetcode_anki
Leetcode Anki card factory.
anki crawler leetcode leetcode-anki scrapy
Last synced: 11 Jan 2025
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 20 Dec 2024
https://github.com/keosariel/ramby
Ramby is a simple way to setup a webscraper
beautifulsoup crawler python3 webscraping
Last synced: 01 Feb 2025
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 07 Feb 2025
https://github.com/carloocchiena/python_url_crawler
A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website
beautifulsoup crawler python python3
Last synced: 28 Nov 2024
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/z3ntl3/redeye
Crawl real and new user agents from the most major 2 databases.
crawler header ua user-agents useragents
Last synced: 09 Feb 2025
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 01 Jan 2025
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 17 Jan 2025
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/zekrotja/r34-crawler
A simple CLI tool to fetch and download images from rule34.xxx
crawler go rest-api rule34 worker-pool xml
Last synced: 17 Dec 2024
https://github.com/aicore/app_info_extracter
This application would be used to extract information about apps from the internet
android appreview apps crawler googleplaystore
Last synced: 13 Nov 2024
https://github.com/viclafouch/pe-crawler
📌 An automated system that serves data extracted from the Google Help Center
crawler javascript nodejs postgresql sequelize
Last synced: 29 Jan 2025
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 23 Dec 2024
https://github.com/travorlzh/temperature-analyzer
Python crawler that helps fetch temperature of Beijing, China
crawler homework python variance
Last synced: 17 Jan 2025
https://github.com/rudrakshi99/web_crawler
A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.
Last synced: 22 Nov 2024
https://github.com/tsonglew/spidreat
Article Spider with Python & Node.js :beetle:
Last synced: 19 Dec 2024
https://github.com/agricolamz/2017_andan_course
Course for ANDAN Summer School about strings and texts in R
crawler language-detection r regular-expressions rstats string-distance string-manipulation strings teaching teaching-materials text-analysis tf-idf tidytext
Last synced: 30 Jan 2025
https://github.com/nemmusu/free-vpn-downloader
This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.
automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn
Last synced: 30 Jan 2025
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 17 Dec 2024
https://github.com/ozakboy/taiwan-news-crawlers
.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)
crawler data-collection dataset-generation dotnet news taiwan webcrawlers
Last synced: 22 Jan 2025
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 09 Feb 2025
https://github.com/ging-dev/sitemap-crawler
Collect links through the sitemap.xml or robots.txt
crawler php php8 sitemap sitemap-crawler
Last synced: 18 Nov 2024
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 28 Jan 2025
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 24 Jan 2025