Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-10 00:06:28 UTC
- JSON Representation
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 13 Nov 2024
https://github.com/spa5k/quick-scraper
An easy, lightweight scraper built using typescript for good developer experience.
crawler dx easy-to-use esbuild scraper typescript
Last synced: 13 Nov 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 14 Nov 2024
https://github.com/huzecong/film-spider
Spiders crawling for film listing websites.
Last synced: 11 Jan 2025
https://github.com/simoninithomas/news-crawler-parse-backend
This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend
Last synced: 17 Jan 2025
https://github.com/capturr/price-extract
Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.
amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript
Last synced: 07 Jan 2025
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 12 Jan 2025
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 24 Jan 2025
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 06 Jan 2025
https://github.com/mmqnym/etherscan_tracker
Show how to tacker wallet on etherscan.io
Last synced: 18 Jan 2025
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/zhaotianff/qzone
想起那天夕阳下的奔跑,那是我逝去的青春
crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf
Last synced: 15 Jan 2025
https://github.com/shunk031/lineblogscraper
Scraper for LINE Blog in Scrapy
crawler lineblog scraper scrapy
Last synced: 10 Jan 2025
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 01 Feb 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 16 Jan 2025
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 01 Feb 2025
https://github.com/basemax/fakefaces
This repository contains a crawler that downloads thousands of fake human face images from various sources on the internet. Additionally, the repository includes a dataset of thousands of face images of fake humans.
crawler crawler-php crawler-testing crawlers curl dataset datasets face face-fake faces fake-face fake-faces php php-curl
Last synced: 09 Feb 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/idanhoro/nasa-heat-maps-prediction
In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms
beautifulsoup crawler machine-learning pillow prediction python sklearn
Last synced: 20 Jan 2025
https://github.com/keosariel/ramby
Ramby is a simple way to setup a webscraper
beautifulsoup crawler python3 webscraping
Last synced: 01 Feb 2025
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Jan 2025
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/wangshouh/icourse163_script
A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本
crawler icourse163 python requests
Last synced: 02 Feb 2025
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 05 Jan 2025
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 19 Dec 2024
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 23 Oct 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 19 Dec 2024
https://github.com/kapitanluffy/sunny-crawler
That moment when I tried learning things about "Big Data" and "Inverted Indexes"
big-data crawler inverted-index php search
Last synced: 07 Feb 2025
https://github.com/skulltech/arachnid
Crawling Instagram for reasons.
crawler instagram instagram-scraper python3 scraper scrapy
Last synced: 01 Feb 2025
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 13 Jan 2025
https://github.com/aicore/app_info_extracter
This application would be used to extract information about apps from the internet
android appreview apps crawler googleplaystore
Last synced: 13 Nov 2024
https://github.com/tsonglew/spidreat
Article Spider with Python & Node.js :beetle:
Last synced: 19 Dec 2024
https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse
[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.
browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms
Last synced: 26 Jan 2025
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 09 Feb 2025
https://github.com/eduardosbcabral/desafio-tecnico-mp
Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.
Last synced: 13 Jan 2025
https://github.com/nava45/simplempcrawler
Simple Multiprocessing Crawler in python
crawler multiprocessing python
Last synced: 05 Jan 2025
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 20 Dec 2024
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 07 Jan 2025
https://github.com/carloocchiena/python_url_crawler
A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website
beautifulsoup crawler python python3
Last synced: 28 Nov 2024
https://github.com/first-coding/django-and-web
This is a django and Web front - and back -end separation project.
Last synced: 28 Dec 2024
https://github.com/antoinegagne/treewalker
A web crawler in Erlang that respects `robots.txt`.
Last synced: 20 Dec 2024
https://github.com/z3ntl3/redeye
Crawl real and new user agents from the most major 2 databases.
crawler header ua user-agents useragents
Last synced: 09 Feb 2025
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 09 Feb 2025
https://github.com/epigos/newsbot
A news bot written in Go for Dialogflow and Facebook messenger
autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed
Last synced: 27 Jan 2025
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 07 Feb 2025
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 29 Jan 2025
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/becky-dai/flower-knowledge-graph-visualization
A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目
crawler css django echarts html js knowledge-graph neo4j python
Last synced: 21 Dec 2024
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 29 Oct 2024
https://github.com/sebi75/lightweight-sitemapper
A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies
Last synced: 21 Dec 2024
https://github.com/spraakbanken/svt-crawler
Programme for crawling SVT's API for news articles and converting the data to XML.
Last synced: 28 Jan 2025
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 27 Oct 2024
https://github.com/mohammadrezaamani/squirrel
Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.
Last synced: 21 Dec 2024
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 14 Jan 2025
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 24 Dec 2024
https://github.com/superreal/octopus
Recursive and multi-threaded broken link checker
Last synced: 07 Jan 2025
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 07 Jan 2025
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 02 Jan 2025
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 07 Nov 2024
https://github.com/fabrix-app/spool-scraper
Spool: Webscraper
cheerio crawler fabrix nodejs scraping spools typescript webscraper
Last synced: 13 Jan 2025
https://github.com/omkarcloud/multiple-account-generation-template
🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING MULTIPLE ACCOUNTS ON A WEBSITE. 🤖
beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping
Last synced: 02 Jan 2025
https://github.com/foufou-exe/yspeed
Yspeed is a library that scrapes the Speedtest site
crawler python rich scraper scraping selenium selenium-python speedtest
Last synced: 08 Jan 2025
https://github.com/sangupta/shopify-burst-crawler
Simple crawler to download meta information for all stock pics from Shopify Burst website
burst crawler java shopify stock-photos
Last synced: 08 Nov 2024
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 16 Jan 2025
https://github.com/jiannei/github-trending
Github trending crawling based on lumen.
crawler github-trending lumen php
Last synced: 09 Nov 2024
https://github.com/linkspreed/twig
Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more
crawler engine search search-engine web5
Last synced: 03 Jan 2025
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 04 Feb 2025
https://github.com/restuwahyu13/node-scraper-content
example node scraper all content programming using puppeteer
crawler nodejs puppeter scrapper
Last synced: 03 Jan 2025
https://github.com/truethari/fcrawler
Python application that can be used to copy files of a given file type from a folder directory.
copy copy-files crawl crawler crawler-python file files
Last synced: 07 Jan 2025
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 08 Jan 2025
https://github.com/gill-singh-a/crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
crawler multithreading osint python python3 requests scraper
Last synced: 09 Nov 2024
https://github.com/jjlibra/bake-mediacrawler
NanmiCoder‘s self-media data crawling software
Last synced: 30 Nov 2024
https://github.com/maxbubblegum47/spotydump
Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!
crawler dump genius lyrics python spotify unimore-informatica
Last synced: 28 Jan 2025
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 17 Jan 2025
https://github.com/sean2077/leetcode_anki
Leetcode Anki card factory.
anki crawler leetcode leetcode-anki scrapy
Last synced: 11 Jan 2025
https://github.com/panyanyany/vps_spider
VPS Spider powering https://findallvps.com
Last synced: 11 Jan 2025
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Nov 2024