Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-08 00:06:12 UTC
- JSON Representation
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 05 Nov 2024
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 28 Oct 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 12 Oct 2024
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 17 Oct 2024
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 10 Oct 2024
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 17 Oct 2024
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 06 Nov 2024
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 02 Aug 2024
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 23 Oct 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 09 Oct 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 07 Nov 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 02 Aug 2024
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 02 Aug 2024
https://github.com/ozansz/github-crawler
A basic utility for crawling users and e-mails of users
Last synced: 16 Oct 2024
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 13 Oct 2024
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 29 Oct 2024
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 29 Oct 2024
https://github.com/sebi75/lightweight-sitemapper
A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies
Last synced: 27 Oct 2024
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 26 Oct 2024
https://github.com/tsonglew/spidreat
Article Spider with Python & Node.js :beetle:
Last synced: 31 Oct 2024
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 08 Nov 2024
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 11 Oct 2024
https://github.com/galaxiat/galaxiat.serve.seo
Node.JS package to serve React app and prerender path (cron)
crawler cron puppeteer seo seo-optimization ssr
Last synced: 05 Nov 2024
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 14 Oct 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 31 Oct 2024
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 15 Oct 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Oct 2024
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 17 Oct 2024
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 07 Nov 2024
https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse
[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.
browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms
Last synced: 13 Oct 2024
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 15 Oct 2024
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 11 Oct 2024
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 11 Oct 2024
https://github.com/tbarnes94/fortnite-weapons-bot
A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.
crawler discord discord-bot discord-js typescript2
Last synced: 15 Oct 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 15 Oct 2024
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 11 Oct 2024
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 29 Oct 2024
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 15 Oct 2024
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 30 Oct 2024
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 21 Oct 2024
https://github.com/benderpan/fakeagent.net
Fake Agent for .Net Standard.
agent crawler fake-agent http-headers
Last synced: 05 Nov 2024
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 26 Oct 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 31 Oct 2024
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 06 Nov 2024
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
crawler lahore-garrison-university lahore-garrison-university-timetable
Last synced: 26 Oct 2024
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 27 Oct 2024
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 10 Oct 2024
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 23 Oct 2024
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 31 Oct 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Oct 2024
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 27 Oct 2024
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 27 Oct 2024
https://github.com/nakabonne/netsurfer
netsurfer is a very lightweight scraping framework
Last synced: 27 Oct 2024
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 07 Nov 2024
https://github.com/schbenedikt/web-crawler
A simple web crawler using Python that stores the metadata of each web page in a database.
crawler database mariadb mysql python python-crawler web
Last synced: 08 Nov 2024
https://github.com/joaopauloaramuni/python
Repo Python
crawler python scraping scrapy
Last synced: 04 Nov 2024
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 06 Nov 2024
https://github.com/yjg30737/pyqt-google-image-crawler
Crawling image files from Google search result with Python and icrawler
beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application
Last synced: 07 Nov 2024
https://github.com/yjg30737/pyqt-wikipedia-crawler
Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI
beautifulsoup4 crawler pyqt pyqt5 wikipedia
Last synced: 07 Nov 2024
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 12 Oct 2024
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 14 Oct 2024
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 06 Nov 2024
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 13 Oct 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 14 Oct 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 06 Nov 2024
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 06 Nov 2024
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 28 Oct 2024
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 05 Nov 2024
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 19 Oct 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 30 Oct 2024
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 01 Nov 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 13 Oct 2024
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 13 Oct 2024