Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-15 00:06:36 UTC
- JSON Representation
https://github.com/1uc1f3r616/dark-net-websites-dataset
Dataset of Onion Websites
crawler darknet data-analysis dataset onion search-engine website
Last synced: 10 Jan 2025
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 19 Dec 2024
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 07 Jan 2025
https://github.com/georgea93/crawley
nodejs web crawler
crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn
Last synced: 20 Nov 2024
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 13 Jan 2025
https://github.com/simoninithomas/news-crawler-parse-backend
This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend
Last synced: 17 Nov 2024
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 27 Nov 2024
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 22 Dec 2024
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 21 Dec 2024
https://github.com/developerdavi/meli-crawler
Basic web crawler API for getting products from MercadoLibre (BRL | MLB)
api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit
Last synced: 25 Nov 2024
https://github.com/pyaesoneaungrgn/2d-crawler
2D crawler for set.or.th
2d 2d-crawler crawler myanmar php
Last synced: 09 Nov 2024
https://github.com/mmqnym/etherscan_tracker
Show how to tacker wallet on etherscan.io
Last synced: 17 Nov 2024
https://github.com/ktont/curlas
a nodejs spider tool
chrome-extension crawler spider
Last synced: 13 Jan 2025
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/mikirasora/osuplayedbeatmapscrawler
A crawler that fetch and download osu!beatmaps which you had played
Last synced: 01 Jan 2025
https://github.com/sergioburdisso/solidscraper
Easy to use JQuery-Like API for Web Scraping/Crawling.
crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping
Last synced: 23 Nov 2024
https://github.com/natlee/myanimelist-comment-crawler
Crawl all reviews and infomation of Anime works on MyAnimeList. ;)
anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite
Last synced: 21 Nov 2024
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 08 Nov 2024
https://github.com/achannarasappa/locust-cli
Developer tools to accelerate development of Locust jobs
cli crawler headless-chrome puppeteer scraper
Last synced: 18 Nov 2024
https://github.com/zhaotianff/crawler-line
C# command-line crawler
command-line command-line-tool crawler csharp dotnet-core
Last synced: 15 Jan 2025
https://github.com/v-braun/hero-scrape
Find the hero (main) image of an URL
crawler fastimage hero hero-image opengraph webscraping
Last synced: 15 Jan 2025
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 25 Dec 2024
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 09 Oct 2024
https://github.com/capturr/price-extract
Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.
amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript
Last synced: 07 Jan 2025
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 10 Oct 2024
https://github.com/archan937/webhead
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing
Last synced: 10 Nov 2024
https://github.com/ozansz/github-crawler
A basic utility for crawling users and e-mails of users
Last synced: 06 Dec 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 15 Nov 2024
https://github.com/huzecong/film-spider
Spiders crawling for film listing websites.
Last synced: 11 Jan 2025
https://github.com/joshuaquek/docusite-to-pdf
Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.
crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper
Last synced: 12 Jan 2025
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 28 Dec 2024
https://github.com/gatenlp/wpextract
Create datasets from WordPress sites for research or archiving
corpus crawler nlp text-extraction text-mining web-scraping wordpress
Last synced: 13 Nov 2024
https://github.com/oldkingcone/pbandj
PasteBin Crawler, crawls the url https://pastebin.com/archive
crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver
Last synced: 16 Nov 2024
https://github.com/spa5k/quick-scraper
An easy, lightweight scraper built using typescript for good developer experience.
crawler dx easy-to-use esbuild scraper typescript
Last synced: 13 Nov 2024
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 13 Nov 2024
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 11 Oct 2024
https://github.com/alexmili/reachable
Check if a URL exists and is reachable
crawler health-check monitoring reachability webscraping
Last synced: 10 Dec 2024
https://github.com/ribeirogab/technology-insights
Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.
crawler python scraping typescript
Last synced: 19 Nov 2024
https://github.com/indatawetrust/reporter
Crawler queue creation tool for paging
Last synced: 13 Dec 2024
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 12 Jan 2025
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 07 Jan 2025
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 06 Jan 2025
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 09 Dec 2024
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 23 Dec 2024
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/igeligel/teamfortressoutpostapi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 19 Nov 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 28 Dec 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 14 Nov 2024
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 14 Nov 2024
https://github.com/oxylabs/web-crawler
Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.
api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping
Last synced: 17 Nov 2024
https://github.com/obaskly/kikfriender.com-bot
A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.
ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist
Last synced: 08 Jan 2025
https://github.com/hctilg/pinterest-crawler
Downloads all images suitable for search
Last synced: 07 Nov 2024
https://github.com/zhaotianff/qzone
想起那天夕阳下的奔跑,那是我逝去的青春
crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf
Last synced: 15 Jan 2025
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 31 Dec 2024
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 09 Dec 2024
https://github.com/raspi/scrapy-kuntavaalit2021-yle
Fetch YLE kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 20 Dec 2024
https://github.com/camara94/crawlers
Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere
crawler python scraping scrapy spider
Last synced: 23 Dec 2024
https://github.com/kapitanluffy/sunny-crawler
That moment when I tried learning things about "Big Data" and "Inverted Indexes"
big-data crawler inverted-index php search
Last synced: 14 Dec 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 11 Jan 2025
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 24 Dec 2024
https://github.com/anzo52/jcrawl
Java web crawler
crawler java java-web-crawler web web-crawler
Last synced: 01 Jan 2025
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Jan 2025
https://github.com/sangupta/shopify-burst-crawler
Simple crawler to download meta information for all stock pics from Shopify Burst website
burst crawler java shopify stock-photos
Last synced: 08 Nov 2024
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 08 Jan 2025
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 14 Dec 2024
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 13 Jan 2025
https://github.com/travorlzh/temperature-analyzer
Python crawler that helps fetch temperature of Beijing, China
crawler homework python variance
Last synced: 16 Nov 2024
https://github.com/maraf/staticsitecrawler
A simple util for crawling links from root URL and saving HTML documents.
Last synced: 16 Nov 2024
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 07 Jan 2025
https://github.com/maxbubblegum47/spotydump
Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!
crawler dump genius lyrics python spotify unimore-informatica
Last synced: 29 Nov 2024