Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 11 Dec 2024

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 17 Dec 2024

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 01 Jan 2025

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 18 Dec 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 17 Jan 2025

https://github.com/tvrcgo/collect

数据采集

crawler scraper

Last synced: 19 Dec 2024

https://github.com/spaceemotion/goodreads-browser

Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍

books crawler goodreads

Last synced: 26 Dec 2024

https://github.com/ewertoncodes/mind-crawler

A simple api written in Rails to extract quotations from the Quotes to Scrape site.

crawler ruby ruby-on-rails

Last synced: 23 Jan 2025

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 26 Dec 2024

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 10 Jan 2025

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 03 Jan 2025

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 03 Jan 2025

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 17 Jan 2025

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 11 Jan 2025

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 11 Jan 2025

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 12 Jan 2025

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 12 Jan 2025

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 12 Jan 2025

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 24 Jan 2025

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 19 Jan 2025

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 19 Jan 2025

https://github.com/alatiera/ellinofreneia-crawler

Crawler of ellinofreneianet.gr for offline content consumption

crawler ellinofreneia

Last synced: 01 Jan 2025

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 26 Dec 2024

https://github.com/chen0040/ios-stock-tracker

Stock tracker implemented using Objective-C for iOS

crawler ios-app objective-c stock-prices

Last synced: 16 Dec 2024

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 17 Jan 2025

https://github.com/alexzhangs/stockdb

Stock data collecting and analyzing

crawler django pandas scrapy stock tushare

Last synced: 08 Jan 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 26 Dec 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 28 Dec 2024

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 02 Jan 2025

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 22 Dec 2024

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 03 Dec 2024

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/tsaohucn/crawler_fb_group

This is crawler use selenium for facebook groups

crawler facebook-groups rails ruby

Last synced: 20 Jan 2025

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 25 Dec 2024

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 28 Nov 2024

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 28 Nov 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 28 Nov 2024

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 18 Jan 2025

https://github.com/0fatal/zjxxc-crawl

在浙学爬虫:作业情况和登录

crawler

Last synced: 16 Dec 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 01 Dec 2024

https://github.com/exp-codes/pyzone-crawler

QQ空间爬虫(Python版)

crawler programming

Last synced: 16 Dec 2024

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 29 Nov 2024

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 04 Jan 2025

https://github.com/teal33t/base_crawler

Simple scaffold for selenium based crawler bots

crawler scaffold-template selenium selenium-python

Last synced: 23 Jan 2025

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 03 Jan 2025

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 28 Nov 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 08 Jan 2025

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 16 Dec 2024

https://github.com/sinkaroid/webnovelcrawler

Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.

crawler dompdf webnovel

Last synced: 23 Dec 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/j-hoplin/naver_news_headtopic_news_scraper

네이버 뉴스에서 헤드라인 뉴스 스크레이핑

crawler naver-news scraper

Last synced: 11 Dec 2024

https://github.com/orsinium-labs/gpcc

Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)

crawler gpc gs1

Last synced: 17 Jan 2025

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Jan 2025

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Jan 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 23 Jan 2025

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 17 Jan 2025

https://github.com/victorhuu/amazonmovieintegration

本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据

crawler data-warehouse movies pandas scrapy xpath

Last synced: 28 Nov 2024

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 03 Jan 2025

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 09 Jan 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 28 Nov 2024

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 28 Nov 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 06 Dec 2024

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Jan 2025

https://github.com/wondervictor/spiderman

2017 Software Course Project

crawler distribute-crawler zhihu-crawler

Last synced: 17 Jan 2025

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 22 Jan 2025

https://github.com/yassilah/nuxt-crawler

Automatic crawler & search for Nuxt SSG.

algolia crawler nuxt search ssg

Last synced: 17 Jan 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 11 Jan 2025

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 23 Jan 2025

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 10 Jan 2025

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 28 Nov 2024

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 02 Dec 2024