Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/dean9703111/ithelp_total_count

計算 IT邦幫忙文章的瀏覽/Like/留言總數

crawler ithelp total-likes total-responses total-views

Last synced: 12 Jan 2025

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 28 Oct 2024

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 03 Feb 2025

https://github.com/joeri-abbo/python-credly-scraper

This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an

badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling

Last synced: 15 Jan 2025

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 09 Feb 2025

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 Jan 2025

https://github.com/supadata-ai/js

Official TypeScript/JavaScript SDK for the Supadata API.

ai crawler llm markdown scraper transcript web-crawler youtube

Last synced: 27 Jan 2025

https://github.com/ycrao/some-spider-code

some spider code 财经资讯以及基金股票外汇价格爬虫

crawler economics fin-eco-news finance forex fund-value spider stock-price

Last synced: 19 Nov 2024

https://github.com/supadata-ai/py

Official Python SDK for the Supadata API.

ai api crawler llm markdown scraping sdk transcript web-scraper youtube

Last synced: 27 Jan 2025

https://github.com/mahmoudgalalz/pupt

A starter for web crawling using Puppeteer

crawler nodejs scraping

Last synced: 05 Jan 2025

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider

Last synced: 27 Jan 2025

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 10 Feb 2025

https://github.com/rogerluo410/gcrawler

Google search crawler for Ruby version. Crawling each links' text and url by keywords on Google.com.

crawler crawling google ruby

Last synced: 02 Jan 2025

https://github.com/efishery/wpi-kkp-crawler

This is crawler for fisheries price on wpi.kkp.go.id

crawler kkp wpi

Last synced: 02 Jan 2025

https://github.com/jimmy-ly00/dhe-prime-grabber

Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.

certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3

Last synced: 29 Dec 2024

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 30 Dec 2024

https://github.com/hctilg/taaghche-dl

Save books purchased from taaghche.com !

crawler downloader pillow-library python3 selenium taaghche

Last synced: 09 Jan 2025

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 22 Jan 2025

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 28 Jan 2025

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 14 Jan 2025

https://github.com/somnisomni/trawler-csharp

The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#

crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper

Last synced: 05 Jan 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 08 Jan 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 22 Jan 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/linux0hat/cpp-web-crawler

Explore the web.

cpp crawler sqlite3

Last synced: 12 Jan 2025

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 26 Dec 2024

https://github.com/sebyx07/active_proxy

Ruby proxy fetcher, retries request until completed, provides user agent🚀🚀

crawler http proxy rails ruby

Last synced: 28 Dec 2024

https://github.com/thomashirtz/douban-crawler

A simple crawler for retrieving information about movies or TV shows from the famous www.douban.com website.

crawler douban

Last synced: 25 Dec 2024

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 18 Jan 2025

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 23 Jan 2025

https://github.com/j-hoplin/naver_news_headtopic_news_scraper

네이버 뉴스에서 헤드라인 뉴스 스크레이핑

crawler naver-news scraper

Last synced: 05 Feb 2025

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 29 Jan 2025

https://github.com/tcc0lin/magiccrawler

Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:

crawler python3 spider

Last synced: 18 Jan 2025

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 18 Jan 2025

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 16 Jan 2025

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 23 Jan 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 09 Feb 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 27 Jan 2025

https://github.com/enansari/guess-price-car

Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتب‌خونه

crawler jadi machine-learning maktabkhoone maktabkhooneh python

Last synced: 09 Jan 2025

https://github.com/pythoript/pgn-scraper

PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.

7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip

Last synced: 23 Jan 2025

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 28 Dec 2024

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/knourian/freelancer.com-category-scrapping

Scrapping Categories from Freelancer.com Using scrapy with number of project for each category

crawler freelancer python3 scrapy web-crawler

Last synced: 05 Jan 2025

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 23 Dec 2024

https://github.com/chunkingz/youtubelinks-scraper

A python script that scrapes Youtube links from a predefined website of choice.

crawler python scraper spider websitescraper youtube

Last synced: 02 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 29 Jan 2025

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 07 Jan 2025

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 21 Jan 2025

https://github.com/christopher-besch/therapy_search

Compute Call Times from arztsuche-bw into a Calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 28 Dec 2024

https://github.com/citiususc/polypus

Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

analytics bigdata crawler scraper sentiment-analysis twitter

Last synced: 29 Jan 2025

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 28 Dec 2024

https://github.com/codelegant/movie-crawler-api

淘宝,猫眼,格瓦拉影票信息抓取接口

async await crawler mongoose request

Last synced: 11 Feb 2025

https://github.com/igorbrizack/web-scraper

Aplicação de raspagem de dados HTML, construída em python.

crawler pytest python3 scraper

Last synced: 26 Jan 2025

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 03 Jan 2025

https://github.com/okwilkins/web-crawler

This program will crawl through entire domains, exporting every link it can find into a txt file.

crawler crawling files html htmlparser python python3 reader scraper threading threads web writer

Last synced: 21 Jan 2025

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 29 Jan 2025

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 21 Dec 2024

https://github.com/fi1a/crawler

PHP crawler

crawler php

Last synced: 29 Jan 2025

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 22 Dec 2024

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 08 Nov 2024

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 02 Feb 2025

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 17 Jan 2025

https://github.com/victorpre/erlich

Erlich Bachman - Hacker Hostel

chatbot crawler elixir housing umbrella

Last synced: 02 Feb 2025

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 20 Jan 2025

https://github.com/priyakdey/github-api-crawler

A crawler to crawl and save the APIs found in the Public APIs github repo - https://github.com/public-apis/public-apis. Visit README for details.

api crawler mongo python3

Last synced: 02 Feb 2025

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 26 Jan 2025

https://github.com/jiamingla/mvdis_i18n

機車駕照預約考試多語友善版 Non-official

crawler jquery koa koajs nodejs supertest

Last synced: 26 Jan 2025

https://github.com/gozeon/weibo-crawler

微博爬虫

crawler web-crawler

Last synced: 26 Jan 2025

https://github.com/iarsham/scrapify

Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.

403-bypass arkose cloudflare crawler golang http-client scraper

Last synced: 05 Feb 2025

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 03 Jan 2025

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 31 Dec 2024

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Jan 2025

https://github.com/geoffreybauduin/website-checker

Performs useful checks against a website, such as 404 errors reporting, structured data validation...

crawler seo structured-data web-spider website

Last synced: 25 Dec 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 26 Jan 2025

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 10 Jan 2025

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 09 Jan 2025

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 02 Feb 2025

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 26 Jan 2025

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 30 Jan 2025