Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 16 Jan 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 08 Jan 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 16 Jan 2025

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 22 Jan 2025

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/ariefrahmansyah/crawler

Simple website crawler using Go programming language.

crawler go

Last synced: 05 Dec 2024

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 30 Nov 2024

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 15 Dec 2024

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 16 Jan 2025

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 21 Jan 2025

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Jan 2025

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 26 Jan 2025

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 08 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 12 Jan 2025

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 08 Jan 2025

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 14 Jan 2025

https://github.com/mahdijamebozorg/cryptonewscrawler

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 16 Jan 2025

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 02 Jan 2025

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 08 Jan 2025

https://github.com/davelongdev/link-report-crawler

A web crawler using Node.js that crawls a site and returns a report showing all internal links.

crawler crawling javascript seo seo-tools

Last synced: 02 Jan 2025

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 17 Jan 2025

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 20 Jan 2025

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 21 Jan 2025

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Jan 2025

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Jan 2025

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 08 Jan 2025

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 29 Dec 2024

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 24 Jan 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 25 Jan 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 22 Jan 2025

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 02 Jan 2025

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 02 Jan 2025

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 26 Dec 2024

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 06 Nov 2024

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 17 Jan 2025

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 31 Dec 2024

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 24 Jan 2025

https://github.com/jarircse16/bot_detection_firewall

Detects and Blocks generic crawlers from your website.

bot crawler php

Last synced: 30 Dec 2024

https://github.com/abx123/crawler

Simple lambda function to crawl daily web novel updates.

crawler firebase-database golang lambda-functions

Last synced: 07 Dec 2024

https://github.com/abx123/coronachan

Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.

crawler lambda-functions python

Last synced: 07 Dec 2024

https://github.com/kaymen99/imdb-scraper

IMDB scraper allows to collect movies and tv shows data from the imdb website

crawler python scraper scraping scrapy

Last synced: 22 Jan 2025

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 26 Dec 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Jan 2025

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 08 Jan 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 28 Dec 2024

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 25 Dec 2024

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 10 Jan 2025

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 07 Jan 2025

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 25 Dec 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 25 Dec 2024

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 28 Dec 2024

https://github.com/spider-rs/spider-clients

Clients to use with the hosted spider service - spider.cloud

ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping

Last synced: 05 Nov 2024

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 30 Dec 2024

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 21 Jan 2025

https://github.com/manikantasanjay/stackoverflow_tag_generator_webcrawler

StackOverFlow Tag Generator Using a WebCrawler.

crawler python

Last synced: 22 Dec 2024

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Jan 2025

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 03 Dec 2024

https://github.com/timpletin/comming-soon

Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event

crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript

Last synced: 21 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 23 Jan 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 21 Dec 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 21 Dec 2024

https://github.com/hoan02/novel-crawler

Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn

crawler python

Last synced: 20 Jan 2025

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 21 Dec 2024

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 19 Jan 2025

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 20 Dec 2024

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 04 Jan 2025

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 23 Jan 2025

https://github.com/eneax/web-crawler

A web crawler built in Node.js

crawler javascript nodejs web-crawler

Last synced: 22 Dec 2024

https://github.com/jjpaulo2/crawler-financeiro

Módulo em Python que extrai dados públicos de planos de previdência do portal da SUSEP.

crawler docker ocr python selenium tesseract

Last synced: 21 Nov 2024

https://github.com/mg98/ipfs-replicate

Replicate IPFS' distributed data structure locally, based on network traces.

crawler dag ipfs redisgraph scraper

Last synced: 30 Nov 2024

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 07 Dec 2024

https://github.com/govau/warcraider

Convert WARC files into Avro for big data processing

avro bigquery crawler rust warc

Last synced: 21 Jan 2025

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 15 Dec 2024

https://github.com/hackthedev/botnet

Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!

botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web

Last synced: 23 Jan 2025

https://github.com/kimseogyu/crawling-music-ranks

음원순위 크롤링 코드

crawler jest typescript

Last synced: 21 Dec 2024

https://github.com/josepedrodias/naivebot

attempt to mimic googlebot behaviour in nodejs with nightmarejs

crawler googlebot nightmarejs nodejs robots

Last synced: 21 Jan 2025

https://github.com/qqxs/usda_pomological_watercolors

爬取美国农业部果树水彩的数据

crawler koa2 nodejs watercolors

Last synced: 18 Jan 2025