Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 15 Jan 2025

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 09 Nov 2024

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 09 Nov 2024

https://github.com/martinius96/web-scraper

Web scraper on ESP8266 board in client mode. Postprocessing in PHP with regular expressions.

arduino bot code crawler esp32 esp8266 html mysql php php7 robot scraper source web

Last synced: 03 Jan 2025

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Jan 2025

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 08 Jan 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 04 Feb 2025

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 08 Jan 2025

https://github.com/marcosvbras/twitton

A simple Python library to make Twitter Search API easily to use

crawler crawling python spider twitter twitter-api

Last synced: 01 Feb 2025

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 15 Jan 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 08 Jan 2025

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 15 Dec 2024

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 21 Jan 2025

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 08 Jan 2025

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 08 Jan 2025

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 15 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 08 Jan 2025

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 22 Jan 2025

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 04 Feb 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 09 Jan 2025

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 14 Jan 2025

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 02 Jan 2025

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 09 Jan 2025

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 25 Jan 2025

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 08 Jan 2025

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 16 Jan 2025

https://github.com/davelongdev/link-report-crawler

A web crawler using Node.js that crawls a site and returns a report showing all internal links.

crawler crawling javascript seo seo-tools

Last synced: 02 Jan 2025

https://github.com/pinpox/go-random-downloader

Download Html using "Random Page"

crawler golang

Last synced: 28 Jan 2025

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 16 Jan 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 11 Jan 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 16 Jan 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 22 Dec 2024

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 22 Jan 2025

https://github.com/fscotto/noahcrawler

A simple web crawler written in Java to support a database of Italian regions.

crawler java jsoup-library

Last synced: 21 Jan 2025

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 08 Jan 2025

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 01 Feb 2025

https://github.com/jauharibill/animeindo-crawler

this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage

crawler python3 scrapy

Last synced: 29 Dec 2024

https://github.com/luminovrym/crawler-tools-js

Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web

crawler crawler-js data js web-scraping

Last synced: 02 Jan 2025

https://github.com/sevenecks/web-crawler

crawl a website, find pages, find links, find relationships between them and report on 404 and other errors

404 checker crawler site web

Last synced: 02 Jan 2025

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 18 Jan 2025

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 16 Jan 2025

https://github.com/tungct/tngtcrawler

Crawler using Scrapy

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Jan 2025

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 16 Jan 2025

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 16 Jan 2025

https://github.com/pjt3591oo/python-parse

this are modules for url pasing

crawler

Last synced: 26 Dec 2024

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 16 Jan 2025

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 17 Jan 2025

https://github.com/shivamsaraswat/webxcrawler

WebXCrawler is a fast static crawler to crawl a website and get all the links.

crawler crawling python scraping webcrawler webxcrawler

Last synced: 06 Nov 2024

https://github.com/erickj3/strike-api

this is a web scraping api with nestsj

api crawler flow nestjs scraping typescript

Last synced: 24 Jan 2025

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 16 Jan 2025

https://github.com/jarircse16/bot_detection_firewall

Detects and Blocks generic crawlers from your website.

bot crawler php

Last synced: 30 Dec 2024

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 16 Jan 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 16 Jan 2025

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 22 Jan 2025

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/lulurun/kick-off-crawling

make web scraping easy

crawler nodejs scraper

Last synced: 26 Dec 2024

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 16 Jan 2025

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 12 Jan 2025

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Jan 2025

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 08 Jan 2025

https://github.com/murilobsd/icrop-csv

Icrop-csv para automatizar o processo do download dos relatórios.

crawler csv-export python3

Last synced: 28 Dec 2024

https://github.com/iomarmochtar/imagecrawler

Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+

crawler python-library

Last synced: 25 Dec 2024

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 28 Jan 2025

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 28 Jan 2025

https://github.com/zzzzer91/chinaxinge

chinaxinge 爬虫。

crawler python python3

Last synced: 10 Jan 2025

https://github.com/mahdijamebozorg/cryptonewscrawler

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 16 Jan 2025

https://github.com/mstephen19/apify-click-events

Like TypeScript, but for clicking ;) Manage automated clicks, and ensure your Apify web-crawler is only clicking exactly what you allow it to

apify apify-sdk crawler scraper web-automation

Last synced: 04 Feb 2025

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 07 Jan 2025

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 25 Dec 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 25 Dec 2024

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 28 Dec 2024

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 17 Jan 2025

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 20 Jan 2025

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 28 Jan 2025

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Jan 2025

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Jan 2025