Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/btlmd/asahi_nikkei_news_crawler

日本经济新闻、朝日新闻爬虫

crawler

Last synced: 21 Jan 2025

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 17 Nov 2024

https://github.com/longluo/spider

My Python Spider / Crawler

crawler python spider twitter weibo weibo-crawler weibo-spider

Last synced: 06 Jan 2025

https://github.com/devindon/movie-crawler

Movie crawler for douban.com, pianku.tv, etc.

crawler nodejs typescript

Last synced: 06 Dec 2024

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 07 Jan 2025

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 03 Jan 2025

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 15 Jan 2025

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 09 Nov 2024

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 09 Nov 2024

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Jan 2025

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 30 Nov 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 07 Jan 2025

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 08 Jan 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 25 Jan 2025

https://github.com/zawlinnnaing/my-wiki-crawler

A simple program for crawling Burmese wikipedia using Media wiki API.

crawler myanmar-tools python wikipedia-api

Last synced: 25 Dec 2024

https://github.com/jjeffcaii/ok-spider

a simple web crawler like scrapy

crawler nodejs scrapy spider

Last synced: 25 Dec 2024

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 03 Jan 2025

https://github.com/949886/pixiv-crawler

Pixiv illustration info crawler to local MySQL database.

crawler mysql pixiv

Last synced: 28 Dec 2024

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 15 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 08 Jan 2025

https://github.com/yanglr/csharp_spider

Crawler in C#

crawler csharp spider

Last synced: 22 Jan 2025

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 02 Dec 2024

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 26 Jan 2025

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 09 Jan 2025

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 26 Jan 2025

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 20 Dec 2024

https://github.com/jonasrenault/pubchem-api-crawler

Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.

chemistry crawler molecular-formula pubchem python

Last synced: 28 Nov 2024

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 09 Jan 2025

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 25 Jan 2025

https://github.com/spider-rs/spider-clients

Clients to use with the hosted spider service - spider.cloud

ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping

Last synced: 05 Nov 2024

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 16 Jan 2025

https://github.com/brianmacintosh/wikicrawler

Sandbox project for manipulating Wikimedia wikis

c-sharp crawler mediawiki-bot wikipedia-bot

Last synced: 30 Dec 2024

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 16 Jan 2025

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 11 Jan 2025

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 16 Jan 2025

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 22 Dec 2024

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 22 Jan 2025

https://github.com/tatamiya/gas-new-books-crawler

Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)

crawler gas

Last synced: 21 Jan 2025

https://github.com/mustafadalga/website-crawler

Hedef web sitesini tarayarak linklerini listeleyen bir web crawler scripti || A web crawler script that lists links by scanning the target website.

crawl crawler crawling-sites hacking hacking-tool web-crawler web-crawler-python web-crawling

Last synced: 18 Jan 2025

https://github.com/frobware/grawler

Web Crawler

crawler go

Last synced: 16 Jan 2025

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Jan 2025

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Jan 2025

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Jan 2025

https://github.com/brianbruggeman/vax

A vaccination signup tool

covid-19 crawler signup vaccination

Last synced: 16 Jan 2025

https://github.com/bruce-lee-ly/crawler

Several fun crawler cases implemented in Python.

crawler python

Last synced: 16 Jan 2025

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 16 Jan 2025

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 17 Jan 2025

https://github.com/erickj3/strike-api

this is a web scraping api with nestsj

api crawler flow nestjs scraping typescript

Last synced: 24 Jan 2025

https://github.com/brnrajoriya/india-s-states-and-cities-crawler

Crawler to crawl india's all states and cities

cities crawler india php script states

Last synced: 16 Jan 2025

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/radityaharya/sitesweeper

Sitesweeper is a python package to help you automate your web scraping process, outputting pages to a file

crawler pdf python website-crawler

Last synced: 05 Dec 2024

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 16 Jan 2025

https://github.com/usethisname1419/connectioncrawler

crawls a website and checks for connections

connection crawler http-headers reporting website-analyzer

Last synced: 26 Jan 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 16 Jan 2025

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 22 Jan 2025

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 30 Nov 2024

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 16 Jan 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 20 Jan 2025

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Jan 2025

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 12 Jan 2025

https://github.com/nextlevelshit/adonis-crawler

A free web crawler on top of the incredibile AdonisJS Framework

adonisjs crawler javascript nodejs regex spider websocket

Last synced: 20 Jan 2025

https://github.com/vhdm/twitter-hashtag-crawler

Twitter hashtag crawler by selenium, without using the Twitter API ;)

crawler python tor twitter

Last synced: 05 Jan 2025

https://github.com/surister/scrupy

Python library to create web Crawlers which aims to be powerful yet simple.

crawler crawling-framework crawling-python http library python scraping

Last synced: 18 Jan 2025

https://github.com/mahdijamebozorg/cryptonewscrawler

An end-to-end AI pipeline that performs technical and fundamental analysis of different cryptocurrencies.

crawler crypto cryptocurrency data-mining datamining information-retrieval llm python

Last synced: 16 Jan 2025

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 19 Dec 2024

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/docongminh/vinbdi-crawler

crawl data using scrapy + bs4

bs4-requests crawler scrapy splash

Last synced: 28 Dec 2024

https://github.com/luickk/vulnerability-crawler

Small python program meant to analyze random sites found on google for any vulnerabilities!

crawler xss

Last synced: 28 Dec 2024

https://github.com/zenoyang/webcrawler

一些爬虫代码

crawler scrapy spider web-crawler

Last synced: 17 Jan 2025

https://github.com/nextlevelshit/node-crawl

Webcrawler for nodejs

crawl crawler javascript nodejs

Last synced: 20 Jan 2025

https://github.com/timpletin/comming-soon

Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event

crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript

Last synced: 21 Jan 2025

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 15 Dec 2024

https://github.com/rayspock/go-web-crawler

A web crawler to fetch all the links from a given website via go routines.

concurrency crawler golang goroutine

Last synced: 14 Jan 2025

https://github.com/pourmand1376/crawler

Simple Crawler, Indexer and Search Engine Web Application

crawler csharp csharp-code dotnet mvc

Last synced: 14 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/ark930/douban-movie-crawler

豆瓣影评爬虫

crawler douban movie python

Last synced: 24 Jan 2025

https://github.com/jpleorx/tagblender

A simple java API to retrieve hashtags from https://www.tagblender.net/

api crawler hashtags java jsoup parser

Last synced: 25 Jan 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 22 Jan 2025

https://github.com/ronniery/crawler.synom

A crawler for the sinonimo.com.br website that saves the words into mongodb database.

bot crawler html html5 javascript mongodb nodejs nosql npm scraper thesaurus typescript web website xml

Last synced: 21 Dec 2024