Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/hctilg/taaghche-dl

Save books purchased from taaghche.com !

crawler downloader pillow-library python3 selenium taaghche

Last synced: 09 Jan 2025

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 25 Jan 2025

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 22 Jan 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子,注意营养

crawler mzitu python

Last synced: 08 Jan 2025

https://github.com/fnkr/gocrawl

Simple web crawler.

crawler http-client

Last synced: 28 Jan 2025

https://github.com/basemax/kashan-university-phone-directory

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.

crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan

Last synced: 09 Feb 2025

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 27 Jan 2025

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 26 Jan 2025

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/hantang/list-movies-top

豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取

crawler douban imdb movie

Last synced: 07 Jan 2025

https://github.com/bujosa/aldebaran

Example use APP ENGINE with Python3, ThreadPool and webScraping

appengine crawler flask gcp python3 thread-pool

Last synced: 21 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 18 Jan 2025

https://github.com/bing-su/arcalive-crawler-python

아카라이브 크롤러

crawler python

Last synced: 02 Jan 2025

https://github.com/jesseokeya/linkedin-scraper

Selenium webDriver used to get information from linkedIn

chromedriver crawler linkedin os python scraper selenium-webdriver

Last synced: 25 Dec 2024

https://github.com/twknab/django_ajax_web_crawler

Web crawler which retrieves all links on any page. Python & Django-powered.

beautifulsoup4 crawler django-application

Last synced: 25 Dec 2024

https://github.com/lillyschramm/spiegel.de-miner

A bot that automatically saves any posts created at Spiegel.de

crawler spiegel-online

Last synced: 01 Jan 2025

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 08 Jan 2025

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 10 Jan 2025

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 01 Feb 2025

https://github.com/g-ongenae/morphalou-crawler

A Crawler for CNRTL's Morphologie words

crawler french lexical-databases list-of-words words

Last synced: 15 Oct 2024

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 01 Jan 2025

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 26 Dec 2024

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 26 Dec 2024

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 26 Dec 2024

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 11 Jan 2025

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 26 Dec 2024

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 11 Jan 2025

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 17 Nov 2024

https://github.com/pjt3591oo/spider-base_crawler

scrapy 기반 크롤러 만들기

crawler python scrapy spider

Last synced: 26 Dec 2024

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 08 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/sahaavi/web-scraping

Learn Web-Scraping using BeautifulSoup, Selenium and Scrapy with hands on projects!

beautifulsoup4 crawler headless-mode pagination scrapy selenium spider splash web-scraper web-scraping

Last synced: 26 Dec 2024

https://github.com/lsongdev/node-crawler

simple crawler

crawler node-crawler

Last synced: 02 Jan 2025

https://github.com/mohammadreza-mohammadi94/python-webscraper-projects

A collection of Python web scraping projects, showcasing techniques to extract and process data from various websites. Perfect for learning how to gather and analyze web data efficiently.

bs4 crawler object-oriented-programming python requests scrapy webscraping

Last synced: 26 Dec 2024

https://github.com/ggteixeira/motorcycle-simulator

A toy project that fetches prices from motorcycles from OLX and does some calculations for those who want to buy them..

crawler motorcycle olx scraper

Last synced: 11 Jan 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 01 Feb 2025

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 10 Jan 2025

https://github.com/krishpranav/gozap

⚡️ Multiple target ZAP Scanning made in go

cli crawler go go-crawler golang zap

Last synced: 01 Feb 2025

https://github.com/theabbie/shopcrawler

Crawler for Discovering Product URLs on E-commerce Websites (assignment)

crawler

Last synced: 17 Jan 2025

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 23 Jan 2025

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 03 Jan 2025

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 10 Jan 2025

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 10 Jan 2025

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 08 Feb 2025

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 03 Jan 2025

https://github.com/tormol/zenphoto-dl

A script for recursively downloading all pictures from zenphoto-based photo albums.

crawler python-script

Last synced: 30 Jan 2025

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 27 Dec 2024

https://github.com/rkeytech/scrapy-adventure

A Scrapy Framework project on scraping a site with pagination, extracting item information, and outputing it to a csv file or inserting it to a database.

beginner-friendly beginner-project crawler csv database python python3 scrapy scrapy-framework spider web-scraping

Last synced: 10 Feb 2025

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 10 Jan 2025

https://github.com/kahsolt/tieba-dl

A simple image crawler/downloader for Baidu tieba.

baidu-tieba crawler image-crawler tieba

Last synced: 03 Jan 2025

https://github.com/reineimi/va2crawl

Website crawler, validator and SEO optimizer

crawler seo-optimization seotools validator website-crawler

Last synced: 10 Jan 2025

https://github.com/ayoubzulfiqar/spidy

The DART Libraray for Data Crawling & Scrapping

crawler dart flutter scraper scraping spider

Last synced: 03 Jan 2025

https://github.com/tca166/ck2-history-extractor

A tool for creating an encyclopedia from your CK2 savefile

ck2 crawler crusader-kings-2

Last synced: 07 Feb 2025

https://github.com/sxoxgxi/webcrawler

A multi threaded web crawler

crawler python webcrawling

Last synced: 25 Jan 2025

https://github.com/salman0ansari/sitefetch

Fetch a site and extract its readable content as Markdown (to be used with AI models).

ai chatgpt crawler fetcher golang scraping

Last synced: 07 Feb 2025

https://github.com/moojing/coinmarketcap-crypto-crawler

A Raycast plugin for getting the latest price of your favorite coins from CoinMarketCap.

crawler cryptocurrency

Last synced: 07 Feb 2025

https://github.com/splorg/sage

A scraper to get every quote from a book off of Goodreads.

books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping

Last synced: 21 Jan 2025

https://github.com/marceloneppel/crawler

Simple web crawler developed in Go.

crawler go golang web-crawler

Last synced: 30 Jan 2025

https://github.com/cold-bin/jwzx-mail

use golang to construct cqupt-jwzx crawler application

crawler golang

Last synced: 11 Jan 2025

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 08 Jan 2025

https://github.com/massongit/ibaraki-univ-circle-crawler

Crawls official circles in Ibaraki University from university's website

crawler python

Last synced: 30 Jan 2025

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 04 Jan 2025

https://github.com/zhqiang1989/youtube-graph-collector

A demo in python on how to collect youtube video engagement graph data

crawler graph video youtube

Last synced: 11 Jan 2025

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 15 Jan 2025

https://github.com/monumentality/ifiend

Check latest YouTube uploads without leaving the comfort of your terminal.

crawler headless-chrome terminal-based youtube yt-dlp

Last synced: 11 Jan 2025

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 08 Jan 2025

https://github.com/anthonysigogne/scrapy

A list of simple scrapers made with Scrapy

crawler elasticsearch python scrapy spider

Last synced: 11 Jan 2025

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 15 Jan 2025

https://github.com/thomas-rothe/symfonywebcrawler

PHP project for helping in SEO

crawler docker php php8 seo sitemap-xml symfony7

Last synced: 17 Jan 2025

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 08 Feb 2025

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 19 Jan 2025

https://github.com/d-w-arnold/local-news-data-collection

Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎

crawler data-collection python

Last synced: 07 Feb 2025

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 14 Jan 2025

https://github.com/rayc2045/ghibli-crawler

Automatically download 1,178 studio Ghibli's work photos

axios crawler ghibli node node-js nodejs puppeteer rest-api restful restful-api

Last synced: 26 Jan 2025

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 03 Feb 2025

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 04 Feb 2025

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 11 Jan 2025

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 11 Jan 2025

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Jan 2025

https://github.com/kahsolt/qzone_mood_dumper

Dump your qzone mood(说说) history to local SQL database storage

crawler dumper qzone-mood

Last synced: 03 Jan 2025

https://github.com/wilmsn/simple_deye_crawler

A simple crawler to get data from the Deye Inverter using the status webpage

crawler deye fhem inverter shell-script

Last synced: 18 Jan 2025

https://github.com/ashwantmanikoth/aipoweredwebcrawler

This is a AI powered crawler that can search the web for information based on your input.

crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation

Last synced: 10 Feb 2025

https://github.com/khanof89/twitter_scraper

Scrape tweet details from user profile using selenium

crawler scraper selenium twitter twitter-bot

Last synced: 10 Jan 2025

https://github.com/truongdd03/searchengine

A search engine written in c++.

cpp crawler search search-engine

Last synced: 20 Dec 2024

https://github.com/jamesponddotco/wikiextract

[READ-ONLY] A word extractor for Wikipedia articles.

crawler crawling diceware go wikipedia wikipedia-crawler word-extraction

Last synced: 21 Jan 2025

https://github.com/russellsteadman/netscrape

A Node.js framework for creating good bots

bot crawler crawling exclusion rfc9309 scraper scraping web-scraping

Last synced: 03 Jan 2025