Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 13 Dec 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 03 Jan 2025

https://github.com/codelegant/movie-crawler-api

淘宝,猫眼,格瓦拉影票信息抓取接口

async await crawler mongoose request

Last synced: 18 Dec 2024

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 01 Jan 2025

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 25 Dec 2024

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 25 Dec 2024

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 25 Dec 2024

https://github.com/alatiera/ellinofreneia-crawler

Crawler of ellinofreneianet.gr for offline content consumption

crawler ellinofreneia

Last synced: 01 Jan 2025

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Dec 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 26 Dec 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 02 Jan 2025

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Nov 2024

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 03 Jan 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 15 Nov 2024

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 03 Jan 2025

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/istador/mediaindexer

Software for a cronjob to crawl the ViMP media center and generate an index for it as a static website.

crawler website

Last synced: 22 Nov 2024

https://github.com/dnknth/robot.py

Simple web spider

crawler curio python

Last synced: 23 Nov 2024

https://github.com/kaymen99/imdb-scraper

IMDB scraper allows to collect movies and tv shows data from the imdb website

crawler python scraper scraping scrapy

Last synced: 22 Nov 2024

https://github.com/vaenow/chromeless-coursera-caption

Chromeless crawler coursera video's caption / subtitle

caption chromeless coursera crawler crx subtitle

Last synced: 13 Dec 2024

https://github.com/leegeunhyeok/python-gongucrawler

파이썬3 공유마당 이미지 및 상세정보 크롤러

crawler python

Last synced: 22 Dec 2024

https://github.com/seanghay/wpget

⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API

crawler wordpress wp-json

Last synced: 22 Nov 2024

https://github.com/jul10l1r4/objetive

This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*

bigdata crawler data-science security-tools web

Last synced: 19 Nov 2024

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 14 Nov 2024

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 13 Nov 2024

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 01 Jan 2025

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 26 Dec 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Nov 2024

https://github.com/terminaldweller/crawley

A creepy crawler that runs as a sleepy daemon.

crawler daemon python3

Last synced: 26 Dec 2024

https://github.com/pvital/cra-cra

Another web crawler

crawler python

Last synced: 22 Nov 2024

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 13 Nov 2024

https://github.com/bradsec/gomine

A Go CLI tool to quickly crawl and mine (download) specific file types from websites.

cli crawler golang terminal-based

Last synced: 22 Dec 2024

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 14 Nov 2024

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 13 Nov 2024

https://github.com/seart-group/github-keyword-crawler

A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints

api-mining crawler dockerized github-api miner mongodb-database python-script

Last synced: 07 Dec 2024

https://github.com/waived/google-drive-crawler

Proxy-based crawler to expose public (shared) Google Drive links

crawler crawler-python file-crawler google-drive-api shared-folders web-spider

Last synced: 05 Dec 2024

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 11 Nov 2024

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 14 Dec 2024

https://github.com/juangesino/ah-bonus-crawler

React + Express application that crawls Albert Heijn's promotions.

crawler crawling express expressjs headless-chrome nodejs react reactjs

Last synced: 22 Nov 2024

https://github.com/thecloer/crawler-himym

How I met your mother script PDF generator for learning English

crawler pdf pdf-generation typescript web-scraping webscraping

Last synced: 10 Dec 2024

https://github.com/ri0n/unboxer

MP4 crawler and extractor

crawler extractor mp4 object-oriented-design qt

Last synced: 13 Nov 2024

https://github.com/vaenow/crawler-chromeless

A chromeless crawler for coursera

chromeless coursera crawler puppeteer

Last synced: 13 Dec 2024

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Nov 2024

https://github.com/pmuens/crawler

Multi-threaded Web crawler with support for custom fetching and persisting logic

crawler crawler-engine rust rust-lang web-crawler web-crawling

Last synced: 26 Dec 2024

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 13 Nov 2024

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 26 Nov 2024

https://github.com/igapyon/selecrawler

Simple selenium based web crawler

chrome crawler java selenium web

Last synced: 06 Jan 2025

https://github.com/gesiscss/github_traffic_crawler

Retrieve the data information from the repositories (insight, usage, commits)

crawler github traffic

Last synced: 03 Jan 2025

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 22 Dec 2024

https://github.com/kodemartin/webcrawler

A simple webcrawler

crawler rust

Last synced: 26 Nov 2024

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Nov 2024

https://github.com/thamindur/ir-project

Search Engine for Sri Lankan MPs

crawler elasticsearch python scraping search-engine

Last synced: 17 Dec 2024

https://github.com/amazingcoderpro/pythonup

玩转Python!for improving python skills

crawler python

Last synced: 30 Nov 2024

https://github.com/gnehs/twse-financial-ratios-crawler

透過指定的股票代號清單從公開資訊觀測站自動抓取財務比率資訊,並自動計算平均

crawler nodejs

Last synced: 26 Dec 2024

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 16 Nov 2024

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/vaibhavyadav-dev/codeforces-problemset-scrapper

Web Scrapper that scrap the whole problemset of Codeforces into csv or json file.

codeforces competative competative-programming crawler problemset programming python scrapy-crawler scrapy-spider

Last synced: 15 Nov 2024

https://github.com/lopins/article-crawler

一个简单的网页文章爬取工具,可以自定义抽取自己所需要的字段内容,简单容易上手。

article crawler ftp mysql python sqlite3

Last synced: 21 Dec 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 07 Jan 2025

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 20 Dec 2024

https://github.com/nowshad-sust/corona

A simple data endpoint for coronavirus updates

api corona coronavirus-updates crawler dcoker-compose excel nodejs

Last synced: 23 Nov 2024

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Nov 2024

https://github.com/kestarumper/imagecrawler

Downloads images from given URL

crawler image-downloader

Last synced: 06 Jan 2025

https://github.com/erickj3/strike-api

this is a web scraping api with nestsj

api crawler flow nestjs scraping typescript

Last synced: 24 Nov 2024

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 12 Nov 2024

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 12 Nov 2024

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 08 Dec 2024

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 16 Nov 2024

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 27 Dec 2024

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 27 Nov 2024

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 27 Nov 2024

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 14 Oct 2024

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 11 Nov 2024