Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 25 Dec 2024

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 25 Dec 2024

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 25 Dec 2024

https://github.com/alatiera/ellinofreneia-crawler

Crawler of ellinofreneianet.gr for offline content consumption

crawler ellinofreneia

Last synced: 01 Jan 2025

https://github.com/tsaohucn/crawler_fb_group

This is crawler use selenium for facebook groups

crawler facebook-groups rails ruby

Last synced: 19 Nov 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 26 Dec 2024

https://github.com/knourian/freelancer.com-category-scrapping

Scrapping Categories from Freelancer.com Using scrapy with number of project for each category

crawler freelancer python3 scrapy web-crawler

Last synced: 05 Jan 2025

https://github.com/chunkingz/youtubelinks-scraper

A python script that scrapes Youtube links from a predefined website of choice.

crawler python scraper spider websitescraper youtube

Last synced: 02 Jan 2025

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 02 Jan 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 15 Nov 2024

https://github.com/arshadkazmi42/gh-crawl

Crawler for Github repositories. Finds all the broken links from the repositories

bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python

Last synced: 21 Dec 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 08 Dec 2024

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 13 Dec 2024

https://github.com/f-ca7/movie-cat

A website displaying movies

crawler golang website

Last synced: 03 Jan 2025

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 03 Jan 2025

https://github.com/kahsolt/allchan

An image crawler for xChan(4chan/8ch/...) image board.

4chan 4chan-downloader 8chan crawler image-crawler

Last synced: 03 Jan 2025

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 10 Jan 2025

https://github.com/beomi/pycon2017

2017 파이콘 발표자료: <처음부터 알아보는 웹 크롤러>

crawler pyconkr python

Last synced: 11 Jan 2025

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Dec 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 11 Jan 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Nov 2024

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/soulyma/web_crawler

A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.

beautifulsoup4 crawler csv data json python structured-data

Last synced: 13 Dec 2024

https://github.com/thiiagoms/car-stealth

REST API to all cars that were stolen

api cars crawler student

Last synced: 15 Nov 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Jan 2025

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 06 Dec 2024

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 16 Nov 2024

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 04 Jan 2025

https://github.com/oleksandr-moik/spring-boot-web-crawler

Web Crawler app on Spring Boot. Getting categories and relevant news category.

crawler gradle java spring-boot

Last synced: 08 Dec 2024

https://github.com/ymdarake/otenki-crawler

Yet another weather data scraper.

crawler weather weather-data

Last synced: 15 Nov 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 06 Jan 2025

https://github.com/tylpk1216/new-taipei-parkinfo

Find the available parking in New Taipei, Taiwan.

crawler golang goverment-data

Last synced: 27 Nov 2024

https://github.com/tylpk1216/favorite-youtube-to-video

Download your favorite youtube video in PHP

crawler php tool youtube

Last synced: 27 Nov 2024

https://github.com/phanletrunghieu/webcrawler

A web crawler with Spring MVC

crawler java servlet spring-mvc springframework

Last synced: 30 Nov 2024

https://github.com/basemax/css-properties

The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.

crawler css css-properties css-property css3

Last synced: 14 Jan 2025

https://github.com/bramtenhove/issue-crawler

Crawls Drupal issues and keeps stats

crawler

Last synced: 29 Dec 2024

https://github.com/kartikmehta8/pycrawler

PyCrawler is a web scraper that takes a link as input and returns all the links connected to the page(s). Goes beyond recursion. Threaded.

crawler cybersecurity python

Last synced: 15 Nov 2024

https://github.com/mindfiredigital/deepscanbot

It allows you to crawl websites with various configurations, including crawl depth, timeout settings, proxy support, and output options.

bot crawl crawler go golang google webcrawler

Last synced: 28 Dec 2024

https://github.com/xiangronglin/novel2go

Android app to create pdf from website and send to your kindle

android crawler jetpack kotlin pdf-generation readability

Last synced: 21 Dec 2024

https://github.com/jyasskin/pbot-crawler

Crawler for PBOT's website to show what has changed.

crawler

Last synced: 30 Nov 2024

https://github.com/phatpham9/scraper.fun

Building, using & sharing HTML scraper are way funnier!

crawler html-scraper scraper

Last synced: 02 Dec 2024

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 08 Jan 2025

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 13 Dec 2024

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/jeanluc162/prnt-sc-crawler

Crawler for the Website prnt.sc

crawler net5 net50 prntsc screenshots

Last synced: 15 Nov 2024

https://github.com/nblthree/python-url-crawler

Simple web crawler

crawler python3

Last synced: 03 Dec 2024

https://github.com/onetail/crawler-with-kafka-docker

homework to crawler and anaylsis

analysis crawler kafka-docker

Last synced: 24 Nov 2024

https://github.com/freakwill/mycrawlers

🕷 My Crawlers for Movies、Information、Encyclopedia...

baidu crawler douban movie quotes taobao

Last synced: 28 Nov 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 02 Dec 2024

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 08 Dec 2024

https://github.com/onetail/applenews

simple crawler

crawler simple

Last synced: 24 Nov 2024

https://github.com/kehiy/prawler

Pactus P2P Network Crawler

crawler crawling metrics networking p2p pactus

Last synced: 28 Dec 2024

https://github.com/igorbrizack/web-scraper

Web-Scraper aplication

crawler pytest python3 scraper

Last synced: 28 Nov 2024

https://github.com/igorbrizack/crawler-web

Aplicação de coleta de dados Web com ReactJS e Python - API Rest

beautifulsoup crawler docker fastapi mongodb nodejs python3 react scraper

Last synced: 28 Nov 2024

https://github.com/thejoin95/free-proxies.info

API service for get anonymous and non proxy, filter by latency, country, updatetime and more

api crawler http-proxy proxy proxy-list python scraper

Last synced: 06 Jan 2025

https://github.com/Kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 23 Oct 2024

https://github.com/bandie91/extip

Fetch external IP from known ext. ip providers

address cli crawler external ip ipv4-address parallel

Last synced: 03 Jan 2025

https://github.com/zzzzer91/match_spider

某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:

crawler python

Last synced: 10 Jan 2025

https://github.com/grayhat12/grawler

A web based Crawler that takes two inputs(search item, number of sites to search)and curently displays Readable Content in Text Format but the Code can be modified to display the HTML code.

crawler scraping scraping-websites scrapper scrapy-crawler

Last synced: 06 Dec 2024

https://github.com/namchee/hackerbits

Web Crawler dan Clustering pada website HackerNews.

clustering crawler python3

Last synced: 02 Dec 2024

https://github.com/zzzzer91/crash

通用多线程爬虫框架。

crawler framework python

Last synced: 10 Jan 2025

https://github.com/jamesjarvis/web-graph

Experiment with web scraping

colly crawler database golang web-graph

Last synced: 02 Dec 2024

https://github.com/isaqueveras/scrape-google-results

Scrape Google Results in Golang

crawler golang google scraper webcrawler

Last synced: 28 Nov 2024

https://github.com/der3318/daily-pixiv

Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations

crawler line-notify pixiv workflow

Last synced: 13 Jan 2025

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 08 Jan 2025

https://github.com/fulcrum6378/twitter_profile_exporter

A web-based application which crawls profiles on Twitter for all of their tweets, all tweets related to them, including their attachments, statistics and data of their authors. Main data is stored in an SQLite database and all media are downloaded. Then it'll be able to reconstruct a Twitter profile in front-end.

crawler exporter profile social-media sqlite twitter twitter-api

Last synced: 03 Jan 2025

https://github.com/kimi0230/pstocks

Python 爬股市

crawler numpy pandas python python3 stocks

Last synced: 15 Nov 2024

https://github.com/jenting/compare-drugstore-price

Compare price between cosmeceutical shops

cosmed crawler golang poya side-project watsons

Last synced: 05 Dec 2024

https://github.com/danielemoraschi/sitemap-common

Simple PHP Sitemap generator and crawler library.

crawler php php-library php-sitemap-generator sitemap

Last synced: 31 Dec 2024

https://github.com/danielemoraschi/sitemap-app

Sitemap generator command line application using dmoraschi/sitemap-common library

crawler php php-library sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 08 Jan 2025

https://github.com/hileix/jjxy-lib-search

图书馆书籍查询爬虫工具

crawler expressjs nodejs phantomjs request

Last synced: 28 Nov 2024

https://github.com/jackfsuia/chats-crawler

Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. 论坛数据爬取和解析,直接用于对话微调。

crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser

Last synced: 13 Jan 2025

https://github.com/flaribbit/pixiv-favorites-list

爬取P站收藏夹保存为json格式

crawler pixiv python

Last synced: 29 Nov 2024

https://github.com/jnbdz/xtamia-crawler

(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux

crawler electron foundation foundation-css javascript scraper vuejs xtamia

Last synced: 10 Jan 2025

https://github.com/timzatko/fiit-vinf-1

School project - data crawling, storing using ElasticSearch and visualisation.

angular crawler elasticsearch

Last synced: 16 Dec 2024

https://github.com/kofj/octopus

Octopus an open source software to collect data from web pages.

crawler

Last synced: 28 Nov 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 28 Dec 2024