Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/arman2409/datafalcon

Web crawler

crawler extract-data

Last synced: 27 Oct 2024

https://github.com/sc0vu/gocrawl

Simple crawl for golang

crawler golang

Last synced: 14 Oct 2024

https://github.com/bockstaller/europarl-crawler

Crawler for the documents published by the European Parliament

crawler datamining elasticsearch europarl-crawler european european-parliament opendata parliament union

Last synced: 10 Nov 2024

https://github.com/jarircse16/bot_detection_firewall

Detects and Blocks generic crawlers from your website.

bot crawler php

Last synced: 08 Nov 2024

https://github.com/mnemocron/VPNNetworkShareCrawler

ugly scripts to connect a Raspberry Pi to a VPN and attach network share to periodically crawl the documents on it

crawler samba vpn

Last synced: 23 Oct 2024

https://github.com/frostming/daily-wallpaper

A small crawler to get wallpapers from Unsplash

crawler python requests unsplash wallpaper

Last synced: 13 Oct 2024

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 10 Nov 2024

https://github.com/cristiangreco/gcrawler

A simple (not concurrent) web crawler written in Java.

crawler java

Last synced: 05 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 10 Nov 2024

https://github.com/lin-jun-xiang/python-crawler

Using CloudScraper, Requests, API, Thread, Async... for scrape the data

async cloudscraper crawler multithreading python requests scraper selenium

Last synced: 03 Nov 2024

https://github.com/thamindur/ir-project

Search Engine for Sri Lankan MPs

crawler elasticsearch python scraping search-engine

Last synced: 29 Oct 2024

https://github.com/matheusfaustino/phrawl

Phrawl: A web crawling framework in PHP (or it seems so)

crawler crawling crawling-framework php scraper wip

Last synced: 07 Nov 2024

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 16 Nov 2024

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 11 Nov 2024

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 11 Nov 2024

https://github.com/briangershon/crawlee-playwright

Browser-based automations with Crawlee and Playwright using Vite tooling and TypeScript

crawlee crawler playwright starter-template typescript vite

Last synced: 02 Nov 2024

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 11 Nov 2024

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 11 Nov 2024

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 18 Nov 2024

https://github.com/wingkwong/daily_weather_temperature_in_hong_kong

Crawling daily weather temperature in Hong Kong

crawler hongkong python temperature

Last synced: 06 Nov 2024

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 18 Nov 2024

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 11 Nov 2024

https://github.com/kissaki/website-downloader

A website Crawler and downloader. Useful for archiving dynamic websites as static files.

archive crawler csharp download gpl website

Last synced: 26 Oct 2024

https://github.com/cseas/crawler

Recursive web crawler

crawler python seed-webpage

Last synced: 07 Nov 2024

https://github.com/yuchenq/comp90055-project

This is the lastest version of my project belong to Comp90055.

couchdb crawler data-visualization python3 textblob tweepy

Last synced: 18 Nov 2024

https://github.com/jannchie/go-probe

HTML and JSON data crawler based on Golang. Simple and fast, very easy to use.

collector crawler fetcher golang spider

Last synced: 05 Nov 2024

https://github.com/yyj08070631/web-spider

一个网络蜘蛛

crawler spider webspider

Last synced: 16 Oct 2024

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 12 Nov 2024

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 12 Nov 2024

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 11 Nov 2024

https://github.com/matheusfaustino/jazzmaster_crawler

It is a crawling for getting the audio programs from a specific radio program called Jazzmaster

crawler python scrapy

Last synced: 07 Nov 2024

https://github.com/kernelerr/pixivurls

An awesome tool to get Pixiv image URLs.

crawler downloader pixiv

Last synced: 12 Oct 2024

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 11 Nov 2024

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 14 Nov 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Nov 2024

https://github.com/shunk031/amebloscraper

Scraper for Ameblo in Scrapy

ameblo crawler scraper scrapy

Last synced: 12 Nov 2024

https://github.com/mikiw/reactweb3

Ethereum transaction crawler in ReactJs.

blockchain crawler ethereum

Last synced: 12 Nov 2024

https://github.com/jlenon7/sef_automation

📑 Crawler that automatically enrol in open vacancies in SEF website.

athenna crawler esm nodejs playwright portugal residence sef typescript

Last synced: 26 Oct 2024

https://github.com/jofaval/open-graph-visualizer

Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol

crawler javascript opengraph scraping web web-scraping

Last synced: 21 Oct 2024

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 15 Nov 2024

https://github.com/ecklf/reddit-clawler

A command-line tool written in Rust that crawls Reddit posts from a user or subreddit

cli crawler downloader downloader-for-reddit reddit

Last synced: 25 Oct 2024

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 07 Oct 2024

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 11 Nov 2024

https://github.com/danielvigaru/easyreach

crawler for faster amazon reach

amazon crawler python

Last synced: 08 Nov 2024