Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/tech-espm/misc-webbot

This project is aimed on creating personal assistants for replying messages about specifics issues.

classification-model crawler nlp

Last synced: 12 Nov 2024

https://github.com/tisfeng/bing-dict

A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.

bing-dictionary command-line crawler nodejs

Last synced: 09 Nov 2024

https://github.com/pxlrbt/website-diff

Utility tool that bundles a crawler and BackstopJS for visual regression testing.

backstopjs crawler visual-regression-testing

Last synced: 07 Oct 2024

https://github.com/datvodinh/laptop-price-prediction

An End to End Data Science Project about Laptop Price Prediction

crawler ensemble-learning scrapy selenium xgboost

Last synced: 17 Nov 2024

https://github.com/keizerzilla/ssh-hunter

Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).

crawler raspberry-pi ssh

Last synced: 05 Nov 2024

https://github.com/nagilum/focus

Simple CLI tool, written in C#, to crawl a site and log the responses.

cli crawl crawler csharp playwright

Last synced: 16 Nov 2024

https://github.com/tri613/nespresso

A mobile version for nespresso coffee website :coffee:

crawler nespresso node-js

Last synced: 09 Nov 2024

https://github.com/georgynet/crawler

Web Crawler

crawler go golang web-crawler

Last synced: 09 Nov 2024

https://github.com/iamtonmoy0/sitemap-crawler

site map crawler with golang and goquery

crawler

Last synced: 09 Nov 2024

https://github.com/lesterrry/mutt

More Usable Time Tracker

crawler ios-calendar parser

Last synced: 10 Nov 2024

https://github.com/lesterrry/campfire

Shock-drop watching utility

crawler parser web-crawler web-parser

Last synced: 10 Nov 2024

https://github.com/keizerzilla/search4dwango9

My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8

crawler datamining doom-wad

Last synced: 05 Nov 2024

https://github.com/n3d1117/sisop17

Esercizio per esame di Sistemi Operativi - 2017

crawler html java parser semaphores synchronization thread-safety threading

Last synced: 31 Oct 2024

https://github.com/filipsedivy/tachometer-check

🚘 MDČR - kontrola tachometru

crawler czech-republic mdcr

Last synced: 05 Nov 2024

https://github.com/aweirddev/air-web

A lightweight package for crawling the web with the minimalist of code.

crawl crawler markdown scrape scraper web

Last synced: 09 Nov 2024

https://github.com/webdevcave/directory-crawler-php

Directory Crawler PHP is a simple PHP library for recursively crawling through directories and listing files and directories.

crawler crawling directory path php php-library

Last synced: 09 Nov 2024

https://github.com/artemnikitin/crawler

Example of web crawler implemented in Go

crawler go golang

Last synced: 10 Nov 2024

https://github.com/fredcodee/pexel.com-image-scrapper

download images from pexel.com

crawler image python selenium

Last synced: 10 Nov 2024

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 09 Nov 2024

https://github.com/sbstjn/tatort

Query information for upcoming Tatort shows

crawler node nodejs tatort

Last synced: 09 Nov 2024

https://github.com/tpeterw/summariser

summarizer for pdf and text based uploads

crawler hackathon nlp node nodejs python

Last synced: 10 Nov 2024

https://github.com/eklem/vinmonopolet-crawler

Crawling Vinmonopolet-data and indexing it to a norch search index

crawler dataset javascript norch search-engine

Last synced: 15 Oct 2024

https://github.com/raspi/scrapy-kuntavaalit2021-keskisuomalainen

Fetch Keskisuomalainen kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-vgmusic

Crawler for vgmusic web site

crawler game midi music python scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-transcend

Crawler for transcend (us.transcend-info.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-amp

Crawler for Amiga Music Preservation (AMP) site

amiga crawler mod module music python s3m scrapy spider tracker

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-sanoma

Fetch Sanoma kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-almamedia

Fetch Almamedia kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-corsair

Web crawler for Corsair (corsair.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/raspi/scrapy-crucial

Web crawler for Crucial (crucial.com)

crawler hardware memory scrapy spider

Last synced: 10 Nov 2024

https://github.com/zhanziyuan/webdownloader

Download elements from the specified website.

crawler downloader image image-downloader python python-crawler web

Last synced: 10 Nov 2024

https://github.com/tetreum/puppeteer-for-crawling

Daily use crawling methods for puppeteer

crawler crawling puppeteer

Last synced: 21 Oct 2024

https://github.com/berecat/selenium_facebook_scraper

A simple python3 script used to download a users's friend list from facebook.

automation crawler facebook facebook-scraper webscraper

Last synced: 11 Nov 2024

https://github.com/arman-aminian/divar-text-exploring

The first practice of Dr. Asgari's NLP lesson - Data Exploration

crawler natural-language-processing nlp preprocessing scrapy

Last synced: 11 Nov 2024

https://github.com/guilhem/cachanais

Populate cache by crawling pages

cache crawler hacktoberfest

Last synced: 04 Nov 2024

https://github.com/ekojs/web-crawler

Web Crawler untuk mengambil judul penelitian pada Google Scholar

crawler nodejs web-crawler

Last synced: 11 Nov 2024

https://github.com/fengzixu/crawlinganything

如果你对数据有兴趣,那么就应该立即行动起来

crawler python

Last synced: 11 Nov 2024

https://github.com/earelin/jwraith

A Java clone of the Wraith website comparison tool.

crawler screenshots screenshots-comparison selenium webtest

Last synced: 31 Oct 2024

https://github.com/andrefs/derzis

A path-aware distributed linked data crawler

crawler linked-data

Last synced: 11 Nov 2024

https://github.com/gabrielolobo/crawley

This project is designed to run crawlers and process the results based on the specified output format. It takes command-line arguments to select the crawler and output format.

crawler poetry python scrapping

Last synced: 12 Nov 2024

https://github.com/ggteixeira/corpus-cleaner

Linguistic tool (made by a linguist, for linguists) that scraps corpora, automatically cleans it up, and generates n-grams.

beautifulsoup4 bs4 corpora corpus corpus-linguistics crawler linguistics nlp python scraper web-scraping

Last synced: 12 Nov 2024

https://github.com/zahraarshia/cti_crawl

This cyber threat intelligence crawler can be used to gather information from various sources, including open-source and commercial feeds.

crawler cti cyber-news-bot cyber-threat-intelligence mongodb python scrapy sqlite3 web-scraper

Last synced: 11 Nov 2024

https://github.com/r3c0ger/douban-movie-top250-crawler

Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.

beautifulsoup4 crawler lxml python3 spider

Last synced: 11 Nov 2024

https://github.com/xoraus/revieworacle

The proposed system assists users in deciding which product to buy. It gathers reviews along with the details from multiple websites, which sell the product. Other than that the system is trained to analyze the polarity of the product.

ai crawler datascience machinelearning scrappy selenium-webdriver

Last synced: 14 Nov 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Nov 2024

https://github.com/dominikrys/web-scraper

🎬 IMDB Web Scraper in Go

crawler go mongodb

Last synced: 11 Nov 2024

https://github.com/agucova/needs-seeding

🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.

crawler sci-hub torrents

Last synced: 11 Nov 2024

https://github.com/kasperomari/simplecrawlerapi

A simple RESTful API that takes a URL and returns all the links in a specific depth.

crawler flask-api flask-restful

Last synced: 27 Oct 2024

https://github.com/rsheremeta/web-crawler

A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output

crawler go golang web-crawler webcrawler

Last synced: 11 Nov 2024

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 05 Nov 2024

https://github.com/shentengtu/cht-yp-crawler

Simple Crawler of www.iyp.com.tw.

crawler node-js nodejs yellow-pages yellowpages

Last synced: 12 Nov 2024

https://github.com/billy0402/tibame-python-data-analysis

A learning project from TibaMe Python data analysis course.

ai course crawler jupyter-notebook matplotlib pandas python requests

Last synced: 14 Nov 2024

https://github.com/athulmurali/flickr-api-docs-crawler

A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus

api beautifulsoup4 crawler documentation python3

Last synced: 11 Nov 2024

https://github.com/tryagi/firecrawl

Generated C# SDK based on official Firecrawl OpenAPI specification

ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk

Last synced: 14 Oct 2024

https://github.com/abhijeetps/noddler

Web Crawler build using NodeJS

cheerio crawler csv nodejs

Last synced: 27 Oct 2024

https://github.com/ryu1kn/procedural-page-crawler

Page Crawler. Tell it where to go and what to look for.

crawler npm-package scraper

Last synced: 20 Oct 2024

https://github.com/semoal/pythoncrawler

Python crawler with XMLRPC & BeautifulSoap

beautifulsoup crawler python wordpress xmlrpc

Last synced: 28 Oct 2024

https://github.com/guanbinrui/img-crawler

A image crawler.

crawler

Last synced: 26 Oct 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 07 Nov 2024

https://github.com/sandrewtx08/gearbest_scraper

Seeks catalog ads from Gearbest web page, scraping catalogs information then it's storing by a sequence of SQL commands through a relational database.

crawler gearbest lxml python scraper scraping sqlite3

Last synced: 11 Nov 2024

https://github.com/notreeceharris/webstalker

🕸 A Powerful Relational Web Crawler

crawler datamining webcrawler

Last synced: 14 Nov 2024

https://github.com/zigai/crawlwright

Web crawling framework powered by Playwright

crawler crawling playwright python scraping wrighter

Last synced: 18 Oct 2024

https://github.com/sanhphanvan96/php-training-crawler

Simple php crawler for training purpose

crawler docker docker-compose nginx php php-fpm

Last synced: 11 Nov 2024

https://github.com/aristotelesbr/api_quotes

Project test for job.

crawler mongodb rails5

Last synced: 16 Nov 2024

https://github.com/coding-dream/aspider

A spider run on Android Platform

crawler jsoup spider

Last synced: 12 Nov 2024

https://github.com/hvtuananh/twitter_crawler

Daemon to call and get tweets from Twitter Public Stream API

crawler java streaming-api tweets twitter twitter-crawler

Last synced: 23 Oct 2024

https://github.com/discountry/crawler-microservice

crawler microservice

crawler

Last synced: 27 Oct 2024

https://github.com/estavadormir/scrappist

A web scrapper that takes an URL/URLs and converts into a PDF.

bun cli crawler pdf-generation

Last synced: 12 Nov 2024

https://github.com/ma-pony/playwright-spider-utils

Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.

crawl crawler playwright python scrapy selenium spider spiderman

Last synced: 09 Oct 2024

https://github.com/limdongjin/bill-scraper

Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러

crawler python scraper

Last synced: 12 Nov 2024

https://github.com/ilovebacteria/digikala-api

This python package requests to Digikala API and gets a product detail.

crawler digikala pypi

Last synced: 14 Nov 2024

https://github.com/huakunshen/cron-crawler-template

Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.

crawler github-actions python

Last synced: 16 Nov 2024

https://github.com/serge45/pytwgasprices

APIs to fetch the latest Taiwan gas prices

crawler gas price python taiwan

Last synced: 14 Nov 2024

https://github.com/sssshefer/web-crawler-http

Basic web crawler which represents the linking structure of the website

crawler jest jest-tests js

Last synced: 13 Nov 2024

https://github.com/viko16/hatcher

🐣[WIP] Provides APIs by simple configuration.

api api-server cli crawler koa-middleware nodejs spider

Last synced: 01 Oct 2024

https://github.com/devindon/movie-crawler

Movie crawler for douban.com, pianku.tv, etc.

crawler nodejs typescript

Last synced: 16 Oct 2024

https://github.com/kenanbek/tutorial-python-crawler

Crawling website data using Python with requests and Beautiful Soup libraries

beautifulsoup crawler crawling miner parser python python-requests requests

Last synced: 23 Oct 2024

https://github.com/appliedsoul/headless-screenshot

High-level library for taking screenshot of websites based on headless chrome (puppeteer)

crawler headless-chromium javascript nodejs scrapper screenshot testing

Last synced: 18 Nov 2024

https://github.com/landrisek/contentbot

Create simple content (discussion posts and products description) from previously used data or crawl them from public data.

content crawler golang php php72

Last synced: 13 Nov 2024

https://github.com/willi-dev/dtcapp

dtcapp : distributed twitter crawler.

crawler distributed-systems hazelcast java twitter twitter-api

Last synced: 14 Nov 2024

https://github.com/shamsher31/crawler

Simple site crawler that extracts all the URL links from the given website

crawler

Last synced: 13 Nov 2024

https://github.com/intina47/ee_error

implementation of a web crawler using c++

cpp crawler curl gumbo libcurl stanford-nlp web

Last synced: 15 Oct 2024

https://github.com/beckkramer/puppeteer-traverse

Puppeteer utility to easily run a function you define per route on a set of routes.

crawler crawling nodejs puppeteer

Last synced: 18 Nov 2024

https://github.com/cak/foot

Foot is a library that fetches a list of URLs and silly walks through each site to gather information.

bugbounty crawler scraping

Last synced: 14 Nov 2024

https://github.com/licoy/win4000-images-crawler

基于scrapy爬取&下载win4000.com的图片壁纸

crawler python scraper

Last synced: 19 Oct 2024

https://github.com/yosh1/mio-crawler

A crawler that acquires data usage of iijmio .

crawler iijmio mio ruby

Last synced: 13 Nov 2024