Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/hctilg/pinterest-crawler

Downloads all images suitable for search

crawler pinterest

Last synced: 07 Nov 2024

https://github.com/gatenlp/wpextract

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Last synced: 13 Nov 2024

https://github.com/ericz99/go-crawler

Simple lightweight crawler, that will find all endpoints on any website.

crawler golang

Last synced: 28 Jan 2025

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 28 Dec 2024

https://github.com/obaskly/kikfriender.com-bot

A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.

ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist

Last synced: 08 Jan 2025

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 14 Nov 2024

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 14 Nov 2024

https://github.com/vaibhavpandeyvpz/cbse-scraper

This script scrapes information about schools affiliated with CBSE for a given state.

cbse crawler data schools scraper

Last synced: 09 Nov 2024

https://github.com/archan937/webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing

Last synced: 10 Nov 2024

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 05 Nov 2024

https://github.com/YektaDev/Krawler

A configurable HTML Crawler written in Kotlin (JVM), powered by Coroutines, Kotlin Serialization (JSON), Ktor Client, Exposed, and SQLite.

crawl crawler crawlers crawling

Last synced: 06 Feb 2025

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 19 Nov 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/mrmarble/mineseek

Minecraft server scanner

crawler minecraft minecraft-server scanner slp

Last synced: 17 Jan 2025

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 25 Jan 2025

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 13 Jan 2025

https://github.com/rodyherrera/cdrake-se

✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).

bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube

Last synced: 25 Dec 2024

https://github.com/veasion/automation_testing

自动化测试框架(通过 js 脚本执行自动化测试)

automation crawler

Last synced: 22 Jan 2025

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 12 Feb 2025

https://github.com/krishpranav/spider

A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely

crawler ruby spider web-crawler web-scraping

Last synced: 01 Feb 2025

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 12 Feb 2025

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 14 Feb 2025

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 14 Feb 2025

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 10 Feb 2025

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 07 Feb 2025

https://github.com/skulltech/arachnid

Crawling Instagram for reasons.

crawler instagram instagram-scraper python3 scraper scrapy

Last synced: 01 Feb 2025

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 17 Jan 2025

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 12 Jan 2025

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 04 Feb 2025

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 12 Feb 2025

https://github.com/wangshouh/qzone_api

使用Python调用QQ空间公开接口获取信息

crawler python qzone requests

Last synced: 02 Feb 2025

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

crawler multithreading osint python python3 requests scraper

Last synced: 09 Nov 2024

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 03 Jan 2025

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 10 Jan 2025

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 25 Dec 2024

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 08 Jan 2025

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 18 Nov 2024

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 07 Jan 2025

https://github.com/first-coding/django-and-web

This is a django and Web front - and back -end separation project.

crawler django python

Last synced: 28 Dec 2024

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 24 Jan 2025

https://github.com/stevieflyer/quokka

An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.

crawler parallel web-automation

Last synced: 07 Feb 2025

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 07 Jan 2025

https://github.com/Anakeyn/website-contextual-links

Récupération des liens contextuels d'un site Web avec R.

crawler gephi r

Last synced: 24 Nov 2024

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 20 Jan 2025

https://github.com/nextlevelshit/fick

Fucking Incredible Command line King. Add CLI flavour to any website you like to.

cli crawler

Last synced: 20 Jan 2025

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 07 Jan 2025

https://github.com/wangshouh/icourse163_script

A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本

crawler icourse163 python requests

Last synced: 02 Feb 2025

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 09 Feb 2025

https://github.com/yuminn-k/crawling-tabelog

Crawling store information from tabelog

crawler python3

Last synced: 18 Jan 2025

https://github.com/carloocchiena/python_url_crawler

A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website

beautifulsoup crawler python python3

Last synced: 28 Nov 2024

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 14 Jan 2025

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 12 Feb 2025

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 21 Jan 2025

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 15 Jan 2025

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 02 Jan 2025

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 28 Jan 2025

https://github.com/basemax/fakefaces

This repository contains a crawler that downloads thousands of fake human face images from various sources on the internet. Additionally, the repository includes a dataset of thousands of face images of fake humans.

crawler crawler-php crawler-testing crawlers curl dataset datasets face face-fake faces fake-face fake-faces php php-curl

Last synced: 09 Feb 2025

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Jan 2025

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST,添加自定义的钩子

crawler js-reverse

Last synced: 14 Nov 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 21 Dec 2024

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 05 Feb 2025

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 13 Nov 2024

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 09 Jan 2025

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 29 Jan 2025

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 16 Jan 2025

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 11 Feb 2025

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 23 Jan 2025

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 13 Jan 2025

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 13 Feb 2025

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 19 Jan 2025

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 11 Nov 2024

https://github.com/ductnn/curls

Simple tool crawler URLs form domain

colly crawler domain golang scanning url

Last synced: 09 Feb 2025

https://github.com/litingyes/cobweb

Collect, store and distribute meaningful static data

apis bing-image bing-wallpapers crawler image random-image

Last synced: 05 Dec 2024

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 05 Jan 2025

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 26 Dec 2024

https://github.com/idanhoro/nasa-heat-maps-prediction

In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms

beautifulsoup crawler machine-learning pillow prediction python sklearn

Last synced: 20 Jan 2025

https://github.com/ozakboy/taiwan-news-crawlers

.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲,數據物件化,方便使用)

crawler data-collection dataset-generation dotnet news taiwan webcrawlers

Last synced: 22 Jan 2025

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 23 Dec 2024

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 07 Feb 2025

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 01 Feb 2025

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 03 Jan 2025

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 24 Dec 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 08 Feb 2025

https://github.com/vmandic/tris-web-crawler

Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.

crawler data-tools nodejs scraping seo-tools web-scraper

Last synced: 12 Feb 2025

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 13 Jan 2025

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 31 Dec 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 30 Jan 2025

https://github.com/afsh7n/crawly-automation

Crawly Automation is a lightweight, modular, and extensible web crawling framework built on top of Puppeteer. Whether you need to scrape data, automate browser interactions, manage CAPTCHAs, or handle advanced data extraction, Crawly Automation simplifies the process.

automation crawler nodejs puppeteer webscraping

Last synced: 07 Feb 2025

https://github.com/epigos/newsbot

A news bot written in Go for Dialogflow and Facebook messenger

autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed

Last synced: 27 Jan 2025

https://github.com/0fatal/zjxxc-crawl

在浙学爬虫:作业情况和登录

crawler

Last synced: 09 Feb 2025