Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/first-coding/django-and-web

This is a django and Web front - and back -end separation project.

crawler django python

Last synced: 28 Dec 2024

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 16 Dec 2024

https://github.com/carloocchiena/python_url_crawler

A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website

beautifulsoup crawler python python3

Last synced: 28 Nov 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 17 Jan 2025

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 11 Dec 2024

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST,添加自定义的钩子

crawler js-reverse

Last synced: 14 Nov 2024

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 21 Dec 2024

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 21 Jan 2025

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 17 Dec 2024

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 16 Jan 2025

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 01 Jan 2025

https://github.com/jiannei/github-trending

Github trending crawling based on lumen.

crawler github-trending lumen php

Last synced: 09 Nov 2024

https://github.com/maxbubblegum47/spotydump

Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!

crawler dump genius lyrics python spotify unimore-informatica

Last synced: 29 Nov 2024

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 07 Jan 2025

https://github.com/diogoazevedos/x-ray-build

A helper that build a x-ray based on a schema

crawler schema scraper structure x-ray

Last synced: 31 Dec 2024

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 15 Jan 2025

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 31 Dec 2024

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 18 Dec 2024

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 17 Jan 2025

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 07 Jan 2025

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 20 Dec 2024

https://github.com/tvrcgo/collect

数据采集

crawler scraper

Last synced: 19 Dec 2024

https://github.com/tbarnes94/fortnite-weapons-bot

A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.

crawler discord discord-bot discord-js typescript2

Last synced: 05 Dec 2024

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 05 Jan 2025

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 25 Dec 2024

https://github.com/spaceemotion/goodreads-browser

Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍

books crawler goodreads

Last synced: 26 Dec 2024

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Jan 2025

https://github.com/Anakeyn/website-contextual-links

Récupération des liens contextuels d'un site Web avec R.

crawler gephi r

Last synced: 24 Nov 2024

https://github.com/ewertoncodes/mind-crawler

A simple api written in Rails to extract quotations from the Quotes to Scrape site.

crawler ruby ruby-on-rails

Last synced: 23 Jan 2025

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 26 Dec 2024

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 24 Jan 2025

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 18 Nov 2024

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 10 Jan 2025

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 09 Dec 2024

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 13 Jan 2025

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 03 Jan 2025

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 17 Dec 2024

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Dec 2024

https://github.com/keosariel/ramby

Ramby is a simple way to setup a webscraper

beautifulsoup crawler python3 webscraping

Last synced: 06 Dec 2024

https://github.com/qiaocco/crawler

爬虫:百度贴吧、今日头条(阳光宽频网)、笔趣阁

crawler python

Last synced: 05 Dec 2024

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 20 Jan 2025

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 03 Jan 2025

https://github.com/nextlevelshit/fick

Fucking Incredible Command line King. Add CLI flavour to any website you like to.

cli crawler

Last synced: 20 Jan 2025

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 17 Jan 2025

https://github.com/yidas/tw-stock-crawler-php

PHP Crawler for Taiwan Stock Data (台股資料爬蟲)

crawler stock taiwan taiwan-stock-information taiwan-stock-market

Last synced: 29 Oct 2024

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 19 Dec 2024

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 11 Jan 2025

https://github.com/foufou-exe/yspeed

Yspeed is a library that scrapes the Speedtest site

crawler python rich scraper scraping selenium selenium-python speedtest

Last synced: 08 Jan 2025

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 11 Jan 2025

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 25 Dec 2024

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 14 Jan 2025

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 21 Dec 2024

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 14 Dec 2024

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 12 Jan 2025

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 12 Jan 2025

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 14 Dec 2024

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 20 Oct 2024

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 06 Dec 2024

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 12 Jan 2025

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 19 Dec 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 19 Dec 2024

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 05 Jan 2025

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 24 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-yle

Fetch YLE kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 14 Dec 2024

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 13 Nov 2024

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 09 Jan 2025

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 19 Dec 2024

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 19 Jan 2025

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 19 Jan 2025

https://github.com/gabrielrf/bsbdf

Telegram Public Channel

crawler python telegram telegram-channel telegraph

Last synced: 13 Jan 2025

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

crawler multithreading osint python python3 requests scraper

Last synced: 09 Nov 2024

https://github.com/idanhoro/nasa-heat-maps-prediction

In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms

beautifulsoup crawler machine-learning pillow prediction python sklearn

Last synced: 20 Jan 2025

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 03 Jan 2025

https://github.com/highbreed/web-crawler

A web crawler script that crawls the target website and lists its links

crawler crawling python3

Last synced: 13 Jan 2025

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 08 Jan 2025

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 07 Nov 2024

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 31 Oct 2024

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 27 Dec 2024

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 16 Jan 2025