Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 05 Jan 2025

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 08 Jan 2025

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/maxbubblegum47/spotydump

Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!

crawler dump genius lyrics python spotify unimore-informatica

Last synced: 28 Jan 2025

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 15 Jan 2025

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 24 Jan 2025

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 11 Jan 2025

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 07 Nov 2024

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 02 Jan 2025

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 19 Dec 2024

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 16 Dec 2024

https://github.com/idanhoro/nasa-heat-maps-prediction

In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms

beautifulsoup crawler machine-learning pillow prediction python sklearn

Last synced: 20 Jan 2025

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 25 Dec 2024

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 19 Jan 2025

https://github.com/epigos/newsbot

A news bot written in Go for Dialogflow and Facebook messenger

autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed

Last synced: 27 Jan 2025

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 19 Jan 2025

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 09 Dec 2024

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 28 Jan 2025

https://github.com/highbreed/web-crawler

A web crawler script that crawls the target website and lists its links

crawler crawling python3

Last synced: 13 Jan 2025

https://github.com/ctf-archives/live-photo-crawler

实时图床的图像爬取脚本

crawler pailixiang photoplus

Last synced: 29 Jan 2025

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 17 Dec 2024

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 23 Jan 2025

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 14 Dec 2024

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 26 Jan 2025

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 07 Jan 2025

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 11 Jan 2025

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 07 Jan 2025

https://github.com/nueip/curl

NUEiP Curl Lib

crawler php

Last synced: 24 Nov 2024

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 23 Dec 2024

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 10 Jan 2025

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Dec 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/qiaocco/crawler

爬虫:百度贴吧、今日头条(阳光宽频网)、笔趣阁

crawler python

Last synced: 05 Dec 2024

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 27 Jan 2025

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 09 Jan 2025

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 28 Oct 2024

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 06 Dec 2024

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 27 Jan 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 09 Dec 2024

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/fa7ad/aiub-notes-dl

Download all notes from AIUB's portal

aiub beautifulsoup4 crawler

Last synced: 24 Oct 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 31 Dec 2024

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 25 Dec 2024

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 25 Dec 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 08 Jan 2025

https://github.com/henkman/crawlers

:squirrel: some crawlers and downloaders

crawler

Last synced: 16 Jan 2025

https://github.com/alatiera/ellinofreneia-crawler

Crawler of ellinofreneianet.gr for offline content consumption

crawler ellinofreneia

Last synced: 01 Jan 2025

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 26 Dec 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 03 Jan 2025

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 25 Dec 2024

https://github.com/developerjosh/gogo-crawler

The tool kit for making an anime website with a database full of anime

crawler crawler-js gogoanime gogoanime-api gogoanime-scraper

Last synced: 17 Jan 2025

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 01 Jan 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 06 Dec 2024

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 25 Jan 2025

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 22 Jan 2025

https://github.com/excaliburhan/littlenews

A news app via electron

crawler electron rss-feed

Last synced: 28 Jan 2025

https://github.com/christopher-besch/therapy_search

Compute Call Times from arztsuche-bw into a Calendar.

appointments calendar crawler gatsby therapy time-management typescript

Last synced: 28 Dec 2024

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 30 Jan 2025

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 06 Dec 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Jan 2025

https://github.com/0xpr03/clantool

CF Management & Data Analysis Tool, crawler backend in rust

backend-server crawler data-analysis rust

Last synced: 02 Jan 2025

https://github.com/arghyadipchak/craww

Gemini (protocol) crawler written in Rust

crawler gemini gemini-protocol rust

Last synced: 04 Jan 2025

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 13 Dec 2024

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 22 Jan 2025

https://github.com/codelegant/movie-crawler-api

淘宝,猫眼,格瓦拉影票信息抓取接口

async await crawler mongoose request

Last synced: 18 Dec 2024

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 10 Jan 2025

https://github.com/richecr/pyhltv

Repository to extract information from the HLTV website.

crawler csgo hacktoberfest hltv hltv-api python3

Last synced: 20 Jan 2025

https://github.com/anjackson/scrapy-url-frontier

A Scrapy module for URL Frontier integration

crawler frontier scrapy spider

Last synced: 05 Jan 2025

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Jan 2025

https://github.com/laurybueno/monibus

API de monitoramento de ônibus em São Paulo

api crawler django docker mapping sptrans

Last synced: 27 Jan 2025

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/raphaelalmeidamartins/python-tech-news

Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course

crawler crawler-python data-science pytest python

Last synced: 18 Jan 2025

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 17 Dec 2024

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 22 Jan 2025

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 16 Jan 2025

https://github.com/captain-woof/zhi-zhu

Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.

crawler crawler-python crawling-python python3

Last synced: 31 Dec 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 08 Dec 2024