Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 20 Oct 2024

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 15 Oct 2024

https://github.com/e73b025/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 11 Nov 2024

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 06 Nov 2024

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 14 Nov 2024

https://github.com/mrmarble/mineseek

Minecraft server scanner

crawler minecraft minecraft-server scanner slp

Last synced: 17 Nov 2024

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 27 Oct 2024

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 27 Oct 2024

https://github.com/tbarnes94/fortnite-weapons-bot

A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.

crawler discord discord-bot discord-js typescript2

Last synced: 15 Oct 2024

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 06 Nov 2024

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 09 Nov 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 16 Nov 2024

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 15 Oct 2024

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 11 Nov 2024

https://github.com/benderpan/fakeagent.net

Fake Agent for .Net Standard.

agent crawler fake-agent http-headers

Last synced: 05 Nov 2024

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 07 Nov 2024

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 14 Oct 2024

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 31 Oct 2024

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 27 Oct 2024

https://github.com/galaxiat/galaxiat.serve.seo

Node.JS package to serve React app and prerender path (cron)

crawler cron puppeteer seo seo-optimization ssr

Last synced: 05 Nov 2024

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 31 Oct 2024

https://github.com/joshuaquek/docusite-to-pdf

Provide a URL and this will generate multiple PDF documents of the whole site within the bounds of the URL path. This code repo is for educational purposes only.

crawler documentation-generator html2pdf pdf pdf-converter pdf-document pdf-generation scraper

Last synced: 13 Nov 2024

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 29 Oct 2024

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 13 Oct 2024

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 11 Nov 2024

https://github.com/zenrows/crawling-from-scratch

Repository for the Mastering Web Scraping in Python: Crawling from Scratch blogpost with the final code.

crawler crawling python python3 scraping

Last synced: 16 Nov 2024

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 14 Nov 2024

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 29 Oct 2024

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 13 Nov 2024

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST,添加自定义的钩子

crawler js-reverse

Last synced: 14 Nov 2024

https://github.com/obaskly/kikfriender.com-bot

A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.

ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist

Last synced: 11 Nov 2024

https://github.com/jiannei/github-trending

Github trending crawling based on lumen.

crawler github-trending lumen php

Last synced: 09 Nov 2024

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 08 Nov 2024

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 14 Oct 2024

https://github.com/diogoazevedos/x-ray-build

A helper that build a x-ray based on a schema

crawler schema scraper structure x-ray

Last synced: 08 Nov 2024

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 09 Nov 2024

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 26 Oct 2024

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 31 Oct 2024

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 14 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-yle

Fetch YLE kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/panyanyany/vps_spider

VPS Spider powering https://findallvps.com

crawler spider vps

Last synced: 12 Nov 2024

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Nov 2024

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 06 Nov 2024

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 10 Nov 2024

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 17 Oct 2024

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 12 Oct 2024

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 06 Nov 2024

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 07 Nov 2024

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

crawler multithreading osint python python3 requests scraper

Last synced: 09 Nov 2024

https://github.com/krishpranav/spider

A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely

crawler ruby spider web-crawler web-scraping

Last synced: 15 Oct 2024

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 10 Nov 2024

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 13 Nov 2024

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 05 Nov 2024

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 13 Oct 2024

https://github.com/xcrypt0r/hyacinth

🌸 Dcinside image crawler with deadly simple structure

beautifulsoup4 crawler dcinside parsing pyqt5 pyside2

Last synced: 11 Nov 2024

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 31 Oct 2024

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/sean2077/leetcode_anki

Leetcode Anki card factory.

anki crawler leetcode leetcode-anki scrapy

Last synced: 12 Nov 2024

https://github.com/maraf/staticsitecrawler

A simple util for crawling links from root URL and saving HTML documents.

crawler static-site-generator

Last synced: 16 Nov 2024

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 30 Oct 2024

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 10 Oct 2024

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 30 Oct 2024

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 08 Nov 2024

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 13 Nov 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 04 Nov 2024

https://github.com/bkdev98/ebooks-crawler

Ebooks crawler for personal purpose using ReactJS.

crawler material-ui nodejs reactjs

Last synced: 08 Nov 2024

https://github.com/schbenedikt/web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

crawler database mariadb mysql python python-crawler web

Last synced: 08 Nov 2024

https://github.com/deptno/nsdi

㉿ nsdi downloader built on puppeteer

crawler downloader nsdi openapi puppeteer

Last synced: 08 Nov 2024

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Nov 2024

https://github.com/jorgeparavicini/medalytik-python

Python crawlers for a job mediation firm

crawler python scrapy

Last synced: 17 Oct 2024

https://github.com/hudson-newey/user-web-crawler

The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.

archive crawler open-internet

Last synced: 11 Nov 2024

https://github.com/droiddevgeeks/nodelearning

This is node learning demo. It has covered all basics of node.

crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign

Last synced: 13 Nov 2024

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 17 Oct 2024

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 16 Nov 2024

https://github.com/yordadev/fenrisjs

A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.

analysis crawler link-collection link-crawler nodejs nodejs-application

Last synced: 11 Nov 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 01 Nov 2024

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 12 Oct 2024

https://github.com/carloocchiena/python_url_crawler

A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website

beautifulsoup crawler python python3

Last synced: 14 Oct 2024

https://github.com/scrwdrv/siege-crawler

This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.

benchmark cli crawler ddos debug siege tool

Last synced: 12 Oct 2024

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 13 Oct 2024

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 01 Nov 2024

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 30 Oct 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 12 Oct 2024

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 01 Nov 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 12 Oct 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 13 Oct 2024

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 14 Oct 2024

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 14 Oct 2024