Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 13 Oct 2024

https://github.com/arshadkazmi42/scraplink

Scraplink library, for scraping links and images url from a webpage

crawler mongdb nodejs scraplink url web

Last synced: 28 Oct 2024

https://github.com/arshadkazmi42/github-scanner-local

Locally scan all the repositories of a github organization

bounty bug bug-bounty crawler github local no-api scanner

Last synced: 28 Oct 2024

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 02 Aug 2024

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 02 Aug 2024

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 14 Oct 2024

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 05 Nov 2024

https://github.com/hctilg/pinterest-crawler

Downloads all images suitable for search

crawler pinterest

Last synced: 07 Nov 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 10 Oct 2024

https://github.com/yidas/tw-stock-crawler-php

PHP Crawler for Taiwan Stock Data (台股資料爬蟲)

crawler stock taiwan taiwan-stock-information taiwan-stock-market

Last synced: 29 Oct 2024

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 11 Oct 2024

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 11 Oct 2024

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 11 Oct 2024

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 11 Oct 2024

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 13 Oct 2024

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 26 Oct 2024

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 12 Oct 2024

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 17 Oct 2024

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 13 Oct 2024

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 14 Oct 2024

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 11 Oct 2024

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 29 Oct 2024

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 14 Oct 2024

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 29 Oct 2024

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 15 Oct 2024

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 31 Oct 2024

https://github.com/galaxiat/galaxiat.serve.seo

Node.JS package to serve React app and prerender path (cron)

crawler cron puppeteer seo seo-optimization ssr

Last synced: 05 Nov 2024

https://github.com/tbarnes94/fortnite-weapons-bot

A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.

crawler discord discord-bot discord-js typescript2

Last synced: 15 Oct 2024

https://github.com/xiantang/mini_scrapy

模仿scrapy的轻量级爬虫框架

crawler python3 requets scrapy

Last synced: 15 Oct 2024

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 20 Oct 2024

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 20 Oct 2024

https://github.com/gabrielrf/bsbdf

Telegram Public Channel

crawler python telegram telegram-channel telegraph

Last synced: 15 Oct 2024

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 31 Oct 2024

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 07 Nov 2024

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 21 Oct 2024

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 26 Oct 2024

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 27 Oct 2024

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 31 Oct 2024

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 31 Oct 2024

https://github.com/igaozp/jobwitcher

JobWitcher 招聘网站爬虫合集

crawler python3 redis scrapy spider

Last synced: 07 Nov 2024

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 27 Oct 2024

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 10 Oct 2024

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 05 Nov 2024

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 31 Oct 2024

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 30 Oct 2024

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 30 Oct 2024

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 27 Oct 2024

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 06 Nov 2024

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 27 Oct 2024

https://github.com/krishpranav/spider

A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely

crawler ruby spider web-crawler web-scraping

Last synced: 15 Oct 2024

https://github.com/benderpan/fakeagent.net

Fake Agent for .Net Standard.

agent crawler fake-agent http-headers

Last synced: 05 Nov 2024

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 06 Nov 2024

https://github.com/pjt3591oo/rust-exchange-crawler

rust 공부겸 만들어보는 크롤러

crawler rust

Last synced: 06 Nov 2024

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 06 Nov 2024

https://github.com/dimo414/pycrawl

Simple Python web crawler, primarily designed for inspecting and diagnosing your own website

crawler python

Last synced: 12 Oct 2024

https://github.com/skylightqp/namu2csv

A namuwiki crawler that converts header to csv file for kartrider wiki

crawler rust

Last synced: 19 Oct 2024

https://github.com/vitaee/laravelandcrawlers

php web crawler examples with oop concept and laravel project

crawler laravel php

Last synced: 06 Nov 2024

https://github.com/maxgio92/package-crawler

A package crawler for most known Linux distros

crawler go linux package

Last synced: 13 Oct 2024

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 05 Nov 2024

https://github.com/tanja-4732/od-get

A Rust tool for recursively crawling & downloading data from open directories

cli crawler open-directory open-directory-downloader rust

Last synced: 11 Oct 2024

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 05 Nov 2024

https://github.com/khadkarajesh/aptoide

Aptoide app crawler using beautifulsoup

beautifulsoup4 crawler flask python3 web-application

Last synced: 11 Oct 2024

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 11 Oct 2024

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 24 Oct 2024

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 17 Oct 2024

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 20 Oct 2024

https://github.com/dylanhogg/cloud-products

A package for getting cloud products and product descriptions from a cloud provider website.

aws cloud-products crawler data text-processing

Last synced: 27 Oct 2024

https://github.com/zephyrpersonal/github-trending-crawler

transform github-trending repos to json data

cheerio crawler fetch github node repository spider trending

Last synced: 14 Oct 2024

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 22 Oct 2024

https://github.com/yjg30737/pyqt-wikipedia-crawler

Crawling the Wikipedia with Python powered by BeautifulSoup4, Supporting GUI/CUI

beautifulsoup4 crawler pyqt pyqt5 wikipedia

Last synced: 22 Oct 2024

https://github.com/roccomuso/is-apple

Verify that a request is from Apple crawlers using DNS verification steps

apple bot crawler dns ip js nodejs

Last synced: 17 Oct 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 12 Oct 2024

https://github.com/phanikmr/linkcrawler

A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

async crawler linkcrawler parse python scrapy spider

Last synced: 14 Oct 2024

https://github.com/buren/site_health

Crawl a site and check various health indicators

crawler rubygem site-health

Last synced: 28 Oct 2024

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 28 Sep 2024

https://github.com/tranbavinhson/crawler

Crawler by Scrapy

crawler python scrapy

Last synced: 06 Nov 2024

https://github.com/e73b025/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 11 Oct 2024

https://github.com/mohabmes/matool

A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }

cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web

Last synced: 11 Oct 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 01 Nov 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 13 Oct 2024

https://github.com/ozansz/simple-web-downloader

A simple web page downloader program in C

c crawler curl libcurl web

Last synced: 16 Oct 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Oct 2024

https://github.com/mattmoony/webcrawler.py

A very simple python webcrawler. This is just a fun little side project, which I used to gather some valuable experience with advanced Python- and Web techniques. 🐍

beautifulsoup crawler indexing mongodb multithreading pymongo python spider web webcrawler

Last synced: 12 Oct 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 05 Nov 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 12 Oct 2024

https://github.com/dingpingzhang/papermedia

A scrapy-based crawler for crawling paper media.

crawler scrapy spider

Last synced: 04 Nov 2024

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024