Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/pawod/gis-berlin-rents

A web crawler for ImmobilienScout24.de, that has been implemented for a small project at the institue of geographic sciences of the Free University of Berlin.

apartment-rents berlin crawler gis immobilienscout24

Last synced: 04 Nov 2024

https://github.com/igeligel/BackpackLogin

:arrow_forward: A .NET core library to handle the login to Backpack.tf. Backpack.tf is a trading site for Team Fortress 2, Counter-Strike: Global Offensive, and Dota 2. Community item pricing, item trading and stats, and much more.

bot bot-framework crawler csgo csgo-bot steam steam-api steambot steamweb teamfortress2

Last synced: 02 Aug 2024

https://github.com/aurelius84/pycrawler

A flexible spider based on mysql

crawler etl mysql scrapy spider

Last synced: 06 Nov 2024

https://github.com/windfarer/biu

biubiubiu~~ I'm a tiny web crawler framework

crawler python spider spider-framework web-crawler

Last synced: 28 Oct 2024

https://github.com/nakabonne/webcrawlerforserps

Web crawler that scrapes Google search results

cli crawler golang

Last synced: 24 Oct 2024

https://github.com/bfwg/node-tinycrawler

Tiny web-crawler in a nute shell for Node.js

crawler nodejs redis

Last synced: 11 Oct 2024

https://github.com/softmarshmallow/inked-news-crawler

🕷 korean news source crawler (realtime & bulk)

crawler naver-news python3 scrapy

Last synced: 11 Oct 2024

https://github.com/omilab/internet-archive-link-extractor

Tool for extracting external links of a URL from Internet Archive snapshots

crawler internetarchive

Last synced: 07 Aug 2024

https://github.com/the1812/bingwallpapers

A tool for downloading wallpapers from Bing.

crawler csharp wpf

Last synced: 04 Nov 2024

https://github.com/khaleddallah/LinkedinScraper

Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file

crawler excel json linkedin python scraper scrapy spider

Last synced: 05 Nov 2024

https://github.com/thesp0nge/nightcrawler

A python program that crawls a website and tries to stress it, polluting forms with bogus data

crawler offensive-scripts offensive-security stress-test web-crawler web-crawling

Last synced: 12 Oct 2024

https://github.com/amirhoseinsb/Cloud_Player_V2

You can use the cloudplayer tool to listen to the music of the singer you want without going to a specific website and at a very high speed.

cloud-player crawler crawling music music-player programming python url-player

Last synced: 04 Aug 2024

https://github.com/fedebotu/neurips2022-openreviewdata

Crawl & Visualize NeurIPS 2022 Data from OpenReview

crawler dataset neurips neurips-2022 openreview peer-review review scraper

Last synced: 06 Nov 2024

https://github.com/yerkopalma/bash-crawler

:computer: Get a site links with bash

bash crawler

Last synced: 13 Oct 2024

https://github.com/eric2788/platformscrawler

多平台爬蟲 + 模塊化管理,用於搜集資料並經 redis pubsub 發送

bilibili crawler crawling pubsub redis twitter youtube

Last synced: 11 Oct 2024

https://github.com/Antosser/web-crawler

Rust Web Crawler that finds every page, image, and script on a website (and downloads it)

crawler html rust seo web

Last synced: 24 Sep 2024

https://github.com/bugfishtm/bugfish-image-downloader

💾 Bugfish Image Downloader: Effortless web image downloads, subsite exploration, and HD selection. Windows app, .NET 4.5, no registry usage. Download now!

bugfish bugfish-software bugfishtm crawler downloader downloadmanager downloadtool gplv3 image imagedownloader imagedownloadertool imageprocessing portable-executable portableapps software utilityapp webscraping windows windows-desktop

Last synced: 06 Nov 2024

https://github.com/tikazyq/github-crawler

Github repositories crawler

crawler scrapy

Last synced: 11 Oct 2024

https://github.com/AmirAref/DivarCrawler

an script to crawl divar.ir and extract phone numbers

crawler scraper selenium

Last synced: 05 Aug 2024

https://github.com/oscarnevarezleal/ecommerce-crawler

Parallel ecommerce crawler using Docker and Puppeter on GCP

crawler gcp nodejs pubnub puppeteer

Last synced: 09 Aug 2024

https://github.com/integralist/go-web-crawler

A web crawler built in the Go programming language

concurrency crawler go golang web-crawler

Last synced: 11 Oct 2024

https://github.com/Hound-fm/podcatcher

Audio media crawler for lbry.

crawler lbry python

Last synced: 03 Aug 2024

https://github.com/AmirAref/Torobot

an inline telegram robot to easy access and search in torob.com products from telegram.

crawler python python-telegram-bot scraper telegtam-bot

Last synced: 05 Aug 2024

https://github.com/dynesshely/everydaynews

A repo fetched most of news and infomation, where stored and organized them.

crawler data fetcher network news

Last synced: 05 Nov 2024

https://github.com/gabfl/sitecrawl

Simple Python module to crawl a website and extract URLs

crawl crawler crawler-python crawling-sites

Last synced: 13 Oct 2024

https://github.com/feedeo/youtube-channel-crawler

YouTube Channel :tv: Crawler

crawler youtube youtube-channel

Last synced: 11 Oct 2024

https://github.com/luizppa/web-crawler

A web crawler that collects and indexes web pages. Made with chilkat and gumbo parser.

chilkat cpp crawler webcrawler

Last synced: 28 Oct 2024

https://github.com/dotenorio/freeloader-of-data

A simple crawler or scraper to get open graph and other meta data from any website.

crawler graph hacktoberfest meta-data open-graph scraper

Last synced: 25 Oct 2024

https://github.com/capturr/jsonld-extract

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDom ...).

cashdom cheerio crawler crawling data extract extractor javascript jquery json jsonld metadata nodejs parser scraper scraping spider typescript

Last synced: 28 Oct 2024

https://github.com/ajcerejeira/base.gov.pt

A crawler that fetches data from base.gov.pt

crawler csv python scrapy

Last synced: 06 Nov 2024

https://github.com/karambir/ugc-colleges

Python Script to extract college names from UGC, India website.

college crawler extract html-parser python python-script ugc

Last synced: 24 Oct 2024

https://github.com/jean-baptiste-camps/iiif-crawler

Interrogate IIIF servers and get images of manuscripts

crawler iiif iiif-image manuscripts

Last synced: 11 Oct 2024

https://github.com/simin75simin/libgencrawl

crawl all books from a library genesis search

crawler free-software libgen python3 scraper

Last synced: 05 Nov 2024

https://github.com/sayakie/pixiv-crawler

Crawls images from Pixiv 🚀

crawler nodejs pixiv typescript

Last synced: 28 Oct 2024

https://github.com/kernelerr/pixivsync

Pixiv图片下载及同步工具

crawler pixiv pixiv-crawler python

Last synced: 12 Oct 2024

https://github.com/iml1111/toonkor_collector

툰코 만화 수집기

crawler python

Last synced: 21 Oct 2024

https://github.com/vmdang/historycrawler

The OOP project collects historical data in Vietnam and displays

crawler gson java javafx jsoup

Last synced: 11 Oct 2024

https://github.com/roccomuso/is-bing

Verify that a request is from Bing crawlers using Bing's DNS verification steps

bing bot check crawler dns ip js nodejs verify

Last synced: 17 Oct 2024

https://github.com/vinitkumar/pycrawler

Crawler in Python 3.7, 3.8. 3.9. Pypy3

crawler python python35 python36 utils

Last synced: 28 Oct 2024

https://github.com/haxzie-xx/crode.js-node-web-crawler

Node.js Crawler built for open FTP sites for movie link collection.

crawler nodejs

Last synced: 01 Nov 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 08 Oct 2024

https://github.com/juliandavidmr/raptor

Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.

crawler kotlin mysql spider

Last synced: 11 Oct 2024

https://github.com/ernesto-jimenez/crawler

Easily crawl websites in Go.

crawler golang

Last synced: 13 Oct 2024

https://github.com/dist1ll/hltv-rust

A client to fetch and parse data from HLTV.org

api crawler hltv parser rust

Last synced: 14 Oct 2024

https://github.com/licoy/java-crawler

通过java使用jsoup爬虫框架爬取数据

crawler java jsoup

Last synced: 19 Oct 2024

https://github.com/danielmorell/se_bot_checker

Validate search engine user agents and IP addresses.

crawler googlebot python search-engine spider

Last synced: 15 Oct 2024

https://github.com/vinouno/BilibiliDanmuCrawler

一个从 bilibili.com 爬取弹幕并生成词云的 Python 项目

crawler python

Last synced: 27 Oct 2024

https://github.com/leomaurodesenv/smm-course-search

A package to searching courses - Super Mario Maker

bookmark-site crawler javascript json mario-game mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/holmofy/spring-spider

Spring Spider App Utility Library.

crawler java spider spring spring-spider

Last synced: 27 Oct 2024

https://github.com/stopka/fedicrawl

Collect feeds to follow on Fediverse nodes.

crawler docker fediverse nodejs prisma typescript

Last synced: 05 Nov 2024

https://github.com/marzzzello/appstore_crawler

(mirror) download the IDs and metadata of all apps in the apple appstore

apple appstore crawler metadata scrapy

Last synced: 05 Nov 2024

https://github.com/agmmnn/nis-scraper

Scrapy script to scrape nisanyansozluk.com

cli crawler python scraper

Last synced: 04 Nov 2024

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 02 Aug 2024

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 10 Oct 2024

https://github.com/jmkim/stock-crawler

Universal Stock Crawler

crawler stock stock-market yahoo-finance

Last synced: 13 Oct 2024

https://github.com/ruedigervoigt/salted

Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files

asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python

Last synced: 11 Oct 2024

https://github.com/ozansz/github-crawler

A basic utility for crawling users and e-mails of users

crawler github python python3

Last synced: 16 Oct 2024

https://github.com/rodyherrera/cdrake-se

✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).

bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube

Last synced: 06 Nov 2024

https://github.com/qin2dim/istockphoto-go

📸 Gracefully download dataset from iStockPhoto.

colly crawler istockphoto

Last synced: 31 Oct 2024

https://github.com/roccomuso/is-baidu

Verify that a request is from Baidu crawlers using DNS verification

baidu crawler dns ip js nodejs verification

Last synced: 17 Oct 2024

https://github.com/thaddeusjiang/campcat

キャンプ場予約情報監視 Bot

bot crawler telegram

Last synced: 25 Oct 2024

https://github.com/roccomuso/is-duckduck

Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo

crawler duckduck duckduckbot duckduckgo ip js nodejs verify web

Last synced: 17 Oct 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/elliotxx/readnewspaper

自动获取电子版报纸,方便每天阅读

crawler lxml newspaper pypdf2 python requests

Last synced: 06 Nov 2024

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 26 Oct 2024

https://github.com/xdk78/grabbi

grabbi a simple web scraper/crawler

crawler html scraper web-scraper

Last synced: 23 Oct 2024

https://github.com/arshadkazmi42/github-scanner-local

Locally scan all the repositories of a github organization

bounty bug bug-bounty crawler github local no-api scanner

Last synced: 28 Oct 2024

https://github.com/arshadkazmi42/scraplink

Scraplink library, for scraping links and images url from a webpage

crawler mongdb nodejs scraplink url web

Last synced: 28 Oct 2024

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 05 Nov 2024

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.

crawler spider web

Last synced: 28 Oct 2024

https://github.com/igeligel/TeamFortressOutpostApi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 02 Aug 2024

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/hangyan/generate-cs-word-dict

Generate a word dict for CS from stackoverflow/github tags

crawler dict github python word

Last synced: 15 Oct 2024

https://github.com/chenmozhijin/mediawikiextractor

一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.

crawler crawler-python crawling extractor json mediawiki python regex web-crawler

Last synced: 09 Oct 2024

https://github.com/dylanhogg/legaldata

Provides access to Australian legal data

crawler data law lawtech legal legaltech

Last synced: 27 Oct 2024

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 02 Aug 2024

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 14 Oct 2024

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 13 Oct 2024

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 30 Oct 2024

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 12 Oct 2024

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 31 Oct 2024

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 31 Oct 2024

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 10 Oct 2024

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 13 Oct 2024

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 31 Oct 2024

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 14 Oct 2024