Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/polakosz/smf-scraper

You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:

crawler csharp forum machines php scraper simple simplemachines smf

Last synced: 18 Dec 2024

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 31 Dec 2024

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 17 Dec 2024

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 26 Jan 2025

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Jan 2025

https://github.com/Anakeyn/website-contextual-links

Récupération des liens contextuels d'un site Web avec R.

crawler gephi r

Last synced: 24 Nov 2024

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 24 Jan 2025

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 18 Nov 2024

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 20 Oct 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 21 Dec 2024

https://github.com/manojahi/is-there-any-song-reference-in-article

It will tell if there are any songs references in article from a website.

crawler lyrics-search python webscraping

Last synced: 01 Jan 2025

https://github.com/zekrotja/r34-crawler

A simple CLI tool to fetch and download images from rule34.xxx

crawler go rest-api rule34 worker-pool xml

Last synced: 17 Dec 2024

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 19 Dec 2024

https://github.com/coverified/spider

A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)

akka crawler graphql hacktoberfest microservice spider

Last synced: 25 Dec 2024

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 14 Dec 2024

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 11 Dec 2024

https://github.com/yidas/tw-stock-crawler-php

PHP Crawler for Taiwan Stock Data (台股資料爬蟲)

crawler stock taiwan taiwan-stock-information taiwan-stock-market

Last synced: 29 Oct 2024

https://github.com/camara94/crawlers

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere

crawler python scraping scrapy spider

Last synced: 23 Dec 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 17 Jan 2025

https://github.com/mohammadrezaamani/squirrel

Squirrel is a web crawler designed to collect all pages from Iranian websites, enabling you to download and store web page content in a structured format.

crawler iran python

Last synced: 21 Dec 2024

https://github.com/ctf-archives/live-photo-crawler

实时图床的图像爬取脚本

crawler pailixiang photoplus

Last synced: 29 Jan 2025

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 21 Jan 2025

https://github.com/viclafouch/pe-crawler

📌 An automated system that serves data extracted from the Google Help Center

crawler javascript nodejs postgresql sequelize

Last synced: 29 Jan 2025

https://github.com/exp-codes/sina-crawler

新浪博客爬虫

crawler programming

Last synced: 16 Dec 2024

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 16 Dec 2024

https://github.com/nemmusu/free-vpn-downloader

This repository contains three Python scripts designed to simplify the process of downloading and configuring free VPN .ovpn files for use with OpenVPN.

automation crawler download downloader free freevpn openvpn ovpn ovpn-files vpn

Last synced: 30 Jan 2025

https://github.com/buaadreamer/buaastar

北航星球网站 北航2021年夏季学期Python英文课大作业

crawler css flask html javascript python

Last synced: 23 Jan 2025

https://github.com/leveled-up/memedl

Memedl is a very simple tool to download the latest images from a specific sub reddit.

crawler download extract images javascript meme memes node reddit regex rip

Last synced: 23 Dec 2024

https://github.com/skulltech/arachnid

Crawling Instagram for reasons.

crawler instagram instagram-scraper python3 scraper scrapy

Last synced: 01 Feb 2025

https://github.com/rdil/crawley

My attempt at a web crawler.

bs4 crawler python python3 web

Last synced: 04 Jan 2025

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 14 Dec 2024

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Dec 2024

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 27 Dec 2024

https://github.com/galaxiat/galaxiat.serve.seo

Node.JS package to serve React app and prerender path (cron)

crawler cron puppeteer seo seo-optimization ssr

Last synced: 23 Dec 2024

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 28 Jan 2025

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 20 Dec 2024

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 16 Jan 2025

https://github.com/keosariel/ramby

Ramby is a simple way to setup a webscraper

beautifulsoup crawler python3 webscraping

Last synced: 01 Feb 2025

https://github.com/gabrielrf/bsbdf

Telegram Public Channel

crawler python telegram telegram-channel telegraph

Last synced: 13 Jan 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 03 Feb 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 22 Jan 2025

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 23 Jan 2025

https://github.com/supadata-ai/js

Official TypeScript/JavaScript SDK for the Supadata API.

ai crawler llm markdown scraper transcript web-crawler youtube

Last synced: 27 Jan 2025

https://github.com/tubone24/askfm-qa-crawler

Crawl Ask.fm QA lists and create corpus for ML.

askfm chromedriver corpus-builder crawler selenium

Last synced: 25 Dec 2024

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 02 Feb 2025

https://github.com/lukasherz/22fs-sc-twitter-crawler

used for a research project in social computing @ uzh (fs22)

crawler crawling database twitter twitter-api-v2

Last synced: 25 Dec 2024

https://github.com/jovijovi/ether-crawler

A transaction crawler for the Ethereum ecosystem.

blockchain crawler ether ethereum transaction

Last synced: 16 Jan 2025

https://github.com/bingxyz/blackcat

使用telegram bot查詢黑貓物流

crawler nodejs telegram-bot

Last synced: 22 Jan 2025

https://github.com/priyakdey/github-api-crawler

A crawler to crawl and save the APIs found in the Public APIs github repo - https://github.com/public-apis/public-apis. Visit README for details.

api crawler mongo python3

Last synced: 02 Feb 2025

https://github.com/estroz/seekret

Seekret is a sensitive data crawler for GitHub repositories

crawler security

Last synced: 25 Dec 2024

https://github.com/akashrajpurohit/node-crawler

Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain

crawler node-crawler nodejs url

Last synced: 25 Dec 2024

https://github.com/im-perativa/public_crawler

A collection of crawler project for Indonesia dataset

crawler indonesia indonesia-api scrapy

Last synced: 25 Jan 2025

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/vindecodex/automated-crawler-wget

Using wget to crawl site

crawler shell-script

Last synced: 01 Jan 2025

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 01 Feb 2025

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 18 Jan 2025

https://github.com/sefinek/niedlascamu.pl-tracker

Śledzenie zmian na stronie niedlascamu.pl.

crawl crawler niedlascamu niedlascamu-pl tracker tracking

Last synced: 02 Feb 2025

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 01 Feb 2025

https://github.com/vietdoo/sg-property-hub

SG Property Hub is a comprehensive platform for managing and analyzing property data.

airflow celery-redis crawler etl etl-pipeline fastapi minio mongodb nextjs postgresql s3 spark webscraping

Last synced: 13 Dec 2024

https://github.com/codelegant/movie-crawler-api

淘宝,猫眼,格瓦拉影票信息抓取接口

async await crawler mongoose request

Last synced: 18 Dec 2024

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 23 Jan 2025

https://github.com/beanwei/zmt-post-crawler

Crawler the ZMT platform site ,put the author id, get the post list.This project is coding for my friend

crawler golang golang-ui

Last synced: 28 Dec 2024

https://github.com/j-hoplin/naver_news_headtopic_news_scraper

네이버 뉴스에서 헤드라인 뉴스 스크레이핑

crawler naver-news scraper

Last synced: 05 Feb 2025

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 27 Jan 2025

https://github.com/somehowchris/swisslos-cralwer

(WIP) Crawler to access the current and history numbers of swisslos

crawler euromillions lotto rust swisslos

Last synced: 27 Jan 2025

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 29 Jan 2025

https://github.com/tcc0lin/magiccrawler

Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:

crawler python3 spider

Last synced: 18 Jan 2025

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 18 Jan 2025

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 12 Jan 2025

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 22 Jan 2025

https://github.com/andmerk93/scrapy_parser_pep

Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах

crawler scrapy

Last synced: 24 Jan 2025

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 16 Jan 2025

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 01 Feb 2025

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 18 Jan 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 01 Feb 2025

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 18 Jan 2025

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 12 Jan 2025