Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

https://github.com/pjt3591oo/exchange-crawler

업비트, 코인원 크롤러

crawler data exchange python

Last synced: 26 Dec 2024

https://github.com/shunk031/lineblogscraper

Scraper for LINE Blog in Scrapy

crawler lineblog scraper scrapy

Last synced: 10 Jan 2025

https://github.com/yjg30737/pyqt-google-image-crawler

Crawling image files from Google search result with Python and icrawler

beautifulsoup4 crawler icrawler image-crawler pyqt pyqt5 pyqt5-desktop-application

Last synced: 03 Jan 2025

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 19 Dec 2024

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 03 Jan 2025

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/techguy-bhushan/web-spider

multi-threaded webs crawler

crawler python web-spider

Last synced: 17 Jan 2025

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 19 Dec 2024

https://github.com/andreoliwa/scrapy-tegenaria

🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢

crawler flask postgresql python python3 scrapy

Last synced: 11 Jan 2025

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 05 Jan 2025

https://github.com/systemfsoftware/youtube-autocomplete-scraper

YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.

actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api

Last synced: 11 Jan 2025

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 12 Jan 2025

https://github.com/madis/flatcrawl

Clojure app for crawling apartment information from http://kv.ee

clojure crawler real-estate webapp

Last synced: 12 Jan 2025

https://github.com/zabuzard/songcrawler

Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.

command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler

Last synced: 12 Jan 2025

https://github.com/nakabonne/staticcollector

Application to analyze static files of competing sites

crawler go golang

Last synced: 14 Dec 2024

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 19 Dec 2024

https://github.com/gabrielrf/bsbdf

Telegram Public Channel

crawler python telegram telegram-channel telegraph

Last synced: 13 Jan 2025

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 24 Jan 2025

https://github.com/idanhoro/nasa-heat-maps-prediction

In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms

beautifulsoup crawler machine-learning pillow prediction python sklearn

Last synced: 20 Jan 2025

https://github.com/highbreed/web-crawler

A web crawler script that crawls the target website and lists its links

crawler crawling python3

Last synced: 13 Jan 2025

https://github.com/zhaoweih/meizitu-crawler

🕷️妹子图爬虫-Scrapy

crawler meizitu python scrapy spider

Last synced: 31 Oct 2024

https://github.com/0000xffff/webgrab

web page: crawler / file scanner / downloader

crawler download downloader scrape scraper webcrawler

Last synced: 19 Jan 2025

https://github.com/telanflow/scrago

A micro crawler framework. achieved by GOLANG.

crawler go micro-framework spider

Last synced: 19 Jan 2025

https://github.com/denrydu/baiduimagecrawler

自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!

baidu crawler dynamic python3

Last synced: 27 Dec 2024

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 16 Jan 2025

https://github.com/litingyes/cobweb

Collect, store and distribute meaningful static data

apis bing-image bing-wallpapers crawler image random-image

Last synced: 05 Dec 2024

https://github.com/marabesi/social-crawler

Easy way to find emails from social networks

crawler emails php social-crawler social-network

Last synced: 11 Nov 2024

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 03 Jan 2025

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」-聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 13 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 06 Dec 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Jan 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫,並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 22 Jan 2025

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 21 Jan 2025

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 08 Dec 2024

https://github.com/tsoliangwu0130/ptt-search

A simple Python script to fetch PTT post from the command line.

crawler ptt python

Last synced: 08 Jan 2025

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Nov 2024

https://github.com/marzzzello/gplaycrawler

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper

Last synced: 23 Dec 2024

https://github.com/krishpranav/gocralwer

A awsome crawler made in go

crawler

Last synced: 18 Jan 2025

https://github.com/maxiroellplenty/gs-robot

NodeJs tool to scrap gelbe-seiten

axios cheerio crawler gelbe-seiten nodejs scraper yargs

Last synced: 23 Jan 2025

https://github.com/pjullrich/link-crawler

Python Crawler that reports broken links on a given website and its sup-pages

asyncio breadth-first-search broken-links crawler python

Last synced: 23 Jan 2025

https://github.com/tcc0lin/magiccrawler

Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:

crawler python3 spider

Last synced: 18 Jan 2025

https://github.com/curegit/nominium

個人間取引サイトの新着商品をメールなどで通知するクローラーシステム

c2c chromium crawler ecommerce firefox selenium shopping webdriver

Last synced: 18 Jan 2025

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Dec 2024

https://github.com/ghost---shadow/feature-extractor-from-codebase

Copies the target java file and all its dependencies recursively to another directory

code-splitting crawler

Last synced: 16 Jan 2025

https://github.com/hamidrabedi/digikala-crawler

a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls

crawler digikala digikala-crawler django python scraper

Last synced: 14 Dec 2024

https://github.com/greatdrake/contributecounter

crawl Wikipedia for contributers

crawler python scraping

Last synced: 14 Dec 2024

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 27 Jan 2025

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/juangesino/gazette

A personal news aggregator application using Meteor.

crawler meteor meteorjs news news-aggregator news-feed scraper

Last synced: 23 Jan 2025

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 06 Dec 2024

https://github.com/h4r5h1t/crawlytics

A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.

appsec crawler crawler-python mechanicalsoup security security-tools webcrawler

Last synced: 28 Dec 2024

https://github.com/srx-2000/swaiter

a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序

crawler selenium selenium-python

Last synced: 22 Jan 2025

https://github.com/enansari/guess-price-car

Car price estimation based on the information of a car sales site | final project of Maktabkhooneh | حدس قیمت خودرو با ماشین لرنینگ | پروژه نهایی مکتب‌خونه

crawler jadi machine-learning maktabkhoone maktabkhooneh python

Last synced: 09 Jan 2025

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 27 Jan 2025

https://github.com/pythoript/pgn-scraper

PGN Scraper is a command-line application written in Go, designed to scrape Portable Game Notation (PGN) files and related formats from the internet.

7zip cbv chess chessbase cli command-line-tool crawler downloader go golang open-source pgn pgn-extract scid scraper web-crawler web-scraper zip

Last synced: 23 Jan 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 01 Dec 2024

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/machu-gwu/crawlib-project

tool set for crawler project.

crawler framework mongodb python scrapy

Last synced: 18 Jan 2025

https://github.com/sefinek/niedlascamu.pl-tracker

Śledzenie zmian na stronie niedlascamu.pl.

crawl crawler niedlascamu tracker tracking

Last synced: 07 Dec 2024

https://github.com/zhoudaxia233/unilogo

A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.

crawler python visualization

Last synced: 15 Dec 2024

https://github.com/idlesign/gallerycrawler

Generic crawling for galleries

crawler gallery images python3

Last synced: 17 Dec 2024

https://github.com/purrproof/smartcrawl

An adaptable framework for gathering, aggregating and analyzing data, focusing on blockchain and smart contracts.

blockchain cli crawler explorer framework go golang hacktoberfest

Last synced: 27 Jan 2025

https://github.com/openpj/manifoldcf-sdk

Apache ManifoldCF SDK is a Maven project focused on helping developers to extend ManifoldCF with new connectors and extensions

apache crawler docker ecm extensions integrations manifoldcf migration sdk search

Last synced: 25 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/zhanymkanov/marketplace_parser

Products and Reviews Crawler

crawler python scrapy

Last synced: 14 Jan 2025

https://github.com/abdus/scrape-web

A simple web scrapper for Node.js

crawler web-scraping web-scrapper

Last synced: 03 Dec 2024

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 29 Nov 2024

https://github.com/excaliburhan/littlenews

A news app via electron

crawler electron rss-feed

Last synced: 28 Jan 2025

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 24 Jan 2025

https://github.com/mkfsn/chronos

A light cron-like container service - create cron job easily.

crawler cron cronjob golang

Last synced: 22 Jan 2025

https://github.com/zhs007/lottery-crawler

基于jarvis-task的爬虫,主要用来爬取lottery数据。

crawler jarvis-task

Last synced: 03 Jan 2025

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 19 Jan 2025

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 26 Jan 2025

https://github.com/elektrostudios/gamefaqs-platform-exclusive-games-scraper

Crawls exclusive video games released for the platforms specified on GameFAQs website to generate a table in Markdown format with the crawled titles.

console-app console-application crawler dotnet game gamefaqs games megadrive netframework nintendo ps3 ps4 ps5 scraper snes vbnet videogame videogames windows xbox

Last synced: 01 Dec 2024

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 26 Jan 2025

https://github.com/redco/goose-phantom-environment

Environment for Goose parser which allows to run it in PhantomJS

crawler environment goose goose-parser nodejs parse parser phantomjs scraper

Last synced: 22 Dec 2024

https://github.com/laurybueno/monibus

API de monitoramento de ônibus em São Paulo

api crawler django docker mapping sptrans

Last synced: 27 Jan 2025

https://github.com/victorhuu/amazonmovieintegration

本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据

crawler data-warehouse movies pandas scrapy xpath

Last synced: 26 Jan 2025

https://github.com/sonhm3029/crawl-data-bot

This project making a base crawl data from web bot, include text data and images data

crawler google medical vietnamese

Last synced: 17 Jan 2025

https://github.com/maxmindlin/swarm

Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.

crawler golang mongodb

Last synced: 06 Dec 2024

https://github.com/camilamaia/crawl4us

[WIP] A Python web crawler looking wildly for tables 🕵️‍♀️

beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping

Last synced: 08 Dec 2024

https://github.com/simonrichardson/crwlr

Crawl all the things!

crawler meshuggah

Last synced: 01 Dec 2024