Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-24 00:06:40 UTC
- JSON Representation
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 11 Jan 2025
https://github.com/systemfsoftware/youtube-autocomplete-scraper
YouTube AutoComplete Scraper - An Apify actor that scrapes YouTube's search suggestions with intelligent deduplication using pglite and trigram similarity matching. Perfect for content research, SEO, and trend analysis.
actor apify autocomplete crawler deduplication pglite scraper search similarity suggestions trigram youtube youtube-api
Last synced: 11 Jan 2025
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 23 Jan 2025
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 19 Dec 2024
https://github.com/wangyihang/acw-sc-v2-py
Python requests.HTTPAdapter for `acw_sc__v2`
Last synced: 05 Jan 2025
https://github.com/fernandod1/yahoo-finance-scraper
This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.
crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api
Last synced: 12 Jan 2025
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Jan 2025
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 14 Dec 2024
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 12 Jan 2025
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 19 Dec 2024
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 13 Jan 2025
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 24 Jan 2025
https://github.com/idanhoro/nasa-heat-maps-prediction
In this project we research the correlations between different weather conditions and try to predict future scenarios by using image processing and traditional machine learning algorithms
beautifulsoup crawler machine-learning pillow prediction python sklearn
Last synced: 20 Jan 2025
https://github.com/highbreed/web-crawler
A web crawler script that crawls the target website and lists its links
Last synced: 13 Jan 2025
https://github.com/0000xffff/webgrab
web page: crawler / file scanner / downloader
crawler download downloader scrape scraper webcrawler
Last synced: 19 Jan 2025
https://github.com/telanflow/scrago
A micro crawler framework. achieved by GOLANG.
crawler go micro-framework spider
Last synced: 19 Jan 2025
https://github.com/ph-7/gettermails
GetterMails, Scraper
bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver
Last synced: 19 Jan 2025
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 27 Dec 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 16 Jan 2025
https://github.com/litingyes/cobweb
Collect, store and distribute meaningful static data
apis bing-image bing-wallpapers crawler image random-image
Last synced: 05 Dec 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Nov 2024
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 27 Nov 2024
https://github.com/chen0040/ios-stock-tracker
Stock tracker implemented using Objective-C for iOS
crawler ios-app objective-c stock-prices
Last synced: 16 Dec 2024
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 06 Dec 2024
https://github.com/efishery/wpi-kkp-crawler
This is crawler for fisheries price on wpi.kkp.go.id
Last synced: 02 Jan 2025
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 19 Jan 2025
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 17 Jan 2025
https://github.com/1970mr/link-crawler
Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.
clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper
Last synced: 11 Nov 2024
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 17 Jan 2025
https://github.com/0xpr03/clantool
CF Management & Data Analysis Tool, crawler backend in rust
backend-server crawler data-analysis rust
Last synced: 02 Jan 2025
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 28 Dec 2024
https://github.com/jimmy-ly00/dhe-prime-grabber
Grabs Diffie-Hellman primes from certificates using OpenSSL. Uses multiprocessing to collect over 50 million Diffie-Hellman primes.
certificate certificates crawler dhe-prime-grabber diffie-hellman ipv4 multiprocessing openssl prime prime-numbers python python-3
Last synced: 29 Dec 2024
https://github.com/dean9703111/humandesign_nodejs
用nodejs爬蟲工具將人類圖網頁上的資訊爬下來,再存到雲端的google excel
crawler googlesheetapi googlesheets nodejs
Last synced: 12 Jan 2025
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 11 Oct 2024
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 13 Jan 2025
https://github.com/liebki/githubnet
This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.
crawler crawling github github-trending htmlagilitypack microsoft
Last synced: 24 Jan 2025
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 31 Dec 2024
https://github.com/christopher-besch/therapy_search
Compute Call Times from arztsuche-bw into a Calendar.
appointments calendar crawler gatsby therapy time-management typescript
Last synced: 28 Dec 2024
https://github.com/juangesino/gazette
A personal news aggregator application using Meteor.
crawler meteor meteorjs news news-aggregator news-feed scraper
Last synced: 23 Jan 2025
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 18 Jan 2025
https://github.com/appliedsoul/crawlmatic
Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.
Last synced: 30 Dec 2024
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 24 Jan 2025
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 28 Nov 2024
https://github.com/sefinek/niedlascamu.pl-tracker
Śledzenie zmian na stronie niedlascamu.pl.
crawl crawler niedlascamu tracker tracking
Last synced: 07 Dec 2024
https://github.com/basemax/jadi-net-blog
This Python script is used to extract posts from a WordPress blog (https://jadi.net/) and save them in HTML format. The script fetches the RSS feed, parses the posts, and saves each post as an individual HTML file.
blog-copier copier crawler crawler-python crawlers jadi-blog jadi-clone jadi-net-blog jadi-net-clone jadinet-blog py python python-crawler wordpress wp
Last synced: 24 Jan 2025
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Jan 2025
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 23 Dec 2024
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 28 Nov 2024
https://github.com/j-hoplin/naver_news_headtopic_news_scraper
네이버 뉴스에서 헤드라인 뉴스 스크레이핑
Last synced: 11 Dec 2024
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 17 Jan 2025
https://github.com/arshadkazmi42/gh-crawl
Crawler for Github repositories. Finds all the broken links from the repositories
bug-bounty-recon crawl crawler gh-crawler github github-crawler githubcrawler python
Last synced: 21 Dec 2024
https://github.com/dean9703111/shopee_find_mac
用最快的速度找到便宜符合自己要求規格的mac
argparse crawler mac pip python python2 xlsxwriter
Last synced: 12 Jan 2025
https://github.com/camilamaia/crawl4us
[WIP] A Python web crawler looking wildly for tables 🕵️♀️
beautifulsoup4 crawler crawling pypi python-3 python-module scraper scraping tables web-scraping
Last synced: 08 Dec 2024
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 08 Dec 2024
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 23 Jan 2025
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 08 Jan 2025
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 14 Jan 2025
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 03 Jan 2025
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 22 Jan 2025
https://github.com/victorhuu/amazonmovieintegration
本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据
crawler data-warehouse movies pandas scrapy xpath
Last synced: 28 Nov 2024
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 27 Dec 2024
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 24 Oct 2024
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 21 Dec 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 28 Nov 2024
https://github.com/wondervictor/spiderman
2017 Software Course Project
crawler distribute-crawler zhihu-crawler
Last synced: 17 Jan 2025
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 05 Dec 2024
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 15 Dec 2024
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 28 Oct 2024
https://github.com/richecr/pyhltv
Repository to extract information from the HLTV website.
crawler csgo hacktoberfest hltv hltv-api python3
Last synced: 20 Jan 2025
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 16 Jan 2025
https://github.com/anjackson/scrapy-url-frontier
A Scrapy module for URL Frontier integration
crawler frontier scrapy spider
Last synced: 05 Jan 2025
https://github.com/danielemoraschi/go-sitemap-common
Simple GO sitemap generator and crawler.
crawler golang sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 16 Dec 2024