Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with web-scraper

A curated list of projects in awesome lists tagged with web-scraper .

https://github.com/getmaxun/maxun

πŸ”₯ Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]

agents api automation browser browser-automation data-extraction no-code no-code-web-scraper playwright robotic-process-automation rpa scraper self-hosted web-agent web-automation web-scraper web-scraping web-scraping-agent webscraping website-to-api

Last synced: 02 Jan 2025

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 02 Jan 2025

https://github.com/tholian-network/stealth

:rocket: Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

anonymity browser-automation privacy-protection web-browser web-filter web-proxy web-scraper

Last synced: 29 Dec 2024

https://github.com/gosom/google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

distributed-scraper distributed-scraping golang google-maps google-maps-scraping web-scraper web-scraping

Last synced: 05 Nov 2024

https://github.com/postmodern/spidr

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

crawler ruby scraper spider spider-links web web-crawler web-scraper web-scraping web-spider

Last synced: 02 Jan 2025

https://github.com/k0rnh0li0/onlyfans-dl

OnlyFans content downloader

media-downloader onlyfans python web-scraper

Last synced: 29 Oct 2024

https://github.com/je-suis-tm/web-scraping

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

bloomberg data-scraper data-scraping financial-data financial-times futures futures-historical-data news-scraper news-websites newsletter options-data python-web-scraper reuters scrapper sraping wall-street-journal wallstreetbets web-scraper web-scrapers web-scraping

Last synced: 03 Jan 2025

https://github.com/gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

archiving cli crawler deno dockerfile nodejs scraping-websites single-file web-archiving web-crawler web-scraper web-scraping

Last synced: 03 Jan 2025

https://github.com/cassidoo/scrapers

A list of scrapers from around the web.

list scrape-websites scraper web-scraper

Last synced: 10 Dec 2024

https://github.com/oxylabs/quick-start-guide

Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.

oxylabs scraper scraper-api scraper-python scrapers scraping scraping-websites web-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/austinoboyle/scrape-linkedin-selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

linkedin python scrape scraper scraping selenium selenium-webdriver web-scraper web-scraping

Last synced: 04 Jan 2025

https://github.com/oxylabs/how-to-scrape-google-scholar

A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.

google-scholar google-scholar-scraper google-scholar-scrapper google-search-scraper python python-scraper scraper-api web-scraper web-scraping

Last synced: 30 Dec 2024

https://github.com/paulpierre/markdown-crawler

A multithreaded πŸ•ΈοΈ web crawler that recursively crawls a website and creates a πŸ”½ markdown file for each page, designed for LLM RAG

html-to-markdown html-to-markdown-converter html2md llm llmops markdown markdown-crawler markdown-parser markdown-scraper md-crawler rag web-scraper

Last synced: 04 Jan 2025

https://github.com/phantominsights/summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

nlp praw python3 reddit-bot spacy web-scraper wordcloud

Last synced: 01 Jan 2025

https://github.com/PhantomInsights/summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

nlp praw python3 reddit-bot spacy web-scraper wordcloud

Last synced: 12 Nov 2024

https://github.com/epiqueras/getsy

A simple browser/client-side web scraper.

browser client-side scraper web-scraper

Last synced: 01 Jan 2025

https://github.com/wikimedia/html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

javascript metadata-extraction metadata-extractor node-module nodejs web-scraper web-scraping

Last synced: 04 Jan 2025

https://github.com/areed1192/python-sec

A simple python library that allows for easy access of the SEC website so that someone can parse filings, collect data, and query documents.

finance python sec securities-and-exchange-commission web-scraper

Last synced: 01 Jan 2025

https://github.com/passivebot/facebook-marketplace-scraper

This repository contains a script to scrape Facebook Marketplace data using Playwright, BeautifulSoup and Streamlit.

database facebook facebook-marketing-automation facebook-marketplace playwright playwright-python python sqlite3 web-automation web-scraper web-scraping

Last synced: 19 Nov 2024

https://github.com/khuyentran1401/top-github-scraper

Scape top GitHub repositories and users based on keywords

github github-api python scraping web-scraper web-scraping

Last synced: 19 Dec 2024

https://github.com/scrapehero/yellowpages-scraper

Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.

business-directory extract html lxml parsing python scraper web-scraper yellow-pages yellow-pages-scraper

Last synced: 04 Nov 2024

https://github.com/oxylabs/playwright-web-scraping

A tutorial for web scraping using Playwright headless browser

playwright web-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/cobalt-uoft/uoft-scrapers

Public web scraping scripts for the University of Toronto.

open-data toronto uoft web-scraper

Last synced: 03 Nov 2024

https://github.com/nasdin/videorecognition-realtime-autotrainer-alerts

State of the art object detection in real-time using YOLOV3 algorithm. Augmented with a process that allows easy training of the classifier as a plug & play solution . Provides alert if an item in an alert list is detected.

alerts automatic convolutional-neural-networks darknet deep-learning google-image-search image-processing image-recognition machine-learning object-detection real-time tensorflow video-recognition web-scraper webcam webscraping yolo yolo2 yolov2 yolov2-model

Last synced: 07 Nov 2024

https://github.com/jlospinoso/abrade

A fast Web API scraper written in C++ and built on Boost ASIO

boost-asio boost-beast cpp web-scraper

Last synced: 30 Oct 2024

https://github.com/mawrkus/jason-the-miner

⛏ A versatile Web scraper for Node.js

crawler crawling javascript scraper scraping web-scraper

Last synced: 13 Nov 2024

https://github.com/janchaloupka/web-scraper-nabidek-pronajmu

NΓ‘stroj pro hlΓ­dΓ‘nΓ­ novΓ½ch nabΓ­dek nemovitostΓ­ na populΓ‘rnΓ­ch realitnΓ­ch serverech. NabΓ­dky jsou vypisovΓ‘ny do Discord roomky.

apartment-finder discord discord-bot docker python renting web-scraper

Last synced: 06 Nov 2024

https://github.com/milahu/opensubtitles-scraper

scrape subtitles from opensubtitles.org

opensubtitles subtitles web-scraper

Last synced: 13 Dec 2024

https://github.com/jetkai/proxy-scraper

This is an application that scrapes various Proxy API Endpoints, then compiles the proxies into files within the "/proxies/" directory.

exe gradle httpclient jackson-json jar java jdk11 kotlin launch4j proxies proxy proxy-scrape proxy-scraper scraper scraping selenium-java web-scraper web-scraping

Last synced: 30 Dec 2024

https://github.com/phantominsights/reddit-bots

A collection of Reddit bots that I use to enhance the subreddits I manage.

beautifulsoup praw python3 reddit-bot requests rss web-scraper

Last synced: 11 Nov 2024

https://github.com/phantominsights/mexican-jobs-2020

Data ETL & Analysis on thousands of job listings from the official Mexican job board (2020 edition).

hacktoberfest job-offers lxml pandas plotly python3 selenium web-scraper

Last synced: 11 Nov 2024

https://github.com/azogue/esiosdata

Web Scraper para datos de demanda, producciΓ³n y coste de la energΓ­a elΓ©ctrica en EspaΓ±a, y simulador de facturaciΓ³n elΓ©ctrica segΓΊn el PVPC

energy energy-monitor esios python-3 scraper web-scraper

Last synced: 09 Nov 2024

https://github.com/PhantomInsights/tweet-transcriber

A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.

beautifulsoup imgur praw python3 reddit-bot web-scraper

Last synced: 12 Nov 2024

https://github.com/phantominsights/tweet-transcriber

A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.

beautifulsoup imgur praw python3 reddit-bot web-scraper

Last synced: 11 Nov 2024

https://github.com/shobrook/git-pull

Parallelized web scraper for Github

github github-api github-scraper parallel scraper web-scraper

Last synced: 28 Oct 2024

https://github.com/earowang/rwalkr

R package to provide API to Melbourne pedestrian data

r web-scraper

Last synced: 14 Oct 2024

https://github.com/deep5050/abosar

অবসর πŸ“š A collection of short Bengali stories web scraped from various Bengali eMagazines and eNewspapers.

bengali cron-jobs stories web-scraper web-scraping webcrawler

Last synced: 09 Nov 2024

https://github.com/faheel/youtube-scraper-api

A web API that scrapes a YouTube video's data and returns it as JSON

api json json-api python python3 scraper web-scraper youtube youtube-data

Last synced: 12 Oct 2024

https://github.com/metalwarrior665/actor-rust-scraper

Experimental scraper in Rust suited for running locally or on the Apify platform. Inspired by Apify SDK.

apify rust web-scraper

Last synced: 30 Dec 2024

https://github.com/knlnks/uber_eats_scraper

An Uber Eats scraper written in python.

python restaurant selenium uber-eats uber-eats-scraper web-scraper

Last synced: 12 Nov 2024

https://github.com/metalwarrior665/actor-article-extractor-smart

Combines Apify's crawling system and article parsing with unfluff library.

actor apify article-extractor scraper web-scraper

Last synced: 30 Dec 2024

https://github.com/j4asper/dmr.py

Pull data from the danish vehicle registry with dmr.py

denmark dmr motorregister nummerplade python python-library python3 web-scraper

Last synced: 25 Nov 2024

https://github.com/sgtfloyd/mtg-db

Ruby gem containing structured data for all Magic: The Gathering cards

card-database magic-the-gathering mtg ruby-gem web-scraper

Last synced: 22 Dec 2024

https://github.com/oxylabs/golang-web-scraper

A tutorial for building a web scraper in Golang

go golang url-scraper web-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/jlumbroso/princeton-scraper-seas-faculty

This is a web scraper that produces publicly accessible, static JSON feeds directly and automatically from the public SEAS directory website.

directory faculty princeton princeton-university web-scraper

Last synced: 02 Dec 2024

https://github.com/developerjosh/nekonode-site

Watch high-quality, ad-free anime streaming on NekoNode – your ultimate anime destination!

anime anime-api anime-scraper anime-streaming api nextjs streaming-video web-scraper website

Last synced: 16 Nov 2024

https://github.com/deadsec-security/easy-scraper

Create easy workflows for web scraping using the web and drag and drop features. Making scraping easy and fast!

docker easy-to-use selfhostable selfhosted web-scraper web-scraping web-scraping-software web-scrapper-python

Last synced: 22 Oct 2024

https://github.com/anlisha-maharjan/laravel-web-scraping

Web Scraping With PHP. A Laravel REST API to fetch content of any website.

laravel8 php php-crawler spatie-crawler web-scraper

Last synced: 11 Oct 2024

https://github.com/rija/ghost-ssg

A Docker-based pipeline to publish the content of a local Ghost 4 server as static pages.

bash blogging cli docker docker-compose ghost ghost-cms gitlab integration jamstack nginx nodejs publishing scraping-tool self-hosted static-site-generator web-scraper wget workflow

Last synced: 27 Oct 2024

https://github.com/pps-22-scooby/pps-22-scooby

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

crawler crawlers internal-dsl scala scraper scrapers web web-crawler web-crawling web-scraper web-scrapers

Last synced: 14 Oct 2024

https://github.com/jlumbroso/princeton-scraper-cos-people

This is a web scraper that produces publicly accessible, static JSON feeds directly and automatically from the public COS directory website.

directory faculty princeton princeton-cs princeton-university web-scraper

Last synced: 02 Dec 2024

https://github.com/breadrock1/socialnetworkscraper

Web scraping is simply the process of using a social media web scraper to gather data automatically. It saves users time, effort and sometimes money since it’s an automatic process performed by bots. You could take the time to search the web for all mentions of a certain word or find all prices for a certain product, but that would take a lot of time.

facebook facebook-scraping flake8 mailru osint osint-python python python3 scraper scraping site-scraper social-network social-network-analysis twitter vk-api vkontakte web-scraper web-scraping

Last synced: 11 Nov 2024

https://github.com/mkearney/r-bloggers

[Tweet bot] R script tweeting new links to R-bloggers posts

r r-bloggers r-rtweet tweetbot tweets twitter web-scraper

Last synced: 15 Nov 2024

https://github.com/ganevdev/actor-webdesignernews-scraper

Scraper for www.webdesignernews.com, using Apify.

actor apify scrap scraper scraping web-scraper

Last synced: 27 Oct 2024

https://github.com/beautifulmoon211/onthemarket-scraping

Web scraping tool used to extract real estate information from OnTheMarket.com, a leading property portal in the United Kingdom.

cheerio data-extraction onthemarket onthemarket-scraper real-estate requests typescript web-scraper

Last synced: 14 Nov 2024

https://github.com/aaryanrr/DownDetector-CLI

CLI Client for DownDetector.com

cli downdetector python3 web-scraper web-scraping

Last synced: 06 Nov 2024

https://github.com/sahilbansal17/moodletracker

This is a simple script which will check whether there are any updates on a registered moodle course and print them in the terminal.

beautiful-soup python script web-scraper

Last synced: 12 Nov 2024