Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with web-scraping

A curated list of projects in awesome lists tagged with web-scraping .

https://github.com/chrismuir/zillow

Zillow Scraper for Python using Selenium

chromedriver python scraper selenium web-scraping zillow

Last synced: 19 Dec 2024

https://github.com/NLPatVCU/PaperScraper

A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.

journal-web-scraper natural-language-processing pubmed-articles-grabber scientific-publications selenium-webdriver web-scraping

Last synced: 06 Nov 2024

https://github.com/nlpatvcu/paperscraper

A web scraping tool to systematically extract the text of scientific papers and corresponding metadata from university accessible journals.

journal-web-scraper natural-language-processing pubmed-articles-grabber scientific-publications selenium-webdriver web-scraping

Last synced: 19 Dec 2024

https://github.com/apify/actor-page-analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.

headless-chrome javascript web-scraping

Last synced: 07 Nov 2024

https://github.com/HarryShomer/Hockey-Scraper

Python Package for scraping NHL Play-by-Play and Shift data

hockey nhl python scraper sports web-scraping

Last synced: 06 Nov 2024

https://github.com/hominee/dyer

Dyer is designed for reliable, flexible and fast web crawling, providing some high-level, comprehensive features without compromising speed.

crawler rust rust-programming-language spider web-crawler web-framework web-scraping

Last synced: 06 Nov 2024

https://github.com/bertrandmartel/tableau-scraping

Tableau scraper python library. R and Python scripts to scrape data from Tableau viz

dataframe pandas python r tableau web-scraping

Last synced: 30 Dec 2024

https://github.com/trainingbypackt/data-wrangling-with-python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

analytics beautifulsoup data-analytics data-munging data-science data-wrangling database numpy pandas python regular-expression web-scraping

Last synced: 01 Jan 2025

https://github.com/my8100/scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:

cluster heroku logparser python scrapy scrapyd scrapydweb web-crawling web-scraping

Last synced: 20 Dec 2024

https://github.com/apify/actor-scraper

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.

apify web-scraping

Last synced: 07 Nov 2024

https://github.com/codingforentrepreneurs/Web-Scraping

Learn how to leverage Python's amazing tools to scrape data from other websites. The end goal of this course is to scrape blogs to analyze trending keywords and phrases. We'll be using Python 3.6, Requests, BeautifulSoup, Asyncio, Pandas, Numpy, and more!

aysncio beautifulsoup beautifulsoup4 joincfe numpy pandas python python-requests python3 requests scraper sraping tutorial web-scraping

Last synced: 22 Nov 2024

https://github.com/sangaline/scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

archive-dot-org middleware python scrapy scrapy-extension wayback-machine web-scraping

Last synced: 01 Jan 2025

https://github.com/Crinibus/scraper

Web scraper for scraping, tracking and visualizing prices of products on various websites.

amazon avcables computersalg coolshop ebay elgiganten expert komplett mm-vision newegg prices products proshop python scrape-prices scraper sharkgaming shein tech-scraper web-scraping

Last synced: 06 Nov 2024

https://github.com/vindarel/cl-torrents

Searching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)

1337 1337x common-lisp pirate-bay torrents tutorial web-scraping

Last synced: 19 Dec 2024

https://github.com/siongui/instago

Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram

downloader go golang gopherjs instagram web-scraping webscraping

Last synced: 29 Oct 2024

https://github.com/passivebot/facebook-marketplace-scraper

This repository contains a script to scrape Facebook Marketplace data using Playwright, BeautifulSoup and Streamlit.

database facebook facebook-marketing-automation facebook-marketplace playwright playwright-python python sqlite3 web-automation web-scraper web-scraping

Last synced: 19 Nov 2024

https://github.com/king04aman/all-in-one-python-projects

A huge collection of awesome beginner-friendly Python projects starting from very basics to advance. Prefect repository for learning python and enhancing your python programming skills.

artificial-intelligence automate-task automation beginner-friendly hacktoberfest hacktoberfest2024 machine-learning open-source-project python python-projects python-projects-basic-to-advanced python-tools web-scraping

Last synced: 29 Dec 2024

https://github.com/hrbrmstr/splashr

:sweat_drops: Tools to Work with the 'Splash' JavaScript Rendering Service in R

har phantomjs r r-cyber rstats selenium splash web-scraping

Last synced: 27 Oct 2024

https://github.com/scrapehero/zillow_real_estate

Zillow.com Web Scraper written in Python and LXML to extract real estate listings available based on a zip code.

html lxml parsing python-requests scraper web-scraping

Last synced: 04 Nov 2024

https://github.com/scrapinghub/web-poet

Web scraping Page Objects core library

hacktoberfest page-objects python web-scraping

Last synced: 30 Dec 2024

https://github.com/apify/browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

browser-automation headless-browsers playwright puppeteer rpa scraping web-scraping

Last synced: 01 Jan 2025

https://github.com/zoranpandovski/bookingscraper

:earth_americas: :hotel: Scrape Booking.com :hotel: :earth_americas:

beautifulsoup booking python3 request scraper web-scraping webscraper webscraping

Last synced: 29 Dec 2024

https://github.com/seanfhear/tab-scraper

Interface for downloading guitar tabs from Ultimate Guitar

chords guitar guitar-chords guitar-tablature guitar-tabs ultimate-guitar web-scraping

Last synced: 11 Nov 2024

https://github.com/mostlypanda/node-js-functionalities

This repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below

2-way-authentication crudapi express html login logout mongodb multer-storage node-js nodejs-tutorials npm packages payment-gateway rest-api signup sms-services smtp twilio web-scraping

Last synced: 05 Nov 2024

https://github.com/khuyentran1401/top-github-scraper

Scape top GitHub repositories and users based on keywords

github github-api python scraping web-scraper web-scraping

Last synced: 19 Dec 2024

https://github.com/umesh-01/python-assistant

Python Assistant (PA) is a voice command based assistant service written in Python 3.9+. It can recognize human speech or voice, talk to user and execute basic commands.

ai-assistants google-recognition nlp openweathermap-api pycharm-ide python python-assistant python-automation python39 pyttsx3 speech-recognition text-to-speech virtual-assistant voice-assistant voice-commands voice-recognition web-scraping wikipedia-search wolfram-alpha

Last synced: 09 Oct 2024

https://github.com/crawlzone/crawlzone

Crawlzone is a fast asynchronous internet crawling framework for PHP.

automated-testing crawler crawling-framework middleware php web-scraping web-search

Last synced: 29 Oct 2024

https://github.com/umarbutler/open-australian-legal-corpus-creator

The code used to create and update the Open Australian Legal Corpus, the first and only multijurisdictional open corpus of Australian legislative and judicial documents.

australia corpus dataset datasets law legal open-data scraping web-scraping

Last synced: 01 Jan 2025

https://github.com/dddat1017/Scraping-Youtube-Comments

Scrape comments from any Youtube video

data-scraping python selenium web-scraping

Last synced: 02 Dec 2024

https://github.com/utkuufuk/ping-sm

Receive an email or Telegram message as soon as Migros Sanalmarket is available for delivery in your neighborhood.

mailgun-api web-scraping

Last synced: 07 Nov 2024

https://github.com/hrbrmstr/decapitated

Headless 'Chrome' Orchestration in R

headless-chrome javascript r r-cyber rstats web-scraping

Last synced: 22 Nov 2024

https://github.com/yusuftaufiq/laravel-books-api

Fully documented & tested Laravel 9 RESTful books API scraped from Gramedia.

docker laravel9 php81 restful-api web-scraping

Last synced: 11 Oct 2024

https://github.com/oxylabs/playwright-web-scraping

A tutorial for web scraping using Playwright headless browser

playwright web-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/b0o/apple-autofill-domains

Apple's allowed autofill domains

apple data-analysis github-actions web-scraping

Last synced: 29 Oct 2024

https://github.com/scrapehero/selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

python scraping selectors web-scraping xpath

Last synced: 04 Nov 2024

https://github.com/techfanetechnologies/qtsapp

The Python Library For QtsApp which displays the option chain in near real-time. This program retrieves this data from the QtsApp site and then generates useful analysis of the Option Chain for the specified Index or Stock. It also continuously refreshes the Option Chain along with Implied Volatatlity (IV), Open Interest (OI), Delta, Theta, Vega, Gamma, Vanna, Charm, Speed, Zomma, Color, Volga, Veta at an interval of a second and visually displays the trend in various indicators useful for Technical Analysis.

analysis app banknifty derivatives drmoonejune equity moonedrjune nifty nifty50 nse option-chain option-greeks option-pricing option-trading options options-trading python script strike-price web-scraping

Last synced: 10 Nov 2024

https://github.com/alex000kim/slack-gpt-bot

GPT4-powered Slack bot that can scrape URL contents

chatbot gpt-4 gpt4 slack slack-bot web-scraping webscraping

Last synced: 07 Nov 2024

https://github.com/dojutsu-user/imdb-scraper

Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.

imdb-webscrapping movie-dataset python3 scrapy scrapy-crawler scrapy-framework web-scraping

Last synced: 28 Oct 2024

https://github.com/hrbrmstr/wayback

:rewind: Tools to Work with the Various Internet Archive Wayback Machine APIs

internet-archive memento r r-cyber rstats wayback wayback-machine web-scraping

Last synced: 28 Oct 2024

https://github.com/gadingnst/kampus-scraper

Scraper & GraphQL API untuk data Perguruan Tinggi di Indonesia berdasarkan dari website Kementrian RISTEKDIKTI.

api graphql puppeteer scraper serverless web web-scraping

Last synced: 20 Dec 2024

https://github.com/ibnesayeed/linkextractor

A Docker tutorial using a link extraction application example

docker hacktoberfest interactive link-extraction php python ruby tutorial web-scraping

Last synced: 26 Dec 2024

https://github.com/deedy5/primp

🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers

akamai-fingerprint fingerprint http http-client http2-fingerprint https ja3-fingerprint ja4-fingerprint tls-fingerprint web-scraping

Last synced: 17 Dec 2024

https://github.com/goodbyteco/letterboxd-watchlist-picker

A simple website that gives you a random film off your Letterboxd watchlist (or any list).

film go letterboxd movies watchlist web-scraping webapp

Last synced: 05 Nov 2024

https://github.com/ahmedshahriar/bd-medicine-scraper

Scrapy-Django PostgreSQL integrated API with Proxy IP configuration that scrapes all medicine data (meds, prices, generics, companies, indications) from Bangladesh (30k+ pages)

django django-rest-framework drug manufacturer medicine medicine-database postgresql proxy-ip python python3 rest-api scrapy web-scraping

Last synced: 19 Dec 2024

https://github.com/gugarosa/viviner

🍷 Scraps data from Vivino and collects outstanding wine-based meta-data.

data-mining requests vivino web-scraping wine

Last synced: 01 Oct 2024

https://github.com/danmorse314/hockeyR

Collect and Clean Hockey Stats

hockey nhl nhl-data web-scraping

Last synced: 04 Dec 2024

https://github.com/ramonpaolo/api-b3

API Simples que retorna dados sobre tal ação/empresa da B3

api flask heroku opensource python web-scraping

Last synced: 22 Oct 2024

https://github.com/rebrowser/rebrowser-playwright-python

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

automation bot bot-detection captcha headless playwright playwright-python rebrowser rebrowser-patches scraping web-scraping

Last synced: 01 Jan 2025

https://github.com/oxylabs/web-scraping-tutorials

Web scraping, data parsing and automation tutorials. Suited for both beginners and intermediate/advanced programmers.

csharp curl github-python golang javascript python r-language ruby web-proxies web-scraping wikipedia-scraper

Last synced: 17 Nov 2024

https://github.com/florents-tselai/greek-wines-analysis

Scraper, Data and Analysis for "Analyzing 1000+ Greek Wines with Python"

beautifulsoup data-science pandas python seaborn web-scraping

Last synced: 31 Oct 2024

https://github.com/oxylabs/scraping-dynamic-javascript-ajax-websites-with-beautifulsoup

A guide on how to scrape JavaScript rendered websites with Python and BeautifulSoup.

ajax beautiful-soup github-python javascript python scraping web-scraping

Last synced: 17 Nov 2024

https://github.com/hrbrmstr/htmlunit

🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

htmlunit javascript r r-cyber rstats web-scraping

Last synced: 28 Oct 2024

https://github.com/wenyalintw/google-patents-scraper

Automatically download all PDF files of searching results & their patent families found on Google Patents.

crawler google-patents patent patents pdf scraper scraping scrapy web-scraping

Last synced: 11 Nov 2024

https://github.com/soumyajit4419/ai_for_social_good

Using natural language processing to analyze the sentiments of people and detect suicidal ideation on online social content.

lstm natural-language-processing random-forest tfidf-vectorizer web-scraping

Last synced: 22 Oct 2024

https://github.com/drewcarlson/ktsoup

A Kotlin multiplatform HTML5 parsing library

jsoup kotlin kotlin-multiplatform lexbor web-scraping

Last synced: 25 Dec 2024

https://github.com/ekarton/uoft-timetable-generator

A web application that generates timetables for university students at the University of Toronto

genetic-algorithm productivity timetable-generator uoft web-application web-scraping

Last synced: 24 Nov 2024

https://github.com/City-Bureau/city-scrapers-template

Template for creating a City Scrapers project in your area

city-scrapers open-data python web-scraping

Last synced: 04 Dec 2024

https://github.com/ayaka14732/lihkg-scraper

A Python script for scraping LIHKG

scraper web-scraping

Last synced: 28 Oct 2024

https://github.com/vaasudevans/google-podcast-downloader

CL tool to download entire google podcast library for the provided URL 🎵

google-podcasts podcast-downloader python web-scraping

Last synced: 13 Dec 2024

https://github.com/0x0be/scrapeadvisor

A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility

data-mining data-science python3 r scraping sentiment-analysis sentiment-classification text-mining tripadvisor tripadvisor-scraper web-scraping

Last synced: 04 Nov 2024

https://github.com/jetkai/proxy-scraper

This is an application that scrapes various Proxy API Endpoints, then compiles the proxies into files within the "/proxies/" directory.

exe gradle httpclient jackson-json jar java jdk11 kotlin launch4j proxies proxy proxy-scrape proxy-scraper scraper scraping selenium-java web-scraper web-scraping

Last synced: 30 Dec 2024

https://github.com/mainakrepositor/py-automation

Automating social media, mailing, and kernel processes using Python.

automated-tests automation modules os python3 security-tools selenium testing-tools web-scraping webdriver

Last synced: 12 Nov 2024

https://github.com/gabrieldim/a1on-webscraping-pandas-data-science

Learning WebScraping using Pandas in python. - Data Science

data data-science pandas sciecne web-scraping

Last synced: 20 Nov 2024

https://github.com/Granitosaurus/parsel-cli

cli for evaluating css and xpath selectors

cli css lxml parsel web-scraping xpath

Last synced: 06 Nov 2024

https://github.com/websemantics/codepen-puppeteer

Use Puppeteer to download pens from Codepen.io as single html pages

codepen headless-chrome puppeteer web-scraping

Last synced: 06 Nov 2024

https://github.com/papagorgio23/vegaslines

Historical Vegas betting lines for the NBA and NFL

nba nfl sports-betting sportsbetting vegas-lines web-scraping

Last synced: 01 Dec 2024

https://github.com/davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

dataset deduplication fine-tuning fraud gpt2 huggingface lsh minhash nlp pytorch readability scam transformer web-scraping

Last synced: 22 Nov 2024

https://github.com/d4n3436/gscraper

A collection of search engine image scrapers (Google Images, DuckDuckGo and Brave)

brave duckduckgo google google-images gscraper web-scraping

Last synced: 08 Nov 2024

https://github.com/wiringbits/simple-http-proxy

A very simple http proxy that runs on a Raspberry Pi

http-proxy playframework raspberry-pi scala web-scraping

Last synced: 21 Nov 2024