Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with web-scraping

A curated list of projects in awesome lists tagged with web-scraping .

https://github.com/hrbrmstr/madhttr

🎩Tidy Helper Methods for Many Types of Unkempt Internet Metadata and Content

har httr openssl r rstats web-scraping

Last synced: 15 Nov 2024

https://github.com/sycanz04/schedulr

A chrome extension that transfers timetable from CLiC to Google Calendar.

google-api google-calendar javascript web-scraping

Last synced: 06 Nov 2024

https://github.com/vrikodar/hacker-news-scraper

Get A list of all Stories on Hacker News Right in your terminal with one command

hacker-news hackernews hackernews-api news-feed newsfeed scraping-websites stories web-scraping

Last synced: 07 Nov 2024

https://github.com/jlumbroso/basic-git-scraper-template

🔬 Starter template for automating web scrapers using GitHub Actions workflows to incrementally commit data to Git 📈 Includes sample script, scheduling, dependency installation, output to CSV/JSON, and ethics guide 🤖 Customizable for diverse sites and use cases!

git-scraping github-template template web-scraping

Last synced: 02 Dec 2024

https://github.com/bhattbhavesh91/amazon-ratings-reviews-scrapper-python

A simple Python web scraper to extract Reviews & Ratings from Amazon's Product Display Page

amazon-scraper page-scraper scrape-products web-crawler web-scraping web-scraping-tutorials web-scrapper-python

Last synced: 16 Nov 2024

https://github.com/luccahellriegel/four-clojure-to-roam

Import all the 4clojure problems into RoamResearch by importing one of the provided json files.

4clojure clojure learning markdown roamresearch spaced-repetition web-scraping

Last synced: 13 Nov 2024

https://github.com/suwadith/winning-eleven-scout-evaluation-and-analysis-to-enhance-football-player-recommendations-ml-flask

A Machine-Learning based Recommendation system for club football personnel to identify underperforming players of a given football club and provide appropriate younger replacements from other clubs.

flask football machine-learning player recommendation-system soccer web-scraping

Last synced: 13 Nov 2024

https://github.com/kwokhing/network-analysis-on-mrt-station

Demo on applying the concept of network analysis on a network of connected railway stations, attempting to identify the important stations (nodes) in this network. Web scraping techniques using rvest package is also briefly discussed upon.

betweenness-centrality closeness-centrality data-cleaning degree-centrality eigenvector-centrality gephi graph-analysis igraph r rvest social-network-analysis social-networks web-scraping xpath

Last synced: 02 Dec 2024

https://github.com/laggui/image-search-scraper

Dataset builder tool from web image scraping

dataset-generation image-scraper search-engine web-scraping

Last synced: 15 Dec 2024

https://github.com/bestmahdi2/uni__webcrawlerproject

A university project in which a web crawler is designed for the Instagram website and fasttext is used to predict the positive or negative content of a post's comments.

beautifulsoup4 fasttext gui matplotlib pandas prediction-model python selenium tkinter web-scraping

Last synced: 16 Nov 2024

https://github.com/formysister/selenium-whatsapp-messanger

To Broadcast the messages using the Whatsapp Web using python.

automation pip python selenium selenium-webdriver web-scraping whatsapp

Last synced: 22 Nov 2024

https://github.com/bestmahdi2/uni__contactmanagementsystem

A university project in which a Content Management System (CMS) is designed with Python and PyQT5.

api cms contact-management-system pyqt pyqt5 pyqt5-gui python web-scraping

Last synced: 16 Nov 2024

https://github.com/kwokhing/exploratory-data-analysis-on-smrt-tweets

Demo on performing exploratory data analysis (EDA) on train service disruptions based on scrapped (user generated contents) tweets from the train operator's (SMRT) twitter account

data-analysis data-cleaning data-collection data-preparation exploratory-data-analysis exploratory-data-visualizations folium geospatial-data leaflet-map python python3 regex scraping selenium selenium-python social-media text-processing user-generated-content web-scraping webscraping

Last synced: 02 Dec 2024

https://github.com/maxbarsukov/korol-i-shut-lyrics-parser

🍯💀 Allows you to collect all the lyrics of your favorite band

gorshok king-and-jester kish korol-i-shut lyrics nokogiri-parser parser ruby-parser web-scraping

Last synced: 01 Dec 2024

https://github.com/spekulatius/phpscraper-keyword-length-distribution-example

Example to demonstration the parsing of keywords as well as the simple analysis of the data to get a length distribution

example-project keyword-extraction keyword-scraper keyword-search php php-scraper phpscraper-example scraper web-scraping

Last synced: 12 Nov 2024

https://github.com/gatenlp/wpextract

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Last synced: 13 Nov 2024

https://github.com/prdpx7/webdict

Fetch definition/meaning of words from dictionary.com and urbandictionary

dictionary nodejs npm-package web-scraping

Last synced: 01 Dec 2024

https://github.com/lebrancconvas/japanese-lyrics

Get Japanese Lyrics from Website and Give the output as a markdown file.

japanese japanese-songs personal-project puppeteer side-project typescript web-scraping

Last synced: 11 Nov 2024

https://github.com/34j/cached-historical-data-fetcher

Python utility for fetching any historical data using caching. Suitable for news, posts, weather, etc.

aiohttp asyncio cache fetch hacktoberfest historical-data http joblib lz4 pagenation pandas python realtime scraper scraping tqdm update web-scraping

Last synced: 28 Oct 2024

https://github.com/serpapi/serpapi-search-rust

Search results in Rust powered by SerpApi.com

bing google seo serpapi web-scraping webscraping

Last synced: 20 Nov 2024

https://github.com/kadnan/yelp-scraper-go

A simple web scraper for Yelp written in Go language.

go golang scraper scraper-go web-scraping

Last synced: 19 Nov 2024

https://github.com/leroyanders/acrticle-scrapper

This Python-based repository hosts a sophisticated service designed for scraping web articles and converting them into Markdown format. The core functionality of this service includes extracting the main content of articles, such as headlines, key paragraphs, and associated images, and then seamlessly transforming this content into well-structured…

article-parser content-creation-tools content-extraction data-archiving html-to-markdown-converter image-downloading markdown-conversion metadata-extraction python web-scraping

Last synced: 27 Oct 2024

https://github.com/axsddlr/rpilocator_api

An Unofficial REST API for checking Raspberry Pi 4 availability https://rpilocator.com/

fastapi python rest-api web-scraping

Last synced: 05 Nov 2024

https://github.com/lukasdrsman/edupage-homework

A simple web scraper to retrieve homework from EduPage

cli edupage edupage-org homework online-school python school scraper selenium web-scraping

Last synced: 30 Nov 2024

https://github.com/vtlim/patfam

Web app to determine whether patent applications from different jurisdictions (USPTO, EPO, WIPO, etc.) are of the same family.

css d3js epo html javascript materializecss patent-applications patent-offices python selenium uspto web-scraping wipo

Last synced: 06 Dec 2024

https://github.com/darsan-in/nexa-auto

Nexa Auto automates the process of verifying the authenticity of addresses for room service eligibility and retrieving detailed specifications across multiple websites. Utilizing Selenium for web automation and GPT for handling missing data, Nexa Auto significantly reduces manual effort in data entry tasks.

address-lookup address-validation address-verification ai-enhanced-automation automation-tools customizable-scripts data-completeness data-entry-automation data-verification error-reduction gpt-integration missing-data-prediction property-details real-estate real-estate-data room-service-eligibility scalable-automation selenium-scraping web-automation web-scraping

Last synced: 12 Dec 2024

https://github.com/dcs-training/summerschool2024-stream2

Welcome to the repository for all the materials for Stream 2 of the CDCS 2024 summer school. Go to the readme file

data-analysis data-visualisation data-wrangling r sentiment-analysis statistics text-analysis web-scraping

Last synced: 10 Nov 2024

https://github.com/hrbrmstr/htmlunitjars

☕️ Java Archive Wrapper Supporting the 'htmlunit' Package

htmlunit r r-cyber rjava rstats web-scraping

Last synced: 15 Nov 2024

https://github.com/briatte/swd

Two-day workshop on scraping legislative data, organised by URFIST Bordeaux in 2018.

legislative-bill-analysis legislative-data r web-scraping

Last synced: 14 Dec 2024

https://github.com/anshu-krishna/html-scraper

A PHP class to simplify data extraction from HTML.

html-scraper html-scraping php php-queryselector scraper web-scraper web-scraping

Last synced: 09 Nov 2024

https://github.com/arefshojaei/digikala-scapper

This project is a Scapper for "digikala.com" website that has been created by Nodejs + API

api back-end digikala digikala-api digikala-crawler digikala-data expressjs nodejs rest-api restful-api scapper web-scraping

Last synced: 01 Jan 2025

https://github.com/utrechtuniversity/ia-webscraping

An AWS workflow for collecting webpages from the Internet Archive

aws internet-archive python terraform web-scraping

Last synced: 22 Nov 2024

https://github.com/oxylabs/automate-competitors-benchmark-analysis

A tutorial for automating competitors’ & benchmark analysis using Python

analysis automation github-python python web-scraping

Last synced: 17 Nov 2024

https://github.com/ayushsoni1010/portfoliogram

⚡️Elevate your portfolio analysis with our cutting-edge web scraping tool. Uncover valuable insights about individuals, their skills, and social profiles effortlessly.

analytics hacktoberfest hactoberfest2023 javascript mongodb nodejs openai openai-api portfolio portfolio-website puppeteer react scraping scraping-tool typescript web-scraping website

Last synced: 13 Dec 2024

https://github.com/johnsell620/sentiment-analysis-goodreads-reviews

Document-level sentiment analysis of book reviews scraped from the Goodreads website. Technologies used include TensorFlow, Spark, HDFS, Sqoop, Scrapy, and D3.js.

data-analysis data-visualization recurrent-neural-networks web-scraping

Last synced: 29 Nov 2024

https://github.com/chimera-hackathon-team/chimera-hackathon-preparation

Welcome to the Chimera team repository! Here you will find the work and resources we use to prepare for hackathons.

artificial-intelligence backend data-science database frontend machine-learning web-scraping

Last synced: 18 Dec 2024

https://github.com/palewire/reuters-jobs

A bot that posts job openings at Reuters News

bot jobs journalism mastodon-bot news python twitter-bot web-scraping

Last synced: 18 Oct 2024

https://github.com/imranr98/instacartflation

A Python script that scrapes your Instacart order history and saves the data in a JSON file.

data-extraction data-ownership export instacart python selenium selenium-webdriver web web-scraping

Last synced: 19 Nov 2024

https://github.com/oxylabs/playwright-proxy-integration-js

A tutorial for implementing Oxylabs` Residential and Datacenter proxies with Playwright using JavaScript

javascript nodejs playwright playwright-java proxies proxy-list proxy-list-github proxy-rotator proxy-site residential-proxy rotating-proxy web-scraping

Last synced: 17 Nov 2024

https://github.com/oxylabs/parse-html-pyquery

Learn to parse HTML using PyQuery, a Python library for web scraping and manipulating HTML.

parser parsing pyquery python web-scraping web-scraping-python

Last synced: 17 Nov 2024

https://github.com/zigai/wrighter

Web scraping/browser automation framework built on top of Playwright

playwright playwright-plugins playwright-python python web-automation web-scraping wrighter

Last synced: 14 Oct 2024

https://github.com/varunon9/github-scraper

A nodejs script (using cheerio module) to extract github users information and save to json file.

cheerio github-scraping nodejs-scraping web-scraping

Last synced: 31 Oct 2024

https://github.com/rafaelgoulartb/web-scrapping

🕷 Web scrap made with Node.js and Puppeteer

nodejs puppeteer web-scraping

Last synced: 09 Nov 2024

https://github.com/hunj/tsa-passenger-throughput

A mini web scraping + data visualization project for number of travelers per day, from TSA's website. Uses GitHub Actions Workflow to schedule grabbing data.

github-actions python-script scraping visualization web-scraping

Last synced: 05 Nov 2024

https://github.com/msarmadqadeer/debank-and-nanoly-scrapping

A web platform that tells you about your token's best yield based on your public address.

beautifulsoup4 flask helium python3 web-scraping

Last synced: 18 Nov 2024

https://github.com/oxylabs/web-scraping-r

A tutorial for web scraping with R

proxy-scraper r-language web-scraping wikipedia-scraper

Last synced: 17 Nov 2024

https://github.com/darsan-in/job-crawler

The Job Crawler is an integral component of the Job RAID project, designed to automatically scrape and collect data from various job listing websites. This crawler enables Job RAID to aggregate comprehensive job listings, ensuring that users have access to up-to-date and relevant job opportunities.

automated-job-listings crawler-integration data-extraction data-gathering job-aggregator job-crawler job-data job-data-collection job-data-miner job-listing-crawler job-portal-scraping job-scraping job-scraping-tool job-search-automation job-search-engine multi-site-job-scraping real-time-job-data scraping-jobs web-crawler web-scraping

Last synced: 12 Dec 2024

https://github.com/nvzard/spoileralert

Stay up to date with your favorite TV Show's upcoming latest releases.

beautifulsoup4 python tv-shows web-scraping

Last synced: 10 Nov 2024

https://github.com/oxylabs/web-crawler

Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.

api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping

Last synced: 17 Nov 2024

https://github.com/ahmedosamamath/issuu-downloader

A Python tool for downloading Issuu documents as PDFs and a JavaScript snippet for extracting publication URLs from Issuu user pages. For educational use only.

educational issuu issuu-downloader web-scraping

Last synced: 19 Nov 2024

https://github.com/tirtharajsinha/bostas

Python package : bostas is a cool social media tool for automated surfing .

automation bostas bot python python-package selenium self-built-python-package social-media-automation web-scraping webdriver

Last synced: 11 Oct 2024

https://github.com/hjsblogger/web-scraping-with-python

Demonstration of Web Scraping using Selenium Python (Pytest & Pyunit) and Beautiful Soup

beautiful-soup beautifulsoup beautifulsoup4 lambdatest selenium-python selenium-webdriver web-scraping youtube-scrapping

Last synced: 11 Oct 2024

https://github.com/pushpendra-1697/web-scraper-puppeteer

Web-Scraper-Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium. It can be used to automate various tasks, including web scraping. Using Puppeteer.

nodejs puppeteer web-scraping

Last synced: 06 Jan 2025

https://github.com/serhatci/web-scraping-from-cryptoexchanges

A script that collects real-time cryptocurrency price, bid-ask spread, and trade history data from online trade platforms of several different crypto-exchanges and saves it to the MySQL database.

bitcoin crypto-exchange real-time selenium web-scraping

Last synced: 28 Nov 2024

https://github.com/dchan3/thoughtfulsoup

dchan3's thoughtful extension of BS4

beautifulsoup beautifulsoup4 bs4 extension python web-scraping

Last synced: 11 Nov 2024

https://github.com/jakeoeding/lexicon

Quick script to grab the definition or synonyms of a given word

beautifulsoup4 easygui keyboard pyperclip python requests synonyms web-scraping

Last synced: 26 Dec 2024

https://github.com/renanstn/olx-scrap-de-produtos

[DEPRECADO] Bot Telegram feito para facilitar a vida de quem busca diariamente itens específicos no OLX. Não funciona mais, devido a alterações na estrutura do html do OLX.

python telegram-bot web-scraping

Last synced: 30 Nov 2024

https://github.com/qharny/dart_web_scraper

A versatile command-line web scraper built with Dart. This tool allows you to scrape web pages and save the extracted data in various formats.

dart scraper url web-scraping

Last synced: 07 Nov 2024

https://github.com/akshatvg/lastminutepptfrontend

Project which automatically generates crucial business and leisure presentation slides based on what you say in real time.

business last-minute-ppt ml nlp presentation real-time voice web-scraping website

Last synced: 08 Dec 2024

https://github.com/cleversamer/imdb-scraping

A python scraping bot for extracting data from IMDB website and write it to an Excel file

beautifulsoup excel python script web-scraping

Last synced: 14 Nov 2024

https://github.com/bhimrazy/web-scraping-using-python

Scrape List of American films of 2022 from wikipedia

beautifulsoup4 python web-scraping wikipedia-scraper

Last synced: 16 Nov 2024

https://github.com/0xnu/amazon_scraper

Scrape Amazon product data such as Product Name, Product Images, Number of Reviews, Price, Product URL, and ASIN.

amazon amazon-scraper amazon-scraping asin ecommerce product-data products web-scraper web-scraping

Last synced: 15 Dec 2024

https://github.com/saketkothari/web-scraping-with-puppeteer

Simple script to scrape data from a web page using Puppeteer

puppeteer scraping web-scraping

Last synced: 26 Nov 2024

https://github.com/gipsh/miradetodo-scrapper

extract links from miradetodo.co

deobfuscator scraper web-scraping

Last synced: 02 Jan 2025

https://github.com/andronovo-bit/nationso-trend-integration

Nationso Trend Integration is a Python project that fetches and saves the latest and most relevant content from dailydev, github, and medium as nation.so pages.

api content-curation dailydev github medium notion-api python web-scraping

Last synced: 14 Nov 2024

https://github.com/fern-aerell/web-crawling-to-txt

Aplikasi web crawling sederhana yang dapat menelusuri URL, mengekstrak konten teks, dan menyimpan hasilnya dalam format TXT.

beautifulsoup4 crawling python requests scraping txt web-crawling web-scraping

Last synced: 12 Nov 2024

https://github.com/firefly55lm/footballers_performances

Sentiment analysis and topic modelling about Premier League footballers (2023)

football fotmob nitter-scraper premier-league web-scraping

Last synced: 25 Nov 2024

https://github.com/mzubairtahir/latest-twitter-scraper

This python scraper is for latest twitter website structure , that scrapes tweets of an twitter account

data-scraping latest-twitter playwright-python python-automation python3 twitter-bot twitter-scraper web-crawler web-scraping

Last synced: 22 Dec 2024

https://github.com/bijoy-sust/simple-web-scraping-in-python

A list of resources and introductory notebooks for Web Scraping in Python using BeautifulSoup.

beautifulsoup machine-learning notebook python-3 web-scraping

Last synced: 25 Nov 2024

https://github.com/slyautomation/astar_pathfinding_node_networks

This project extracts the canvas data on https://www.osrsmap.net/ and converts each canvas display as an png file. This is done by exploiting html elements on the website and adding javascript parameter functions such as .toDataURL. .toDataURL() method returns a data URI containing a representation of the image in the format specified by the type parameter (defaults to PNG). WebDriver is an open source tool for automated testing of webapps across many browsers. It provides capabilities for navigating to web pages, user input, JavaScript execution, and more. Download the chrome webdriver here: https://chromedriver.chromium.org/downloads Base64 module allows for the script to This module provides functions for encoding binary data to printable ASCII characters and decoding such encodings back to binary data. This is useful converting the canvas data to a png file. The next function merges those images by looping the interactions of the canvas images, that results in the final product a full osrs map with icons and detailed.

astar-algorithm chrome-webdriver map-generator maps node-networks osrs python web-scraper web-scraping webdriver

Last synced: 22 Nov 2024

https://github.com/mousazourob/essayresearcher

A website that streamlines the research process by showing articles with excerpts based on a particular topic and group of keywords

beautifulsoup bootstrap css flask html javascript jquery python web-scraping

Last synced: 22 Dec 2024

https://github.com/malkiii/youtube-summarizer

An AI-powered tool that summarizes any YouTube video that enables captions, built with React and Express SSR.

express gemini-api i18next ssr summarization tailwindcss typescript vite web-scraping youtube

Last synced: 22 Nov 2024

https://github.com/kennethleungty/web-scraping-walkthrough-hcp-info

Web scraping script (with Python and Selenium) to automatically compile list of licensed healthcare professionals along with their respective public details

python selenium web-scraping

Last synced: 22 Nov 2024

https://github.com/fethullahceviz/team3-weather-application

Weather aplication with API and Scrapy

api python weather-app web-scraping

Last synced: 07 Jan 2025

https://github.com/pradipchaudhary/javascript-web-scraping

A repository dedicated to exploring and implementing web scraping techniques using JavaScript. Learn how to extract data from websites efficiently and effectively, leveraging the power of JavaScript for scraping tasks.

automation scraping testing-library web-scraping

Last synced: 12 Nov 2024

https://github.com/spekulatius/link-scraping-test-beautifulsoup-vs-phpscraper

Tasking both BeautifulSoup and PHPScraper to extract links - a comparison of code and performance.

beautifulsoup4 link-extractor phpscraper phpscraper-example web-scraper web-scraping

Last synced: 12 Nov 2024

https://github.com/k9mil/eagle

🦅 A simple, fast, and fun CLI-based application which functions as a helper to find answers to your programming questions! Written in Golang + Cobra.

api-client cli cli-app cobra eagle fmt go golang http json json-api regex scraper scraping-websites stackoverflow stackoverflow-answer stackoverflow-api stackoverflow-questions web-scraper web-scraping

Last synced: 28 Dec 2024

https://github.com/radom12/ai_resume_analyzer

The AI Resume Analyzer is a Streamlit-based application that provides detailed resume analysis, skill recommendations, job search tools, and career insights. Utilizing NLP and machine learning, it helps users identify strengths and improvement areas, suggest relevant courses, and find job opportunities tailored to their profiles.

ai aiml data-science final-year-project machine-learning nlp python resume-analysis selenium streamlit ui web-scraping

Last synced: 21 Dec 2024