Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with web-scraping

A curated list of projects in awesome lists tagged with web-scraping .

https://github.com/3choff/docs-miner

A VSCode extension that generates markdown documentation from web pages and GitHub repositories.

developer-tools documentation-generator documentation-tool github-to-markdown markdown-generator vscode-extension web-scraping website-to-markdown

Last synced: 03 Dec 2024

https://github.com/bipinoli/online-price-tracker-with-chrome-extension

Go to e-commerce site, select the price, hit the extension button, that's it. Now that price will be tracked. The system will know where to look for the price in which site and once the price drops to your desired threshold it will should notify you.

dom-manipulation javascript php price-tracker server-sent-events web-scraping

Last synced: 22 Nov 2024

https://github.com/deadsec-security/easy-scraper

Create easy workflows for web scraping using the web and drag and drop features. Making scraping easy and fast!

docker easy-to-use selfhostable selfhosted web-scraper web-scraping web-scraping-software web-scrapper-python

Last synced: 22 Oct 2024

https://github.com/gdsoumya/content_master

Content Master is a content aggregator that collects content from different sources, organizes them and puts them in one place for consumption.

content-aggregation content-aggregator flask python3 web-scraping

Last synced: 14 Oct 2024

https://github.com/breadrock1/socialnetworkscraper

Web scraping is simply the process of using a social media web scraper to gather data automatically. It saves users time, effort and sometimes money since it’s an automatic process performed by bots. You could take the time to search the web for all mentions of a certain word or find all prices for a certain product, but that would take a lot of time.

facebook facebook-scraping flake8 mailru osint osint-python python python3 scraper scraping site-scraper social-network social-network-analysis twitter vk-api vkontakte web-scraper web-scraping

Last synced: 11 Nov 2024

https://github.com/shafiqsadat/kankorrobot

Afghanistan Kankor Results Robot

afghanistan java kankor sqlite3 telegram web-scraping

Last synced: 10 Nov 2024

https://github.com/Ghurtchu/github-topics-web-scraper

:page_facing_up::arrow_right::open_file_folder: Web Scraper for GitHub topics.

csv functional-programming github scala web-scraping zio

Last synced: 25 Nov 2024

https://github.com/Bipinoli/Online-Price-Tracker-with-Chrome-Extension

Go to e-commerce site, select the price, hit the extension button, that's it. Now that price will be tracked. The system will know where to look for the price in which site and once the price drops to your desired threshold it will should notify you.

dom-manipulation javascript php price-tracker server-sent-events web-scraping

Last synced: 07 Nov 2024

https://github.com/shubhamdutta2000/github-user-bot

With this Bot user can login to github, create a new repository and clone other's repository. Feel free to add your own script which can be used to do github automation

create-repository github github-bot github-login hacktoberfest hacktoberfest2022 puppeteer repository-clone web-scraping

Last synced: 07 Nov 2024

https://github.com/serpapi/google-maps-pb-decoder

Google Maps pb (i.e., protobuf) parameter decoder.

google-maps google-maps-scraping ruby web-scraping webscraping

Last synced: 20 Nov 2024

https://github.com/ghurtchu/github-topics-web-scraper

:page_facing_up::arrow_right::open_file_folder: Web Scraper for GitHub topics.

csv functional-programming github scala web-scraping zio

Last synced: 11 Nov 2024

https://github.com/rinminase/anidb-be

💬🐳 Rin Minase's AniDB API Service utilizing the latest version of Laravel and deployed to Heroku

anilist-api cloudinary docker functional-testing graphql heroku laravel open-api open-api-v3 php phpunit postgresql restful-api swagger web-scraping

Last synced: 07 Nov 2024

https://github.com/apify/actor-legacy-phantomjs-crawler

The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.

apify headless-browsers phantomjs web-crawler web-scraping

Last synced: 07 Nov 2024

https://github.com/sachs7/flight-finder

Given a round-trip dates, find flight availability by scrapping Google Flights site and send a screenshot and a text file via Slack!

bot flight-finder python selenium-webdriver slack-bot splinter travel-helper web-scraping

Last synced: 10 Dec 2024

https://github.com/rafabelokurows/sports-odds

Obtaining odds for MLB and NFL games through an API

api mlb nfl sports-betting web-scraping

Last synced: 02 Nov 2024

https://github.com/duyndh98/mangaproject

A Manga Crawler and Viewer Project (truyentranh.net)

html-css python web-crawler web-scraping web-view

Last synced: 15 Nov 2024

https://github.com/umihico/minigun-requests

Web scraping API to outsource tons of GET & xpath to cloud computing

crawler crawling scraping scraping-api scraping-framework scraping-python web-scraping

Last synced: 15 Nov 2024

https://github.com/lexiestleszek/sova_ollama

Open source implementation of Sova - RAG-based Web search engine using power of LLMs. Using Langchain, Ollama, HuggingFace Embeddings and scraping google search results.

large-language-models llm rag-implementation retrieval-augmented-generation web-scraping

Last synced: 14 Nov 2024

https://github.com/oxylabs/webscraping-with-ruby

A tutorial for web scraping with Ruby

ruby web-scraping

Last synced: 17 Nov 2024

https://github.com/rtlee9/sic-list

List of SIC codes and descriptions from authoritative sources

beautifulsoup industry-classification web-scraping

Last synced: 09 Nov 2024

https://github.com/renanstn/buscador-de-boardgames

Bot que compara preços de boardgames anunciados no Ludopedia e notifica caso o valor esteja abaixo da média.

boardgames heroku python telegram telegram-bot web-scraping

Last synced: 30 Nov 2024

https://github.com/nishanthmuruganantham/ndtv-api

This API will scrap the news content present in the NDTV website will provide the data in a JSON format.

api api-documentation flask flaskapi herokuhosting lxml ndtv ndtv-api ndtvapi news news-api news-apis news-data newsapi newsapi-python python-3 python-flask rest web-scraping webscraping

Last synced: 25 Nov 2024

https://github.com/garyhtou/railsconf-2022-schedule

Web scrapes RailsConf's 2022 schedule to create an ICS link

calendar ics railsconf railsconf2022 web-scraping

Last synced: 08 Nov 2024

https://github.com/od-c0d3r/ibnhayyan-dataminer

A web data mining tool for pdf files extension crafted with Nokogiri (鋸) RubyGem and Ruby.

data-mining nokogiri ruby web-scraping

Last synced: 21 Oct 2024

https://github.com/toannd96/devread

DevRead - ứng dụng tổng hợp kiến thức cho developer

colly concurrency echo-framework golang web-scraping

Last synced: 18 Nov 2024

https://github.com/cedoor/scraper

:tractor: Simple desktop scraper app.

scraper web-scraping

Last synced: 04 Nov 2024

https://github.com/pogzyb/tourist

Open-source, LLM-ready SERP and web scraping service

langchain llama llamaindex llm-tool-call llm-tools llmops search-engine selenium-python serpapi web-scraping

Last synced: 10 Dec 2024

https://github.com/darideveloper/phone-emails-scraper-multithreading

Project for extract emails and phones from a list of web pages, with multithreading, using requests, bs4, regex and selenium for get more data.

python script web-automation web-scraping

Last synced: 20 Nov 2024

https://github.com/aaryanrr/DownDetector-CLI

CLI Client for DownDetector.com

cli downdetector python3 web-scraper web-scraping

Last synced: 06 Nov 2024

https://github.com/ashwinpn/advanced-python

Python for Machine Learning/AI/DS, Game Theory and Convex Optimization using Python, Managing Docker in Python, Web Scraping / Development in Python using Django and Flask, Functional Programming in Python.

convex-optimization data-science docker flask functional-programming game-theory machine-learning machine-learning-algorithms python web-development web-scraping

Last synced: 15 Nov 2024

https://github.com/volfpeter/graphscraper

Python 3 graph implementation designed to be turned into a web scraper for graph data.

cache-storage graph graph-algorithms graph-database python python3 social-network-analysis web web-scraping

Last synced: 23 Oct 2024

https://github.com/hrbrmstr/drill-html-tools

Apache Drill UDFs for retrieving and working with HTML text

apache-drill css-selectors dom html-parsing jsoup web-scraping

Last synced: 11 Oct 2024

https://github.com/vivint/selenium-docker

Extending Selenium drivers with extra runtime goodies!

docker gevent selenium selenium-python web-scraping

Last synced: 31 Oct 2024

https://github.com/jarvisprestidge/hacker-news-scraper

Simple command line application to scrape a user specified number of Hacker News articles and output as valid JSON

hacker-news python web-scraping

Last synced: 26 Nov 2024

https://github.com/sukanyabag/statistical-analysis-of-my-medium-articles

This repository contains an exploratory data analysis of my writer data at Medium. I use it to carry out data analysis once in every 4 months to see audience and fan growth, and topics they love! You can check out the articles here👇

data-analysis data-storytelling matplotlib pandas seaborn statistical-analysis sweetviz web-scraping

Last synced: 10 Nov 2024

https://github.com/dori-dev/news-reader

Advanced news reader website using django and celery.

celery celerybeat django news news-reader news-scraper pydantic redis web-scraping website

Last synced: 09 Nov 2024

https://github.com/chrismuir/mma-data-scrape

Scrape and clean MMA/UFC data using R and rvest

mma r rstats ufc web-scraping

Last synced: 26 Nov 2024

https://github.com/ahmedshahriar/burnout-tweets-scraper

A Scraper that scrapes '#burnout' tweets daily powered by GitHub action and snscrape (stopped at June 30,2023)

automation burnout dataset git-automation git-scraper git-scraping github-action snscrape social-media twitter twitter-scraper web-scraping

Last synced: 16 Nov 2024

https://github.com/shockz-offsec/Scraping-Notion-Backup

This script automates the backup process of Notion data into Markdown and CSV formats, removing the need for tokens and private Notion APIs. It also removes AWS identifiers in the markdown files, folders, and internal references in the backup.

automation background backup export linux local markdown notion python remover selenium web-scraping windows

Last synced: 25 Nov 2024

https://github.com/davidumoru/scryer

Transform web data into actionable knowledge

content-parsing data-extraction gemini-api google-gemini web-scraping

Last synced: 15 Dec 2024

https://github.com/ahmedshahriar/bd-ponno

Scrapy Mongodb Djongo integrated API that scrapes popular e-commerce sites (10+) from Bangladesh

django django-rest-framework djongo mongodb mongodb-atlas python python37 restful-api scrapy web-scraping

Last synced: 16 Nov 2024

https://github.com/belsman/ruby-capstone

This project entails building a program that crawls daily data about COVID-19 from a website and displays it in the terminal. Built with Ruby.

microverse-projects ruby web-scraping

Last synced: 18 Oct 2024

https://github.com/kishlayjeet/github-topics-data-scraping

This code is a web scraping script that extracts data from GitHub. It creates a CSV file with the top 100 topics from GitHub and the top 20 repositories for each topic.

beautifulsoup data-scraping featured-repo github pandas pandas-dataframe python python-script requests scraping-websites web-scraping

Last synced: 24 Dec 2024

https://github.com/onlyphantom/pricemate

A simple scraper for departure time and prices from Jakarta to Bandung from Tiket.com

beautifulsou beautifulsoup tiket-kereta-api web-scraper web-scraping

Last synced: 13 Dec 2024

https://github.com/yash22222/ibm-csrbox-internship-project

The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects

data-mining data-preprocessing data-science exploratory-data-analysis feature-engineering lemmatization machine-learning pandas pos-tagging random-forest random-forest-classifier scikit-learn sentiment-analysis web-scraping wordcloud

Last synced: 09 Nov 2024

https://github.com/jmoseka/astro-tabiri

Web application that provides users with daily horoscope readings through data scraping from horoscope.com and gathers comprehensive zodiac information via an API.

api-rest reactjs redux tailwind-css web-scraping

Last synced: 07 Dec 2024

https://github.com/rexsimiloluwah/mediumreader

LISTEN to your favourite medium blog posts.

nodejs react speech-synthesis web-scraping

Last synced: 11 Dec 2024

https://github.com/ahmed-alnassif/net-spider

Net-Spider is a web scraping tool designed to retrieve the source code for a web page, including front-end elements such as JavaScript, CSS, images, and fonts. It allows you to crawl and download the source code from a target website.

beautifulsoup4 command-line-interface front-end-web-development python3 source-code-extraction web-automation web-crawling web-development-tool web-optimization web-scraping

Last synced: 16 Nov 2024

https://github.com/foxt451/actor-quora-scraper

This actor scrapes Quora's API and lets collect questions, answers, upvotes etc.

quora quora-scraper web-scraping

Last synced: 11 Oct 2024

https://github.com/raahulrathore/Python-YouTube-Scraper

A simple YouTube Scraper to scrape data from YouTube using URL and download audio and video files.

python pytube web-scraping youtube-downloader youtube-scraper

Last synced: 29 Nov 2024

https://github.com/bertrandmartel/covid19-nyc-vaccine-tracker

Covid19 NYC Vaccine Tracker data extracted from Tableau

covid19 python scraper web-scraping

Last synced: 03 Jan 2025

https://github.com/worldbank/firms-web-scraping

The aim of this project is to scrape metadata of business firms given only their name and country where they are operating.

business-firms-data machine-learning nlp smart-web-scraping web-scraping

Last synced: 10 Nov 2024

https://github.com/joshpetit/collegiatecovid

Tracks covid responses and statistics at universities. Done during the HackDuke hackathon. Created with React, Python, and Firebase.

beautifulsoup-python-library college colleges coronavirus covid firebase hackathon health puppeteer pyppeteer python react statistics universities web-scraping

Last synced: 30 Nov 2024

https://github.com/yusufcinarci/web-scraping-projects

In these project files, I will host the web scraping examples that I will make day by day.

data-analysis data-science jupyter-notebook python web-scraping

Last synced: 26 Dec 2024

https://github.com/sycanz04/schedulr

A chrome extension that transfers timetable from CLiC to Google Calendar.

google-api google-calendar javascript web-scraping

Last synced: 06 Nov 2024

https://github.com/matheusfillipe/wcofun.cli

Stream and download animes directly from your terminal

anime anime-downloader bash shell-script video web-scraping

Last synced: 19 Nov 2024

https://github.com/vmanot/browserkit

Web-scraping made easy with Swift.

async-await framework swift web-scraping

Last synced: 15 Oct 2024

https://github.com/jakewarren/scrape

A command line scraping utility supporting CSS selectors or XPath

css-selector css-selectors scraping-utility web-scraping xpath

Last synced: 16 Nov 2024

https://github.com/kuhumcst/cuphic

Transform or scrape Hiccup with a declarative DSL.

data-mining data-transformation declarative dsl hiccup html scraping sgml web-scraping xml

Last synced: 16 Nov 2024

https://github.com/ndleah/currency-converter

💱 A program that will perform currency conversion using data fetched from an open-source API

api-rest currency-converter python unit-testing web-scraping

Last synced: 13 Nov 2024

https://github.com/justquick/pdf12step

Generates PDFs meeting guides from sites using the 12 Step Meeting List WordPress plugin

flask html pdf pdf-generation web-scraping

Last synced: 14 Oct 2024

https://github.com/deep5050/amori-banglabhasha

Collection of Bangladeshi Stories

bangladesh bengali scraper story storybook web-scraping

Last synced: 02 Jan 2025

https://github.com/thelastgimbus/apis-scraper

Python web scraper for getting Polish political parties support percentage!

beautifulsoup beautifulsoup4 vote voter-engagement voting web-scraping

Last synced: 29 Nov 2024

https://github.com/johnwmillr/sharktank

Analysis of Shark Tank deals 🦈💰

analysis dataisbeautiful shark-tank web-scraping

Last synced: 10 Dec 2024

https://github.com/engageintellect/beauty-by-jitka

A rich, modern, and elegant landing page for a medical injector. Using Sveltekit, svelte-superforms, google-forms payload, fastapi, tiktok, shadcn-svelte, gsap, svelte-maplibre, and zod.

beautifulsoup4 fastapi google-forms gsap iconify puppeteer python shadcn-svelte shadcn-ui svelte svelte-maplibre svelte-superforms sveltekit tailwind tailwindcss tiktok-api typescript web-scraping zod

Last synced: 12 Oct 2024

https://github.com/ariear/tikchan

the minimal tik-tok downloader

cheerio tiktok tiktok-api tiktok-downloader web-scraping

Last synced: 17 Nov 2024

https://github.com/rivaquiroga/datapalooza-2024-webscraping

Materiales del taller sobre web scraping con Python para Datapalooza 2024

python web-scraping

Last synced: 02 Jan 2025

https://github.com/jfilter/get-wayback-machine

Fetch a URL via the latest Wayback Machine snapshot

wayback-machine web-scraping

Last synced: 11 Nov 2024

https://github.com/hrbrmstr/crux

Identify the Crux of an Article

crux r rjava rstats web-scraping

Last synced: 15 Nov 2024

https://github.com/someshsingh22/flaireddit-midas

A Webapp deployed on Heroku which detects the 'flair' tags of a Reddit Post from the subreddit r/india

flask heroku reddit text-classification web-application web-scraping

Last synced: 17 Nov 2024

https://github.com/hrbrmstr/splashttpd

Slight modifications to the scrapinghub/splash Docker environment to enable an internal web server to render "files" from a mounted filesystem

docker splash web-scraping

Last synced: 15 Nov 2024

https://github.com/hrbrmstr/scala-splash

Scala interface to the ScrapingHub Splash API

scala scrapinghub scrapinghub-api splash web-scraping

Last synced: 15 Nov 2024

https://github.com/mohammadkarbalaee/json-html-parser

This project aimed to mimic what Gson and Jackson packages do in Java world

java json parsing shahid-beheshti-university web-scraping

Last synced: 06 Dec 2024

https://github.com/tnytcoder/url_checker

Python Script To Verify Url Existence And Provide Basic Information

hacking-tool python requests termux termux-hacking termux-tools web-scraping website-scraper

Last synced: 23 Dec 2024