Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with web-scraping

A curated list of projects in awesome lists tagged with web-scraping .

https://github.com/wiringbits/simple-http-proxy

A very simple http proxy that runs on a Raspberry Pi

http-proxy playframework raspberry-pi scala web-scraping

Last synced: 21 Nov 2024

https://github.com/tokahuke/lopez

Crawling and scraping the Web for fun and profit

crawler rust scraper seo web-scraping

Last synced: 14 Nov 2024

https://github.com/capturr/scraper

All In One API to easily scrape data from any website, without worrying about captchas and bot detection mecanisms.

captcha cheerio crawler crawling data declarative extract growth-hacking html javascript json jsonld nodejs recaptcha scraper scraping spider typescript web web-scraping

Last synced: 06 Dec 2024

https://github.com/pirtleshell/scrape-a-grave

Scrape and Retrieve FindAGrave memorial page data and save them to an SQL database

citation genealogy scraper web-scraping

Last synced: 09 Nov 2024

https://github.com/moonlitgrace/mangareader-api

A Python based web scraping api built with fastapi that provides easy access to manga contents

anime fastapi manga manga-api mangareader python python-web-scraper scraping web-scraping

Last synced: 28 Oct 2024

https://github.com/prakharchoudhary/twitteradvsearch

A scraping tool to scrape tweets with user provided keywords and hashtags, in a given data range.

csv python-3-5 selenium twitter twitter-scraper web-scraping

Last synced: 22 Dec 2024

https://github.com/ahmedshahriar/depression-tweets-scraper

A Scraper that scrapes '#depression' tweets daily powered by GitHub action and snscrape (stopped at June 30,2023)

automation dataset depression git-automation git-scraper git-scraping github-action snscrape social-media twitter twitter-scraper web-scraping

Last synced: 16 Nov 2024

https://github.com/oxylabs/backlink-monitoring

Backlink checker is a simple tool, which checks backlink quality, identifies problematic backlinks, and outputs them to a specific Slack channel

backlinks backlinks-checker python screen-scraping web-scraping

Last synced: 17 Nov 2024

https://github.com/pgomba/mdpi_explorer

A simple package to explore MDPI´s articles by journal. A series of functions help to obtain lists of papers, obtain data from them (turnaround times, special issues and articles types) and create summary graphs.

analysis data-analysis data-visualization mdpi metrics scientific-journals visualization web-scraping

Last synced: 18 Nov 2024

https://github.com/arnie97/chrome-cookiejar

Import your Chrome cookies to a Python CookieJar

chrome chromium cookie web-scraping

Last synced: 28 Nov 2024

https://github.com/apify/actor-content-checker

You can use this act to monitor any page's content and get a notification when content changes.

apify content-selector web-scraping

Last synced: 07 Nov 2024

https://github.com/ianramzy/article-summary-deep-learning

📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!

fact-extractor flask named-entity-recognition nlp summarization web-scraping

Last synced: 19 Nov 2024

https://github.com/duskvirkus/dafonts-free

Dafonts Free Dataset and python scripts used to make it

dafont dataset fonts ml-datasets otf-fonts ttf-fonts web-scraping

Last synced: 25 Nov 2024

https://github.com/pb2204/covid-19

This Is A Web Scraping Projects With Covid-19 Data From 2 Very Popular & Authentic Websites

covid-19 covid19-data web-scraping web-scraping-python web-scrapping

Last synced: 16 Nov 2024

https://github.com/w-henderson/Unlimited-YouTube-Search

🔍 Search YouTube without the YouTube Data API.

python-library search web-scraping youtube youtube-search

Last synced: 06 Nov 2024

https://github.com/w-henderson/unlimited-youtube-search

🔍 Search YouTube without the YouTube Data API.

python-library search web-scraping youtube youtube-search

Last synced: 28 Sep 2024

https://github.com/openanime/aniscrape

Web scraping tool to download videos from TürkAnimeTV and AnimeciX

anime puppeteer web-scraping

Last synced: 28 Nov 2024

https://github.com/inputsh/automation-scripts

Simple scripts that I'm using to automate the boring things.

automation home-automation web-scraping

Last synced: 05 Nov 2024

https://github.com/Etwas-Builders/Twitter-Source-Bot

Ever wanted to know the source of a tweet? Just @whosaidthis_bot and I'll tell you where it came from

bot mozilla-builders nlp source-verify twitter-bot twitter-source-bot web-scraping

Last synced: 06 Nov 2024

https://github.com/gabryxx7/ai_dating

What to do when you end up single during a pandemic in 2020? Create an AI to deal with dating apps for you! Analyse bios, messages, pictures and more! Or just use this as a desktop client for Tinder (and Bumble) or to scrape some data for research purposes!

nlp nltk pyqt5 pyside2 python qt requests scraper tinder web-scraping

Last synced: 24 Nov 2024

https://github.com/till-tietz/parsel

parallel execution of RSelenium

cran parallel r rselenium web-scraping

Last synced: 26 Nov 2024

https://github.com/maxhumber/scrape.world

The Web Scraping Sandbox

gazpacho selenium web-scraping

Last synced: 19 Dec 2024

https://github.com/dbeley/senscritiquescraper

Python library to extract data from senscritique.com.

python scraper senscritique web-scraping

Last synced: 10 Nov 2024

https://github.com/mohammad-mghn/dev-tab

WEB TAB makes it easy for you to stay up-to-date with the latest developer news, tools, jobs and events.

axios cheerio fs nextjs nextjs-13 react tailwindcss typescript web-scraping

Last synced: 23 Oct 2024

https://github.com/webmiddle/webmiddle

Node.js framework for modular web scraping and data extraction

data-extraction framework jsx jsx-components modular nodejs web-scraping

Last synced: 11 Oct 2024

https://github.com/vito-mohagheghian/dev-tab

WEB TAB makes it easy for you to stay up-to-date with the latest developer news, tools, jobs and events.

axios cheerio fs nextjs nextjs-13 react tailwindcss typescript web-scraping

Last synced: 03 Oct 2024

https://github.com/hrbrmstr/reapr

🕸→ℹ️ Reap Information from Websites

html r r-cyber rstats rvest web-scraping xpath

Last synced: 11 Oct 2024

https://github.com/prankshaw/beware-web-scraper

Web Scraping project including; C projects scraper from GitHub , ICC rankings scraper, YouTube Trending Scrapper, LinkedIn Profile Scraper, Wikipedia Image Scraper

batting c chrome-webdriver chromedriver cricket github icc icc-rankings-scraper pandas python python-3 rankings scraper selenium selenium-webdriver web-scraping wikipedia-image-scraper

Last synced: 03 Dec 2024

https://github.com/palewire/fed-dot-plot-scraper

Extracting the "dot plot" economic projections posted online by the Federal Open Market Committee

data-journalism economic-data federal-reserve fomc journalism macroeconomics monetary-policy news python scraper web-scraping

Last synced: 18 Oct 2024

https://github.com/jonathanhefner/grubby

Fail-fast web scraping

mechanize ruby web-scraping

Last synced: 22 Dec 2024

https://github.com/deep5050/abosar

অবসর 📚 A collection of short Bengali stories web scraped from various Bengali eMagazines and eNewspapers.

bengali cron-jobs stories web-scraper web-scraping webcrawler

Last synced: 09 Nov 2024

https://github.com/yashkathe/f1-api-json

F1-API is a TypeScript-based web scraping API designed to extract information about Formula 1 races, drivers, cars, standings, and race schedules. This powerful web scraper automates the process of gathering data and aggregating it into a structured format for easy analysis and consumption.

axios cheerio f1 f1-api formula1 formula1-analysis formula1-api nodejs npm typescript web-scraping

Last synced: 06 Nov 2024

https://github.com/apify/super-scraper

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

api apify cheerio javascript nodejs playwright scraping typescript web-scraping

Last synced: 07 Nov 2024

https://github.com/alexandrevl/supersummarizeai

Unleash the power of AI with SuperSummarizeAI! Effortlessly extract, condense, and clip content from webpages and YouTube videos using ChatGPT. Turning endless streams of content into digestible summaries.

beautifulsoup chatgpt content-analysis multilingual nlp openai papperclip text text-processing text-summarization web-scraping youtube

Last synced: 09 Nov 2024

https://github.com/beatrizmilz/noticiasgov

Raspagem de dados de portais de noticias governamentais

git-scraping r rstats web-scraping

Last synced: 04 Dec 2024

https://github.com/cimentadaj/dataharvesting

Material for the course 'Data Harvesting' for the masters in computational social science - UC3M

api data-science r web-scraping

Last synced: 09 Nov 2024

https://github.com/zzzul/diwa

an unofficial simple API from https://distrowatch.com/

api distrowatch hacktoberfest laravel php swagger-ui web-scraping

Last synced: 12 Oct 2024

https://github.com/oxylabs/news-scraping

A tutorial for scraping news

news news-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/amrrs/hn_scraper_in_r

Hacker News Front Page Scrapper in R

hacker-news r scraper top-stories web-scraping

Last synced: 15 Nov 2024

https://github.com/shockz-offsec/automatic-notion-backup

This script automates the backup process of Notion data into Markdown and CSV formats. Additionally, the script processes the data to remove any AWS identifiers that may be present in the Markdown files, folders, and internal references to other files in the backup

api automatic automation backup export linux markdown notion remove selenium web-scraping windows

Last synced: 09 Nov 2024

https://github.com/oxylabs/news-article-scraper

Learn about web scraping news articles using Python and JavaScript.

github-python javascript python scraping url-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/shockz-offsec/Automatic-Notion-Backup

This script automates the backup process of Notion data into Markdown and CSV formats. Additionally, the script processes the data to remove any AWS identifiers that may be present in the Markdown files, folders, and internal references to other files in the backup

api automatic automation backup export linux markdown notion remove selenium web-scraping windows

Last synced: 07 Nov 2024

https://github.com/geminidsystems/googlenewsscraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

crawler googleautomator googlenews googlenewsscraper googlescraper python scraper scraping selenium web-scraping webcrawler webdriver webscraper

Last synced: 19 Nov 2024

https://github.com/discovai/discovai-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

ai api crawler embedding vector-database web-scraping

Last synced: 12 Nov 2024

https://github.com/r2dev2/onefactorauth

A tool to bypass 2 factor authentication.

one-factor-auth python two-factor-authentication web-scraping wtfpl

Last synced: 28 Oct 2024

https://github.com/iiitv/lyrics-crawler

Simple crawler to collect lyrics, written in Python

azlyrics custom-crawler hindilyrics lyrics lyricsmasti metrolyrics python web-scraping

Last synced: 08 Nov 2024

https://github.com/oxylabs/golang-web-scraper

A tutorial for building a web scraper in Golang

go golang url-scraper web-scraper web-scraping

Last synced: 17 Nov 2024

https://github.com/pratapvardhan/elections-india-2014

Results related to General Assembly (Lok Sabha) elections 2014 in India.

data elections india python web-scraping

Last synced: 07 Nov 2024

https://github.com/ahmedshahriar/twittercelebritymatcher

Match celebrity users with their respective tweets by making use of Semantic Textual Similarity on over 900+ celebrity users' 2.5 million+ scraped tweets utilizing SBERT, streamlit, tweepy and FastAPI

fastapi multilingual-bert mypy pydantic python310 python39 pytorch pytorch-gpu rest-api sbert sentence-transformers streamlit streamlit-webapp tweepy twitter-scraping web-scraping

Last synced: 16 Nov 2024

https://github.com/rly0nheart/tarantula

Python web crawler tool

crawling scraping web-crawler web-scraping

Last synced: 13 Oct 2024

https://github.com/BufferingIO/film2subtitle

A REST API for film2subtitle.com website written in Python and FastAPI ⚡

beautifulsoup4 fastapi film2subtitle python rest-api web-scraping

Last synced: 20 Nov 2024

https://github.com/firaskahlaoui/web-scraping-lvl2

The "web-scraping-lvl2" project is a learning exercise to explore advanced web scraping techniques using Scrapy.

framework learning python scrapy web-scraping

Last synced: 15 Nov 2024

https://github.com/chen0040/scrapy-projects

Projects using selenium, requests, bs4, and scrapy for web scraping on google images, google trends and others

google-images google-trends pokemon requests scrapy-crawler selenium web-scraping

Last synced: 16 Dec 2024

https://github.com/bl4ck44/web-scraping

Herramienta que sirve para la extracción de datos de páginas web.

hacking osint osint-tool python python3 web-hacking web-scraping website

Last synced: 22 Nov 2024

https://github.com/emibcn/covid

Progressive Web Application that displays extracted data from the official web https://dadescovid.cat

catalunya climate-change covid covid-19 covid-data create-react-app frontend github-page hacktoberfest javascript reactjs visual-widgets web-scraping webapp widgets workflow

Last synced: 14 Nov 2024

https://github.com/jmrchelani/scrap-this-web

API to scrap HTML CSS and JS of a website

javascript scraper web web-scraping

Last synced: 06 Nov 2024

https://github.com/martincastroalvarez/html2vec

Algorithm that converts an HTML to a vectorized object suitable for neural networks.

data-science html2vec natural-language-processing python web-scraping word2vec

Last synced: 22 Dec 2024

https://github.com/neysofu/qetesh

Web scraper to train profanity detectors. NSFW!

bot nsfw python scrapy web-scraping

Last synced: 26 Dec 2024

https://github.com/aziz-axg/puppeteer-copy-content

A basic web script for copying contents

fs nodejs puppeteer scraping web-scraping

Last synced: 11 Nov 2024

https://github.com/itaynir1/brute-force

This project is a Python script for conducting a brute-force attack on a login page. It takes a target URL, a username, and a password file as inputs, attempting to find the correct password through successive login attempts.

automation brute-force-attack command-line-tool ethical-hacking http-requests login-page password-cracking password-management python-libraries python-script requests security term-colors user-authentication web-application-testing web-development web-scraping web-security

Last synced: 26 Dec 2024

https://github.com/nirantak/wsiwn

What Should I Watch Next? Expert System built using Python/Flask and Prolog.

expert-system flask prolog python python3 react swi-prolog web-scraping

Last synced: 12 Nov 2024

https://github.com/theritikchoure/crawlyx

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping

Last synced: 12 Oct 2024

https://github.com/julzerinos/python-scraping-tools

A collection of repeatable methods and concepts appearing in python web scraping with the use of Scrapy and Selenium

bot python python-bot python-web-scraper scraping scrapy selenium selenium-python web-scraping

Last synced: 09 Nov 2024

https://github.com/talaatmagdyx/socials_regex

🪡 Social account detection and extraction in ruby, e.g. for crawling/scraping.

ruby web-crawling web-scraping

Last synced: 09 Nov 2024

https://github.com/shaido987/invivogen-printer-tool

For automatic download of specified TDS documents

automatisation pdf pdfbox printer printing web-scraping

Last synced: 13 Nov 2024

https://github.com/deadsec-security/easy-scraper

Create easy workflows for web scraping using the web and drag and drop features. Making scraping easy and fast!

docker easy-to-use selfhostable selfhosted web-scraper web-scraping web-scraping-software web-scrapper-python

Last synced: 22 Oct 2024

https://github.com/andredarcie/best-games-of-all-time-data-based

🏆 Definite Best Games Of All Time Data Based by multiple sources

best critics data dataset game rank video-game video-games web-crawling web-scraping

Last synced: 11 Nov 2024

https://github.com/3choff/docs-miner

A VSCode extension that generates markdown documentation from web pages and GitHub repositories.

developer-tools documentation-generator documentation-tool github-to-markdown markdown-generator vscode-extension web-scraping website-to-markdown

Last synced: 03 Dec 2024

https://github.com/apify/apify-zapier-integration

Apify integration for Zapier

api apify web-scraping zapier

Last synced: 07 Nov 2024

https://github.com/bipinoli/online-price-tracker-with-chrome-extension

Go to e-commerce site, select the price, hit the extension button, that's it. Now that price will be tracked. The system will know where to look for the price in which site and once the price drops to your desired threshold it will should notify you.

dom-manipulation javascript php price-tracker server-sent-events web-scraping

Last synced: 22 Nov 2024