Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 03 Jul 2024

https://github.com/datawizard1337/ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

crawling python scraping scrapy scrapyd webcrawling webscraping

Last synced: 01 Jul 2024

https://github.com/nberlette/dql

Web Scraping with Deno: DOM + GraphQL

deno deno-deploy denoland dom dom-parser dql graphql graphql-scraper scraper webscraping

Last synced: 29 Jun 2024

https://github.com/mov-cli/mov-cli

Watch everything from your terminal.

android cli hacktober ios linux scraping webscraping windows

Last synced: 29 Jun 2024

https://github.com/aish2002/Movie-Info-Telegram-Bot

A telegram bot which scrapes IMDb website to get details on movies and TV shows

bot python telegram-bot webscraping

Last synced: 27 Jun 2024

https://github.com/Hesbadami/Footballemrooz

Scrape data on all soccer matches daily, and create a stylized image containing info on today's matches (e.g. kick-off time and broadcasters).

postgresql soccer-matches telegram-bots webscraping

Last synced: 27 Jun 2024

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 27 Jun 2024

https://github.com/FastestMolasses/fBrowser

Helpful Selenium functions to make web-scraping easier and faster

helper-functions python3 selenium selenium-functions selenium-python webscraper webscraping

Last synced: 27 Jun 2024

https://github.com/SeroviICAI/Movie_Recommender

This program recommends you a movie for any genre, scraping data from IMDb. This project is done for educational purposes.

data-science educational entertainment movies pandas python regex webscraping

Last synced: 27 Jun 2024

https://github.com/daijro/SearchifyX

Fast flashcard searcher study tool

education quizizz quizlet scraper webscraper webscraping

Last synced: 27 Jun 2024

https://github.com/jtanwk/nytcrossword

An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.

crosswords dataviz linguistic-analysis nytimes nytimes-crossword rvest webscraping

Last synced: 27 Jun 2024

https://github.com/cornelk/goscrape

Web scraper that can create an offline readable version of a website

go golang scraper webscraping

Last synced: 27 Jun 2024

https://github.com/openaustralia/morph

Take the hassle out of web scraping

civictech docker webscraping

Last synced: 18 Jun 2024

https://github.com/pavlovtech/WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping

Last synced: 15 Jun 2024

https://github.com/MayankPandey01/BrokenLinkHijacker

A Fast Broken Link Hijacker Tool written in Python

blh bug-bounty python reconnaissance scanner webhacking webscraping

Last synced: 14 Jun 2024

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 14 Jun 2024

https://github.com/m8sec/CrossLinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 14 Jun 2024

https://github.com/jamesturk/scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

gpt openai-api webscraping

Last synced: 14 Jun 2024

https://github.com/Indie-Platforms/scrapecomfort

Desktop AI Data Scraper

ai data webscraping

Last synced: 11 Jun 2024

https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 11 Jun 2024

https://github.com/requests-cache/requests-cache

Persistent HTTP cache for python requests

cache dynamodb http mongodb performance redis requests sqlite web webscraping

Last synced: 09 Jun 2024

https://github.com/giuseppegambino/Scraping-TripAdvisor-with-Python-2020

Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

python selenium tripadvisor tripadvisor-scraper tripadvisorreview webscraper webscraper-website webscraping

Last synced: 08 Jun 2024

https://github.com/matt-dray/altcheckr

:sunrise_over_mountains: :white_check_mark: R package: assess image alt text on websites

accessibility alt-text package r rpackage rstats webscraping

Last synced: 04 Jun 2024

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 03 Jun 2024

https://github.com/JBGruber/paperboy

A comprehensive (eventually) collection of webscraping scripts for news media sites

r r-package rstats webscraping

Last synced: 02 Jun 2024

https://github.com/DEENUU1/property-aggregator

🏠 A web application written in FastAPI and a console application for scraping and parsing data enabling the collection of offers for apartments, houses and other premises for both rent and purchase

alembic celery docker docker-compose fastapi flower postgresql redis sqlite webscraping

Last synced: 02 Jun 2024

https://github.com/jchao01/TradingView-data-scraper

Extract price and indicator data from TradingView charts to create ML datasets

algorithmic-trading data-mining json tradingview webscraping

Last synced: 02 Jun 2024

https://github.com/siongui/instago

Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram

downloader go golang gopherjs instagram web-scraping webscraping

Last synced: 31 May 2024

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 30 May 2024

https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON

webscraping webscrapper

Last synced: 29 May 2024

https://github.com/dimitryzub/youtube-mention-tracker

Find target keyword mention(s) from YouTube videos. Similar to Mention but for videos. Sponsored by SerpApi.

mention-detection python webscraping youtube youtube-downloader

Last synced: 29 May 2024

https://github.com/Flybell/web_to_obsidian

A Python 3 script that scrapes an html/xml page to extract text, then creates markdown files for Obsidian & the dataview plugin

beautifulsoup dataview markdown obsidian python3 webscraping

Last synced: 27 May 2024

https://github.com/h4r7w3l1/http_file_prober

Simple a Bash tool to parse URLs size and content-type

bash calculate-size prober remote-file web-size webscraping

Last synced: 26 May 2024

https://github.com/driscoll42/ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

ebay python scraping-websites webscraping

Last synced: 26 May 2024

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 26 May 2024

https://github.com/feddelegrand7/ralger

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

dataextraction r rstats webcrawling webscraper-website webscraping

Last synced: 20 May 2024

https://github.com/OpenCourseAPI/OwlAPI

An open source REST API written in Python to scrape and serve Foothill / De Anza course data :ledger:

api course data de-anza foothill myportal owl-api webscraping

Last synced: 19 May 2024

https://github.com/rithwik003/Worthit

A web app using Node.js and Express for tracking Amazon product price history and setting up price drop alerts.

amazon express-js nodejs price-tracker pricedrop-alert webapp webscraping worthit

Last synced: 16 May 2024

https://github.com/aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping

Last synced: 14 May 2024

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 11 May 2024

https://github.com/fabienvauchelles/scrapoxy

Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers 🕸️. It also smartly handles traffic routing 🔀 to minimize bans and increase success rates 🚀.

antibot blacklisting proxies webscraping

Last synced: 10 May 2024

https://github.com/laxmanbalaraman/Broken-Links-Finder

A web application to find all the dead links in a website

bfs-algorithm broken-links django html5 multithreading spiderbot webscraping

Last synced: 10 May 2024

https://github.com/FrankFlitton/autoyeai.com

A tensorflowJS Kanye West lyrics generator and data ingestion pipeline.

netlify react styled-components tensorflow tensorflowjs webscraping webworkers

Last synced: 07 May 2024

https://github.com/maxhumber/gazpacho

🥫 The simple, fast, and modern web scraping library

gazpacho scraping webscraping

Last synced: 05 May 2024

https://github.com/yusuzech/r-web-scraping-cheat-sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping

Last synced: 02 May 2024

https://github.com/hagabaka/youtube-timestamps-to-playlist

Chrome and Firefox extension which creates a playlist from time tags on a YouTube page

chrome-extension firefox-addon playlist web-components webscraping youtube youtube-timestamps

Last synced: 01 May 2024

https://github.com/huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 30 Apr 2024

https://github.com/Skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 29 Apr 2024

https://github.com/nicodds/chesf

CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages

chrome-headless scraping selenium webscraping

Last synced: 27 Apr 2024

https://github.com/ian-whitestone/toronto-apartment-finder

[really old and probably doesn't work] Slack bot to post relevant Toronto apartment listings from Kijiji & Craigslist

google-maps python slack-bot webscraping

Last synced: 26 Apr 2024

https://github.com/mBaratta96/musicScraper

CLI tool for scraping information from musical websites (Rateyourmusic, Metal Archives), with nice album ASCII art

cli go golang metalarchives metalarchives-parser metallum rateyourmusic rym tui webscraping

Last synced: 21 Apr 2024

https://github.com/0xMH/OkanimeDownloader

Scrape your favorite Anime from Okanime.com without effort

anime beautifulsoup python python3 webscraping

Last synced: 17 Apr 2024

https://github.com/testdrivenio/selenium-grid-docker-swarm

web scraping in parallel with Selenium Grid and Docker

docker docker-swarm selenium selenium-grid selenium-webdriver webscraping

Last synced: 16 Apr 2024

https://github.com/salil-gtm/SmartTourister

We have developed a fully AI/ML-based itinerary recommendation system which when used by people coming to visit any place would allow them to better optimize their cost/time. We have 3 developed 3 inputs that are Scraping Twitter, UI Form, and FB Chatbot

chatbot css flask html javascript nodejs python tourism travel webscraping

Last synced: 13 Apr 2024

https://github.com/currentslab/extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

author-extraction content-extraction date-extraction machine-learning news news-articles news-extraction news-extractor python text-cleaning text-mining web-scraping webscraping

Last synced: 12 Apr 2024

https://github.com/Adrian-Winter/Meta-Music-Toolbox

A collection of handy tools such as adding Key & BPM to your music library

bpm key music musiclibrary selenium tunebat webscraping

Last synced: 05 Apr 2024

https://github.com/Amr2812/devto-stats-card

Display your Dev.to blog followers count and total post views count in a card image.

dev devcommunity github-profile nodejs webscraping

Last synced: 05 Apr 2024

https://github.com/HarshdipD/eztrackr

v3 of Eztrackr's Chrome extension. Designed to ease your job hunt by adding your jobs in an organized Trello board ✨

chrome-extension hacktoberfest hacktoberfest2020 javascript trello trello-boards webscraping

Last synced: 05 Apr 2024

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping

Last synced: 31 Mar 2024

https://github.com/reworkd/tarsier

Vision utilities for web interaction agents 👀

gpt4v llms ocr playwright pypi-package python selenium webscraping

Last synced: 31 Mar 2024

https://github.com/dewey/feedbridge

Plugin based RSS feed generator for sites that don't offer any. Serves RSS, Atom and JSON Feeds.

atom-feed jsonfeed-generator rss rss-generator scraping webscraping

Last synced: 30 Mar 2024

https://github.com/decryptr/decryptr

An extensible API for breaking captchas

captcha r rstats tidyverse webscraping

Last synced: 26 Mar 2024

https://github.com/openzim/python-libzim

Libzim binding for Python: read/write ZIM files in Python

binding library libzim offline python webscraping

Last synced: 24 Mar 2024

https://github.com/smyja/blackmaria

Python package for webscraping in Natural language

gpt-3 nlp openai python webscraping

Last synced: 24 Mar 2024

https://github.com/urbanadventurer/bing-ip2hosts

bingip2hosts is a Bing.com web scraper that discovers websites by IP address

bing discovery hostnames ipaddress kali kali-linux osint osint-reconnaissance osint-tool reconnaissance scraper search-engine webscraping

Last synced: 23 Mar 2024

https://github.com/brianckeegan/wikifunctions

Python functions for retrieving data from the MediaWiki/Wikipedia API

mediawiki python3 webscraping wikipedia

Last synced: 21 Mar 2024

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 19 Mar 2024

https://github.com/pacjo/UTnotifier

Python script providing UserTesting notifications

selenium usertesting webscraping

Last synced: 17 Mar 2024

https://github.com/Nickwasused/FreeGamesonSteam

Searching SteamDB for Free Games and Activating them using ArchiSteamFarm

archisteamfarm free-games steam-api steambot webscraping

Last synced: 16 Mar 2024

https://github.com/kboghe/NordVPN-switcher

Rotate between different NordVPN servers with ease. Works both on Linux and Windows without any required changes to your code!

nordvpn vpn webscraping

Last synced: 16 Mar 2024

https://github.com/antonio-nicolau/chaleno

A Dart package to web scraping data from websites easily and faster using less code lines.

dart flutter-webscrap webscraping webscraping-data

Last synced: 16 Mar 2024

https://github.com/A-Wheeto/Dashboard

A tkinter GUI collating various data

apis dashboard gui tkinter webscraper webscraping

Last synced: 14 Mar 2024