Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with webscraping

A curated list of projects in awesome lists tagged with webscraping .

https://github.com/cantino/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 22 Nov 2024

https://github.com/huginn/huginn

Create agents that monitor and act on your behalf. Your agents are standing by!

agent automation feed feedgenerator huginn monitoring notifications rss scraper twitter twitter-streaming webscraping

Last synced: 16 Dec 2024

https://github.com/mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

ai ai-scraping crawler data html-to-markdown llm markdown rag scraper scraping web-crawler webscraping

Last synced: 16 Dec 2024

https://github.com/assafelovic/gpt-researcher

LLM based autonomous agent that conducts local and web research on any topic and generates a comprehensive report with citations.

agent ai automation llms openai python research search webscraping

Last synced: 16 Dec 2024

https://github.com/niespodd/browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

automation bot bot-detection browser-fingerprinting chromedriver chromium chromium-browser crawler detection fingerprinting puppeteer recaptcha scraper spider stealth web webscraping

Last synced: 17 Dec 2024

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 19 Dec 2024

https://github.com/fabienvauchelles/scrapoxy

Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers 🕸️. It also smartly handles traffic routing 🔀 to minimize bans and increase success rates 🚀.

antibot blacklisting proxies webscraping

Last synced: 27 Oct 2024

https://github.com/thewebscrapingclub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 20 Dec 2024

https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 26 Oct 2024

https://github.com/reworkd/tarsier

Vision utilities for web interaction agents 👀

gpt4v llms ocr playwright pypi-package python selenium webscraping

Last synced: 17 Dec 2024

https://github.com/jamesturk/scrapeghost

👻 Experimental library for scraping websites using OpenAI's GPT API.

gpt openai-api webscraping

Last synced: 19 Dec 2024

https://github.com/requests-cache/requests-cache

Persistent HTTP cache for python requests

cache dynamodb http mongodb performance redis requests sqlite web webscraping

Last synced: 17 Dec 2024

https://github.com/m8sec/CrossLinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 06 Nov 2024

https://github.com/m8sec/crosslinked

LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

enumeration linkedin-scraper osint pentest-scripts pentest-tool python3 username-generator webscraping

Last synced: 18 Dec 2024

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 20 Dec 2024

https://github.com/maxhumber/gazpacho

🥫 The simple, fast, and modern web scraping library

gazpacho scraping webscraping

Last synced: 19 Dec 2024

https://github.com/skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 16 Nov 2024

https://github.com/Skallwar/suckit

Suck the InTernet

hacktoberfest rust webscraping

Last synced: 31 Oct 2024

https://github.com/mov-cli/mov-cli

Watch everything from your terminal.

android cli hacktober ios linux scraping webscraping windows

Last synced: 21 Nov 2024

https://github.com/benibela/xidel

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

cli command-line css-selector curl data-processing datascraping html http httpie json rest scraper web webscraper webscraping wget xml xmlstarlet xpath xquery

Last synced: 18 Dec 2024

https://github.com/chris-greening/instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

beginner-friendly data-mining data-science instagram instagram-data instagram-scraper lightweight python python-scraper python3 webscraping

Last synced: 06 Nov 2024

https://github.com/z0m31en7/uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 21 Dec 2024

https://github.com/wodsuz/easyapplyjobsbot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 21 Dec 2024

https://github.com/adrianhajdin/pricewise

Dive into web scraping and build a Next.js 13 eCommerce price tracker within a single video that teaches you data scraping, cron jobs, sending emails, deployment, and more.

scraping webscraping

Last synced: 21 Dec 2024

https://github.com/z0m31en7/Uscrapper

Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.

darkweb darkweb-crawler information-extraction information-gathering osint osint-python osint-tool python reconnaissance selenium selenium-webscraper tor web-scraping webcra webcrawler webscraping website-scraper websites

Last synced: 13 Nov 2024

https://github.com/jchao01/TradingView-data-scraper

Extract price and indicator data from TradingView charts to create ML datasets

algorithmic-trading data-mining json tradingview webscraping

Last synced: 30 Oct 2024

https://github.com/openaustralia/morph

Take the hassle out of web scraping

civictech docker webscraping

Last synced: 04 Nov 2024

https://github.com/roniemartinez/dude

dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

async beautifulsoup4 crawler css framework lxml parsel playwright python scraper scraping selenium sync web-scraping webscraping xpath

Last synced: 13 Dec 2024

https://github.com/openzim/zimit

Make a ZIM file from any Web site and surf offline!

docker scraper webscraping zim

Last synced: 15 Dec 2024

https://github.com/yusuzech/r-web-scraping-cheat-sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

cheatsheet httr r rselenium rvest scrape-websites web-scraping webscraping

Last synced: 09 Nov 2024

https://github.com/wodsuz/EasyApplyJobsBot

A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

ai apply-jobs automated automation bot challenge chatgpt find-jobs glassdoor glassdoor-scraper indeed job jobs linkedin list-jobs python3 selenium webscraping ziprecruiter

Last synced: 07 Dec 2024

https://github.com/oxylabs/python-web-scraping-tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

amazon-scraper-python crawler github-python json-database-python python python-projects python-web-crawler python-web-scraper scraper-python scraping web-crawler-python web-scraping web-scraping-api web-scraping-python webscraping

Last synced: 15 Dec 2024

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 31 Oct 2024

https://github.com/clueless-community/scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

beautifulsoup hacktoberfest hacktoberfest2023 package pip python selenium webscraping

Last synced: 20 Dec 2024

https://github.com/ispras/web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension

javascript scraping scraping-tool webscraping

Last synced: 18 Dec 2024

https://github.com/browserutils/kooky

Go code to read cookies from browser cookie stores.

browser cookies firefox go golang google-chrome safari webscraping

Last synced: 16 Dec 2024

https://github.com/kboghe/NordVPN-switcher

Rotate between different NordVPN servers with ease. Works both on Linux and Windows without any required changes to your code!

nordvpn vpn webscraping

Last synced: 03 Nov 2024

https://github.com/scrapfly/scrapfly-scrapers

Web scrapers for popular targets powered Scrapfly.io

crawling python webscraping

Last synced: 12 Dec 2024

https://github.com/alexjc/weboptout

Opt-Out tool to check Copyright reservations in a way that even machines can understand.

command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping

Last synced: 18 Dec 2024

https://github.com/brucedone/clock

可视化任务调度系统,精简到一个二进制文件 (Web visual task scheduler system , yes ! just one binary solve all the problems !)

dag-scheduling gocron scheduler task taskflow visual web web-scheduler webscraping

Last synced: 18 Dec 2024

https://github.com/decryptr/decryptr

An extensible API for breaking captchas

captcha r rstats tidyverse webscraping

Last synced: 25 Oct 2024

https://github.com/owainlewis/falkor

Open Source web scraping API. Falkor turns web pages into queryable JSON

webscraping webscrapper

Last synced: 01 Nov 2024

https://github.com/driscoll42/ebayMarketAnalyzer

Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

ebay python scraping-websites webscraping

Last synced: 06 Nov 2024

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 17 Nov 2024

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 09 Nov 2024

https://github.com/cornelk/goscrape

Web scraper that can create an offline readable version of a website

go golang scraper webscraping

Last synced: 17 Nov 2024

https://github.com/guilhermecgs/ir

Projeto de calculo de Imposto de Renda em operacoes na bovespa automaticamente. Tags:canal eletronico do investidor, CEI, selenium, bovespa, IRPF, IR, imposto de renda, finance, yahoo finance, acao, fii, etf, python, crawler, webscraping, calculadora ir

acoes b3 bovespa calculadora-ir canal-eletronico-investidor cei crawler etf fii finance imposto-de-renda irpf webscraping

Last synced: 11 Nov 2024

https://github.com/pwlmaciejewski/imghash

Perceptual image hashing for Node.js

computer-vision image-processing imghash webscraping

Last synced: 15 Dec 2024

https://github.com/aeksco/aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

aws aws-cdk aws-textract cdk cloudformation data-pipeline dynamodb jest lambda pdf puppeteer s3 serverless sns textract typescript webscraping

Last synced: 19 Dec 2024

https://github.com/dedsecinside/gotor

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

cli command-line command-line-tool docker go golang golang-server hacktoberfest http-server information-extraction osint osint-tools rest-api service tor torbot webcrawler webcrawling webscraping

Last synced: 18 Dec 2024

https://github.com/sshh12/llm_osint

LLM OSINT is a proof-of-concept method of using LLMs to gather information from the internet and then perform a task with this information.

gpt-4 llms osint webscraping

Last synced: 21 Dec 2024

https://github.com/feddelegrand7/ralger

ralger makes it easy to scrape a website. Built on the shoulders of titans: rvest, xml2.

dataextraction r rstats webcrawling webscraper-website webscraping

Last synced: 16 Dec 2024

https://github.com/smyja/blackmaria

Python package for webscraping in Natural language

gpt-3 nlp openai python webscraping

Last synced: 29 Nov 2024

https://github.com/urbanadventurer/bing-ip2hosts

bingip2hosts is a Bing.com web scraper that discovers websites by IP address

bing discovery hostnames ipaddress kali kali-linux osint osint-reconnaissance osint-tool reconnaissance scraper search-engine webscraping

Last synced: 12 Nov 2024

https://github.com/Indie-Platforms/scrapecomfort

Desktop AI Data Scraper

ai data webscraping

Last synced: 08 Nov 2024

https://github.com/jtanwk/nytcrossword

An exploration of New York Times crossword answers from 1994-2017, i.e. the Will Shortz era.

crosswords dataviz linguistic-analysis nytimes nytimes-crossword rvest webscraping

Last synced: 20 Nov 2024

https://github.com/pavlovtech/WebReaper

Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.

crawler datamining parser parsing scraper scraping scraping-api scraping-data scraping-tool scraping-web scraping-websites webcrawler webscraping

Last synced: 06 Nov 2024

https://github.com/chukhraiartur/seo-keyword-research-tool

Python SEO keywords suggestion tool. Google Autocomplete, People Also Ask and Related Searches.

cli google google-autocomplete google-related-search people-also-ask python seo serpapi webscraping

Last synced: 15 Dec 2024

https://github.com/siongui/instago

Download/access photos, videos, stories, story highlights, postlives, following and followers of Instagram

downloader go golang gopherjs instagram web-scraping webscraping

Last synced: 29 Oct 2024

https://github.com/A-Wheeto/Dashboard

A tkinter GUI collating various data

apis dashboard gui tkinter webscraper webscraping

Last synced: 31 Oct 2024

https://github.com/dimitryzub/scrape-google-scholar-py

Extract data from all Google Scholar pages from a single Python module. NOTE: I'm no longer maintaining this repo. Chrome driver/selectors might need and update.

beautifulsoup4 googlescholar lexbor python-3 requests selectolax serp serp-api serpapi webscraping

Last synced: 14 Dec 2024

https://github.com/zoranpandovski/bookingscraper

:earth_americas: :hotel: Scrape Booking.com :hotel: :earth_americas:

beautifulsoup booking python3 request scraper web-scraping webscraper webscraping

Last synced: 15 Dec 2024

https://github.com/cecobask/imdb-trakt-sync

Automatic sync from IMDb to Trakt (watchlist, lists, ratings and history) using GitHub actions.

github-actions golang imdb trakt webscraping

Last synced: 08 Nov 2024

https://github.com/giuseppegambino/Scraping-TripAdvisor-with-Python-2020

Python implementation of web scraping of TripAdvisor with Selenium in a new 2019 website

python selenium tripadvisor tripadvisor-scraper tripadvisorreview webscraper webscraper-website webscraping

Last synced: 06 Nov 2024

https://github.com/datawizard1337/ARGUS

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9

crawling python scraping scrapy scrapyd webcrawling webscraping

Last synced: 27 Oct 2024