Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with beautifulsoup

A curated list of projects in awesome lists tagged with beautifulsoup .

https://github.com/apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

apify automation beautifulsoup crawler crawling hacktoberfest headless headless-chrome pip playwright python scraper scraping web-crawler web-crawling web-scraping

Last synced: 30 Dec 2024

https://github.com/MechanicalSoup/MechanicalSoup

A Python library for automating interaction with websites.

beautifulsoup mechanicalsoup pypi python python-library requests web

Last synced: 25 Oct 2024

https://github.com/hickford/MechanicalSoup

A Python library for automating interaction with websites.

beautifulsoup mechanicalsoup pypi python python-library requests web

Last synced: 22 Nov 2024

https://github.com/mechanicalsoup/mechanicalsoup

A Python library for automating interaction with websites.

beautifulsoup mechanicalsoup pypi python python-library requests web

Last synced: 30 Dec 2024

https://github.com/ashvardanian/stringzilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖

beautifulsoup common-crawl csv dataset html information-retrieval json laion ndjson parser pattern-recognition simd sorting-algorithms string string-manipulation string-matching string-parsing string-search substring

Last synced: 02 Jan 2025

https://github.com/anaskhan96/soup

Web Scraper in Go, similar to BeautifulSoup

beautifulsoup go golang html-node web-scraper webscraper webscraping

Last synced: 02 Jan 2025

https://github.com/paulmcinnis/jobfunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 31 Dec 2024

https://github.com/ashvardanian/StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

beautifulsoup common-crawl csv dataset html information-retrieval json laion ndjson parser pattern-recognition simd sorting-algorithms string string-manipulation string-matching string-parsing string-search substring

Last synced: 28 Oct 2024

https://github.com/PaulMcInnis/JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 27 Oct 2024

https://github.com/chishui/jssoup

JavaScript + BeautifulSoup = JSSoup

beautifulsoup crawler html javascript nodejs parser react-native spider

Last synced: 30 Dec 2024

https://github.com/walissonsilva/web-scraping-python

🌐 Repositório com o conteúdo (slides, exemplos, códigos) da série de vídeos no YouTube sobre Web Scraping com Python.

beautifulsoup python requests selenium web-scraping

Last synced: 31 Oct 2024

https://github.com/sdaqo/anipy-cli

Little tool in python to watch and download anime from the terminal (the better way to watch anime). Also applicable as an API

anime anime-scraper beautifulsoup cli gogoanime gplv3 python python3 requests-library-python scraper watch

Last synced: 31 Oct 2024

https://github.com/clueless-community/scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

beautifulsoup hacktoberfest hacktoberfest2023 package pip python selenium webscraping

Last synced: 03 Jan 2025

https://github.com/jimywork/djangohunter

Tool designed to help identify incorrectly configured Django applications that are exposing sensitive information.

beautifulsoup bs4 django hacking python python3 shodan tool

Last synced: 27 Dec 2024

https://github.com/facelessuser/soupsieve

A modern CSS selector implementation for BeautifulSoup

beautifulsoup css css-selector css4 html html5 python soup-sieve xml

Last synced: 02 Jan 2025

https://github.com/alisonmitchell/stock-prediction

Technical and sentiment analysis to predict the stock market with machine learning models based on historical time series data and news article sentiment collected using APIs and web scraping.

beautifulsoup bert gensim huggingface keras-tensorflow machine-learning matplotlib mplfinance nlp nltk numpy pandas plotly python scikit-learn scipy seaborn spacy textblob yfinance

Last synced: 19 Dec 2024

https://github.com/nedlir/languagepod101-scraper

Python scraper for Language Pods such as Japanesepod101.com :japanese_ogre: :japan: :sushi: Compatible with Japanese, Chinese, French, German, Italian, Korean, Portuguese, Russian, Spanish and many more! ✨

beautifulsoup chinese-language course download japanese japanese-language japanesepod japanesepod101 jpod101 language language-learning learn learn-chinese learn-japanese learn-spanish podcast python requests scraping spanish-language

Last synced: 05 Nov 2024

https://github.com/danclaudiupop/robox

Simple library for exploring/scraping the web or testing a website you’re developing

beautifulsoup httpx python scraping

Last synced: 09 Dec 2024

https://github.com/rohithasrk/GSoC-Organisation-Scraper

Scrape GSoC organisations using a single script.

beautifulsoup gsoc organisation python requests terminal

Last synced: 26 Oct 2024

https://github.com/trainingbypackt/data-wrangling-with-python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

analytics beautifulsoup data-analytics data-munging data-science data-wrangling database numpy pandas python regular-expression web-scraping

Last synced: 01 Jan 2025

https://github.com/dangsh/hive

lots of spider (很多爬虫)

beautifulsoup python3 scrapy selenium-webdriver spider

Last synced: 11 Oct 2024

https://github.com/codingforentrepreneurs/Web-Scraping

Learn how to leverage Python's amazing tools to scrape data from other websites. The end goal of this course is to scrape blogs to analyze trending keywords and phrases. We'll be using Python 3.6, Requests, BeautifulSoup, Asyncio, Pandas, Numpy, and more!

aysncio beautifulsoup beautifulsoup4 joincfe numpy pandas python python-requests python3 requests scraper sraping tutorial web-scraping

Last synced: 22 Nov 2024

https://github.com/Udzu/pudzu

Various python scripts, mostly geared towards dataviz.

beautifulsoup pillow python visualization

Last synced: 27 Nov 2024

https://github.com/zoranpandovski/bookingscraper

:earth_americas: :hotel: Scrape Booking.com :hotel: :earth_americas:

beautifulsoup booking python3 request scraper web-scraping webscraper webscraping

Last synced: 29 Dec 2024

https://github.com/loks0n/supreme-drop-bot

A supreme web bot, written in python, to grab a list of specified products, and checkout before they sell out!

autocomplete beautifulsoup python-3-6 selenium splinter supreme supreme-bot supreme-product supremewebstore tkinter

Last synced: 22 Dec 2024

https://github.com/leerob/facebook-data-analyzer

📊Python script to analyze the contents of your Facebook data export

beautifulsoup data-analysis facebook python

Last synced: 10 Dec 2024

https://github.com/Sparsh1212/gsocanalyzer

A blazingly fast tool to analyze all the selected organizations in Google Summer of Code in the form of graphical analytics.

analytics beautifulsoup google-summer-of-code gsoc gsoc-analyser javascript organisation python reactjs scraping

Last synced: 01 Nov 2024

https://github.com/michalczaplinski/pitchfork

:notes: Unofficial python API for pitchfork.com reviews.

beautifulsoup pitchfork python requests scraper

Last synced: 19 Dec 2024

https://github.com/kangvcar/awsomespider

Python爬虫小项目汇总(招聘信息/电影信息/股票信息/天气信息/贴吧信息/图片信息/视频信息..)

beautifulsoup lxml pymysql pyspider python scrapy selenium spider urllib2

Last synced: 18 Nov 2024

https://github.com/boringppl/linkedin-profiles-scraping

Automatically scrape the web data of people profiles on Linkedin based on a specific search query

beautifulsoup beautifulsoup4 python python3 selenium selenium-webdriver webscraper webscraping webscraping-data webscrapper webscrapping

Last synced: 03 Dec 2024

https://github.com/Flybell/web_to_obsidian

A Python 3 script that scrapes an html/xml page to extract text, then creates markdown files for Obsidian & the dataview plugin

beautifulsoup dataview markdown obsidian python3 webscraping

Last synced: 04 Dec 2024

https://github.com/ChemLez/xmcTiaoJiInformation_Pachong

爬虫。考研调剂信息。主要爬取小木虫网站的调剂信息。可以爬取任何年份,任何专业的调剂信息。爬取内容包括:标题,学校名称,专业、招生人数、发布时间、学校发布调剂的网页链接。主要用到的库:BeautifulSoup,requests,re。

beautifulsoup kao-yan pa-chong re requests tiao-ji tiao-ji-xin-xi xiao-mu-chong

Last synced: 30 Oct 2024

https://github.com/hackersandslackers/beautifulsoup-tutorial

:sparkles: :ramen: Scrape webpage metadata using BeautifulSoup.

beautifulsoup beautifulsoup4 python scraping scraping-websites scripting tutorial

Last synced: 03 Jan 2025

https://github.com/rmccorm4/pokefetch

🕹️ Command-line tool similar to neofetch for looking up pokemon in the terminal.

beautifulsoup catimg developer-tools development-environment neofetch pokemon screenfetch shiny sprites

Last synced: 29 Oct 2024

https://github.com/griffintaur/news-at-command-line

:newspaper: News at the command line

beautifulsoup python-2 requests yaml

Last synced: 22 Dec 2024

https://github.com/rmccorm4/Pokefetch

🕹️ Command-line tool similar to neofetch for looking up pokemon in the terminal.

beautifulsoup catimg developer-tools development-environment neofetch pokemon screenfetch shiny sprites

Last synced: 15 Nov 2024

https://github.com/wazzabeee/copy-spotter

Make plagiarism detection easier. This script will find similar sentences between given files and highlight them in a side by side comparison.

beautifulsoup bs4 docx odt pdf plagiarism plagiarism-check plagiarism-checker plagiarism-detection plagiarism-detector python side-by-sidediff similarity similarity-detection similarity-score txt

Last synced: 16 Dec 2024

https://github.com/Woahai321/list-sync

ListSync automates the import of your IMDB & Trakt lists into Overseerr & Jellyseerr, simplifying your movie management.

beautifulsoup beautifulsoup4 docker imdb imdb-webscrapping jellyfin jellyseerr overseerr plex plex-media-server python radarr sonarr trakt trakt-tv

Last synced: 26 Oct 2024

https://github.com/sr1jan/videoautoproduction

A simple program to automate the production of videos for a news channel on youtube.

automation bash beautifulsoup ffmpeg gcp moviepy python texttospeech-api video youtube

Last synced: 11 Nov 2024

https://github.com/maicius/universityrecruitment-ssurvey

用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?

analysis beautifulsoup crawler data redis university

Last synced: 11 Nov 2024

https://github.com/adregner/beautifulscraper

Python web-scraping library that wraps urllib2 and BeautifulSoup

beautifulsoup cookie python python2 python3 urllib2

Last synced: 13 Oct 2024

https://github.com/florents-tselai/greek-wines-analysis

Scraper, Data and Analysis for "Analyzing 1000+ Greek Wines with Python"

beautifulsoup data-science pandas python seaborn web-scraping

Last synced: 31 Oct 2024

https://github.com/galarzaa90/tibia.py

API to parse tibia.com content into python objects.

beautifulsoup crawling-python python python3 tibia webcrawling

Last synced: 30 Dec 2024

https://github.com/jameszbl/zhilian_spider

智联招聘关键词搜索职位信息爬虫

beautifulsoup python python3 requests scrapy spider zhilian

Last synced: 28 Oct 2024

https://github.com/amrrs/scraper-projects

🕸 List of mini projects that involve web scraping 🕸

beautifulsoup scraper scraping

Last synced: 15 Nov 2024

https://github.com/cw1997/tieba-birthday-spider

百度贴吧生日爬虫,可抓取贴吧内吧友生日,并且在对应日期自动发送祝福

beautifulsoup birthday config mongodb post pymongo python queue requests spider threading tieba

Last synced: 28 Nov 2024

https://github.com/rsharifnasab/create_word_cloud

create word clouds with wrodcloud-fa for twitter and telegram chat

beautifulsoup python python3 telegram twint twitter worldcould

Last synced: 14 Nov 2024

https://github.com/phantominsights/reddit-bots

A collection of Reddit bots that I use to enhance the subreddits I manage.

beautifulsoup praw python3 reddit-bot requests rss web-scraper

Last synced: 11 Nov 2024

https://github.com/montanaz0r/mma-parser-for-sherdog-and-ufc-data

Python web scraper for Sherdog & UFC data. Creates output of your choice in csv or json format.

beautifulsoup data-science mma python ufc webscraping

Last synced: 14 Dec 2024

https://github.com/dheavy/househunterbot

Use Python, Google Spreadsheet, Google Shortener and CALLR API to automate your apartment search in Paris.

api beautifulsoup bot callr google-api google-spreadsheet google-spreadsheets python scraping

Last synced: 28 Oct 2024

https://github.com/PhantomInsights/tweet-transcriber

A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.

beautifulsoup imgur praw python3 reddit-bot web-scraper

Last synced: 12 Nov 2024

https://github.com/phantominsights/tweet-transcriber

A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.

beautifulsoup imgur praw python3 reddit-bot web-scraper

Last synced: 11 Nov 2024

https://github.com/sachaarbonel/beautifulsoup.dart

A dart port of the famous python library beautifulsoup

beautifulsoup dart

Last synced: 20 Dec 2024

https://github.com/manasvigoyal/gmail-classification

Extract Emails from Gmail account, convert to Excel file and classify using various classification algorithms.

beautifulsoup classification email-classification excel gmail jupyter-notebooks machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 11 Oct 2024

https://github.com/lironmiz/python_mini_projects

my python mini projects as part of the complete python Pro Bootcamp for 2023 - 100 Days of Code course

100-days-of-code api beautifulsoup bootstrap5 css3 flask html5 mathplotlib numpy pandas pycharm python3 pythongui requests sympy-library threading tkinter turtle udemy-course webscraping

Last synced: 27 Oct 2024

https://github.com/bbergerud/merlin

Bird Identification Quiz + Webscraping

beautifulsoup bird-recognition birds ornithology vimeo webscapping

Last synced: 12 Nov 2024

https://github.com/sawyerclick/scrapers

Scrapin' some data, man

actions beautifulsoup python scraping-python

Last synced: 15 Dec 2024

https://github.com/gamemann/how-to-use-selenium-and-beautifulsoup

A full lab and how-to guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 27 Oct 2024

https://github.com/georgiydemo/instagramtotelegram

Post photos/videos from instagram to telegram channel by the tag

beautifulsoup grabber grabbing-content grabble instagram python3 telegram telegram-bot telegram-channel

Last synced: 05 Dec 2024

https://github.com/hackersandslackers/jsonld-scraper-tutorial

🌎 🖥 Supercharge your scraper to extract quality page metadata by parsing JSON-LD data via Python's extruct library.

beautifulsoup extruct json-ld python scraper structured-data tutorial

Last synced: 09 Nov 2024

https://github.com/chaitjo/music-library-ocd-fixer

Automatically fetch metadata for your music collection and rename files accordingly

beautifulsoup eyed3 metadata music python

Last synced: 25 Oct 2024

https://github.com/alexandrevl/supersummarizeai

Unleash the power of AI with SuperSummarizeAI! Effortlessly extract, condense, and clip content from webpages and YouTube videos using ChatGPT. Turning endless streams of content into digestible summaries.

beautifulsoup chatgpt content-analysis multilingual nlp openai papperclip text text-processing text-summarization web-scraping youtube

Last synced: 09 Nov 2024

https://github.com/ptyadana/web-scraping-and-api-in-python

Web Scraping and API in Python using beautifulsoup, requests, requests-xml, etc for processing multiple APIs and scraping multple sites such as youtube, soundcloud and many more.

365datascience api beautifulsoup exchangeratesapi-io github-api itunes-api jokes-api jupyter-notebook juypter lxml python3 requests requests-html soundcloud steam urllib webscraping youtube

Last synced: 15 Nov 2024

https://github.com/gamemann/selenium-and-beautifulsoup-lab

A full lab and guide on how to use Selenium paired with Beautiful Soup to parse and extract data from a website using Python.

beautifulsoup beautifulsoup4 bs4 firefox geckodriver node nodejs python react selenium selenium-python selenium-webdriver webscraper webscraping

Last synced: 10 Oct 2024