An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with article-extractor

A curated list of projects in awesome lists tagged with article-extractor .

https://github.com/extractus/article-extractor

To extract main article from given URL with Node.js

article article-extractor article-parser crawler extract nodejs readability scraper

Last synced: 27 Apr 2025

https://github.com/scotteh/php-goose

Readability / Html Content / Article Extractor & Web Scrapping library written in PHP

article article-extractor autoloader composer php php-goose readability scraper

Last synced: 05 Oct 2025

https://github.com/hipstermojo/paperoni

An article extractor in Rust

article-extractor readability rust

Last synced: 07 Apr 2025

https://github.com/fterh/sneakpeek

Reddit bot to preview and post hyperlinks as comments

article-extractor news-articles preview reddit reddit-bot

Last synced: 08 Jul 2025

https://github.com/inaridiy/webforai

The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.

article-extractor extractor html-to-markdown html2markdown html2md html2text readability scraping text-mining

Last synced: 06 Apr 2025

https://github.com/johnbumgarner/newshound

This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.

article-extracting article-extractor data-extraction data-mining data-science datascience news news-aggregator news-crawler newspaper-crawler python-newspaper python3 text-mining web-scraping webscraping

Last synced: 14 Jan 2026

https://github.com/metalwarrior665/actor-article-extractor-smart

Combines Apify's crawling system and article parsing with unfluff library.

actor apify article-extractor scraper web-scraper

Last synced: 03 Sep 2025

https://github.com/bharathvaj-ganesan/artixtractor

Extract article/blog from websites like [medium.com, inc42.com,etc]:100:

article-extractor hacktoberfest nodejs

Last synced: 19 Jun 2025

https://github.com/gadzan/generatoc

Automatically generate table of content from heading of HTML document

article-extractor html-document ssr toc typescript

Last synced: 23 May 2026

https://github.com/mccallofthewild/alexandrias-revenge

🔥The bold new archive that can’t be burned, bulldozed or battering-rammed #PoweredByArweave

archive article-extractor arweave blockchain webarchive

Last synced: 21 Apr 2025

https://github.com/robinmillford/cortex-ai-multi-model-insights-hub

Cortex AI: Multi-Model Insights Hub is an advanced platform that leverages cutting-edge AI to empower your research, analysis, and data exploration. By integrating multiple Large Language Models (LLMs) with a sophisticated Retrieve-and-Generate (RAG) system

article-extractor chatbot data-analysis data-visualization deepseek-chat deepseek-r1 llama3 llm pdf-document-processor rag streamlit-webapp summarizer vector-database

Last synced: 28 Oct 2025

https://github.com/hemantwasthere/ai-sumz

Simplify your reading with Summarizer, an open-source article summarizer that transforms lengthy articles into clear and concise summaries

article-extractor rapidapi react redux-toolkit tailwindcss vite

Last synced: 12 Apr 2026

https://github.com/sters/extract-content

ExtractContent for PHP7. Extract web article tool.

article-extractor extract-content php

Last synced: 16 Jan 2026

https://github.com/jujulis18/smartdata_tracker

Outil de scraping conçu pour extraire proprement le contenu d’articles en ligne (blogs, presse, publications). Il automatise la collecte de données textuelles, nettoie le contenu (suppression des balises, publicités, etc.), et permet un export structuré pour une analyse ultérieure (NLP, résumé, veille, etc.).

agent article-extractor google-cloud-platform mistral-api ner python scraping streamlit veille

Last synced: 31 Jan 2026

https://github.com/sahilg28/artisumm

Artisumm is a tool that delivers concise and accurate article summaries for quick information digestion.

article-extractor article-summarizer article-summary fronted-development javascript rapidapi reactjs tailwindcss webdevelopment

Last synced: 15 Apr 2026

https://github.com/RobinMillford/Cortex-AI-Multi-Model-Insights-Hub

This project creates a Retrieve-and-Generate (RAG) powered chatbot for summarizing and interacting with articles. The system processes articles provided as PDFs or URLs, extracts text, splits the content into chunks, generates embeddings, and stores them in a vector database

article-extractor chatbot llama3 llm pdf-document-processor rag streamlit summarizer vector-database

Last synced: 11 Oct 2025

https://github.com/parthapray/pdf_text_extraction_json_section_subsection

This repo contains codes for extraction of PDF text to JSON to show section number, section title, section body content, footnote

article-extractor document extraction json pdf pymupdf-fitz regex text

Last synced: 10 May 2026