Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://github.com/codelucas/newspaper
crawler crawling news news-aggregator python scraper
Last synced: 5 days ago
JSON representation
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
- Host: GitHub
- URL: https://github.com/codelucas/newspaper
- Owner: codelucas
- License: mit
- Created: 2013-11-25T09:50:50.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2024-04-03T05:54:17.000Z (9 months ago)
- Last Synced: 2024-05-15T13:32:40.346Z (8 months ago)
- Topics: crawler, crawling, news, news-aggregator, python, scraper
- Language: Python
- Homepage: https://goo.gl/VX41yK
- Size: 17.5 MB
- Stars: 13,780
- Watchers: 386
- Forks: 2,093
- Open Issues: 521
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-projects - newspaper - News, full-text, and article metadata extraction in Python 3 (Python)
- awesome-data-science-viz - Newspaper - text, and article metadata extraction in Python 3. (Data / Aggregators)
- my-awesome-github-stars - codelucas/newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs: (Python)
- awesome-python-resources - GitHub - 59% open · ⏱️ 02.09.2020): (网络)
- awesome-rainmana - codelucas/newspaper - newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs: (Python)
- awesome-starts - codelucas/newspaper - News, full-text, and article metadata extraction in Python 3. Advanced docs: (Python)
- starred-awesome - newspaper - News, full-text, and article metadata extraction in Python 3. Advanced docs: (Python)
README
Newspaper3k: Article scraping & curation
========================================.. image:: https://badge.fury.io/py/newspaper3k.svg
:target: http://badge.fury.io/py/newspaper3k.svg
:alt: Latest version.. image:: https://travis-ci.org/codelucas/newspaper.svg
:target: http://travis-ci.org/codelucas/newspaper/
:alt: Build status.. image:: https://coveralls.io/repos/github/codelucas/newspaper/badge.svg?branch=master
:target: https://coveralls.io/github/codelucas/newspaper
:alt: Coverage statusInspired by `requests`_ for its simplicity and powered by `lxml`_ for its speed:
"Newspaper is an amazing python library for extracting & curating articles."
-- `tweeted by`_ Kenneth Reitz, Author of `requests`_"Newspaper delivers Instapaper style article extraction." -- `The Changelog`_
.. _`tweeted by`: https://twitter.com/kennethreitz/status/419520678862548992
.. _`The Changelog`: http://thechangelog.com/newspaper-delivers-instapaper-style-article-extraction/**Newspaper is a Python3 library**! Or, view our **deprecated and buggy** `Python2 branch`_
.. _`Python2 branch`: https://github.com/codelucas/newspaper/tree/python-2-head
A Glance:
---------.. code-block:: pycon
>>> from newspaper import Article
>>> url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
>>> article = Article(url).. code-block:: pycon
>>> article.download()
>>> article.html
'