Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/codelucas/newspaper

News, full-text, and article metadata extraction in Python 3. Advanced docs:
https://github.com/codelucas/newspaper

crawler crawling news news-aggregator python scraper

Last synced: about 1 month ago
JSON representation

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Lists

README

        

Newspaper3k: Article scraping & curation
========================================

.. image:: https://badge.fury.io/py/newspaper3k.svg
:target: http://badge.fury.io/py/newspaper3k.svg
:alt: Latest version

.. image:: https://travis-ci.org/codelucas/newspaper.svg
:target: http://travis-ci.org/codelucas/newspaper/
:alt: Build status

.. image:: https://coveralls.io/repos/github/codelucas/newspaper/badge.svg?branch=master
:target: https://coveralls.io/github/codelucas/newspaper
:alt: Coverage status

Inspired by `requests`_ for its simplicity and powered by `lxml`_ for its speed:

"Newspaper is an amazing python library for extracting & curating articles."
-- `tweeted by`_ Kenneth Reitz, Author of `requests`_

"Newspaper delivers Instapaper style article extraction." -- `The Changelog`_

.. _`tweeted by`: https://twitter.com/kennethreitz/status/419520678862548992
.. _`The Changelog`: http://thechangelog.com/newspaper-delivers-instapaper-style-article-extraction/

**Newspaper is a Python3 library**! Or, view our **deprecated and buggy** `Python2 branch`_

.. _`Python2 branch`: https://github.com/codelucas/newspaper/tree/python-2-head

A Glance:
---------

.. code-block:: pycon

>>> from newspaper import Article

>>> url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
>>> article = Article(url)

.. code-block:: pycon

>>> article.download()

>>> article.html
'