{"id":13449324,"url":"https://github.com/neegor/wanish","last_synced_at":"2025-03-22T22:32:42.347Z","repository":{"id":57492519,"uuid":"31414320","full_name":"neegor/wanish","owner":"neegor","description":"Open Source implementation of Summly","archived":false,"fork":false,"pushed_at":"2016-12-11T12:48:47.000Z","size":1944,"stargazers_count":47,"open_issues_count":1,"forks_count":15,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-10T18:46:50.743Z","etag":null,"topics":["parsing","python","readability","summly"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neegor.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-02-27T10:24:02.000Z","updated_at":"2024-08-27T03:26:33.000Z","dependencies_parsed_at":"2022-08-28T11:51:28.614Z","dependency_job_id":null,"html_url":"https://github.com/neegor/wanish","commit_stats":null,"previous_names":["reefeed/wanish"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neegor%2Fwanish","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neegor%2Fwanish/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neegor%2Fwanish/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neegor%2Fwanish/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neegor","download_url":"https://codeload.github.com/neegor/wanish/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245029053,"owners_count":20549641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["parsing","python","readability","summly"],"created_at":"2024-07-31T06:00:35.646Z","updated_at":"2025-03-22T22:32:41.927Z","avatar_url":"https://github.com/neegor.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":".. image:: https://codeclimate.com/github/reefeed/wanish/badges/gpa.svg\n   :target: https://codeclimate.com/github/reefeed/wanish\n   :alt: Code Climate\n\nAbout\n-----\n\nThis package allows you to summarize text by reducing an article in size\nto several sentences retaining the idea of the text.\n\nBesides of that the package extracts the following from the document:\n\n1. Canonical URL of the article\n2. Title of the article\n3. URL of the image characterizing this article\n4. Strips the document of excessive information (headers, footers,\n   navigation, advertisement, etc.) and forms a clean HTML based on\n   structured data of schema.org\n\n`DEMO`_\n\nInstallation\n------------\n\n::\n\n    easy_install wanish\n    or\n    pip install wanish\n\nUsage\n-----\n\n.. code:: python\n\n    from wanish import Wanish\n    wanish = Wanish()\n    wanish.perform_url(document_url)\n\n    # getting doc's source canonical url\n    url = wanish.url\n    # getting document's title\n    title = wanish.title\n    # getting url of related image if document has it\n    image_url = wanish.image_url\n    # getting two-letter code of the document's language (en, de, es...)\n    language_code = wanish.language\n    # getting a clean html page of a document with article\n    clean_html = wanish.clean_html\n    # getting a short summarized description of the article reduced to several sentences (5 by default)\n    description = wanish.description\n\nAvailable kwarg options for *Wanish()* class (all are optional):\n\n.. code:: python\n\n    wanish = Wanish(url=document_url,\n                    positive_keywords=[\"main\", \"story\"],\n                    negative_keywords=[\"banner\", \"adv\", \"similar\", \"top-ad\"],\n                    summary_sentences_qty=5,\n                    headers={'user-agent': 'test-purposes/0.0.1'})\n\n-  **url:** Allows to pass an url of a document in constructor. If set,\n   then it will automatically launch *self.perform\\_url(url)* after\n   initialization. Default is None.\n-  **positive\\_keywords:** A list of positive search patterns in classes\n   and ids, for example: *[“main”, “story”]* . Default is None.\n-  **negative\\_keywords:** A list of negative search patterns in classes\n   and ids, for example: *[“banner”, “adv”, “similar”, “top-ad”]* .\n   Default is None.\n-  **summary\\_sentences\\_qty:** Maximum quantity of sentences in\n   summarized text of the document. Set to 5 by default.\n-  **headers:** Dict of additional custom headers for GET request to\n   obtain web page of the article. Default is None.\n\nSpecial Thanks\n--------------\n\n-  https://github.com/nltk/nltk\n-  https://github.com/buriy/python-readability\n-  https://github.com/saffsd/langid.py\n\n.. _DEMO: http://reefeed.com","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneegor%2Fwanish","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneegor%2Fwanish","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneegor%2Fwanish/lists"}