{"id":13454742,"url":"https://github.com/Tjatse/node-readability","last_synced_at":"2025-03-24T06:31:20.241Z","repository":{"id":16871298,"uuid":"19631679","full_name":"Tjatse/node-readability","owner":"Tjatse","description":"Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.","archived":false,"fork":false,"pushed_at":"2018-08-01T06:37:53.000Z","size":587,"stargazers_count":343,"open_issues_count":7,"forks_count":36,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-20T04:34:54.927Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tjatse.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-05-10T02:42:52.000Z","updated_at":"2024-11-25T02:24:43.000Z","dependencies_parsed_at":"2022-08-04T12:00:11.612Z","dependency_job_id":null,"html_url":"https://github.com/Tjatse/node-readability","commit_stats":null,"previous_names":["tjatse/read-art"],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tjatse%2Fnode-readability","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tjatse%2Fnode-readability/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tjatse%2Fnode-readability/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tjatse%2Fnode-readability/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tjatse","download_url":"https://codeload.github.com/Tjatse/node-readability/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245222523,"owners_count":20580178,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T08:00:57.535Z","updated_at":"2025-03-24T06:31:19.866Z","avatar_url":"https://github.com/Tjatse.png","language":"JavaScript","readme":"read-art [![NPM version](https://badge.fury.io/js/read-art.svg)](http://badge.fury.io/js/read-art) [![Build Status](https://travis-ci.org/Tjatse/node-readability.svg?branch=master)](https://travis-ci.org/Tjatse/node-readability) [![js-standard-style](https://img.shields.io/badge/code%20style-standard-brightgreen.svg)](http://standardjs.com/)\n=========\n[![NPM](https://nodei.co/npm/read-art.png?downloads=true\u0026downloadRank=true\u0026stars=true)](https://nodei.co/npm/read-art/)\n\n1. Readability reference to Arc90's.\n2. Scrape article from any page (automatically).\n3. Make any web page readable, no matter Chinese or English.\n\n\u003e *快速抓取网页文章标题和内容，适合node.js爬虫使用，服务于ElasticSearch。*\n\n## Guide\n\n- [Features](https://github.com/Tjatse/node-readability/wiki/Handbook#features)\n- [Performance](https://github.com/Tjatse/node-readability/wiki/Handbook#perfs)\n- [Installation](https://github.com/Tjatse/node-readability/wiki/Handbook#ins)\n- [Usage](https://github.com/Tjatse/node-readability/wiki/Handbook#usage)\n- [Debug](https://github.com/Tjatse/node-readability/wiki/Handbook#debug)\n- [Score Rule](https://github.com/Tjatse/node-readability/wiki/Handbook#score_rule)\n- [Extract Selectors](https://github.com/Tjatse/node-readability/wiki/Handbook#selectors)\n- [Image Fallback](https://github.com/Tjatse/node-readability/wiki/Handbook#imgfallback)\n- [Threshold](https://github.com/Tjatse/node-readability/wiki/Handbook#threshold)\n- [Customize Settings](https://github.com/Tjatse/node-readability/wiki/Handbook#cus_sets)\n- [Output](https://github.com/Tjatse/node-readability/wiki/Handbook#output)\n- [Notes](https://github.com/Tjatse/node-readability/wiki/Handbook#notes)\n\n## How it works\n\nIn my case, the speed of [spider](https://github.com/Tjatse/spider2) is about **1500k documents per day**, and the maximize crawling speed is **1.2k /minute**, **avg 1k /minute**, the memory cost are about **200 MB** on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing [Score Rules](https://github.com/Tjatse/node-readability/wiki/Handbook#score_rule) or [Selectors](https://github.com/Tjatse/node-readability/wiki/Handbook#selectors). it's better than any other readability modules.\n\u003e (4) Server infos:\n\u003e * 20M bandwidth of fibre-optical\n\u003e * 8 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz cpus\n\u003e * 32G memory\n","funding_links":[],"categories":["Packages","包","Humanize","目录","Number"],"sub_categories":["Humanize","人性化"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTjatse%2Fnode-readability","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTjatse%2Fnode-readability","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTjatse%2Fnode-readability/lists"}