https://github.com/transitive-bullshit/text-summarization
Automagically generates summaries from html or text.
https://github.com/transitive-bullshit/text-summarization
extractive-summarization extractive-text-summarization summarization summarize summary text
Last synced: 6 months ago
JSON representation
Automagically generates summaries from html or text.
- Host: GitHub
- URL: https://github.com/transitive-bullshit/text-summarization
- Owner: transitive-bullshit
- Created: 2019-11-04T22:38:00.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-02-10T05:18:52.000Z (over 2 years ago)
- Last Synced: 2025-03-28T04:41:56.948Z (6 months ago)
- Topics: extractive-summarization, extractive-text-summarization, summarization, summarize, summary, text
- Language: JavaScript
- Homepage:
- Size: 326 KB
- Stars: 66
- Watchers: 3
- Forks: 20
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# text-summarization
> Automagically generates summaries from html or text.
[](https://www.npmjs.com/package/text-summarization) [](https://travis-ci.com/transitive-bullshit/text-summarization) [](https://standardjs.com)
## Intro
This module powers Automagical's text summarization, which was [acquired by Verblio in 2018](https://www.verblio.com/blog/we-bought-a-company).
It provides the most powerful and comprehensive text summarization available on NPM.
## Features
- Uses a variety of metrics to generate quality extractive text summaries
- Handles html or text-based content
- Utilizes html structure as a signal of text importance
- Includes basic abstractive shortening of extracted sentences
- Usable as a node module or cli
- Thoroughly tested and used in production## Install
This module is usable either as a CLI or as a module.
```bash
npm install --save text-summarization
```## Usage
```js
const summarize = require('text-summarization')const fs = require('fs')
const html = fs.readFileSync('fixtures/automagical-1.html')const summary = await summarize({ html })
console.log(JSON.stringify(summary, null, 2))
```which outputs:
```
{
"extractive": [
"Why you should drop everything and try Automagical",
"Video content is significantly more engaging than text content",
"Go from blog post → video in 5 minutes.",
"Our builder is exceptionally easy to use.",
"For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical."
]
}
```## CLI
```
npm install -g text-summarization
```This installs a `summarize` binary globally.
```bash
Usage: summarize [options]Options:
-V, --version output the version number
-n, --num-sentences number of sentences (defaults to variable length)
-t, --title title
-c, --content-type sets content type to html or text
-d, --detailed print detailed info for top sentences
-D, --detailedAll print detailed info for all sentences
-m, --media resolve links using iframely and return best matching media
-P, --no-pretty-print disable pretty-printing output
-h, --help output usage information
```## Metrics
- tfidf overlap for base relative sentence importance
- html node boosts for tags like `` and ``
- listicle boosts for lists like `2) second item`
- penalty for poor readability or really long sentencesHere's an example of a sentence's internal structure after normalization, processing, and scoring:
```js
{
"index": 8,
"sentence": {
"original": "4. For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
"listItem": 4,
"actual": "For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
"normalized": "for the cost of 1 highly produced video you can get a years worth of videos from automagical",
"tokenized": [
"cost",
"highly",
"produced",
"video",
"years",
"worth",
"videos",
"automagical"
]
},
"liScore": 1,
"nodeScore": 0.7,
"readabilityPenalty": 0,
"tfidfScore": 0.8019447657605553,
"score": 5.601944765760555
}
```## Iframely
This module optionally supports using [iframely](https://iframely.com) to get social previews for any external links in the source html, adding the resulting images and summary text to the source pool of candidate sentences.
To enable this, set the `IFRAMELY_BASE_URL` and `IFRAMELY_API_KEY` environment variables.
## References
- [node-summary](https://github.com/jbrooksuk/node-summary)
- [natural nlp](https://github.com/NaturalNode/natural)
- [retext](https://github.com/wooorm/retext)
- [retext-readability](https://github.com/wooorm/retext-readability)
- [retext-simplify](https://github.com/wooorm/retext-simplify)
- [retext-redundant-acronyms](https://github.com/wooorm/retext-redundant-acronyms)
- [retext-repeated-words](https://github.com/wooorm/retext-repeated-words)## License
MIT © [Travis Fischer](https://transitivebullsh.it)
Support my OSS work by following me on twitter
![]()