Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/saasify-sh/synopsis

Automagical summarization for webpages and articles. 🔥
https://github.com/saasify-sh/synopsis

saas saasify summarization summarizer summary text text-summarization

Last synced: 8 days ago
JSON representation

Automagical summarization for webpages and articles. 🔥

Awesome Lists containing this project

README

        





# synopsis

> Automagical AI-powered summarization for webpages and articles.

- Uses state of the art AI to extract the core content of any webpage
- Uses a variety of metrics to generate quality extractive text summaries
- Summarizes html or text content
- Utilizes html structure as a signal of text importance
- Includes basic abstractive shortening of extracted sentences
- Hosted by [Saasify](https://saasify.sh)
- Thoroughly tested and used in production

## Examples

The following examples all use [HTTPie](https://httpie.org/), a more intuitive version of `curl`.

### The Onion Example 1

*Input:* ([article](https://www.theonion.com/fun-toy-banned-because-of-three-stupid-dead-kids-1819565691))
```bash
http POST \
'https://ssfy.sh/dev/synopsis/summarize' \
'url=https://www.theonion.com/fun-toy-banned-because-of-three-stupid-dead-kids-1819565691'
```

*Output:*
```json
[
"Fun Toy Banned Because Of Three Stupid Dead Kids",
"So now we have to do a full recall and halt production on what was a really awesome toy.",
"But now I'll never see it again, all because three stupid idiots had to go and wreck everything.\"",
"\"She thought the broken shards were candy.",
"That's what you'd assume after breaking a plastic, inedible toy, right?",
"\"I considered this for a while, but then I decided no. No way.",
"If you're 11 years old, you should know that it's impossible to fly.",
"And poor Wizco's probably going to go bankrupt because of this shit."
]
```

### The Onion Example 2

*Input:* ([article](https://local.theonion.com/plan-to-get-laid-at-dragoncon-2001-fails-1819566152))
```bash
http POST \
'https://ssfy.sh/dev/synopsis/summarize' \
'url=https://local.theonion.com/plan-to-get-laid-at-dragoncon-2001-fails-1819566152'
```

*Output:*
```json
[
"Plan To Get Laid At DragonCon 2001 Fails",
"\"I know a lot of girls online, but that's not really the same,\" Melcher said.",
"\"I imagined some girl and I talking about the new Lord Of The Rings movie,\" Melcher said.",
"\"I guess girls aren't into dragons and superheroes as much as guys are,\" Melcher said.",
"\"Andy and I went to this Sailor Moon thing because we knew girls would be there,\" Melcher said.",
"\"Make no mistake—we do not like Sailor Moon.",
"The women, however, were only interested in talking about Sailor Moon.",
"\"This one girl asked me if I wrote fan fiction, and I said yes,\" Melcher said.",
"The following night, Melcher attended a party he had heard about in an online chat room."
]
```

### The Onion Example 3

*Input:* ([article](https://www.theonion.com/everyone-involved-in-pizzas-preparation-delivery-purc-1819564897))
```bash
http POST \
'https://ssfy.sh/dev/synopsis/summarize' \
'url=https://www.theonion.com/everyone-involved-in-pizzas-preparation-delivery-purc-1819564897'
```

*Output:*
```json
[
"Everyone Involved In Pizza's Preparation, Delivery, Buy Extremely High",
"After taking the order, Lindeman relayed it to co-worker and fellow stoner Greg Kanner.",
"At 1 a.m. Monday, the pizza came into material being for the first time.",
"\"After all, it's just pizza, right?",
"Also, Bickell and Wang had forgotten to include their apartment number with the order.",
"Fuck!\" Behr later described the prolonged Blount Street search as \"a serious fucking hassle.\"",
"\"They were seriously bitching me out,\" said Lindeman, who was royally baked at the time.",
"\"I was like, 'Dude, just chill, your pizza will be there any sec.'\"",
"Finally, at 3:10 a.m., more than three hours after the order was placed, the pizza reached its destination."
]
```

### Wait But Why Example

*Input:* ([article](https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html))
```bash
http POST \
'https://ssfy.sh/dev/synopsis/summarize' \
'url=https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html'
```

*Output:*
```json
[
"The AI Revolution: The Road to Superintelligence",
"The Far Future—Coming Soon",
"The Road to Superintelligence",
"What Is AI?",
"Where We Are Now—A World Running on ANI",
"The Road From ANI to AGI",
"Plagiarize the brain.",
"Try to make evolution do what it did before but for us this time.",
"Make this whole thing the computer’s problem, not ours.",
"All This Could Happen Soon"
]
```

## Metrics

Replace `/summarize` with `/detail` to see the full metrics for how the input was processed which will give you a better understanding of why certain sentences were deemed more important by the algorithm.

- tfidf overlap for base relative sentence importance
- html node boosts for tags like `

` and ``
- listicle boosts for lists like `2) second item`
- penalty for poor readability or really long sentences

Here's an example of a sentence's internal structure after normalization, processing, and scoring:

```js
{
"index": 8,
"sentence": {
"original": "4. For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
"listItem": 4,
"actual": "For the cost of 1 highly produced video, you can get a year's worth of videos from Automagical.",
"normalized": "for the cost of 1 highly produced video you can get a years worth of videos from automagical",
"tokenized": [
"cost",
"highly",
"produced",
"video",
"years",
"worth",
"videos",
"automagical"
]
},
"liScore": 1,
"nodeScore": 0.7,
"readabilityPenalty": 0,
"tfidfScore": 0.8019447657605553,
"score": 5.601944765760555
}
```

Support my OSS work by following me on twitter twitter