https://github.com/jparkerweb/sentence-parse
📄 parse sentences from input text
https://github.com/jparkerweb/sentence-parse
parse segment sentence split text
Last synced: 8 months ago
JSON representation
📄 parse sentences from input text
- Host: GitHub
- URL: https://github.com/jparkerweb/sentence-parse
- Owner: jparkerweb
- Created: 2024-12-19T04:46:08.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-01-30T18:30:42.000Z (8 months ago)
- Last Synced: 2025-01-30T18:38:25.895Z (8 months ago)
- Topics: parse, segment, sentence, split, text
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/sentence-parse
- Size: 143 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 📄 Sentence Parse
A simple utility to parse text into sentences.
## Installation
```bash
npm install sentence-parse
```## Usage
The parser can be used to split text into sentences with various options. Here's a basic example:
```javascript
import { parseSentences } from 'sentence-parse';// Parse from string
const text = "Hello world! This is a test.";
const sentences = await parseSentences(text);
console.log(sentences);
// Output: ["Hello world!", "This is a test."]// Parse from file
import { readFile } from 'fs/promises';
import { join } from 'path';const fileText = await readFile(join(process.cwd(), 'text-file.txt'), 'utf8');
const fileSentences = await parseSentences(fileText);
console.log(fileSentences);
```### Options
- **observeMultipleLineBreaks**: Treats two or more consecutive line breaks as separate sentences. Default is `false`.
- **removeStartLineSequences**: Removes specified sequences at the start of each line. Default is an empty array `[]`.
- **preserveHTMLBreaks**: Preserves HTML `
` and `` tags as line breaks in the text. Default is `true`.
- **preserveListItems**: Preserves list items by adding a prefix to each `
- **listItemPrefix**: Specifies the prefix to use for list items when `preserveListItems` is `true`. Default is `'- '`.
- **excludeNonLetterSentences**: Excludes segments that contain no letters (only numbers, symbols, etc). Default is `false`.
## Examples
### Using observeMultipleLineBreaks
```javascript
import { parseSentences } from 'sentence-parse';
const text = "Hello world!\n\nThis is a test.";
const sentences = await parseSentences(text, { observeMultipleLineBreaks: true });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
### Using removeStartLineSequences
```javascript
import { parseSentences } from 'sentence-parse';
const text = "> Hello world!\n> This is a test.";
const sentences = await parseSentences(text, { removeStartLineSequences: ['>'] });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
### Using HTML Options
```javascript
import { parseSentences } from 'sentence-parse';
const htmlText = `
Hello world!
This is a test.
- First item
- Second item
`;
const sentences = await parseSentences(htmlText, {
preserveHTMLBreaks: true,
preserveListItems: true,
listItemPrefix: '* '
});
console.log(sentences);
// Output: ["Hello world!", "This is a test.", "* First item", "* Second item"]
```
### Using excludeNonLetterSentences
```javascript
import { parseSentences } from 'sentence-parse';
const text = "Hello world! $4,000,000. This is a test.";
const sentences = await parseSentences(text, { excludeNonLetterSentences: true });
console.log(sentences);
// Output: ["Hello world!", "This is a test."]
```
## Example
Check out `example/example.js` for a working example that parses sentences from a text file.
Run the example:
```bash
cd example
node example
```