{"id":27559965,"url":"https://github.com/willshiao/transcript-parser","last_synced_at":"2025-04-20T02:57:59.578Z","repository":{"id":57378983,"uuid":"55820717","full_name":"willshiao/transcript-parser","owner":"willshiao","description":"Parses plaintext speech/debate/radio transcripts into JavaScript objects.","archived":false,"fork":false,"pushed_at":"2020-05-01T05:51:34.000Z","size":186,"stargazers_count":2,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-20T02:57:55.795Z","etag":null,"topics":["hacktoberfest","javascript","nodejs","npm","parse","parser","transcript","transcript-parser"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/willshiao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-09T01:53:11.000Z","updated_at":"2024-10-04T18:32:45.000Z","dependencies_parsed_at":"2022-09-02T21:21:34.700Z","dependency_job_id":null,"html_url":"https://github.com/willshiao/transcript-parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willshiao%2Ftranscript-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willshiao%2Ftranscript-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willshiao%2Ftranscript-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/willshiao%2Ftranscript-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/willshiao","download_url":"https://codeload.github.com/willshiao/transcript-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249841727,"owners_count":21333099,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest","javascript","nodejs","npm","parse","parser","transcript","transcript-parser"],"created_at":"2025-04-20T02:57:59.027Z","updated_at":"2025-04-20T02:57:59.572Z","avatar_url":"https://github.com/willshiao.png","language":"JavaScript","readme":"transcript-parser\n=================\n[![Build Status](https://travis-ci.org/willshiao/transcript-parser.svg?branch=master)](https://travis-ci.org/willshiao/transcript-parser)\n[![Coverage Status](https://coveralls.io/repos/github/willshiao/transcript-parser/badge.svg?branch=master)](https://coveralls.io/github/willshiao/transcript-parser?branch=master)\n[![npm](https://img.shields.io/npm/v/transcript-parser.svg?maxAge=2592000)](https://www.npmjs.com/package/transcript-parser)\n[![Known Vulnerabilities](https://snyk.io/test/github/willshiao/transcript-parser/badge.svg)](https://snyk.io/test/github/willshiao/transcript-parser)\n\n- [Description](#description)\n- [Usage](#usage)\n- [Config](#config)\n- [Documentation](#documentation)\n  * [\\.parseStream()](#parsestream)\n  * [\\.parseOneSync()](#parseonesync)\n  * [\\.parseOne()](#parseone)\n  * [\\.resolveAliasesSync()](#resolvealiasessync)\n  * [\\.resolveAliases()](#resolvealiases)\n- [Example](#example)\n\n\n## Description\n\nParses plaintext speech/debate/radio transcripts into JavaScript objects. It is still in early development. Pull requests are welcome.\n\nTests can be run with `npm test` and a benchmark can be run with `npm run benchmark`. For a full coverage report using [Istanbul](https://github.com/gotwarlost/istanbul), run `npm run travis-test`.\n\nTested for Node.js \u003e= v4.4.6\n\n## Usage\n\n`npm install transcript-parser`\n\n```node\n'use strict';\n\nconst fs = require('fs');\nconst TranscriptParser = require('transcript-parser');\nconst tp = new TranscriptParser();\n\n// Synchronous example\nconst parsed = tp.parseOneSync(fs.readFileSync('transcript.txt', 'utf8'));\nconsole.log(parsed);\n\n// Asynchronous example\nfs.readFile('transcript.txt', (err, data) =\u003e {\n  if(err) return console.error('Error:', err);\n  tp.parseOne(data, (err, parsed) =\u003e {\n    if(err) return console.error('Error:', err);\n    console.log(parsed);\n  }));\n});\n\n// Stream example\nconst stream = fs.createReadStream('transcript.txt', 'utf8');\ntp.parseStream(stream, (err, parsed) =\u003e {\n  if(err) return console.error('Error:', err);\n  console.log(parsed);\n});\n```\n\n\n## Config\n\nThe constructor for `TranscriptParser` accepts a settings object.\n\n- `removeActions`\n    + default: `true`\n    + Specifies if the parser should remove actions (e.g. `(APPLAUSE)`).\n- `removeAnnotations`\n    + default: `true`\n    + Specifies if the parser should remove annotations (surrounded by `[]`).\n- `removeTimestamps`\n    + default: `true`\n    + **True if `removeAnnotations` is true**\n    + Specifies if the parser should remove timestamps (in the `[##:##:##]` format).\n- `removeUnknownSpeakers`\n    + default: `false`\n    + Specifies if the parser should remove lines that have no associated speaker.\n    + If true, lines that have no associated speaker will be stored under the key `none`.\n- `blacklist`\n    + default: `[]`\n    + A list of speakers (as strings) that the parser should ignore.\n- `aliases`\n    + default: `{}`\n    + A object with the real name as the key and an `Array` of the aliases' regular expressions as the value.\n    + Example: `{ \"Mr. Robot\": [ /[A-Z\\ ]*SLATER[A-Z\\ ]*/ ] }`\n        * Renames all speakers who match the regex to \"Mr. Robot\".\n- `regex` _(\u003e= v0.7.1)_\n  + `newLine`\n    * default: `/(?:\\r?\\n)+/` (`RegExp` literal)\n    * The regular expression used to match new line seperators (CRLF, LF).\n    * Should be set to match multiple consecutive seperators for the fastest parsing.\n    * Example: `/\\|/`\n      - Uses a single pipe (`|`) symbol to indicate a new line instead of the traditional LF or CRLF.\n\nSettings can be changed after object creation by changing the corresponding properties of `tp.settings`, where `tp` is an instance of `TranscriptParser`.\n\n\n## Documentation\n\n### .parseStream()\n\nThe `parseStream()` method parses a [`Stream`](https://nodejs.org/api/stream.html) and returns an object representing it.\n\nThis is the preferred method for parsing streams asynchronously as it doesn't have to load the entire transcript into memory (unlike `parseOne()`).\n\n#### Syntax\n\n`tp.parseOneSync(stream, callback)`\n\n##### Parameters\n\n- `stream`\n    + The `Readable` stream to read.\n- `callback(err, data)`\n    + A callback to be executed on function completion or error.\n\n\n### .parseOneSync()\n\nThe `parseOneSync()` method parses a string and returns an object representing it.\n\n#### Syntax\n\n`tp.parseOneSync(transcript)`\n\n##### Parameters\n\n- `transcript`\n    + The transcript, as a `string`.\n\n\n### .parseOne()\n\nThe `parseOne()` method parses a string and returns an object representing it.\n\n#### Syntax\n\n`tp.parseOne(transcript, callback)`\n\n##### Parameters\n\n- `transcript`\n    + The transcript, as a `string`.\n- `callback(err, data)`\n    + A callback to be exectuted on function completion or error.\n\n\n### .resolveAliasesSync()\n\nThe `resolveAliasesSync()` method resolves all aliases specified in the configuration passed to the `TranscriptParser`'s constructor (see above).\n\nRenames the names in the `order` list to match the new names in the transcript. Note that there is a signifigant performance penalty, so don't use this method unless you need it.\n\n#### Syntax\n\n`tp.resolveAliasesSync(data)`\n\n##### Parameters\n\n- `data`\n    + The transcript object after being parsed.\n \n\n### .resolveAliases()\n\nThe `resolveAliases()` method resolves all aliases specified in the configuration passed to the `TranscriptParser`'s constructor (see above).\n\nRenames the names in the `order` list to match the new names in the transcript. Note that there is a signifigant performance penalty, so don't use this method unless you need it.\n\n#### Syntax\n\n`tp.resolveAliases(data, callback)`\n\n##### Parameters\n\n- `data`\n    + The transcript object after being parsed.\n- `callback(err, resolved)`\n    + A callback to be executed on function completion or error.\n\n\n## Example\n\n### Input\n```\nA: I like Node.js.\nA: I also like C#.\nB: I like Node.js too!\nA: I especially like the Node Package Manager.\n```\n\n### Output\n```node\n{\n  speaker: {\n    A: [\n      'I like Node.js.',\n      'I also like C#.',\n      'I especially like the Node Package Manager.'\n    ],\n    B: ['I like Node.js too!']\n  },\n  order: ['A', 'A', 'B', 'A']\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillshiao%2Ftranscript-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwillshiao%2Ftranscript-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwillshiao%2Ftranscript-parser/lists"}