{"id":22025867,"url":"https://github.com/digitallinguistics/concordance","last_synced_at":"2025-05-07T09:40:16.563Z","repository":{"id":35138310,"uuid":"210680113","full_name":"digitallinguistics/concordance","owner":"digitallinguistics","description":"A Node.js library for performing concordance-related tasks on a corpus in DLx JSON format","archived":false,"fork":false,"pushed_at":"2023-05-06T20:53:13.000Z","size":111,"stargazers_count":2,"open_issues_count":17,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-18T21:31:57.017Z","etag":null,"topics":["corpora","corpus","corpus-linguistics","digital-linguistics","dlx","linguistics"],"latest_commit_sha":null,"homepage":"https://developer.digitallinguistics.io/concordance/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/digitallinguistics.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-24T19:19:17.000Z","updated_at":"2022-02-12T20:28:34.000Z","dependencies_parsed_at":"2022-08-17T22:35:29.450Z","dependency_job_id":null,"html_url":"https://github.com/digitallinguistics/concordance","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitallinguistics%2Fconcordance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitallinguistics%2Fconcordance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitallinguistics%2Fconcordance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/digitallinguistics%2Fconcordance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/digitallinguistics","download_url":"https://codeload.github.com/digitallinguistics/concordance/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252852345,"owners_count":21814339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corpora","corpus","corpus-linguistics","digital-linguistics","dlx","linguistics"],"created_at":"2024-11-30T07:20:15.352Z","updated_at":"2025-05-07T09:40:16.532Z","avatar_url":"https://github.com/digitallinguistics.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Concordance\n\n[![GitHub releases](https://img.shields.io/github/v/release/digitallinguistics/concordance)][releases]\n[![status](https://github.com/digitallinguistics/concordance/workflows/tests/badge.svg)][actions]\n[![issues](https://img.shields.io/github/issues/digitallinguistics/concordance)][issues]\n[![npm downloads](https://img.shields.io/npm/dt/digitallinguistics/concordance)][npm]\n[![DOI](https://zenodo.org/badge/210680113.svg)][Zenodo]\n[![license](https://img.shields.io/github/license/digitallinguistics/concordance)][license]\n[![GitHub stars](https://img.shields.io/github/stars/digitallinguistics/concordance?style=social)][GitHub]\n\nThe Digital Linguistics (DLx) Concordance library is a Node.js library for creating a concordance of words in a [corpus][corpus] (a collection of texts in a language) which is formatted according to the [Data Format for Digital Linguistics][DaFoDiL] (\u003cabbr title='Data Format for Digital Linguistics'\u003eDaFoDiL\u003c/abbr\u003e) (a JSON-based format). It is useful for anybody doing research involving linguistic corpora. If your data are not yet in DaFoDiL format, there are several converters available [here][converters].\n\nThis library produces a tab-delimited file containing information about each token (instance) of the words specified. By default, the concordance is generated in \u003cdfn\u003eKeyword in Context\u003c/dfn\u003e (\u003cabbr\u003eKWIC\u003c/abbr\u003e) format, where the word is listed along with the immediately preceding and following context. An example of a partial concordance of the word _little_ in _The Three Little Pigs_ is shown in KWIC format below.\n\ntext | utterance | word |                        pre | token  | post                     |\n---- | --------- | ---- | -------------------------: | :----: | ------------------------ |\n3LP  | 1         | 14   | mother pig who had three   | little | pigs and not enough food |\n3LP  | 3         | 3    | The first                  | little | pig was very lazy.       |\n3LP  | 5         | 3    | The second                 | little | pig worked a little bit  |\n3LP  | 5         | 7    | second little pig worked a | little | bit harder but he was    |\n3LP  | 7         | 3    | The third                  | little | pig worked hard all day  |\n\n## Basic Usage\n\nThis following examples process any JSON files in the current directory and output a concordance file to `concordance.tsv` in Keyword in Context format. At a minimum, the concordance function requires a single argument: a wordform or list of wordforms to concordance.\n\nAs a module:\n\n```js\nconst concordance = require(`concordance`)\n\nconst wordforms = [`little`, `big`];\n\nconcordance({ wordforms });\n```\n\nOn the command line:\n\n```cmd\ndlx-conc -k --wordforms=little,big\n```\n\n**Note:** The Keyword in Context format is _not_ enabled by default. It must be enabled by passing the `-k` or `--kwic` flag.\n\n## Options\n\nThe available options are listed below.\n\nModule       | Command Line       | Default             | Description\n------------ | ------------------ | ------------------- | -----------\n`context`    | `-c, --context`    | `10`                | the number of words to show to either side of the token (if the `KWIC` option is set to `true`)\n`dir`        | `-d, --dir`        | `\".\"`               | the directory where the corpus is located\n`KWIC`       | `-k, --KWIC`       | `false`             | whether to create the concordance in Keyword in Context format; adds `pre` and `post` columns to the concordance if true\n`outputPath` | `-o, --outputPath` | `\"concordance.tsv\"` | path where the concordance file should be generated\n`wordforms`  | `-w, --wordforms`  | `[]`                | a string or list of strings of words to concordance (formatted as an array when using as a module, and as a comma-separated list when using on the command line)\n`wordlist`   | `-l, --wordlist`   | `undefined`         | path to a file containing a JSON array of words to concordance\n\n## Contributing\n\n[Report an issue or suggest a feature here.][issues]\n\nPull requests are very welcome. Please make sure you've [opened and issue][issues] for your change first.\n\nNo test suite was written for this library, but you can test the results with `npm test`. A test concordance will be generated at `test/concordance.tsv`.\n\n## About\n\nThis library is authored and maintained by [Daniel W. Hieber][me]. Please consider citing this library following the model below:\n\n\u003e Hieber, Daniel W. 2019. digitallinguistics/concordance. DOI:[10.5281/zenodo.3464144][Zenodo]\n\n[actions]:    https://github.com/digitallinguistics/concordance/actions\n[converters]: https://developer.digitallinguistics.io/#converters\n[corpus]:     https://en.wikipedia.org/wiki/Text_corpus\n[DaFoDiL]:    https://format.digitallinguistics.io/\n[GitHub]:     https://github.com/digitallinguistics/concordance\n[issues]:     https://github.com/digitallinguistics/concordance/issues\n[Jasmine]:    https://jasmine.github.io/\n[license]:    https://github.com/digitallinguistics/concordance/blob/master/LICENSE.md\n[me]:         https://danielhieber.com/\n[npm]:        https://www.npmjs.com/package/@digitallinguistics/concordance\n[releases]:   https://github.com/digitallinguistics/concordance/releases\n[Zenodo]:     https://zenodo.org/badge/latestdoi/210680113\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitallinguistics%2Fconcordance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdigitallinguistics%2Fconcordance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdigitallinguistics%2Fconcordance/lists"}