{"id":22401171,"url":"https://github.com/guillaumedd/dialign","last_synced_at":"2025-07-31T16:30:40.719Z","repository":{"id":41465435,"uuid":"101656101","full_name":"GuillaumeDD/dialign","owner":"GuillaumeDD","description":"Automatic and generic measures of verbal alignment in dyadic dialogue based on sequential pattern mining at the level of surface of text utterances","archived":false,"fork":false,"pushed_at":"2022-11-15T20:15:26.000Z","size":924,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-10-20T20:07:39.553Z","etag":null,"topics":["ai","dialogue","dialogue-transcript","nlp","sequential-patterns","verbal-alignment-measures"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GuillaumeDD.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-28T15:12:14.000Z","updated_at":"2023-03-02T21:02:55.000Z","dependencies_parsed_at":"2023-01-21T20:17:12.580Z","dependency_job_id":null,"html_url":"https://github.com/GuillaumeDD/dialign","commit_stats":null,"previous_names":[],"tags_count":3,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GuillaumeDD%2Fdialign","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GuillaumeDD%2Fdialign/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GuillaumeDD%2Fdialign/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GuillaumeDD%2Fdialign/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GuillaumeDD","download_url":"https://codeload.github.com/GuillaumeDD/dialign/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228265845,"owners_count":17893836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","dialogue","dialogue-transcript","nlp","sequential-patterns","verbal-alignment-measures"],"created_at":"2024-12-05T08:43:45.818Z","updated_at":"2025-07-31T16:30:40.712Z","avatar_url":"https://github.com/GuillaumeDD.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dialign #\n\nLatest versions:\n- scala : [1.1](CHANGELOG.md#11---2022-06-18) \\[[download link](https://github.com/GuillaumeDD/dialign/releases/download/v1.1/dialign-1.1.zip)\\]\n   + Previous versions: [1.0](CHANGELOG.md#10---2021-02-07)\n- python : see the [`dialign_python` project](https://github.com/akonoroshi/dialign_python)\n\n`dialign` is  a software that  provides automatic and generic  measures of\nverbal alignment and self-repetitions in  dyadic dialogue based on  sequential \npattern mining at the level of surface of text utterances.\n\nA good place to start can be found in the following paper \n(more information can be found in the [\"Citing `dialign`\" section](#citing-dialign)):\n- Dubuisson Duplessis, G.; Langlet, C.; Clavel, C.; Landragin, F., **Towards alignment strategies in human-agent interactions based on measures of lexical repetitions**, Lang Resources \u0026 Evaluation, 2021, 36p. \\[[HAL](https://hal.archives-ouvertes.fr/hal-03147824) \\| [DOI](https://dx.doi.org/10.1007/s10579-021-09532-w)\\]\n\n\n\nTable of content:\n- [Framework](#framework)\n- [Installation](#installation)\n- [Usage](#usage)\n   + [Tutorial to use `dialign` for corpus studies](#dialign-for-corpus-studies)\n   + [Demonstration of `dialign-online` for interactive purposes](#dialign-online-for-interactive-purposes)\n- [Contributors](#contributors)\n- [Citing `dialign`](#citing-dialign)\n- [License](#license)\n\n## Framework\n`dialign`  is based  on the  observation that  the behaviours  of dialogue\nparticipants tend to converge and  automatically align at several levels\n(such as the lexical, syntactic  and semantic ones).  One consequence of\nsuccessful alignment at several  levels between dialogue participants is\na certain  repetitiveness in  dialogue leading to  the development  of a\nlexicon  of   fixed  expressions.   As   a  matter  of   fact,  dialogue\nparticipants tend  to automatically establish and  use fixed expressions\nthat become dialogue routines.\nMore concretely, here follows an excerpt of a dialogue between a human and an\nagent operated by a Woz where instances of shared lexical patterns are coloured \n(from the [journal article](#citing-dialign)):\n\n![Excerpt of a dialogue between a human and an agent where instances of shared expressions are coloured.](./doc/img/example-dialogue.png)\n\n\n`dialign` provides a framework to quantify the interactive lexical alignment process \nand the self-repetition behaviour of dialogue participants (DPs) in dyadic textual \ndialogues. This framework focuses on lexical patterns occurring in dialogue utterances. \nIt distinguishes two main types of such patterns. The first type is shared lexical\npatterns between DPs, i.e., patterns that are initiated (or primed) by a DP,\nsubsequently adopted by the other DP and possibly reused during the dialogue by any DP. \nThese patterns are directly related to the interactive verbal\nalignment process, a particular type of on-the-fly linguistic adaptation. They\ncan be seen as shared dialogue routines at the lexical level. They are a way\nto verbally align and ultimately share a common language to improve understanding, \ncollaboration and social connection to a conversational partner. \nThe second type is lexical self-repetition. Contrary to the previous type which considers \npatterns that are shared between DPs, self-repetition considers each DP\nin isolation. Self-repetitions are lexical patterns appearing at least twice\nin the dialogue utterances of a given DP, independently of the other DP's\nutterances. Self-repetitions are directly related to the self-consistency of the\nlinguistic production of a given DP.\n\n![Idea of the framework: automatic building of the shared expression lexicon to derive verbal alignment measures](./doc/img/framework.png)\n\nThe main concept behind this model is the automatically built lexicon. For each\ndialogue transcript, three lexicons are automatically computed:\n- [the shared expression lexicon](examples/dialign-offline/#the-shared-expression-lexicon): keeps  track of shared  expressions and valuable features about these  expressions  (e.g.,  who  first produced  this  expression,  its frequency)\n- [one self-repetition lexicon per DP](examples/dialign-offline/#the-self-repetition-lexicons): keeps  track of self-repetitions and valuable features about these  patterns  (e.g., its frequency)\n\nLexicons and the dialogue transcript are leveraged by deriving offline and\nonline measures to quantify aspects of the verbal alignment process and the\nself-repetition behaviour of DPs. Offline measures are intended to be used for\npast dialogue interactions (e.g., corpus studies) while online measures are intended\nfor use in a dialogue system.\n\n`dialign` currently provides [out-of-the box offline measures for corpus studies](#dialign-for-corpus-studies).\nOnline usage in a dialogue system is [available as a demonstration](#dialign-online-for-interactive-purposes).\n\n### Measures Provided by dialign\n\n`dialign` provides a set of measures to characterise both:\n1. the interactive verbal alignment process between dialogue participants, and\n2. the self-repetition behaviour of each participant.\n\nThese measures allow the characterisation of the nature of these processes by addressing \nvarious informative aspects such as their variety, strength, complexity, stability, and \norientation. In a nutshell:\n- **variety**: the variety of shared expressions or self-repetitions emerging during a dialogue \n               relative to its length. It is directly related to the number of unique \n               expressions in a lexicon. \n- **strength**: the strength of repetition of the (shared) lexical patterns, i.e., how much the\n                patterns are reused.\n- **complexity**: the complexity indicates the variety of the types of lexical patterns. It is here \n                  featured by Shannon entropy measures. High entropy indicates the presence of\n                  a wide range of lexical patterns relative to their lengths in number of tokens (e.g., ranging \n                  from a single word to a full sentence). On the contrary, low entropy indicates the predominance \n                  of one type of lexical pattern.\n- **extension** and **stability**: The extension and stability of the (shared) lexical patterns are related \n                                   to the size of the lexical patterns. The extension indicates the size of the \n                                   lexical patterns. The longer it is, the more extended the lexical pattern is. \n                                   Extension is directly linked to the stability of the processes since the \n                                   more extended the patterns are, the more stable the processes are.\n- **orientation**: the orientation of the interactive alignment process, i.e., it indicates either a symmetry \n                   (both dialogue participants initiate and reuse the same number of shared lexical patterns),\n                   or an asymmetry (a dialogue participant initiates and/or reuses more shared lexical patterns).\n\n### Measures Characterising the Interactive Verbal Alignment Process\n#### Speaker-independent\n| Measure | Description |  Aspects |\n| :---:   | :---       |  :---:   |\n|  EV     |  Expression Variety (EV). The shared expression lexicon size normalized by the length of the dialogue (which is its total number of tokens in the dialogue).  | Variety | \n|  ER     |  Expression Repetition (ER). The proportion of tokens which DPs dedicate to the repetition of a shared expression.  | Strength |\n| ENTR    | Shannon entropy of the lengths in token of the shared expression instances. | Complexity |\n| L       | Average length in token of the shared expression instances. | Stability |\n| LMAX    | Maximum length in token of the shared expression instances. | Stability |\n\n#### Speaker-dependent\n\n| Measure | Description |  Aspects |\n| :---:   | :---       |  :---:   |\n|  IE_S     |  Initiated Expression (IE) for locutor S. Ratio of shared expressions initiated by locutor S.   | Orientation | \n|  ER_S     |  Expression Repetition (ER) for locutor S. Ratio of tokens produced by S belonging to an instance of a shared expression.  | Strength |\n\n### Measures Characterising Self-Repetition Behaviour of each Dialogue Participant\n| Measure | Description |  Aspects |\n| :---:   | :---       |  :---:   |\n|  SEV_S  |  Self-Expression Variety (SEV) for locutor S. For locutor S, the self-repetition lexicon size normalized by the total number of tokens produced by S in the dialogue.  | Variety | \n|  SER_S     |  Self-Expression Repetition (SER) for locutor S. The proportion of tokens which locutor S dedicates to self-repetition.| Strength |\n| SENTR_S    |  Shannon entropy of the length in token of the self-repetitions from S. | Complexity |\n| SL_S       |  Average length in tokens of the self-repetitions from S. | Stability |\n| SLMAX_S    |  Maximum length in token of the self-repetitions from S. | Stability |\n\n### Synthetic Presentation of the Provided Measures\n| Aspect      | Speaker-independent Measures (*) | Speaker-dependent Measures (**) |\n| :---:       | :---:                            | :---:                           |\n| Variety     | EV                               |  SEV_S                          | \n| Strength    | ER                               |  ER_S, SER_S                    |\n| Complexity  | ENTR                             |  SENTR_S                        |\n| Stability   | L, LMAX                          |  SL_S, SLMAX_S                  |\n| Orientation |  --                              |  IE_S                           |\n\n(*) All these measures are related to the interactive verbal alignment process\n\n(**) Measures starting with 'S' are related to the self-repetition behaviour, the others\n     are related to the interactivate verbal alignment process\n\n## Installation ##\n\n### From JAR (preferred way) ###\n\nA ready-to-use JAR is available on github. Check the [latest release](https://github.com/GuillaumeDD/dialign/releases)!\n\n### From source code (for developers) ###\n\nYou can generate the JAR from [SBT](http://www.scala-sbt.org/).\n\nFirst, clone the repository. Then, you can compile the code:\n\n\t$ sbt compile\n\nEventually, you can produce the JAR as follows \n(requires [sbt-assembly](https://github.com/sbt/sbt-assembly)):\n\n\t$ sbt assembly\n\nThe JAR file can be probably found in the directory `dialign/target/scala-2.13/`.\n\n## Usage ##\n\n`dialign` is designed to be **easy to use** from the command line interface.\n\n### `dialign` for Corpus Studies\n\n`dialign` provides out-of-the box offline measures for corpus studies.\n\n#### Tutorial\nA complete **walkthrough tutorial** is available in the \n[examples/dialign-offline/ directory](examples/dialign-offline/). \n\nIn this tutorial, you will:\n- learn [how to format your dialogue transcripts in `tsv` format](examples/dialign-offline/#input-data-dialogue-transcripts) ;\n- learn [how to run `dialign` on a single dialogue transcript](examples/dialign-offline/#running-dialign) (generalisation to a \n  full corpus is straightforward) ; and\n- understand the output files of `dialign` where you can find \n   + [the measures provided by `dialign` for the full corpus in `tsv` format](examples/dialign-offline/#output-files-at-the-level-of-the-corpus),\n   + [the shared expression lexicon in `tsv` format for each dialogue transcript](examples/dialign-offline/#the-shared-expression-lexicon), and\n   + [the self-repetition lexicons in `tsv` format for each dialogue transcript](examples/dialign-offline/#the-self-repetition-lexicons).\n\n\n#### CLI Usage Example ####\n\nLet's  say  that   the  dialogue  files  are  in   the  input  directory\n`input-directory/`  and   that  output  is  planned   in  the  directory\n`output-directory/`. To run `dialign`  with this configuration, proceed as\nfollows:\n```bash\njava -jar dialign.jar -i input-directory/ -o output-directory/\n```\n(here we assume that the dialogue files are encoded in UTF-8, if not it\nis possible to specify a different encoding by adding `-Dfile.encoding=ISO-8859-1`\nwhere `ISO-8859-1` is the desired encoding)\n\n`dialign`  allows to  filter input  dialogue files  by prefix,  suffix and\nextension. For  instance, if the  only input dialogue files  to consider\nare files matching the  following pattern: `dialogue-*-cleaned.dial`, it\nis possible use the following options with dialign:\n```bash\njava -jar dialign.jar -i input-directory/ -o output-directory/ \\\n\t-p \"dialogue-\" \\ # specification of a required filename prefix\n\t-s \"-cleaned\" \\ # specification of a required filename suffix\n\t-e \"tsv\" # specification of the extension (without the '.')\n```\n\nMore options are available, see usage note:\n```bash\njava -jar dialign.jar -h\n```\n\n### `dialign-online` for Interactive Purposes\nThis framework can also be embedded in an interactive system.\nTo demonstrate these capabilities, a complete tutorial is available in the \n[examples/dialign-online/ directory](examples/dialign-online/). \n\nIn this tutorial, you will:\n- learn [how to run `dialign-online` in interactive mode and export the transcript of the created dialogue](examples/dialign-online/#running-dialign-online-in-interactive-mode) ; and\n- learn [how to run `dialign-online` on a single dialogue transcript](examples/dialign-online/#running-dialign-online-on-a-dialogue-transcript) \n  in order to directly compute online metrics for each turn.\n\nA screenshot of this demonstration can be found below:\n\n![Demonstration of `dialign-online`](examples/dialign-online/screenshots/02.png)\n\n## Contributors ##\n\n- Guillaume Dubuisson Duplessis (2017, 2020, 2021, 2022)\n\n## Citing `dialign` ##\n\nIf you want to refer to the framework or to the `dialign` software, please cite \nthe following paper:\n- Dubuisson Duplessis, G.; Langlet, C.; Clavel, C.; Landragin, F., **Towards alignment strategies in human-agent interactions based on measures of lexical repetitions**, Lang Resources \u0026 Evaluation, 2021, 36p. \\[[HAL](https://hal.archives-ouvertes.fr/hal-03147824) \\| [DOI](https://dx.doi.org/10.1007/s10579-021-09532-w)\\]\n\nIf you want to refer to the study strictly limited to verbal alignment on a Human-Agent \nnegotiation task, please cite this paper :\n- Dubuisson Duplessis, G.; Clavel, C.; Landragin, F., **Automatic Measures to \n  Characterise Verbal Alignment in Human-Agent Interaction**, 18th Annual Meeting \n  of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2017, \n  pp. 71--81 \\[[See paper](http://www.sigdial.org/workshops/conference18/proceedings/pdf/SIGDIAL10.pdf) | [BIB](http://www.sigdial.org/workshops/conference18/proceedings/bib/SIGDIAL10.bib)\\]\n\n### Contact\nThe authors  of this work would  be happy to  hear about you if  you are\nusing this code! Please, do not hesitate to contact us:\n- G. Dubuisson Duplessis \u003c[website](http://www.dubuissonduplessis.fr/contact.html)\u003e\n- C. Clavel \u003c[website](https://clavel.wp.imt.fr/)\u003e\n- F. Landragin \u003c[website](http://fred.landragin.free.fr/)\u003e\n\n## License ##\n\nCECILL-B - see the LICENSE file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillaumedd%2Fdialign","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguillaumedd%2Fdialign","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguillaumedd%2Fdialign/lists"}