{"id":13857902,"url":"https://github.com/juba/rainette","last_synced_at":"2025-10-04T02:03:33.355Z","repository":{"id":50639981,"uuid":"153594739","full_name":"juba/rainette","owner":"juba","description":"R implementation of the Reinert text clustering method","archived":false,"fork":false,"pushed_at":"2024-05-05T17:56:08.000Z","size":16202,"stargazers_count":57,"open_issues_count":5,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-07T03:46:37.306Z","etag":null,"topics":["r","text-analysis","text-classification"],"latest_commit_sha":null,"homepage":"https://juba.github.io/rainette/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-18T09:02:45.000Z","updated_at":"2025-06-20T13:07:10.000Z","dependencies_parsed_at":"2024-05-05T18:52:12.706Z","dependency_job_id":null,"html_url":"https://github.com/juba/rainette","commit_stats":{"total_commits":510,"total_committers":4,"mean_commits":127.5,"dds":"0.20588235294117652","last_synced_commit":"a92b316a06fff1f36024abf7f9ead0032784d2d8"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/juba/rainette","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juba%2Frainette","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juba%2Frainette/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juba%2Frainette/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juba%2Frainette/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juba","download_url":"https://codeload.github.com/juba/rainette/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juba%2Frainette/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264505735,"owners_count":23618963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["r","text-analysis","text-classification"],"created_at":"2024-08-05T03:01:50.146Z","updated_at":"2025-10-04T02:03:28.312Z","avatar_url":"https://github.com/juba.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"# Rainette  \n\n[![CRAN status](https://www.r-pkg.org/badges/version-ago/rainette)](https://cran.r-project.org/package=rainette)\n[![rainette status badge](https://juba.r-universe.dev/badges/rainette)](https://juba.r-universe.dev)\n[![DOI](https://zenodo.org/badge/153594739.svg)](https://zenodo.org/badge/latestdoi/153594739)\n![CRAN Downloads](https://cranlogs.r-pkg.org/badges/last-month/rainette)\n[![R build status](https://github.com/juba/rainette/workflows/R-CMD-check/badge.svg)](https://github.com/juba/rainette/actions?query=workflow%3AR-CMD-check)\n\u003c!-- [![Coverage status](https://codecov.io/gh/juba/rainette/branch/master/graph/badge.svg)](https://codecov.io/github/juba/rainette?branch=master) --\u003e\n\nRainette is an R package which implements a variant of the Reinert textual clustering method. This method is available in other softwares such as [Iramuteq](http://www.iramuteq.org/) (free software) or [Alceste](https://www.image-zafar.com/Logiciel.html) (commercial, closed source).\n\n## Features\n\n- Simple and double clustering algorithms\n- Plot functions and shiny interfaces to visualise and explore clustering results\n- Utility functions to split a corpus into segments or import a corpus in Iramuteq format\n\n## Installation\n\nThe package is installable from CRAN.\n\n```r\ninstall_packages(\"rainette\")\n```\n\nThe development version is installable from [R-universe](https://r-universe.dev).\n\n```r\ninstall.packages(\"rainette\", repos = \"https://juba.r-universe.dev\")\n```\n\n## Usage\n\nLet's start with an example corpus provided by the excellent [quanteda](https://quanteda.io) package.\n\n```r\nlibrary(quanteda)\ndata_corpus_inaugural\n```\n\nFirst, we'll use `split_segments()` to split each document into segments of about 40 words (punctuation is taken into account).\n\n```r\ncorpus \u003c- split_segments(data_corpus_inaugural, segment_size = 40)\n```\n\nNext, we'll apply some preprocessing and compute a document-term matrix with `quanteda` functions.\n\n```r\ntok \u003c- tokens(corpus, remove_punct = TRUE)\ntok \u003c- tokens_remove(tok, stopwords(\"en\"))\ndtm \u003c- dfm(tok, tolower = TRUE)\ndtm \u003c- dfm_trim(dtm, min_docfreq = 10)\n```\n\nWe can then apply a simple clustering on this matrix with the `rainette()` function. We specify the number of clusters (`k`), and the minimum number of forms in each segment (`min_segment_size`). Segments which do not include enough forms will be merged with the following or previous one when possible.\n\n```r\nres \u003c- rainette(dtm, k = 6, min_segment_size = 15)\n```\n\nWe can use the `rainette_explor()` shiny interface to visualise and explore the different clusterings at each `k`.\n\n```r\nrainette_explor(res, dtm, corpus)\n```\n\n![rainette_explor() interface](man/figures/rainette_explor_plot_en.png)\n\nThe *Cluster documents* tab allows to browse and filter the documents in each cluster.\n\n![rainette_explor() documents tab](man/figures/rainette_explor_docs_en.png)\n\nWe can also directly generate the clusters description plot for a given `k` with `rainette_plot()`.\n\n```r\nrainette_plot(res, dtm, k = 5)\n```\n\nOr cut the tree at chosen `k` and add a group membership variable to our corpus metadata.\n\n```r\ndocvars(corpus)$cluster \u003c- cutree(res, k = 5)\n```\n\nIn addition to this, we can also perform a double clustering, *ie* two simple clusterings produced with different `min_segment_size` which are then \"crossed\" to generate more robust clusters. To do this, we use `rainette2()` on two `rainette()` results :\n\n```r\nres1 \u003c- rainette(dtm, k = 5, min_segment_size = 10)\nres2 \u003c- rainette(dtm, k = 5, min_segment_size = 15)\nres \u003c- rainette2(res1, res2, max_k = 5)\n```\n\nWe can then use `rainette2_explor()` to explore and visualise the results.\n\n```r\nrainette2_explor(res, dtm, corpus)\n```\n\n![rainette2_explor() interface](man/figures/rainette2_explor_en.png)\n\n## Tell me more\n\nTwo vignettes are available :\n\n- Introduction and usage vignette : [english](https://juba.github.io/rainette/articles/introduction_en.html), [french](https://juba.github.io/rainette/articles/introduction_usage.html)\n- Algorithms description vignette : [english](https://juba.github.io/rainette/articles/algorithms_en.html), [french](https://juba.github.io/rainette/articles/algorithmes.html)\n\n## Credits\n\nThis clustering method has been created by Max Reinert, and is described in several articles, notably :\n\n- Reinert M., \"Une méthode de classification descendante hiérarchique : application à l'analyse lexicale par contexte\", *Cahiers de l'analyse des données*, Volume 8, Numéro 2, 1983. \u003chttp://www.numdam.org/item/?id=CAD_1983__8_2_187_0\u003e\n- Reinert M., \"Alceste une méthodologie d'analyse des données textuelles et une application: Aurelia De Gerard De Nerval\", *Bulletin de Méthodologie Sociologique*, Volume 26, Numéro 1, 1990. \u003chttps://doi.org/10.1177/075910639002600103\u003e\n- Reinert M., \"Une méthode de classification des énoncés d’un corpus présentée à l’aide d’une application\", *Les cahiers de l’analyse des données*, Tome 15, Numéro 1, 1990. \u003chttp://www.numdam.org/item/?id=CAD_1990__15_1_21_0\u003e\n\nThanks to Pierre Ratineau, the author of [Iramuteq](http://www.iramuteq.org/), for providing it as free software and open source. Even if the R code has been almost entirely rewritten, it has been a precious resource to understand the algorithms.\n\nMany thanks to [Sébastien Rochette](https://github.com/statnmap) for the creation of the hex logo.\n\nMany thanks to [Florian Privé](https://github.com/privefl/) for his work on rewriting and optimizing the Rcpp code.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuba%2Frainette","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuba%2Frainette","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuba%2Frainette/lists"}