{"id":26822667,"url":"https://github.com/dylanpieper/samesies","last_synced_at":"2025-04-26T10:47:20.960Z","repository":{"id":279436967,"uuid":"937624046","full_name":"dylanpieper/samesies","owner":"dylanpieper","description":"Compare Lists for Similarity","archived":false,"fork":false,"pushed_at":"2025-04-21T13:58:42.000Z","size":2237,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-25T22:38:43.148Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://dylanpieper.github.io/samesies/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dylanpieper.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-23T14:21:05.000Z","updated_at":"2025-04-21T13:56:02.000Z","dependencies_parsed_at":"2025-02-25T15:40:29.052Z","dependency_job_id":"83cfbb57-10ab-4768-9a00-b7a25641aeed","html_url":"https://github.com/dylanpieper/samesies","commit_stats":null,"previous_names":["dylanpieper/samesies"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fsamesies","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fsamesies/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fsamesies/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylanpieper%2Fsamesies/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dylanpieper","download_url":"https://codeload.github.com/dylanpieper/samesies/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250926213,"owners_count":21508917,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-30T08:20:11.112Z","updated_at":"2025-04-26T10:47:20.943Z","avatar_url":"https://github.com/dylanpieper.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# samesies \u003cimg src=\"man/figures/samesies-hex.png\" align=\"right\" width=\"140\"/\u003e\n\n[![CRAN status](https://www.r-pkg.org/badges/version/samesies)](https://cran.r-pkg.org/package=samesies) [![R-CMD-check](https://github.com/dylanpieper/samesies/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/dylanpieper/samesies/actions/workflows/R-CMD-check.yaml)\n\nCompare lists of texts, factors, or numerical values to measure their similarity. The motivating use case is evaluating the similarity of large language model (LLM) responses across models, providers, or prompts.\n\n## Installation\n\nYou can install `samesies` from CRAN with:\n\n``` r\ninstall.packages(\"samesies\")\n```\n\n## Basic Usage\n\n`samesies` provides three main functions for measuring similarity:\n\n### `same_text()`\n\nCompare similarity between multiple lists of character strings.\n\n``` r\nlibrary(samesies)\n\nr1 \u003c- list(\"R is a statistical computing software\", \n           \"R enables grammar of graphics using ggplot2\", \n           \"R supports advanced statistical models\")\nr2 \u003c- list(\"R is a full-stack programming language\",\n           \"R enables advanced data visualizations\", \n           \"R supports machine learning algorithms\")\n\ntex \u003c- same_text(r1, r2)\n```\n\nMethods available via [stringdist](https://github.com/markvanderloo/stringdist) (e.g., `method = \"osa\"`):\n\n-   Transformational Algorithms\n\n    -   **osa**: Adjacent transposition edits\n    -   **lv**: Basic edit distance\n    -   **dl**: Comprehensive edit distance with transpositions\n\n-   Structural Comparison\n\n    -   **hamming**: Position-wise character differences\n    -   **lcs**: Longest shared subsequence\n    -   **qgram**: Subsequence matching\n    -   **cosine**: Vector-space string similarity\n    -   **jaccard**: Set-based string comparison\n\n-   Linguistic Matching\n\n    -   **jw**: Prefix-weighted string matching\n    -   **soundex**: Phonetic encoding\n\n### `same_factor()`\n\nCompare similarity between multiple lists of categorical data.\n\n``` r\ncats1 \u003c- list(\"R\", \"R\", \"Python\")\ncats2 \u003c- list(\"R\", \"Python\", \"R\")\n\nfct \u003c- same_factor(cats1, cats2, \n                   levels = c(\"R\", \"Python\"))\n```\n\nMethods available (e.g., `method = \"exact\"`):\n\n-   **exact**: Exact matching\n-   **order**: Distances across ordered factor levels\n\n### `same_number()`\n\nCompare similarity between multiple lists of numeric values.\n\n``` r\nn1 \u003c- list(1, 2, 3)\nn2 \u003c- list(1, 2.1, 3.2)\n\nnum \u003c- same_number(n1, n2)\n```\n\nMethods available (e.g., `method = \"exact\"`):\n\n-   **exact**: Exact matching\n-   **raw**: Absolute difference\n-   **exp**: Exponential decay on the absolute difference\n-   **pct_diff**: Percentage difference\n-   **normalized**: Normalized difference (`max_diff` is computed automatically by default)\n\n``` r\nnum \u003c- same_number(n1, n2, \n                   method = \"normalized\", \n                   max_diff = 2.2)\n```\n\n-   **fuzzy**: Fuzzy matching with dual tolerance system:\n    -   Uses both relative and absolute tolerance thresholds\n    -   The fuzzy matching method calculates numeric similarity by using both a relative (default 2%) and absolute (default 0.05) tolerance\n    -   Values within the maximum of these two epsilon thresholds are considered exact matches (score of 1)\n    -   Similarity scores gradually decrease as the difference grows beyond the threshold\n\n``` r\nnum \u003c- same_number(n1, n2, \n                   method = \"fuzzy\", \n                   epsilon = 0.05,\n                   epsilon_pct = 0.02)\n```\n\n## More Lists\n\nWhen you input more than two lists, compute pairwise comparisons across lists.\n\n## Nested Lists\n\nNested lists are supported as long as they share the same names and lengths.\n\n## Methods\n\nAll three functions return `similar` objects that support the following methods:\n\n-   `print(x)`\n-   `summary(x)`\n-   `average_similarity(x, method = NULL)`\n-   `pair_averages(x, method = NULL)`\n\n## Accessing Object Data\n\nThe package uses S3 objects, allowing access to the underlying data:\n\n-   `$scores`: A list of similarity scores for each method and comparison pair\n-   `$summary`: A list of statistical summaries for each method and comparison pair\n-   `$methods`: The similarity methods used in the analysis\n-   `$list_names`: Names of the input lists\n-   `$raw_values`: The original input values\n-   `$digits`: Number of decimal places for rounding results in output\n\n## Credits\n\nThe Spiderman image in the hex logo is fan art created by the Reddit user [WistlerR15](https://www.reddit.com/r/Spiderman/comments/k3pcj3/remade_the_spiderman_meme_with_my_favorite/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylanpieper%2Fsamesies","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdylanpieper%2Fsamesies","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylanpieper%2Fsamesies/lists"}