{"id":13507344,"url":"https://github.com/OCannings/tf-idf","last_synced_at":"2025-03-30T07:32:46.189Z","repository":{"id":62430405,"uuid":"41534262","full_name":"OCannings/tf-idf","owner":"OCannings","description":"tf-idf elixir","archived":false,"fork":false,"pushed_at":"2020-03-03T16:48:13.000Z","size":159,"stargazers_count":17,"open_issues_count":2,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-06T08:07:39.918Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OCannings.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-08-28T07:47:59.000Z","updated_at":"2023-12-20T10:53:09.000Z","dependencies_parsed_at":"2022-11-01T20:30:36.469Z","dependency_job_id":null,"html_url":"https://github.com/OCannings/tf-idf","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCannings%2Ftf-idf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCannings%2Ftf-idf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCannings%2Ftf-idf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OCannings%2Ftf-idf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OCannings","download_url":"https://codeload.github.com/OCannings/tf-idf/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222535179,"owners_count":16999233,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T02:00:31.659Z","updated_at":"2024-11-01T06:31:47.628Z","avatar_url":"https://github.com/OCannings.png","language":"Elixir","funding_links":[],"categories":["Algorithms and Data structures"],"sub_categories":[],"readme":"![Travis CI Build Status](https://travis-ci.org/OCannings/tf-idf.svg?branch=master)\n\n#Tfidf\nAn Elixir implementation of tf-idf\n\n[Based on the blog post by Steven Loria](http://stevenloria.com/finding-important-words-in-a-document-using-tf-idf/)\n\n##What is tf-idf?\n\u003e tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining.\n\n[tf-idf on Wikipedia](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)\n\n## Installation\n```elixir\ndefp deps do\n  [{:tfidf, \"~\u003e 0.1.0\"}]\nend\n```\n\n## Usage\n\n### Tfidf.calculate(word, text, corpus, tokenize_fn \\\\\\ \u0026tokenize(\u00261))\n Calculates the tf-idf for a given word within a text and a corpus (List) of\n  texts.\n```elixir\niex\u003e Tfidf.calculate(\"dog\", \"nice dog dog\", [\"dog hat\", \"dog\", \"cat mat\", \"duck\"])\n0.19178804830118723\n```\n  An optional tokenizer function can be passed as the last argument to replace the default tokenizer:\n```elixir\niex\u003e Tfidf.calculate(\"dog\", \"nice,dog,dog\", [\"dog,hat\", \"dog\", \"cat,mat\", \"duck\"], \u0026String.split(\u00261, \",\"))\n0.19178804830118723\n```\n\n=====\n\n### Tfidf.calculate(word, tokenized_text, corpus)\n  Calculates the tf-idf for a given word within a pre-tokenized list and a corpus\n  comprised of pre-tokenized lists.\n  \n```elixir\niex\u003e Tfidf.calculate(\"dog\", [\"nice\", \"dog\", \"dog\"], [[\"dog\", \"hat\"], [\"dog\"], [\"cat\", \"mat\"], [\"duck\"]])\n0.19178804830118723\n```\n\n=====\n\n### Tfidf.calculate_all(text, corpus, tokenize_fn \\\\\\ \u0026tokenize(\u00261)) \n Calculates the tf-idf for all words in a given text, returns a list\n  of {word, score} tuples.\n\n```elixir\niex\u003e Tfidf.calculate_all(\"nice dog\", [\"dog hat\", \"dog\", \"cat mat\", \"duck\"])\n[{\"nice\", 0.6931471805599453}, {\"dog\", 0.14384103622589042}]\n```\n\n  As with `Tfidf.calculate/4` an optional tokenizer function can be passed\n  as the last argument. This will be used in place of the default tokenizer.\n  \n```elixir\niex\u003e Tfidf.calculate_all(\"nice,dog\", [\"dog,hat\", \"dog\", \"cat,mat\", \"duck\"], \u0026String.split(\u00261, \",\"))\n[{\"nice\", 0.6931471805599453}, {\"dog\", 0.14384103622589042}]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOCannings%2Ftf-idf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOCannings%2Ftf-idf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOCannings%2Ftf-idf/lists"}