{"id":25449804,"url":"https://github.com/elchemista/rassifier","last_synced_at":"2026-04-28T18:03:35.066Z","repository":{"id":277496743,"uuid":"932606266","full_name":"elchemista/rassifier","owner":"elchemista","description":"Rassifier is an Elixir library that provides low-resource text classification powered by a Rust implementation using lrtc","archived":false,"fork":false,"pushed_at":"2025-02-14T07:38:13.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-14T08:31:29.248Z","etag":null,"topics":["classification","elixir","lrtc","rust","rustler"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elchemista.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-14T07:31:05.000Z","updated_at":"2025-02-14T07:38:16.000Z","dependencies_parsed_at":"2025-02-14T08:41:49.368Z","dependency_job_id":null,"html_url":"https://github.com/elchemista/rassifier","commit_stats":null,"previous_names":["elchemista/rassifier"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Frassifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Frassifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Frassifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Frassifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elchemista","download_url":"https://codeload.github.com/elchemista/rassifier/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239366245,"owners_count":19626656,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","elixir","lrtc","rust","rustler"],"created_at":"2025-02-17T21:19:12.377Z","updated_at":"2026-04-28T18:03:35.054Z","avatar_url":"https://github.com/elchemista.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rassifier\n\n**Rassifier** is an Elixir library that provides low-resource text classification powered by a Rust implementation using [lrtc](https://github.com/jerryjliu/lrtc) (Low-Resource Text Classification). It allows you to load a small training dataset from a CSV file and classify short text queries using compression-based similarity measures.\n\nThe library supports customization of the compression level, the number of nearest neighbors (k), and the compression algorithm (e.g. `\"zstd\"`, `\"gzip\"`, `\"zlib\"`, `\"deflate\"`). Rassifier is integrated into your Elixir application via a GenServer-based worker that wraps the Rust NIF.\n\n---\n\n## Installation\n\n```elixir\ndef deps do\n  [\n    {:rassifier, \"~\u003e 0.1.1\", github: \"elchemista/rassifier\"}\n  ]\nend\n```\n\nThen, fetch the dependencies:\n\n```bash\nmix deps.get\n```\n\n---\n\n## Usage\n\n### CSV Training Data Format\n\nYour CSV file should have two columns:\n1. **user_input**: The training text (e.g., `\"create job\"`, `\"Yes\"`, `\"No\"`, etc.)\n2. **label**: The corresponding label (e.g., `create`, `agree`, `cancel`, `edit`, or `exit`)\n\n*Example CSV snippet:*\n\n```csv\nuser_input,label\n\"Yes\",agree\n\"create\",create\n\"create job\",create\n\"No\",cancel\n...\n```\n\n---\n\n### Starting the Classifier Worker\n\nThe Rassifier worker is implemented as a GenServer. You start it with configuration options for your training CSV file, compression level, number of neighbors, and algorithm.\n\n```elixir\ndefmodule MyApp do\n  def start_classifier do\n    {:ok, _pid} = Rassifier.Worker.start_link(\n      file_path: \"data/training_data.csv\",\n      level: 3,\n      k: 1,\n      algorithm: \"zstd\"\n    )\n  end\nend\n```\n\nWhen started, the worker calls the Rust NIF to load the training data and stores an opaque resource handle in its state.\n\n---\n\n### Classifying a Query\n\n#### Synchronous Classification\n\nOnce the worker is running, you can classify a query using a synchronous call:\n\n```elixir\ndefmodule MyApp do\n  def classify_query(query) do\n    label = Rassifier.Worker.classify(query)\n    IO.puts(\"The classification label is: #{label}\")\n    label\n  end\nend\n\n# Example usage:\nMyApp.start_classifier()\nMyApp.classify_query(\"create job\")\n```\n\nIf your training data is set up appropriately, `\"create job\"` should return the label `\"create\"`.\n\n#### Asynchronous Classification\n\nYou can also perform asynchronous classification using a cast. In this case, the worker sends the result back to the caller via a message.\n\n```elixir\ndefmodule MyApp do\n  def async_classify(query) do\n    GenServer.cast(Rassifier.Worker, {:classify, query, self()})\n    receive do\n      {:classified, label} -\u003e\n        IO.puts(\"Asynchronously classified label: #{label}\")\n    after\n      5000 -\u003e\n        IO.puts(\"No asynchronous result received\")\n    end\n  end\nend\n\n# Example usage:\nMyApp.start_classifier()\nMyApp.async_classify(\"No\")\n```\n\n---\n\n### Full Example\n\nBelow is a full example combining both synchronous and asynchronous calls:\n\n```elixir\ndefmodule Example do\n  def run do\n    # Start the classifier worker with the given training data.\n    {:ok, _pid} = Rassifier.Worker.start_link(\n      file_path: \"data/training_data.csv\",\n      level: 3,\n      k: 1,\n      algorithm: \"zstd\"\n    )\n\n    # Synchronous classification\n    label_sync = Rassifier.Worker.classify(\"create job\")\n    IO.puts(\"Synchronous result: #{label_sync}\")\n\n    # Asynchronous classification\n    GenServer.cast(Rassifier.Worker, {:classify, \"No\", self()})\n    receive do\n      {:classified, label_async} -\u003e\n        IO.puts(\"Asynchronous result: #{label_async}\")\n    after\n      5000 -\u003e\n        IO.puts(\"No asynchronous result received\")\n    end\n  end\nend\n\nExample.run()\n```\n\n---\n\n## Under the Hood\n\nRassifier uses a Rust NIF (via [Rustler](https://github.com/rusterlium/rustler)) to leverage a low-resource text classification method based on compression distances. The core functions provided by the Rust side are:\n\n- **`load(file_path, level, k, algorithm)`**:  \n  Reads training data from a CSV file and initializes the classifier with the specified compression level, number of nearest neighbors (k), and compression algorithm.\n\n- **`classify_query(resource, query)`**:  \n  Given a loaded resource and a query string, it returns the classification label by comparing the query against the training set using a compression-based distance metric.\n\nThe Elixir module `Rassifier` exposes these functions, while `Rassifier.Worker` wraps the resource in a GenServer for easier integration.\n\n---\n\n## Customization\n\n- **Training Data:**  \n  Ensure your CSV file has a balanced and unambiguous set of examples. For example, to distinguish `\"create\"` commands from simple `\"agree\"` responses, include short training examples such as `\"create\"`, `\"create job\"`, etc. in the **create** category.\n\n- **Compression Settings:**  \n  Experiment with different compression levels (e.g., from 1 to 9) and algorithms (`\"zstd\"`, `\"gzip\"`, `\"zlib\"`, `\"deflate\"`) to optimize classification accuracy based on your dataset.\n\n- **Nearest Neighbors (k):**  \n  Adjust the `k` parameter to determine how many nearest neighbors to consider. Increasing `k` may yield more robust results when your training data is noisy or sparse.\n\n---\n\n## Contributing\n\nContributions are welcome! Feel free to open issues or submit pull requests on [GitHub](https://github.com/elchemista/rassifier).\n\n---\n\n## License\n\nThis project is licensed under the (MIT License)[LICENSE].\n\nThe [lrtc](https://github.com/jerryjliu/lrtc) has its own license please check it out.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felchemista%2Frassifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felchemista%2Frassifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felchemista%2Frassifier/lists"}