{"id":22901749,"url":"https://github.com/mbuczko/clj-moderator","last_synced_at":"2025-12-12T01:17:54.154Z","repository":{"id":62433575,"uuid":"49164544","full_name":"mbuczko/clj-moderator","owner":"mbuczko","description":"Fancy scoring of input data","archived":false,"fork":false,"pushed_at":"2017-03-30T09:45:51.000Z","size":11,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-07T03:42:33.955Z","etag":null,"topics":["clojure","moderation"],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mbuczko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-01-06T22:03:34.000Z","updated_at":"2023-10-25T13:02:57.000Z","dependencies_parsed_at":"2022-11-01T21:01:35.567Z","dependency_job_id":null,"html_url":"https://github.com/mbuczko/clj-moderator","commit_stats":null,"previous_names":["mbuczko/moderator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbuczko%2Fclj-moderator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbuczko%2Fclj-moderator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbuczko%2Fclj-moderator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mbuczko%2Fclj-moderator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mbuczko","download_url":"https://codeload.github.com/mbuczko/clj-moderator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246591808,"owners_count":20801985,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","moderation"],"created_at":"2024-12-14T01:40:44.597Z","updated_at":"2025-12-12T01:17:49.060Z","avatar_url":"https://github.com/mbuczko.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Clojars Project](https://img.shields.io/clojars/v/mbuczko/moderator.svg)](https://clojars.org/mbuczko/moderator)\n\n# Semi-automatic content moderation\n\nEver wanted to score incoming data based on easily defined criteria? Here it comes - content moderator, a set of basic (and some more advanced) matchers which form together a pipe where you put a data on one side and get a score on the other one.\n\nWhat matchers do we have in our toolbelt?\n\n - ```upercase-matcher``` : returns number of capitals letters\n - ```content-size-matcher``` : returns count of all characters\n - ```bad-words-matcher``` : lowercases and splits incoming data into set of words (stripped of common non-letter characters) and compares with ```:blacklist```\n - ```bad-email-matcher``` : lowercases incoming data, wraps it into one-element set and compares with ```:blacklist```\n - ```repeats-matcher``` : returns number of subsequently repeating characters\n - ```bayes-matcher``` : returns bayes classification in binary form (0 if input was classified positively, 1 otherwise)\n\nNaïve Bayes classification is based on [judgr](https://github.com/danielfm/judgr) and 2 wrapper function have been exposed to classify input phrase:\n\n - ```negative``` : trains classifier suggesting given phrase as negative\n - ```positive``` : trains classifier suggesting given phrase as positive\n\nNote that Polish extractor is used by default.\n\n## How matchers work\n\nEach matcher returns either a number (integer or float) or a set (data structure). Now the magic happens - if it's a number a ```:min``` and ```:max``` parameters are checked if numeric value fits in between ```(:min \u003c= value \u003c= :max)```.\nIf this condition is met a ```:penalty``` value is added to final score. Otherwise score is not modified. Oh, by default ```:min``` is set to 0 and ```:max``` to Integer/MAX_VALUE\nso there is no need to provide them both if only minimal or maximal value really matters.\n\nIn case when matcher returned a set instead of number a ```:blacklist``` set is checked if it contains at least one of elements returned by matcher. If so, again, ```:penalty``` is added to final score.\n\n## How to define custom matcher\n\nWell, the easiest way is to use ```defmatcher``` macro. As said before, matcher is a function which returns either a number or a set and that's the only rule a matcher has to obey:\n\n``` clojure\n(defmatcher my-matcher\n   (fn [input]\n      ...process input and return number or a set... ))\n```\n\n\n## Examples\n\n``` clojure\n(require '[mbuczko.moderator.matchers :as m])\n\n(def blacklists {:content #{\"incomplete\", \"bullshit\"}\n                 :emails #{\"bad@boy.from.ru\"}})\n\n(def candidate {:title \"Amazing brand new Alfa-Romeo with A FEEEW minor glitches\"\n                :contact {:phone-numbers [\"1234\", \"55556\"]}\n                :description \"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Bullshit.\"\n                :username \"bad@boy.from.ru\"})\n\n(-\u003e candidate\n    (m/content-size-matcher :penalty 20 :field [:title] :min 71)\n    (m/content-size-matcher :penalty 10 :field [:contact :phone-numbers] :max 0)\n    (m/uppercase-matcher    :penalty 20 :field [:description] :min 36)\n    (m/bad-words-matcher    :penalty 30 :field [:description] :blacklist (:content blacklists))\n    (m/bad-email-matcher    :penalty 20 :field [:username] :blacklist (:emails blacklists))\n    (m/repeats-matcher      :penalty 10 :field [:title] :min 2))\n```\n\nAs a result a ```Candidate``` record will be returned with 3 relevant keys: ```:body``` with original data, ```:scores``` with vector of applied penalties in form of ```[penalty field matcher-name]``` and ```:final``` with sum of all applied penalties.\n\n``` clojure\n{:body {:title \"Amazing brand new Alfa-Romeo with A FEEEW minor glitches\",\n        :contact {:phone-numbers [\"1234\" \"55556\"]},\n        :description \"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Bullshit.\",\n        :username \"bad@boy.from.ru\"},\n :scores [[30 [:description] bad-words-matcher]\n          [20 [:username] bad-email-matcher]],\n          [10 [:title] repeats-matcher]],\n :final 60}\n```\n\nIf you want to negatively classify given phrase:\n\n``` clojure\n(m/negative \"Ala ma kota a kot ma Alę\")\n```\n\n\nNow, let's pass it through ```bayes-matcher```:\n\n``` clojure\n(m/bayes-matcher {:text \"Ala lubi kota\"} :penalty 99 :field [:text] :min 1)\n```\n\nResult:\n\n``` clojure\n{:body {:text \"Ala lubi kota\"},\n :scores [[99 [:text] bayes-matcher]],\n :final 99}\n```\n\nAs expected, we got penalty score 99 because ```bayes-matcher``` classified our phrase as a negative one (and returned 1 in result).\n\n## LICENSE\n\nCopyright © Michał Buczko\n\nLicensed under the EPL.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbuczko%2Fclj-moderator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmbuczko%2Fclj-moderator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmbuczko%2Fclj-moderator/lists"}