{"id":22864960,"url":"https://github.com/mohsenim/persianp","last_synced_at":"2025-03-31T09:46:36.820Z","repository":{"id":203926920,"uuid":"162033829","full_name":"mohsenim/persianp","owner":"mohsenim","description":"A Processing Toolbox for Persian Texts","archived":false,"fork":false,"pushed_at":"2018-12-19T13:26:48.000Z","size":64261,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-06T14:25:27.114Z","etag":null,"topics":["chunker","lemmatizer","nlp","persian","postagger","text-analysis","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mohsenim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-12-16T19:29:48.000Z","updated_at":"2024-03-17T22:07:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"e326c28b-38f5-4936-a6db-24e961808bca","html_url":"https://github.com/mohsenim/persianp","commit_stats":null,"previous_names":["mohsenim/persianp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsenim%2Fpersianp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsenim%2Fpersianp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsenim%2Fpersianp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohsenim%2Fpersianp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mohsenim","download_url":"https://codeload.github.com/mohsenim/persianp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246450402,"owners_count":20779406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chunker","lemmatizer","nlp","persian","postagger","text-analysis","tokenizer"],"created_at":"2024-12-13T11:32:11.518Z","updated_at":"2025-03-31T09:46:36.802Z","avatar_url":"https://github.com/mohsenim.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Persianp Processing Toolbox\n\nPersianp is a text processing tool developed in Java to preprocessing Persian texts. The toolbox accomplishes following tasks:\n* Character-level normalization\n* Tokenization\n* Lemmatization\n* POS tagging\n* Stopword detection\n* Noun phrase chunking\n\n### Using Persianp from the command line\nBe sure folder `res` is next to the `jar` file.\n\n```bash\n$ java -cp persianp-toolbox-1.0.jar com.persianp.nlp.process.Process -input inputfile.txt -output outputfile.txt -task (tokenize|tag|lemmatize|taglemmatize) [-nostopword] [-prop propertyFile.properties]\n```\n\nAt the moment NP chunking is not supported from the comand line.\n\n### Using the Persianp API\nAdd the API to libraries of your program. The following example shows how to use the toolbox.\n\n```\npublic class TestPersianp { \n\n    public static void main(String[] args) { \n        TestPersianp testPersianp = new TestPersianp(); \n        testPersianp.process(); \n    } \n\n    private void process() { \n        try { \n            Properties properties = new Properties(); \n            properties.load(this.getClass().getClassLoader().getResourceAsStream(\"persianp.properties\"));\n            Process process = new Process(properties); \n            InputStream in = this.getClass().getClassLoader().getResourceAsStream(\"testText.txt\");\n            BufferedReader br = new BufferedReader(new InputStreamReader(in, \"UTF-8\"));\n            String line; \n            while ((line = br.readLine()) != null) { \n                process.process(line); \n\n                System.out.println(process.getText()); \n//                process.getTokens(); \n//                process.getTokensText(); \n//                process.getTags(); \n//                process.getChunkTag(); \n//                process.getLemmas(); \n//                process.getNonStopwordTokens(); \n\n                int sentenceSize = process.getSentencesSize(); \n                for (int j = 0; j \u003c sentenceSize; ++j) { \n//                    List tokensText = process.getTokensTextInSentence(j); \n//                    List tags = process.getTagsInSentence(j); \n//                    List lemmas = process.getLemmasInSentence(j); \n                    List tokens = process.getTokensInSentence(j); \n                    for (int k = 0; k \u003c tokens.size(); ++k) { \n                        System.out.println(tokens.get(k).getText() + \"\\t\\t\\t\" + tokens.get(k).getLemma() + \"\\t\\t\\t\" + tokens.get(k).getTag());\n                    } \n                } \n            } \n            in.close(); \n            br.close(); \n        } catch (Exception e){ \n            e.printStackTrace(); \n        } \n    } \n} \n\n```\n\n### More Information / Citing This Toolbox\nPlease cite the paper below if you use the Persianp toolbox in your research. It also provides more information about the toolbox.\n\n\u003e Mahdi Mohseni, Javad Ghofrani, and Heshaam Faili.\n\u003e \"Persianp: A Persian Text Processing Toolbox\".\n\u003e International Conference on Intelligent Text Processing and Computational Linguistics\nCICLing 2016: Computational Linguistics and Intelligent Text Processing, pp 75-87.\n\nBibtex citation:\n\n```\n@InProceedings{Persianp2016,\nauthor=\"Mohseni, Mahdi\nand Ghofrani, Javad\nand Faili, Heshaam\",\ntitle=\"Persianp: A Persian Text Processing Toolbox\",\nbooktitle=\"Computational Linguistics and Intelligent Text Processing\",\nyear=\"2018\",\npublisher=\"Springer International Publishing\",\npages=\"75--87\",\nisbn=\"978-3-319-75477-2\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohsenim%2Fpersianp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohsenim%2Fpersianp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohsenim%2Fpersianp/lists"}