{"id":19532411,"url":"https://github.com/rain1024/linguistic_tools","last_synced_at":"2025-10-26T04:38:08.420Z","repository":{"id":242169271,"uuid":"806092526","full_name":"rain1024/linguistic_tools","owner":"rain1024","description":null,"archived":false,"fork":false,"pushed_at":"2024-06-08T04:42:02.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-08T17:07:44.613Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rain1024.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-26T11:01:51.000Z","updated_at":"2024-06-08T04:42:05.000Z","dependencies_parsed_at":"2024-11-11T01:51:00.366Z","dependency_job_id":"809b45d0-595c-4ff0-aedd-53ec111fd068","html_url":"https://github.com/rain1024/linguistic_tools","commit_stats":null,"previous_names":["rain1024/linguistic_tools"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rain1024%2Flinguistic_tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rain1024%2Flinguistic_tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rain1024%2Flinguistic_tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rain1024%2Flinguistic_tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rain1024","download_url":"https://codeload.github.com/rain1024/linguistic_tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240784234,"owners_count":19856978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T01:50:51.154Z","updated_at":"2025-10-10T14:03:12.881Z","avatar_url":"https://github.com/rain1024.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Linguistic Tools\n\nThis project is a set of linguistic tools designed to assist my **lovely friend** 💕 with various language-related tasks.\n\n* [Filter Words](#filter-words)\n* [Count Senenteces](#count-sentences)\n\n## Filter Words\n\nThe Filter Words tool is designed to process and filter specific words from a given text file, `query.txt`. This tool reads paragraphs from the input file located in the `inputs` folder, applies the necessary filters, and saves the processed output to separate files in the `outputs` folder.\n\n### Usage\n\nTo use the Filter Words tool, follow these steps:\n\n**Step 1:** Place your Microsoft Word file in the `inputs` folder.\n\nExample:\n\n```\ninputs\n └── song_mon__nam_cao.docx\n```\n\n**Step 2:** Add your query in the `query.txt` file.\n\nExample content for `query.txt`:\n\n```\nchớ\ncác\nkhông,đã\nđã,rồi\n```\n\n**Step 3:** Run the following command in your terminal:\n\n```\npython filter_words.py\n```\n\nThis command will execute the script, process the input file, and generate the filtered outputs in the `outputs` folder.\n\n### Output\n\nThe processed output files will be saved in the `outputs` folder, each corresponding to the words or word pairs specified in the `query.txt` file.\n\nExample:\n\n```\noutputs\n ├── chớ.txt\n ├── các.txt\n ├── không-đã.txt\n └── đã-rồi.txt\n```\n\n## Count Sentences\n\nThe script described here is designed to count the number of sentences in Microsoft Word documents (.docx) located in a specified input directory. It processes each document to extract paragraphs, filters out empty paragraphs, and saves the text content into a temporary file. The script also prints the number of sentences (non-empty paragraphs) found in each document.\n\n### Input\n\nPlace your Microsoft Word files (.docx) in the `inputs` folder. The script will automatically detect and process all .docx files within this directory.\n\nExample:\n\n```\ninputs\n └── document1.docx\n └── document2.docx\n```\n\n### Command Line Usage\n\nTo execute the script, run the following command in your terminal:\n\n```\npython count_sentences.py\n```\n\nThis command will initiate the script, which will process each .docx file in the `inputs` folder.\n\n### Output\n\nThe script creates a temporary folder named `tmp` to store the output text files. Each output file corresponds to an input document and contains the extracted paragraphs. The script also prints the number of sentences found in each document to the console.\n\nExample of the temporary folder structure and console output:\n\n```\ntmp\n └── document1.txt\n └── document2.txt\n```\n\nConsole output:\n\n```\ninputs/document1.docx: 10 sentences\ninputs/document2.docx: 8 sentences\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frain1024%2Flinguistic_tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frain1024%2Flinguistic_tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frain1024%2Flinguistic_tools/lists"}