Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abrehan2/autocorrect-nlp
https://github.com/abrehan2/autocorrect-nlp
googlecolab nlp python
Last synced: 30 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/abrehan2/autocorrect-nlp
- Owner: abrehan2
- Created: 2023-12-02T08:02:52.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-02T15:07:39.000Z (about 1 year ago)
- Last Synced: 2024-01-29T10:11:09.495Z (11 months ago)
- Topics: googlecolab, nlp, python
- Language: Jupyter Notebook
- Homepage:
- Size: 642 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Auto-Correct and String Manipulation
This repository contains Python code for text preprocessing and manipulation, including auto-correction functionalities using dynamic programming inspired by minimum edit distance. The implemented tasks are outlined as follows:
---
## Part 1: Data Preprocessing
### Task 1.1: `process_data` function
- Reads a corpus (text file).
- Converts the corpus to lowercase.
- Returns a list of words.### Task 1.2: `get_count` function
- Returns a dictionary where keys are words, and values are their frequency in the corpus.### Task 1.3: Computing Word Probabilities
- Calculates the probability of each word appearing when randomly selected from the corpus using the formula: P(wi) = C(wi) / M, where C(wi) is the count of word wi and M is the total number of words.---
## Part 2: String Manipulations
- `delete_letter` function: Returns all possible strings with one character removed from a given word.
- `replace_letter` function: Returns all possible strings with one character replaced by another different letter.
- `insert_letter` function: Returns all possible strings with an additional character inserted into a given word.---
## Part 3: Combining the Edits
- `edit_one_letter` function: Generates all possible edits one edit away from a word using delete, replace, and insert operations.
- `edit_two_letters` function: Generates a set of words that are two edits away from a given word using combinations of `edit_one_letter` function.
- `get_corrections` function: Provides spelling suggestions based on minimum edit distance logic, prioritizing words with fewer edits.---
## Part 4: Minimum Edit Distance
### Dynamic Programming Algorithm
- Computes the minimum number of edits required to convert one string into another.
- Utilizes a table to efficiently calculate edit distances for substrings of source and target strings.---
### Usage
To effectively use this repository:
1. **Clone the Repository:**
```sh
git clone https://github.com/abrehan2/Autocorrect-Nlp.git
```