{"id":19580564,"url":"https://github.com/blurred-machine/sentence-inference","last_synced_at":"2026-05-18T03:04:05.358Z","repository":{"id":129505703,"uuid":"267093313","full_name":"blurred-machine/Sentence-Inference","owner":"blurred-machine","description":"For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.","archived":false,"fork":false,"pushed_at":"2020-05-27T15:02:45.000Z","size":1514,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-09T05:18:13.874Z","etag":null,"topics":["jupyter-notebook","machinelearning","nlp","python","sentence-inference","text-classification","text-processing","wordtovec"],"latest_commit_sha":null,"homepage":"https://jovian.ml/paras009/sentence-inference","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blurred-machine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-26T16:21:35.000Z","updated_at":"2024-08-20T18:03:59.000Z","dependencies_parsed_at":"2023-06-11T16:45:15.050Z","dependency_job_id":null,"html_url":"https://github.com/blurred-machine/Sentence-Inference","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blurred-machine%2FSentence-Inference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blurred-machine%2FSentence-Inference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blurred-machine%2FSentence-Inference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blurred-machine%2FSentence-Inference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blurred-machine","download_url":"https://codeload.github.com/blurred-machine/Sentence-Inference/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240849053,"owners_count":19867617,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jupyter-notebook","machinelearning","nlp","python","sentence-inference","text-classification","text-processing","wordtovec"],"created_at":"2024-11-11T07:27:30.782Z","updated_at":"2025-09-25T09:32:03.395Z","avatar_url":"https://github.com/blurred-machine.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Sentence-Inference\nFor every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.\n\n## Dataset Description:\n* `Sentence1`: String column of human entered text, Sentence 1\n* `Sentence2`: String column of human entered text, Sentence 2\n* `gold_label`: Categorical column inferring logical relation between sentence1 and sentence2\n\n## Implementation \n* Length of document in sentence1: \n![Length of strings Sentence1](https://github.com/paras009/Sentence-Inference/blob/master/images/length_of_doc_s1.png)\n* Length of document in sentence2: \n![Length of strings Sentence2](https://github.com/paras009/Sentence-Inference/blob/master/images/length_of_doc_s2.png)\n* Heatmap of correlation between the features: \n![Heatmap](https://github.com/paras009/Sentence-Inference/blob/master/images/correlation_heatmap.png)\n* Bidirectional LSTM Model performance(not good due to less data): \n![Loss](https://github.com/paras009/Sentence-Inference/blob/master/images/bidirectional_LSTM_model_performance_loss.png)\n![Accuracy](https://github.com/paras009/Sentence-Inference/blob/master/images/bidirectional_LSTM_model_performance_accuracy.png)\n* Selected model's performance for predicting the testing `gold_label`.\n![MLPClassifier](https://github.com/paras009/Sentence-Inference/blob/master/images/MLPClassifier.PNG)\n\n## Inference\n* Since the dataset was very small, training a Neural network was not a good idea so I choose to move ahead with ML algorithms. \n* So, working on a large dataset can improve the learning.\n* Advanced NLP techniques can be implemented to find the semantic relationship between both the sentences to get a better result.\n* Due to lack of time I decided to follow this approach but with various iterations during the development, model's performance can increase significantly.\n* `Data Cleaning` was done signifantly well but can be done using other approaches.\n* `Feature engineering` is one important part which require good knowledge of NLP which can be worked upon in future.\n* Dimensionality reduction based on experimentation on using `PCA` or `t-SNE` can be perfromed to optimize model performance and remove useless features.\n* `Hypothesis testing` can be done in making useful decissions about the feature, whether they contribute in predicting right `gold_label` or not.\n* `Word ebedding` can be implemented to get a better semantic relationship between words.\n* Working with more better Neural Networks will be a better choice for this kind of problem, although `bidirectional LSTM` should perform well with large dataset.\n* Finally once we get a good model performance over the data, we can implement hyperparameter tuning to tune those small knobs in the `bidirectional LSTM` model to extract the best performance out of it.\n* for any suggestions contact me at [paras.varshney97@gmail.com](paras.varshney97@gmail.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblurred-machine%2Fsentence-inference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblurred-machine%2Fsentence-inference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblurred-machine%2Fsentence-inference/lists"}