https://github.com/blurred-machine/sentence-inference

For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.
https://github.com/blurred-machine/sentence-inference

jupyter-notebook machinelearning nlp python sentence-inference text-classification text-processing wordtovec

Last synced: about 1 month ago
JSON representation

For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.

Host: GitHub
URL: https://github.com/blurred-machine/sentence-inference
Owner: blurred-machine
License: mit
Created: 2020-05-26T16:21:35.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2020-05-27T15:02:45.000Z (about 6 years ago)
Last Synced: 2025-01-09T05:18:13.874Z (over 1 year ago)
Topics: jupyter-notebook, machinelearning, nlp, python, sentence-inference, text-classification, text-processing, wordtovec
Language: Jupyter Notebook
Homepage: https://jovian.ml/paras009/sentence-inference
Size: 1.44 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Sentence-Inference
For every given pair of sentences -- (sentence-1, sentence-2), we need to determine if sentence-2 can be logically inferred given sentence-1.

## Dataset Description:
* `Sentence1`: String column of human entered text, Sentence 1
* `Sentence2`: String column of human entered text, Sentence 2
* `gold_label`: Categorical column inferring logical relation between sentence1 and sentence2

## Implementation
* Length of document in sentence1:
![Length of strings Sentence1](https://github.com/paras009/Sentence-Inference/blob/master/images/length_of_doc_s1.png)
* Length of document in sentence2:
![Length of strings Sentence2](https://github.com/paras009/Sentence-Inference/blob/master/images/length_of_doc_s2.png)
* Heatmap of correlation between the features:
![Heatmap](https://github.com/paras009/Sentence-Inference/blob/master/images/correlation_heatmap.png)
* Bidirectional LSTM Model performance(not good due to less data):
![Loss](https://github.com/paras009/Sentence-Inference/blob/master/images/bidirectional_LSTM_model_performance_loss.png)
![Accuracy](https://github.com/paras009/Sentence-Inference/blob/master/images/bidirectional_LSTM_model_performance_accuracy.png)
* Selected model's performance for predicting the testing `gold_label`.
![MLPClassifier](https://github.com/paras009/Sentence-Inference/blob/master/images/MLPClassifier.PNG)

## Inference
* Since the dataset was very small, training a Neural network was not a good idea so I choose to move ahead with ML algorithms.
* So, working on a large dataset can improve the learning.
* Advanced NLP techniques can be implemented to find the semantic relationship between both the sentences to get a better result.
* Due to lack of time I decided to follow this approach but with various iterations during the development, model's performance can increase significantly.
* `Data Cleaning` was done signifantly well but can be done using other approaches.
* `Feature engineering` is one important part which require good knowledge of NLP which can be worked upon in future.
* Dimensionality reduction based on experimentation on using `PCA` or `t-SNE` can be perfromed to optimize model performance and remove useless features.
* `Hypothesis testing` can be done in making useful decissions about the feature, whether they contribute in predicting right `gold_label` or not.
* `Word ebedding` can be implemented to get a better semantic relationship between words.
* Working with more better Neural Networks will be a better choice for this kind of problem, although `bidirectional LSTM` should perform well with large dataset.
* Finally once we get a good model performance over the data, we can implement hyperparameter tuning to tune those small knobs in the `bidirectional LSTM` model to extract the best performance out of it.
* for any suggestions contact me at [paras.varshney97@gmail.com](paras.varshney97@gmail.com)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/blurred-machine/sentence-inference

Awesome Lists containing this project

README