Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Mphasis-ML-Marketplace/Mphasis-DeepInsights-Text-Summarizer
Text Summarizer solution is an optimal way to tackle the problem of information overload by reducing the size of long documents into a few sentences . Neural-network-based models have the ability to automatically learn the distributed representation for sentences and documents. This summarizer is built using Transfer Learning and Transformer based models which use self attention. The input can have a maximum of 512 words and gives output of 3 sentences (approximately 30 words).
https://github.com/Mphasis-ML-Marketplace/Mphasis-DeepInsights-Text-Summarizer
Last synced: 2 days ago
JSON representation
Text Summarizer solution is an optimal way to tackle the problem of information overload by reducing the size of long documents into a few sentences . Neural-network-based models have the ability to automatically learn the distributed representation for sentences and documents. This summarizer is built using Transfer Learning and Transformer based models which use self attention. The input can have a maximum of 512 words and gives output of 3 sentences (approximately 30 words).
- Host: GitHub
- URL: https://github.com/Mphasis-ML-Marketplace/Mphasis-DeepInsights-Text-Summarizer
- Owner: Mphasis-ML-Marketplace
- License: apache-2.0
- Created: 2020-12-24T04:43:27.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-01-05T07:02:28.000Z (almost 4 years ago)
- Last Synced: 2024-08-02T13:15:10.868Z (3 months ago)
- Language: Jupyter Notebook
- Size: 75.2 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Mphasis-DeepInsights-Text-Summarizer
Text Summarizer solution is an optimal way to tackle the problem of information overload by reducing the size of long documents into a few sentences . Neural-network-based models have the ability to automatically learn the distributed representation for sentences and documents. This summarizer is built using Transfer Learning and Transformer based models which use self attention. The input can have a maximum of 512 words and gives output of 3 sentences (approximately 30 words).
## Amazon SageMaker
### Input :
**Usage Methodology for the algorithm:**
The input has to be a '.txt' file with 'utf-8' encoding. PLEASE NOTE: If your input .txt file is not 'utf-8' encoded, model will not perform as expected
1. To make sure that your input file is 'UTF-8' encoded please 'Save As' using Encoding as 'UTF-8'
2. The input can have a maximum of 512 words (Sagemaker restriction)
3. Input should have atleast 3 sentences (Model limitation)
4. Supported content types: text/plain### Output:
Content type: text/plain
### Invoking endpoint
AWS CLI Command
If you are using real time inferencing, please create the endpoint first and then use the following command to invoke it:`aws sagemaker-runtime invoke-endpoint --endpoint-name "endpoint-name" --body fileb://input.txt --content-type text/plain --accept text/plain result.txt`
**Substitute the following parameters:**
* "endpoint-name" - name of the inference endpoint where the model is deployed
* input.txt - input file
* text/plain - MIME type of the given input file (above)
* result.txt - filename where the inference results are written to.### Python
Real-time inference snippet (more detailed example can be found in sample notebook):
```sample_txt = 'location of input text file
transformer = model.transformer(1, 'ml.m5.xlarge')
transformer.transform(sample_txt, content_type="text/plain")
transformer.wait()
print("Batch Transform output saved to " + transformer.output_path)
```## Resources
1. [Sample Notebook](text_summary_marketplace.ipynb)
2. [Sample Input](SampleInput)
3. [Sample Output](SampleOutput)