An open API service indexing awesome lists of open source software.

https://github.com/jwest951227/extractorChinese

NLP model for extracting chinese datas from the documents
https://github.com/jwest951227/extractorChinese

nltk pdfminer pdfplumber pypdf2 python sentence-transformers torch

Last synced: 5 months ago
JSON representation

NLP model for extracting chinese datas from the documents

Awesome Lists containing this project

README

          

# extract-chinese
Extract Chinese and English from 2 documents and matching them by same meaning sentences.
# Getting Started
This project is a python project to extract two chinese and english sentences text from 2 PDFs.
And to match the sentences by cosine score created embedding values.

pip install pdfplumber
pip install nltk
pip install jieba
pip install sentence_transformers
...

Open python console
>>> import nltk
>>> nltk.download('punkt')

and set some env values