https://github.com/jwest951227/extractorChinese
NLP model for extracting chinese datas from the documents
https://github.com/jwest951227/extractorChinese
nltk pdfminer pdfplumber pypdf2 python sentence-transformers torch
Last synced: 5 months ago
JSON representation
NLP model for extracting chinese datas from the documents
- Host: GitHub
- URL: https://github.com/jwest951227/extractorChinese
- Owner: jwest951227
- Created: 2024-04-29T07:23:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-29T07:41:42.000Z (over 1 year ago)
- Last Synced: 2025-04-19T08:11:29.464Z (6 months ago)
- Topics: nltk, pdfminer, pdfplumber, pypdf2, python, sentence-transformers, torch
- Language: Python
- Homepage:
- Size: 22.6 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# extract-chinese
Extract Chinese and English from 2 documents and matching them by same meaning sentences.
# Getting Started
This project is a python project to extract two chinese and english sentences text from 2 PDFs.
And to match the sentences by cosine score created embedding values.pip install pdfplumber
pip install nltk
pip install jieba
pip install sentence_transformers
...Open python console
>>> import nltk
>>> nltk.download('punkt')and set some env values