https://github.com/techiaith/word2vec-cy
Model Iaith Fectorau Word2vec ar sail corpora ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol at ddibenion ymchwil fel cynhyrchu modelau iaith. | A Word2vec Language Model based on the Language Technologies Unit's research corpora.
https://github.com/techiaith/word2vec-cy
Last synced: 23 days ago
JSON representation
Model Iaith Fectorau Word2vec ar sail corpora ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol at ddibenion ymchwil fel cynhyrchu modelau iaith. | A Word2vec Language Model based on the Language Technologies Unit's research corpora.
- Host: GitHub
- URL: https://github.com/techiaith/word2vec-cy
- Owner: techiaith
- License: apache-2.0
- Created: 2020-10-28T09:08:40.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-10-31T19:33:00.000Z (over 2 years ago)
- Last Synced: 2024-04-17T07:19:26.660Z (almost 2 years ago)
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Model Iaith Fectorau Cymraeg
Model Iaith Fectorau Word2vec ar sail adnoddau ymchwil yr Uned Technolegau Iaith a gasglwyd o ffynonellau amrywiol.
*A Word2vec Language Model based on the Language Technologies Unit's research resources collected from various resources.*
Gweler https://github.com/techiaith/word2vec-cy/tags a chlicio ar 'Latest' i gael at y data.
*See https://github.com/techiaith/word2vec-cy/tags and click on 'Latest' to access the data.*
NODYN: Mae ffurfiau'r model hwn bellach i gyd mewn llythrennau bach.
*NOTE: The forms found found in this model are now in lower case.*
I'w ddefnyddio gyda Gensim 4:
*To use with Gensim 4:*
`pip install gensim`
Yna:
*Then:*
```
import gensim
from gensim.models import KeyedVectors
wv = KeyedVectors.load("word2vec.wordvectors", mmap='r')
print ("MODEL SIZE:", len(wv)) # 518,260
# Find words that are similar to 'athro' (=male teacher)
# whilst subtracting vectors associated with 'dynion' (='men')
similar_to_athro = wv.most_similar(positive=['athro','dynes'],negative=["dynion"], topn=10)
# The top result should be 'athrawes' (female teacher) as subtracting 'dynion' substracts
# both maleness and the plural aspect found in 'athrawon' (='teachers')
print (similar_to_athro)
# RESULTS
[('athrawes', 0.6490613222122192),
('addysgwr', 0.4838572144508362),
('ymarferydd', 0.4762175381183624),
('ymarferwr', 0.4626823663711548),
('aseswr', 0.462118536233902),
('tiwtor', 0.4528316557407379),
('hyfforddai', 0.4441806972026825),
('mentor', 0.43711039423942566),
('asesydd', 0.4269064962863922),
('prifathrawes', 0.4217046797275543)]
```
Ariannwyd creu'r model hwn gan Lywodraeth Cymru.
*The creation of this model was financed by the Welsh Government.*