https://github.com/operavaria/huncor2vec

Automation tools for the Hungarian Webcorpus 2.0
https://github.com/operavaria/huncor2vec

digital-humanities hungarian-language linguistics vectorization word2vec

Last synced: 3 months ago
JSON representation

Automation tools for the Hungarian Webcorpus 2.0

Host: GitHub
URL: https://github.com/operavaria/huncor2vec
Owner: OperaVaria
License: gpl-3.0
Created: 2024-06-25T15:40:33.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-07-17T13:04:36.000Z (10 months ago)
Last Synced: 2024-12-28T00:42:49.264Z (5 months ago)
Topics: digital-humanities, hungarian-language, linguistics, vectorization, word2vec
Language: Python
Homepage: https://github.com/OperaVaria/huncor2vec
Size: 82 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: docs/README.md
- License: COPYING.md

Awesome Lists containing this project

README

# HunCor2Vec

This lightweight Python app provides automation tools to easily
retrieve material form the [Hungarian Webcorpus 2.0](https://hlt.bme.hu/en/resources/webcorpus2), train a Word2Vec
model with the said texts, and evaluate the results.

The app features an easy-to-use command line menu structure,
implemented with the [pick](https://github.com/aisk/pick) package.

Training and querying tasks utilize the [gensim](https://github.com/piskvorky/gensim) library's Word2Vec module.

Available tools:

- Webcorpus 2.0 Scraper and Webcorpus 2.0 Downloader: retrieve all file links and automate the entire corpus file download process.
- Word2Vec Trainer: easily train a Word2Vec model with any plain-text or CoNLL-U formatted, multi-file corpus. Saving and resuming is supported.
- Word2Vec Query: evaluate the trained model with the most common methods.

Tested on: Windows 11 and Lubuntu 22.04 LTS with Python version 3.10.11.

---

**[Contact](mailto:[email protected])**

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/operavaria/huncor2vec

Awesome Lists containing this project

README