Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/magwyz/mediaLexicometer
Tools to do lexicometry on media
https://github.com/magwyz/mediaLexicometer
Last synced: 3 months ago
JSON representation
Tools to do lexicometry on media
- Host: GitHub
- URL: https://github.com/magwyz/mediaLexicometer
- Owner: magwyz
- License: agpl-3.0
- Created: 2021-09-03T20:56:33.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-01-16T20:50:44.000Z (11 months ago)
- Last Synced: 2024-07-15T14:40:24.971Z (5 months ago)
- Language: Python
- Size: 53.7 KB
- Stars: 41
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Media Lexicometer
This repository contains a Python Django application and some scripts to:
- capture the audio streams from DVB-T channels thanks to DVB-T adapters,
- perform live speech to text on the channel audio streams,
- record the recognized words to a database,
- allow an user to query the database from a Web page and generate graphs as results.## Setup
### Install the dependancies
```
sudo apt install ffmpeg w-scan dvb-tools virtualenv tmux
```### Create the channel file
To scan the channels available in your location:
```
w_scan -f t -c FR -X > channels.conf
```To convert the `channels.conf` file to the v5 format:
```
dvb-format-convert -I ZAP -O DVBV5 -s dvb-t channels.conf channels_v5.conf
```### Create and setup your Python virtual environment
```
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
```### Get the models
Download the spacy model from [here](https://github.com/explosion/spacy-models/releases/tag/fr_core_news_sm-3.1.0)
and decompress it in the main application folder.Download the Vosk model from [here](https://alphacephei.com/nsh/2020/10/21/french.html) and decompress the content
of the `vosk-model-fr-0.6-linto` directory in the archive in a `model` directory in the main application folder.### Add the channels to the database
```
cd mediaAnalysis
python3 manage.py syncdb
python3 manage.py createsuperuser
python3 manage.py runserver
```Go to the Django administration interface and log in with the admin credentials you've just created.
Then, create your channel records.### Create your channel scripts
Look at the example scripts in the `script` directory to create your own channel capture and analysis scripts.
The example scripts use two tmux sessions per multiplex to start `dvbv5-zap` and the `liveSpeechToText.py` script.
You are of course free to use a more appropriate way to run them in the background.## Usage
Per multiplex, you must run a `dvbv5-zap` instance and a `liveSpeechToText.py` instance (see previous section).
To run the query Web interface, use:
```
python3 manage.py runserver
```
Then, go to http://localhost:8000.