https://github.com/hyper-node/accessibilityspeecheval

Evaluating speech outputs for accessibility paper
https://github.com/hyper-node/accessibilityspeecheval

Last synced: 4 months ago
JSON representation

Evaluating speech outputs for accessibility paper

Host: GitHub
URL: https://github.com/hyper-node/accessibilityspeecheval
Owner: Hyper-Node
License: mit
Created: 2023-08-23T15:28:22.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-09-15T15:24:11.000Z (over 2 years ago)
Last Synced: 2025-07-31T03:30:27.877Z (11 months ago)
Language: HTML
Size: 2.87 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.MathCatForPython

Awesome Lists containing this project

README

# Run the evaluation

1. node extract_html_to_json.js : extract from HTML
2. run test which adds annotations in PHP (MediaWiki Math extension), copy results to eval_data
3. python generateSpeechForMathML.py: obtain speech from MathCat
4. node compareFormulas.js : run the speech comparison metrics

# Accessibility Speech Evaluation

Comparison of speech output for accessibility on Wikipedia paper with text similarity algorithms.

Measures:

Character N-Gram:
- Create N-Grams for formulas and calculate their overlap similarity based on sliding window
- Consideration: Does not consider bigger changes in the word order and will not capture semantic similarity in all cases

Levenshtein-Distance:
- Levenshtein distance is simple and effective for measuring the minimum edit operations (insertions, deletions, substitutions) needed to transform one formula into another.
- Consideration: It treats all words or characters equally, which might not capture the semantic or structural similarity

Jaccard Similarity:
- Captures overlap in word sets
- Consideration: It doesn't consider word order or frequency.

Cosine Similarity:
- Cosine similarity considers the angle between formula vectors, making it useful for capturing semantic similarity and ignoring word order.
- Consideration: It doesn't account for word repetitions or differences in formula length.

TF/IDF Similarity:
- TF-IDF is useful for capturing word importance relative to a document or set of documents.
- Consideration: It might not handle short formula texts well, and it doesn't capture semantic or structural relationships

# MathCatForPython usage
Parts of the code for speech generation 'generateSpeechForMathML.py' and the Rules
as well as libmathcat.pyd are used from MathCATForPython by Neil Soiffer.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hyper-node/accessibilityspeecheval

Awesome Lists containing this project

README