https://github.com/hyper-node/accessibilityspeecheval
Evaluating speech outputs for accessibility paper
https://github.com/hyper-node/accessibilityspeecheval
Last synced: 4 months ago
JSON representation
Evaluating speech outputs for accessibility paper
- Host: GitHub
- URL: https://github.com/hyper-node/accessibilityspeecheval
- Owner: Hyper-Node
- License: mit
- Created: 2023-08-23T15:28:22.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2023-09-15T15:24:11.000Z (over 2 years ago)
- Last Synced: 2025-07-31T03:30:27.877Z (11 months ago)
- Language: HTML
- Size: 2.87 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.MathCatForPython
Awesome Lists containing this project
README
# Run the evaluation
1. node extract_html_to_json.js : extract from HTML
2. run test which adds annotations in PHP (MediaWiki Math extension), copy results to eval_data
3. python generateSpeechForMathML.py: obtain speech from MathCat
4. node compareFormulas.js : run the speech comparison metrics
# Accessibility Speech Evaluation
Comparison of speech output for accessibility on Wikipedia paper with text similarity algorithms.
Measures:
Character N-Gram:
- Create N-Grams for formulas and calculate their overlap similarity based on sliding window
- Consideration: Does not consider bigger changes in the word order and will not capture semantic similarity in all cases
Levenshtein-Distance:
- Levenshtein distance is simple and effective for measuring the minimum edit operations (insertions, deletions, substitutions) needed to transform one formula into another.
- Consideration: It treats all words or characters equally, which might not capture the semantic or structural similarity
Jaccard Similarity:
- Captures overlap in word sets
- Consideration: It doesn't consider word order or frequency.
Cosine Similarity:
- Cosine similarity considers the angle between formula vectors, making it useful for capturing semantic similarity and ignoring word order.
- Consideration: It doesn't account for word repetitions or differences in formula length.
TF/IDF Similarity:
- TF-IDF is useful for capturing word importance relative to a document or set of documents.
- Consideration: It might not handle short formula texts well, and it doesn't capture semantic or structural relationships
# MathCatForPython usage
Parts of the code for speech generation 'generateSpeechForMathML.py' and the Rules
as well as libmathcat.pyd are used from MathCATForPython by Neil Soiffer.