{"id":50529808,"url":"https://github.com/dima-xd/tonara","last_synced_at":"2026-06-03T12:01:45.737Z","repository":{"id":361395321,"uuid":"1254375491","full_name":"dima-xd/tonara","owner":"dima-xd","description":"Pure-Dart Mandarin tone detection (tones 1–4) from raw PCM - pYIN pitch tracking + an MLP classifier hitting ~93.7% accuracy. No native code, no FFI.","archived":false,"fork":false,"pushed_at":"2026-05-30T13:45:08.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-30T14:03:12.181Z","etag":null,"topics":["audio","chinese","dart","dsp","flutter","language-learning","machine-learning","mandarin","pitch-detection","pyin","speech","tone-detection"],"latest_commit_sha":null,"homepage":"https://abuchi.lol","language":"Dart","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dima-xd.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-30T13:40:03.000Z","updated_at":"2026-05-30T13:45:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/dima-xd/tonara","commit_stats":null,"previous_names":["dima-xd/tonara"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/dima-xd/tonara","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dima-xd%2Ftonara","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dima-xd%2Ftonara/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dima-xd%2Ftonara/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dima-xd%2Ftonara/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dima-xd","download_url":"https://codeload.github.com/dima-xd/tonara/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dima-xd%2Ftonara/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33863264,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","chinese","dart","dsp","flutter","language-learning","machine-learning","mandarin","pitch-detection","pyin","speech","tone-detection"],"created_at":"2026-06-03T12:01:45.675Z","updated_at":"2026-06-03T12:01:45.724Z","avatar_url":"https://github.com/dima-xd.png","language":"Dart","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tonara\n\nPure-Dart **Mandarin Chinese tone detection** (tones 1–4) from raw PCM audio.\nNo native code, no FFI - only `dart:typed_data` and `dart:math`.\n\n## Pipeline\n\n```\nPCM audio\n -\u003e VAD (RMS gate)\n -\u003e pre-emphasis (y[n] = x[n] − 0.97·x[n−1])\n -\u003e Hann framing (frame 1024, hop 256)\n -\u003e pYIN pitch tracking (CMNDF + Beta(2,18) prior + Viterbi)\n -\u003e voiced F0 contour (octave/spike/edge cleanup)\n -\u003e semitone-relative contour (resample to N=20) + shape/duration features\n -\u003e learned MLP classifier (default - ~93.7% LOSO-CV, every tone ≥90%)\n or rule-based + KNN k=5 (opt-in, transparent - ~82%)\n -\u003e optional DTW reference comparison\n -\u003e ToneResult\n```\n\n## Install\n\n```yaml\ndependencies:\n tonara: ^0.1.0\n```\n\n## Usage\n\n```dart\nimport 'dart:typed_data';\nimport 'package:tonara/tonara.dart';\n\nfinal analyzer = TonaraAnalyzer(sampleRate: 16000);\n\n// Single syllable.\nfinal ToneResult result = analyzer.analyze(samples); // Float32List in [-1, 1]\nif (result.error == null) {\n print('Tone ${result.tone} (confidence ${result.confidence})');\n print(result.feedback);\n}\n\n// Compare against a reference recording.\nfinal scored = analyzer.analyzeWithReference(\n samples,\n reference: nativeSpeakerSamples,\n expectedTone: 2,\n);\nprint('similarity: ${scored.similarityScore}');\n\n// Real-time streaming - one ToneFrame per detected syllable.\nawait for (final ToneFrame frame in analyzer.stream(micChunks)) {\n print('syllable ${frame.syllableIndex}: tone ${frame.result.tone}');\n}\n```\n\n## The seven features\n\n| Feature | Meaning |\n| --- | --- |\n| `linearSlope` | overall least-squares slope |\n| `quadraticCoeff` | x² coefficient (positive ⇒ U-shape ⇒ tone 3) |\n| `midpointDip` | midpoint minus endpoint mean (negative ⇒ dip) |\n| `pitchRange` | max − min of the raw Hz contour |\n| `startToMidSlope` | slope of the first half |\n| `midToEndSlope` | slope of the second half |\n| `normalizedVariance` | variance of the z-scored contour |\n\n## Classification\n\nTwo classifiers are available; `TonaraAnalyzer(useModel: ...)` selects between\nthem (default `true`):\n\n- **Learned model** (`tone_model.dart`) - a two-layer MLP (32 -\u003e 48 -\u003e 24 -\u003e 4)\n over the semitone-relative contour plus shape/duration summary features.\n **~93.7% leave-one-speaker-out** on a corpus of 2500+ labeled Mandarin\n clips, every tone ≥90%. Default.\n- **Rule-based + KNN** (`classify`) - a transparent decision tree on 7 shape\n features with a k = 5 KNN fallback. ~82%. Use it when you want interpretable\n decisions or no embedded weights.\n\n### Rule-based decision tree (`useModel: false`)\n\nSlope/curvature features are measured on a normalized **[-1, 1]** x-axis, so the\nthresholds are independent of the contour length. The cut points were tuned\nagainst the training corpus (see below).\n\n1. `pitchRange \u003c 5` -\u003e tone 0 (neutral / unvoiced)\n2. `pitchRange \u003c 22 || normalizedVariance \u003c 0.08` -\u003e tone 1 (level - a level tone\n has the least movement; `pitchRange` in Hz is its only robust cue, since\n z-scoring inflates a flat contour's slope)\n3. `startToMidSlope \u003c −0.2 \u0026\u0026 midToEndSlope \u003e 0.4 \u0026\u0026 quadraticCoeff \u003e 0.4`\n -\u003e tone 3 (dip: does not rise in the first half, then rises)\n4. `linearSlope \u003c −0.4` -\u003e tone 4 (falling)\n5. `linearSlope \u003e 1.0 \u0026\u0026 startToMidSlope \u003e −0.1` -\u003e tone 2 (rising throughout)\n6. otherwise -\u003e KNN (k = 5) over 40 hand-tuned prototypes\n\n\u003e These differ from a naive reading of the original design in ways the data\n\u003e forced: (a) the x-axis is normalized so the `linearSlope`/`quadraticCoeff`\n\u003e thresholds are reachable at all; (b) tone 3 is separated from tone 2 by the\n\u003e **first-half** slope (a citation third tone also ends higher than it starts,\n\u003e so overall slope can't tell them apart); (c) a small pitch range means *level*\n\u003e tone 1, not tone 0.\n\n## Validation on real audio\n\nThe learned model was trained and validated on a corpus of **2500+ labeled\nsingle-syllable Mandarin recordings** (multiple native speakers; the tone and\nspeaker are encoded in each filename). The audio itself is not distributed -\nonly the trained weights ship, in `lib/src/tone_model.dart`. Drop your own\nlabeled `.wav` clips into `audio/train/` to retrain:\n\n```bash\ndart run tool/train_model.dart # prints LOSO-CV, regenerates tone_model.dart\n```\n\n`train_model.dart` reports honest accuracy via **leave-one-speaker-out\ncross-validation** (each speaker is classified by a model trained only on the\nothers), then ships weights trained on every speaker.\n\n**Learned model - 93.7% LOSO-CV** with **every tone above 90%**:\n\n| | t1 | t2 | t3 | t4 |\n|--|----|----|----|----|\n| accuracy | 98% | **90%** | **91%** | 96% |\n\nThe model is a two-hidden-layer MLP (32 -\u003e 48 -\u003e 24 -\u003e 4) over the\nsemitone-relative contour plus shape/duration summary features.\n\nTones 2 and 3 are the hard pair. Tone-3 citation recordings include both full\ndipping (˅) and *reduced* realizations - a low fall (no final rise, looks like\ntone 4) or a low rise (no initial fall, looks like tone 2). These\n\"half-third-tones\" are acoustically ambiguous from F0 alone, so the raw model\nmakes *confident* errors on the fuzzy tone-2/3 boundary that no amount of extra\nfeatures, network depth, or loss weighting could fix (all plateaued tone 3 at\n~88%). Because tones 1 and 4 carry large margins (98% / 96%), the classifier\napplies a **per-class decision bias** (`decisionBias` in `train_model.dart`)\nthat favours tones 2 and 3 at the boundary, pulling slack from tones 1/4 so all\nfour clear 90%. This is a deliberate balance choice, not a raw accuracy gain;\noverall sits at ~93.7%.\n\nThe **rule-based fallback** (`useModel: false`) reaches ~82%. Its main\nconfusions come from z-score normalization erasing the level-tone flatness cue.\nThe learned model avoids this by classifying the *semitone-relative* contour,\nwhich preserves both shape and the small magnitude of a level tone.\n\n## Pitch \u0026 preprocessing notes\n\n- **Pre-emphasis is off by default** (`applyPreEmphasis: false`). It is a\n high-pass that attenuates the fundamental and roughly halves voiced-frame\n detection, so the pitch path runs on the clean signal. Enable it only for\n spectral experiments.\n- The raw F0 contour is cleaned before feature extraction: octave-error repair,\n a 3-point median filter, and a one-frame edge trim (`refineF0`).\n- Real recordings vary widely in level; peak-normalise input before `analyze`\n (the harness does this) so the fixed RMS gate behaves consistently.\n\n## Development\n\n```bash\ndart pub get\ndart analyze\ndart test\ndart run example/main.dart\n```\n\n## Notes \u0026 limitations\n\n- The KNN prototypes in `lib/src/reference_data.dart` are hand-tuned from the\n phonetics literature, not trained on a corpus; classification is heuristic.\n- pYIN frequency resolution is sharpened by parabolic interpolation of the\n CMNDF minimum (~ sub-Hertz on a clean tone).\n- Pre-emphasis is applied in the full pipeline; the single-frame `pyinFrame`\n entry point operates on whatever frame you pass it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdima-xd%2Ftonara","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdima-xd%2Ftonara","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdima-xd%2Ftonara/lists"}