{"id":18963991,"url":"https://github.com/JaneliaSciComp/SongExplorer","last_synced_at":"2025-04-16T05:31:37.668Z","repository":{"id":50227893,"uuid":"204945609","full_name":"JaneliaSciComp/SongExplorer","owner":"JaneliaSciComp","description":"deep learning for acoustic signals","archived":false,"fork":false,"pushed_at":"2024-11-06T15:56:11.000Z","size":10453,"stargazers_count":22,"open_issues_count":6,"forks_count":5,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-11-06T16:47:44.710Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JaneliaSciComp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-TF.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-28T14:02:23.000Z","updated_at":"2024-11-06T15:56:15.000Z","dependencies_parsed_at":"2024-02-20T15:27:39.629Z","dependency_job_id":"417469c7-d237-427e-a141-04e5e6494668","html_url":"https://github.com/JaneliaSciComp/SongExplorer","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JaneliaSciComp%2FSongExplorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JaneliaSciComp%2FSongExplorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JaneliaSciComp%2FSongExplorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JaneliaSciComp%2FSongExplorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JaneliaSciComp","download_url":"https://codeload.github.com/JaneliaSciComp/SongExplorer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223697476,"owners_count":17187809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T14:22:37.575Z","updated_at":"2025-04-16T05:31:37.651Z","avatar_url":"https://github.com/JaneliaSciComp.png","language":"Python","readme":"[![](data/screenshot.png)](https://www.youtube.com/playlist?list=PLYXyXDkMwZip8x78RAyN6ee9NK42WBbKb)\n\nTable of Contents\n=================\n\n   * [Description](#description)\n   * [Public Domain Annotations](#public-domain-annotations)\n   * [Citations and Repositories](#citations-and-repositories)\n   * [Notation](#notation)\n   * [Installation](#installation)\n      * [Downloading Executables](#downloading-executables)\n      * [System Configuration](#system-configuration)\n      * [Scheduling Jobs](#scheduling-jobs)\n         * [Locally](#locally)\n         * [Another Workstation](#another-workstation)\n         * [An On-Premise Cluster](#an-on-premise-cluster)\n   * [Starting SongExplorer](#starting-songExplorer)\n   * [Tutorial](#tutorial)\n      * [Manually Annotating](#manually-annotating)\n      * [Training a Classifier](#training-a-classifier)\n      * [Quantifying Accuracy](#quantifying-accuracy)\n      * [Making Predictions](#making-predictions)\n      * [Fixing Mistakes](#fixing-mistakes)\n      * [Minimizing Annotation Effort](#minimizing-annotation-effort)\n   * [Advanced Usage](#advanced-usage)\n      * [Detecting Rare Sounds](#detecting-rare-sounds)\n      * [Clustering Annotations](#clustering-annotations)\n      * [Correcting False Alarms](#correcting-false-alarms)\n      * [Correcting Misses](#correcting-misses)\n      * [Double Checking Annotations](#double-checking-annotations)\n      * [Measuring Generalization](#measuring-generalization)\n      * [Searching Hyperparameters](#searching-hyperparameters)\n      * [Limiting Ground Truth](#limiting-ground-truth)\n      * [Examining Errors](#examining-errors)\n      * [Ensemble Models](#ensemble-models)\n      * [Transfer Learning](#transfer-learning)\n      * [Autoencoding](#autoencoding)\n      * [Testing Densely](#testing-densely)\n      * [Discovering Novel Sounds](#discovering-novel-sounds)\n      * [Overlapped Classes](#overlapped-classes)\n   * [Unsupervised Methods](#unsupervised-methods)\n   * [Scripting Automation](#scripting-automation)\n   * [Training on Video](#training-on-video)\n   * [Customizing with Plug-ins](#customizing-with-plug-ins)\n      * [Loading Data](#loading-data)\n      * [Video Filenames](#video-filenames)\n      * [Event Detection](#event-detection)\n      * [Double-Click Annotations](#double-click-annotations)\n      * [Network Architecture](#network-architecture)\n      * [Clustering Algorithm](#clustering-algorithm)\n      * [Augmentation](#augmentation)\n   * [Troubleshooting](#troubleshooting)\n   * [Frequently Asked Questions](#frequently-asked-questions)\n   * [Reporting Problems](#reporting-problems)\n   * [Development](#development)\n      * [Conda](#conda)\n      * [Singularity](#singularity)\n      * [Docker](#docker)\n      * [System Tests](#system-tests)\n\n\n# Description #\n\nYou have an audio recording, and you want to know where certain classes of\nsounds are.  SongExplorer is trained to recognize such words by manually giving\nit a few examples.  It will then automatically calculate the probability,\nover time, of when those words occur in all of your recordings.\n\nApplications suitable for SongExplorer include quantifying the rate or pattern\nof words emitted by a particular species, distinguishing a recording of one\nspecies from another, and discerning whether individuals of the same species\nproduce different song.\n\nUnderneath the hood is a deep convolutional neural network.  The input is the\nraw audio stream, and the output is a set of probability waveforms\ncorresponding to each word of interest.\n\nTraining begins by first manually annotating a few sounds with however many\nword labels naturally occur.  A classifier is then trained on this corpus of\nground truth, and a new recording is analyzed by it.  The words it\nautomatically finds are then displayed with predicted labels.  You manually\ncorrect the mistakes, both re-labeling words that it got wrong, as well as\nlabeling words it missed.  These new annotations are added to the ground truth,\nand the process of retraining the classifier and analyzing and correcting new\nrecordings is repeated until the desired accuracy is reached.\n\n\n# Public Domain Annotations #\n\nSongExplorer is open source and free for you to use.  However, SongExplorer is not\na static piece of software.  It’s performance is improved with additional\nhigh-quality annotations.\n\nTherefore, when you publish results based on SongExplorer, we request that you make\nall of your primary data and annotations freely available in a recognized data\nrepository, such as [figshare](http://figshare.com),\n[Dryad](http://datadryad.org), or [Zenodo](http://zenodo.org).  Many journals\nalready require deposition of raw data, but we strongly encourage you to also\nprovide your manual annotations.  These manual annotations will serve to\nimprove the performance of SongExplorer over time, helping both your own work and\nthat of everyone else.\n\nPlease let us know where you have deposited your raw\ndata and annotations by posting an issue to the [SongExplorer\nrepository](https://github.com/JaneliaSciComp/SongExplorer).  We will endeavor to\nmaintain a list of these recordings and annotations in the\n[Citations and Repositories](#citations-and-repositories) section below.\n\nIn addition, consider donating your recordings to a library or museum, like the\nCornell Lab of Ornithology's [Macauley Library](www.macaulaylibrary.org) or the\nMuseo de Ciencias Naturales de Madrid's [Fonoteca Zoológica](www.fonozoo.com).\n\nPublishing your trained models too is also useful to others, for reproducibility\nat a minimum but also as an example of what hyperparameters work well.  Simply\ncompress the entire \"logs folder\" along with the \".pb\" folder inside containing\nthe learned parameters (see [Making Predictions](#making-predictions) below).\nBe sure to use `delete-ckpts` to reduce its footprint on disk before\ndistributing it to others.\n\nAs an alternative, a list of hyperparameters could suffice, but only if it is\ncomplete.  At least the following should be included:  the length of the context\nwindow, a description of each layer including the kernel size, the number of\nfeature maps, strides and dilations if it is convolutional, the type of\nnon-linearity used (e.g. ReLU), whether and where dropout and batch\nnormalization were used, the batch size, the learning rate, the optimizer, the\nnumber of training steps, the fraction of data withheld for validation, the\nversion of songexplorer used, the number of trainable parameters (as a sanity\ncheck), and, for the curious, the wall clock time that training took and the\nhardware used (e.g. GPU make and model).\n\n\n# Citations and Repositories\n\nRibeiro TdS, Lollar MJ, Sprengelmeyer QD, Huang Y, Benson DM, Orr MS, Johnson ZC, Corbett-Detig RB, Pool JE (2024)  \nRecombinant inbred line panels inform the genetic architecture and interactions of adaptive traits in Drosophila melanogaster  \n[bioRxiv](https://www.biorxiv.org/content/10.1101/2024.05.14.594228v1)\n\nLollar MJ, Kim E, Stern DL, Pool JE (2024)  \nCourtship song differs between African and European populations of Drosophila melanogaster and involves a strong effect locus  \n[bioRxiv](https://www.biorxiv.org/content/10.1101/2024.05.14.594231v1)\n\nLee S (2024)  \nStudying Aphid feeding behavior using the Electrical Penetration Graph  \nJanelia Meyerhoff Undergraduate Scholars Symposium\n\nD Ye, JT Walsh, IP Junker, Y Ding (2024)  \nChanges in the cellular makeup of motor patterning circuits drive courtship song evolution in Drosophila  \n[bioRxiv](https://www.biorxiv.org/content/10.1101/2024.01.23.576861v1.full)\n\nT Wiesner (2023)  \nHow good is SongExplorer, a machine learning based classifier, at recognising the electrical signals of Apteronotus leptorhynchus?  \nBachelor's Thesis, Humboldt Universitat zu Berlin\n\nHM Shiozaki, K Wang, JL Lillvis, M Xu, BJ Dickson, DL Stern (2023)  \nActivity of nested neural circuits drives different courtship songs in Drosophila  \n[Nat. Neurosci.](https://doi.org/10.1038/s41593-024-01738-9) [bioRxiv](https://www.biorxiv.org/content/10.1101/2022.12.14.520499v2.full)\n\nJL Lillvis, K Wang, HM Shiozaki, M Xu,  DL Stern, BJ Dickson (2023)  \nNested neural circuits generate distinct acoustic signals during Drosophila courtship  \n[Current Biology](https://www.cell.com/current-biology/fulltext/S0960-9822%2824%2900015-0) [bioRxiv](https://www.biorxiv.org/content/10.1101/2023.08.30.555537v2.abstract)  [figshare](https://doi.org/10.25378/janelia.24707544)\n\nJL Lillvis, H Otsuna, X Ding, I Pisarev, T Kawase, J Colonell, K Rokicki, C Goina, R Gao, A Hu, K Wang, J Bogovic, DE Milkie, L Meienberg, BD Mensh, ES Boyden, S Saalfeld, PW Tillberg, BJ Dickson (2022)  \nRapid reconstruction of neural circuits using tissue expansion and light sheet microscopy  \n[eLife](https://elifesciences.org/articles/81248) [bioRxiv](https://www.biorxiv.org/content/10.1101/2021.11.14.468535v2.abstract)\n\nBJ Arthur, Y Ding, M Sosale, F Khalif, E Kim, P Waddell, S Turaga, DL Stern (2021)  \n*SongExplorer*: A deep learning workflow for discovery and segmentation of animal acoustic communication signals  \n[bioRxiv](https://www.biorxiv.org/content/10.1101/2021.03.26.437280v1)  [figshare](https://figshare.com/articles/dataset/Audio_recordings_of_21_species_of_Drosophila_10_of_which_are_annotated/14328425)\n\n\n# Notation #\n\nThroughout this document `Buttons` and `variables` in the SongExplorer graphical\nuser interface (GUI) as well as `code` are highlighted with backticks.  Files and\npaths are enclosed in double quotes (\"...\").  The dollar sign ($) in code\nsnippets signifies your computer terminal's command line.  Square brackets\n([...]) in code indicate optional components, and angle brackets (\u003c...\u003e)\nrepresent sections which you much customize.\n\n\n# Installation #\n\nSongExplorer can be run on all three major platforms.  Installation is as simple\nas downloading a compressed binary file, unpacking it, and opening a file.  If\nyou have a Linux distribution other than Ubuntu, then you might need to use a\ncontainer (e.g. Docker), as Tensorflow, the machine learning framework from\nGoogle that SongExplorer uses, only supports Ubuntu.\n\nTraining your own classifier is fastest with a graphics processing unit (GPU).\nOn Linux and Windows you'll need an Nvidia card.  SongExplorer was tested and\nbuilt with CUDA version 12.1 on Ubuntu and 11.8 on Windows.  On Macs with Apple\nsilicon processors (i.e. the M series chips), the integrated GPU is accessed\nvia the Metal framework, which comes preinstalled on MacOS.\n\n## Downloading Executables ##\n\nDownload the ZIP file specific to your operating system from the Assets\nsection of Songexplorer's\n[Releases](https://github.com/JaneliaSciComp/SongExplorer/releases) page on\nGithub.  Then extract its contents, by either right-clicking on the icon, or\nexecuting this command in a terminal:\n\n    $ unzip songexplorer-\u003cversion\u003e-\u003carchitecture\u003e.zip\n\nIf there are multiple ZIP files ending in a three digit number, download them\nall and use a decompression program (e.g. 7-zip on MS Windows) that can\nautomatically glue them back together.  Github has a 2 GB limit on file size\nand so sometimes ZIP files must be split.\n\nIf on MS Windows you get an error about file paths being too long, edit the\nregistry as follows:  press Start and type \"regedit\" to launch the\nRegistryEditor, in the left sidebar navigate to \"HKEY_LOCAL_MACHINE \\ SYSTEM \\\\\nCurrentControlSet \\ Control \\ FileSystem\", on the right double-click on\n\"LongPathsEnabled\", change the value to \"1\", press Ok, sign out of your account\nand then back in, and try to decompress again making sure to use 7-zip.\n\nIf on MS Windows you get a permissions error when running SongExplorer, execute\nthe following in PowerShell:\n\n    \u003e Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser\n\n## System Configuration ##\n\nSongExplorer is capable of training a classifier and making predictions on\nrecordings either locally on the host computer, or remotely on a workstation or\na cluster.  You specify how you want this to work by editing \"configuration.py\".\n\nInside you'll find many variables which control where SongExplorer does its work:\n\n    $ grep _where= configuration.py\n    default_where=\"local\"\n    detect_where=default_where\n    misses_where=default_where\n    train_where=default_where\n    generalize_where=default_where\n    xvalidate_where=default_where\n    mistakes_where=default_where\n    activations_where=default_where\n    cluster_where=default_where\n    accuracy_where=default_where\n    delete_ckpts_where=default_where\n    freeze_where=default_where\n    ensemble_where=default_where\n    classify_where=default_where\n    ethogram_where=default_where\n    compare_where=default_where\n    congruence_where=default_where\n\nEach operation (e.g. detect, train, classify, generalize, etc.) is dispatched\naccording to these `_where` variables.  SongExplorer is shipped with each set to\n\"local\" via the `default_where` variable at the top of the configuration file.\nThis value instructs SongExplorer to perform the task on the same machine as used\nfor the GUI.  You can change which computer is used to do the actual work\neither globally through this variable, or by configuring the operation specific\nones later in the file.  Other valid values for these variables are \"server\"\nfor a remote workstation that you can `ssh` into, and \"cluster\" for an\non-premise Beowulf-style cluster with a job scheduler.\n\nNote that \"configuration.py\" must be a valid Python file.\n\n## Scheduling Jobs ##\n\nIrrespective of where you want to perform your compute, there are additional\nvariables that need to be tailored to your specific resources.\n\n### Locally ###\n\nWhen running locally SongExplorer uses a custom job scheduler,\n[aitch](https://github.com/JaneliaSciComp/aitch), to manage the\nresources required by different tasks.  Scheduling permits doing multiple\njobs at once, as well as queueing a bunch of jobs for offline analysis.  By\ndefault, each task reserves all of your computer's CPU cores, GPU cards, and\nmemory, and so only one job can be run at a time.  To tailor resources according\nto your particular data set, and thereby permit multiple jobs to be run\nsimultaneously, you need to specify for each kind of task how much of the\nsystem's resources are actually required.\n\nHere, for example, are the default settings for training a model locally:\n\n    $ grep train_ configuration.py | head -4\n    train_where=default_where\n    train_ncpu_cores=-1\n    train_ngpu_cards=-1\n    train_ngigabytes_memory=-1\n\nLet's break this down.  The variables ending in ncpu_cores, ngpu_cards, and\nngigabytes_memory specify, respectively, the number of CPU cores, number of GPU\ncards, and number of gigabytes of memory needed, with -1 reserving everything\navailable.  For the model in the [Tutorial](#tutorial) below, this is way\noverkill, even on the most humble computer, as training only uses two CPU cores\nand a gigabyte of memory.  So in this case you could set\n`train_{ncpu_cores,ngpu_cards,ngigabytes_memory}` to 2, 0, and 1, respectively.\nDoing so would then permit you to train multiple models at once.  Alternatively,\nif you have a GPU, you could set\n`train_{ncpu_cores,ngpu_cards,ngigabytes_memory}` to 2, 1, and 1 for this\nnetwork architecture.  As it happens though, training is quicker *without* a GPU\nfor this model.  Moreover, were these latter settings used on a machine with\njust one GPU, you could only train one model at a time.\n\nNote that these settings don't actually limit the job to that amount of\nresources, but rather they just limit how many jobs are running simultaneously.\nIt is important not to overburden your computer with tasks, so don't\nunderestimate the resources required, particularly memory consumption.\n\nTo make an accurate assessment for your particular workflow, use the `top` and\n`nvidia-smi` commands on Unix, the Task Manager on Windows, or the Activity\nMonitor on Macs to monitor jobs while they are running.\n\n    $ top\n    top - 09:36:18 up 25 days,  1:18,  0 users,  load average: 11.40, 12.46, 12.36\n    Tasks: 252 total,   1 running, 247 sleeping,   0 stopped,   4 zombie\n    %Cpu(s):  0.7 us,  0.9 sy, 87.9 ni, 10.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st\n    KiB Mem : 32702004 total,  3726752 free,  2770128 used, 26205124 buff/cache\n    KiB Swap: 16449532 total, 16174964 free,   274568 used. 29211496 avail Mem \n\n      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND\n    21124 arthurb   20   0 55.520g 2.142g 320792 S 131.2  6.9   1:38.17 python3\n        1 root      20   0  191628   3004   1520 S   0.0  0.0   1:20.57 systemd\n        2 root      20   0       0      0      0 S   0.0  0.0   0:00.33 kthreadd\n\nThe output above shows that a `python3` command, which is how a training session\nappears, is currently using 131.2% of a CPU core (e.g. 1.3 cores), and 6.9% of\nthe 32702004 KiB of total system memory (so about 2.15 GiB).\n\nUse the `nvidia-smi` command to similarly monitor the GPU card.  The same\n`python3` command as above is currently using 4946 MiB of GPU memory and 67% of\nthe GPU cores.  Use the `watch` command to receive repeated updates (i.e.\n`watch nvidia-smi`).\n\n    $ nvidia-smi\n    Fri Jan 31 09:35:13 2020       \n    +-----------------------------------------------------------------------------+\n    | NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |\n    |-------------------------------+----------------------+----------------------+\n    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n    |===============================+======================+======================|\n    |   0  GeForce GTX 980 Ti  Off  | 00000000:03:00.0 Off |                  N/A |\n    | 22%   65C    P2   150W / 250W |   4957MiB /  6083MiB |     67%      Default |\n    +-------------------------------+----------------------+----------------------+\n                                                                                   \n    +-----------------------------------------------------------------------------+\n    | Processes:                                                       GPU Memory |\n    |  GPU       PID   Type   Process name                             Usage      |\n    |=============================================================================|\n    |    0     21124      C   /usr/bin/python3                            4946MiB |\n    +-----------------------------------------------------------------------------+\n\n### Another Workstation ###\n\nUsing a lab or departmental server, or perhaps a colleague's workstation\nremotely, is easiest if you run SongExplorer on it directly and then view the GUI\nin your own personal workstation's internet browser.  To do this, simply `ssh`\ninto the server and install SongExplorer as described above.\n\nAlternatively, you can run the GUI code (in addition to viewing its output) on\nyour own personal workstation and batch compute jobs to the remote server.\nThis is easiest if there is a shared file system between the two computers.\nThe advantage here is that less compute intensive jobs (e.g. freeze, accuracy)\ncan be run on your workstation.  In this case:\n\n* Store all SongExplorer related files on the share, including the uncompressed\nZIP file or container image, \"configuration.py\", and all of your data.\n\n* Set the SONGEXPLORER_BIN environment variable on the remote machine as follows:\n\nOn MacOS (and Linux), put this definition in your .bashrc file:\n\n    export SONGEXPLORER_BIN='PATH=\u003cpath-to-unzipped-executable\u003e/songexplorer/bin:$PATH'\n\nFor MS Windows, the equivalent is, in a PowerShell terminal with administrator\nprivileges:\n\n    \u003e $tbpath=\"\u003cpath-to-unzipped-executable\u003e\"\n    \u003e [Environment]::SetEnvironmentVariable(\"SONGEXPLORER_BIN\",\n            $tbpath + \";\" +\n            $tbpath + \"\\bin\\songexplorer\\src;\" +\n            $tbpath + \"\\Library\\mingw-w64\\bin;\" +\n            $tbpath + \"\\Library\\usr\\bin;\" +\n            $tbpath + \"\\Library\\bin;\" +\n            $tbpath + \"\\Scripts;\" +\n            $tbpath + \"\\bin;\",\n            [EnvironmentVariableTarget]::Machine)\n    \u003e [Environment]::SetEnvironmentVariable(\"Path\",\n            $env:SONGEXPLORER_BIN + $env:Path,\n            [EnvironmentVariableTarget]::Machine)\n\n* Make the remote and local file paths match by creating a symbolic link.  For\n  example, if on a Mac you use SMB to mount as \"/Volumes/MyLab\" an NSF drive\n  whose path is \"/groups/MyLab\", then execute on your Mac `mkdir -p /groups \u0026\u0026\n  ln -s /Volumes/MyLab/ /groups/MyLab`. \n\n* You might need an RSA key pair.\n\n* You might need to use ssh flags `-i /ssh/id_rsa -o \"StrictHostKeyChecking\nno\"` in \"configuration.py\".\n\nIf you do not have a shared file system, the SongExplorer image and\nconfiguration file must be separately installed on both computers, and\nyou'll need to do all of the compute jobs remotely.\n\nLastly, update \"configuration.py\" with the name of the user and IP\naddress of the server.  As when doing compute locally, SongExplorer\nuses a job scheduler on the server to manage resources.  The per-task\nresources used are the same as specified for the local machine in\n`\u003ctask\u003e_{ncpu_cores,ngpu_cards,ngigabytes_memory}`.\n\n    $ grep -A2 \\'server configuration.py\n    # URL of the 'server' computer\n    server_username=\u003cyour-username\u003e\n    server_ipaddr=\u003cthe-hostname-of-your-server\u003e\n\n### An On-Premise Cluster ###\n\nSubmitting jobs to a cluster is similar to using a remote workstation, so read\nthe above section first.  You might want to even try batching to a another\nworkstation first, as it can be easier to debug problems than doing so on a\ncluster.\n\nYou use your own workstation to view the GUI in a browser, and can either run\nthe GUI code locally or on the cluster.  With the former you have the option to\nsubmit only a portion of the compute jobs to the cluster, whereas with the\nlatter they must all be performed by the cluster.  Running the GUI code on the\ncluster also requires that the cluster be configured to permit hosting a web\npage.  Moreover, if your cluster charges a use fee, you'll be charged even when\nthe GUI is sitting idle.\n\nAs before, it is easiest if there is a shared file system, and if so, all files\nneed to be on it, and the local and remote file paths must be the same or made\nto be the same with links.  The environment variables and aliases must also be\nthe same.\n\nYou'll likely need an RSA key pair, possibly need special `ssh` flags, and\ndefinitely need to specify the IP address of the head node and corresponding\njob submission command and its flags.  The best person to ask for help here is\nyour system administrator.\n\n    $ grep -A4 \\'cluster configuration.py\n    # specs of the 'cluster'\n    cluster_username=\u003cyour-username\u003e\n    cluster_ipaddr=\u003cthe-hostname-of-your-cluster's-head-node\u003e\n    cluster_cmd=\"bsub -Ne -Pmylab\"  # e.g.\n    cluster_logfile_flag=\"-oo\"  # e.g.\n\nThe syntax used to specify the resources required is unique to the particular\nscheduler your cluster uses and how it is configured.  SongExplorer was\ndeveloped and tested using the Load Sharing Facility (LSF) from IBM.  To support\nany cluster scheduler (e.g. SGE, PBS, Slurm, etc.), SongExplorer ignores\n`\u003ctask\u003e_{ncpu_cores,ngpu_cards,ngigabytes_memory}` when `\u003ctask_where\u003e` is set to\n\"cluster\" and uses the variables `\u003ctask\u003e_cluster_flags` instead to provide\nmaximum flexibility.  Instead of specifying the cores, GPUs, and RAM needed\nexplicitly, you give it the flags that the job submission command uses to\nallocate those same resources.\n\n    $ grep -E train_cluster_flags configuration.py\n    train_cluster_flags=\"-n 2 -gpu 'num=1' -q gpu_rtx\"  # e.g.\n\nLSF, and maybe other schedulers, does not honor bash's optional variable\nassignments.  So one cannot use SONGEXPLORER_BIN as described in [Another\nWorkstation](#another-workstation) to specify the path to the SongExplorer\nexecutable.  If modifying your .bashrc file to export a modified PATH is not an\noption, then specify the `-env` flag in your configuration.py:\n\n    cluster_cmd=\"bsub -env 'all, \\\n        PATH=\u003cpath-to-unzipped-executable\u003e/songexplorer/bin:\u003cpaths-to-everything-else\u003e'\"\n\n\n# Starting SongExplorer #\n\nLaunch SongExplorer's GUI by right-clicking on OPEN-WITH-TERMINAL.sh (or\nRUN-WITH-POWERSHELL.ps1 on MS Windows).  Like this on the command line:\n\n    $ ./OPEN-WITH-TERMINAL.sh\n    INFO: detected 12 local_ncpu_cores, 1 local_ngpu_cards, 31 local_ngigabytes_memory\n    SongExplorer version: 27 May 2022 b0c7d5b5452c\n    arthurb-ws2:8080\n    2020-08-09 09:30:02,377 Starting Bokeh server version 2.0.2 (running on Tornado 6.0.4)\n    2020-08-09 09:30:02,381 User authentication hooks NOT provided (default user enabled)\n    2020-08-09 09:30:02,387 Bokeh app running at: http://localhost:8080/gui\n    2020-08-09 09:30:02,387 Starting Bokeh server with process id: 1189\n    2020-08-09 09:30:15,054 404 GET /favicon.ico (10.60.1.47) 1.15ms\n    2020-08-09 09:30:15,054 WebSocket connection opened\n    2020-08-09 09:30:15,055 ServerConnection created\n\nThe SongExplorer GUI should automatically open in a new tab of your default\ninternet browser.  If not, manually navigate to the URL on the line printed\nto the terminal immediately below the version information.  In the output\nabove this is \"arthurb-ws2:8080\", which is my computer's name, but for you\nit will be different.  If that doesn't work, try \"http://localhost:8080/gui\".\n\nOn the left you'll see three empty panels (two large squares side by side and\nthree wide rectangles underneath) in which the sound recordings are displayed and\nannotated.  In the middle are buttons and text boxes used to train the\nclassifier and make predictions with it, as well as a file browser and a large\ngrey text box with the network architecture.  On the right is this instruction\nmanual for easy reference and below it the contents of \"configuration.py\".\n\nThe first time you use SongExplorer many of the parameters will need to be\nmanually specified.  Their values are saved into \"songexplorer.state.yml\" and\nare subsequently automatically filled in with their previous values.\n\n\n# Tutorial #\n\nSongExplorer provides two main workflows.  A supervised approach in which you\niteratively train a model to output the probabilities over time of specific\nwords of your choosing (yellow curve below).  And an unsupervised approach in\nwhich the recordings are such that labels can be applied automatically, with\nthe output being how those sounds cluster after a model is trained to\ndistinguish between them (pink curve).  This tutorial describes the former\nsupervised workflow.  It's best to read it in it's entirety, but you could also\nskip to [Unsupervised Methods](#unsupervised-methods).  The blue curve below is\ndescribed in [Discovering Novel Sounds](#discovering-novel-sounds).  [Video\ntutorials](https://www.youtube.com/playlist?list=PLYXyXDkMwZip8x78RAyN6ee9NK42WBbKb)\nare also available.\n\n\u003cimg src='src/gui/static/workflows.svg' width=400px alt=''\u003e\n\u003cimg src='gui/static/workflows.svg' width=400px alt=''\u003e\n\nLet's walk through the steps needed to train a classifier completely from\nscratch.\n\nBy default, recordings need to be monaural 16-bit little-endian PCM-encoded WAV\nfiles.  They should all be sampled at the same rate, which can be anything.\nFor this tutorial we supply you with *Drosophila melanogaster* data sampled at\n2500 Hz.  For other file formats, see the [Loading Data](#loading-data) section\nin [Customizing with Plug-ins](#customizing-with-plug-ins).\n\nFirst, let's get some data bundled with SongExplorer into your home directory.\nUsing your computer's file browser, create a new folder called\n\"groundtruth-data\" with a subfolder inside it called \"round1\", and copy into\nthere the WAV recording called \"PS_20130625111709_ch3.wav\" from\n\"\u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/data\".  Like this\non the command line:\n\n    $ ls -1 \u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/data\n    20161207T102314_ch1.wav-annotated-person1.csv\n    20161207T102314_ch1.wav\n    20190122T093303a-7.wav-annotated-person2.csv\n    20190122T093303a-7.wav-annotated-person3.csv\n    20190122T093303a-7.wav\n    20190122T132554a-14.wav-annotated-person2.csv\n    20190122T132554a-14.wav-annotated-person3.csv\n    20190122T132554a-14.wav\n    Antigua_20110313095210_ch26.wav\n    PS_20130625111709_ch3.wav-annotated-person1.csv\n    PS_20130625111709_ch3.wav\n\n    $ mkdir -p groundtruth-data/round1\n\n    $ cd \u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/data\n    $ cp PS_20130625111709_ch3.wav $PWD/groundtruth-data/round1\n\n## Manually Annotating ##\n\nNow we need to annotate intervals in time to create a ground truth data set on\nwhich to train a classifier.\n\nFirst, enter into the `Ground Truth` textbox the full path to the\n\"groundtruth-data\" folder you created above (leaving off the \"round1\"\nsub-folder).  You can either type it in by hand, or navigate to this folder in\nthe `File Browser` and click on the `Ground Truth` button.  You'll see the wide\npull-down menu to the left labeled \"recording\" briefly turn orange, and below\nthat will appear a table showing how many sounds we're detected above.\n\nThen check that `labels to use` and `kinds to use` or both blank, pull down on\nthe `recording` menu on the far left, and select the WAV file in which you just\ndetected sounds.  The waveform will be plotted in the panel below.  Pan and\nzoom through the recording with the buttons labeled with arrows below and to\nthe left.  The `Play` button can be used to listen to the sound.\n\nTo record a manual annotation, first find a sound that is an unambiguous\nexample of a particular word.  Type the word's name into one of the text boxes\nto the right of the pan and zoom controls and hit return to activate the\ncorresponding counter to its left.  Either double-click or click-and-drag in\nthe bottom half of the context window to create a custom time span for a new\nannotation.  Annotations can be deleted by double clicking the gray boxes or\nclicking on the \"undo\" button above.\n\nFor this tutorial, choose the words \"mel-pulse\", \"mel-sine\", \"ambient\", and\n\"other\".  We use the syntax \"A-B\" here, where A is the species (mel\nbeing short for *D.  melanogaster*) and B is the song type.  That syntax is not\nrequired though-- the labels can be anything of your choosing.  The GUI\ndoes have a feature, however, to split labels at the hyphen and display\ngroups of words that share a common prefix or suffix.\n\n## Training a Classifier ##\n\nOnce you have a few tens of examples for each word, it's time to train a\nclassifier and make some predictions.  First, confirm that the annotations you\njust made were saved into an \"-annotated.csv\" file in the\n\"groundtruth-data/round1\" folder.\n\n    $ tree groundtruth-data\n    groundtruth-data\n    └── round1\n        ├── PS_20130625111709_ch3.wav-annotated-\u003ctimestamp\u003e.csv\n        ├── PS_20130625111709_ch3.wav-detected.csv\n        ├── PS_20130625111709_ch3.wav-detect.log\n        └── PS_20130625111709_ch3.wav\n\n    $ tail -5 groundtruth-data/round1/PS_20130625111709_ch3.wav-annotated-\u003ctimestamp\u003e.csv\n    PS_20130625111709_ch3.wav,470151,470719,annotated,mel-sine\n    PS_20130625111709_ch3.wav,471673,471673,annotated,mel-pulse\n    PS_20130625111709_ch3.wav,471752,471752,annotated,mel-pulse\n    PS_20130625111709_ch3.wav,471839,471839,annotated,mel-pulse\n    PS_20130625111709_ch3.wav,492342,498579,annotated,ambient\n\nClick on the `Make Predictions` button and then the `Train` button.  All of the\nparameters below that are *not* used in this step will be greyed out and\ndisabled.  If all of the required parameters are filled in, the `DoIt!` button\nin the upper right will in addition be enabled and turn red.\n\nUsing the `file browser`, set `Logs Folder` to a directory in which to put the\ntrained model (e.g. \"trained-classifier1\") and confirm that `Ground Truth` is\nstill set to the parent of the \"round1\" folder containing the WAV and CSV files\n(i.e.  \"groundtruth-data\" in this case).  One hundred steps suffices for this\namount of ground truth.  So we can accurately monitor the progress, withhold\n40% of the annotations to validate on, and do so every 10 steps.  Enter these\nvalues into the `# steps`, `validate %`, and `validate period` variables.\nCheck that the `labels to use` variable is set to\n\"mel-pulse,mel-sine,ambient,other\", and `kinds to use` is \"annotated\".  Note\nthat the total number of annotations must exceed the size of the mini-batches,\nwhich is specified by the `mini-batch` variable.  The rest of the fields, most\nof which specify the network architecture, are filled in with default values\nthe first time you ever use SongExplorer.  Any changes you make to them, along\nwith all of the other text fields, are saved to a file named\n\"songexplorer.state.yml\" in the directory specified by \"state_dir\" in\n\"configuration.py\".\n\nClick on the red `DoIt!` button to start training a classifier.  It will turn\norange while the job is being asynchronously dispatched, and then back to grey.\n\"TRAIN trained-classifier1 (\u003cjobid\u003e)\" will appear in the status bar.  It's font\nwill initially be grey to indicate that it is pending, then turn black when it\nis running, and finally either blue if it successfully finished or red if it\nfailed.\n\nOutput into the log directory are \"train1.log\", \"train_1r.log\", and\n\"train_1r/\".  The former two files contain error transcripts should any\nproblems arise, and the latter folder contains checkpoint files prefixed with\n\"ckpt-\" which save the weights of the neural network at regular intervals.\n\nWith small data sets the network should just take a minute or so to train.\nAs your example set grows, you might want to monitor the training progress\nas it goes:\n\n    $ watch tail trained-classifier1/train_1r.log\n    Every 2.0s: tail trained-classifier1/train_1.log      Fri Jun 3 14:37:31 2022\n\n    39.697532,9,75.8,0.947476\n    43.414184,10,84.4,0.871244\n    Saving to \"/home/arthurb/songexplorer/trained-classifier1/train_1k/ckpt-10\"\n    Confusion Matrix:\n     ['mel-pulse', 'mel-sine', 'ambient']\n     [[26  9  9]\n     [ 0  4  0]\n     [ 0  0  4]]\n    45.067488,10,65.4 Validation \n    48.786851,11,79.7,0.811077\n\nIt is common for the accuracy, as measured on the withheld data and reported as\n\"Validation accuracy\" in the log file above, to be worse than the training\naccuracy.  If so, it is an indication that the classifier does not generalize\nwell at that point.  With more training steps and more ground-truth data though\nthe validation accuracy should become well above chance.\n\n## Quantifying Accuracy ##\n\nMeasure the classifier's performance on the withheld annotations using the\n`Accuracy` button.  For the purposes of this tutorial, leave the `P/Rs` textbox\n(short for Precision / Recall ratio) set to the default value of \"1.0\" so that\nthe false positives and false negatives are equally weighted.  [In future, if,\nsay, minimizing false positives in your experiments is important, and you are\ntolerant of a few more false negatives, you can set it to say \"10\" (or \"0.1\" if\nvice versa).]\n\nOutput are the following charts and tables in the logs folder and the `train_1r`\nsubdirectory therein:\n\n* \"train-validation-loss.pdf\" shows the loss value and training and validation\n  recalls as a function of the number of training steps, wall-clock time, and\n  epochs.  Should the curves not quite plateau, choose a checkpoint to `restore\n  from`, increase `# steps`, and train some more.  If you've changed any of the\n  parameters, you'll need to first reset them as they were, which is made easy\n  by selecting one of the original log files and pressing the `Copy` button.\n\n* \"confusion-matrix.pdf\" shows the confusion matrix of the most accurate\n  checkpoint.  Each annotation is placed in this two-dimensional grid according\n  to the label it was manually assigned and the label it was automatically\n  predicted to be.  For a perfect classifier this matrix would be diagonal--\n  that is, the upper left to lower right boxes would be bright yellow and all\n  others dark purple.\n\n  The F1 score, which is the product divided by the sum of the precision and\n  recall, times two, is taken to be the \"most accurate checkpoint\".  In this\n  particular case, there is only one model, but in [Measuring\n  Generalization](#measuring-generalization) and [Searching\n  Hyperparameters](#searching-hyperparameters) we'll train multiple models.  In\n  those cases, the *sum* of the confusion matrices with the best F1s for each\n  model is shown here.  Multiple models are also created if `# replicates` is\n  greater than 1.\n\n  If there are ten or fewer labels, the squares are large enough to annotate\n  them in the middle with the total number of annotations falling into this\n  square.  The number in the upper right triangle in each square is the number\n  of annotations in this square divided by the number of annotations in this\n  row.  For boxes along the diagonal it indicates the recall for that label,\n  which is the percentage of true positives among all real events (true\n  positives plus false negatives).  Similarly in the lower left is the\n  precision-- the percentage of true positives among all (both true and false)\n  positives.  It is calculated by dividing the numbers in the upper right corner\n  of each box by the sum of the corresponding column.  In the title are the\n  overall precision and recall, which are calculated as the average of the lower\n  left and upper right numbers along the diagonal.\n\n* In the left panel of \"precision-recall.pdf\" is the precision and recall for\n  each label plotted separately.  These are simply the values in the corners of\n  the boxes along the diagonal of the confusion matrix.  For a perfect\n  classifier they would all be 100.  In this case all of the circles have black\n  perimeters because this logs folder contains just a single trained model, but\n  in the case where there are multiple models in this logs folder each will have\n  it's own point here calculated from their individual confusion matrices.  The\n  model-specific circles will be smaller and have white perimeters, with the\n  larger circles outlined in black being the average across models.\n\n  Similarly in the right panel of \"precision-recall.pdf\" is the precision and\n  recall for each model (as opposed to label) plotted separately.  Small circles\n  with white perimeters show the label-specific accuracies and larger circles\n  with black perimeters show the average across labels for each model.\n\n  The legend to the right indicates the best checkpoint for each model.  Refer\n  here when deciding which model to freeze for making predictions.\n\n* \"P-R-F1-label.pdf\" plots validation precision, recall, and the F1 score over\n  time in the top, middle, and bottom rows respectively with a separate column\n  for each label.  Check here to make sure that the accuracy of each label has\n  converged.  If there is more than one model in this logs folder, the thin\n  colored lines are the accuracies for each model, with the thick black line\n  being the average across models.  \"P-R-F1-model.pdf\" is similar, but the\n  columns are the models and the thin colored lines the labels.  Averages across\n  all labels and models are plotted in \"P-R-F1-average.pdf\".\n\n* \"PvR.pdf\" plots, separately for each label and model, the trajectory of the\n  validation precision versus recall curve over number of training steps.  The\n  leftmost column and topmost row show the averages across models and labels,\n  respectively, with the upper left plot showing the average across all models\n  and labels.\n\n* \"train_1r/confusion-matrix.ckpt-\\*.csv\" shows the confusion matrices for each\n  checkpoint.\n\n* \"train_1r/thresholds.ckpt-\\*.csv\" lists the label-specific probability\n  thresholds that are used to achieve the precision-recall ratio specified in\n  the `P/Rs` textbox.  One of these files is used when creating ethograms ([see\n  Making Predictions](#making-predictions)).\n\n* \"train_1r/precision-recall.ckpt-\\*.pdf\" and\n  \"train_1r/sensitivity-specificity.ckpt-\\*.pdf\" show how the ratio of false\n  positives to false negatives changes as the threshold used to call an event\n  changes.  The areas underneath these curves are widely-cited metrics of\n  performance.  The actual threshold used is indicated in red.  Higher\n  thresholds result in more false negatives, and so would be further up and to\n  the left on this curve.\n\n* \"train_1r/probability-density.ckpt-\\*.pdf\" shows, separately for each label,\n  histograms of the values of the classifier's output taps across all of that\n  label's annotations.  The difference between a given label's probability\n  distribution and the second most probable label can be used as a measure of\n  the classifier's confidence.  The dashed vertical black line is the threshold\n  used for this label.\n\n* The CSV files in the \"train_1r/predictions.ckpt-\\*\" directory list the\n  specific annotations in the withheld validation set which were misclassified\n  (plus those that were correct).  The WAV files and time stamps therein can be\n  used to look for patterns in the raw data ([see Examining\n  Errors](#examining-errors)).\n\n## Making Predictions ##\n\nFor the next round of manual annotations, we're going to have this newly\ntrained classifier find sounds for us.  And we're going to do so with a\ndifferent recording so that the classifier learns to be insensitive to\nexperimental conditions.\n\nFirst let's get some more data bundled with SongExplorer into your home\ndirectory.  Make a new subfolder in \"groundtruth-data\" called \"round2\".  Then\ncopy \"20161207T102314_ch1.wav\" from\n\"\u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/data\" into there.\nLike this on the command line:\n\n    $ mkdir groundtruth-data/round2\n\n    $ cd \u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/data\n    $ cp 20161207T102314_ch1.wav $PWD/groundtruth-data/round2\n\nUse the `Freeze` button to save the classifier's neural network graph structure\nand weight parameters into the file format that TensorFlow needs for inference.\nYou'll need to tell it which model to use by selecting the most accurate\ncheckpoint file in the classifier's log files with the `File Browser` (i.e. one\nof \"trained-classifier1/train_1r/ckpt-100.{index,data\\*}\" in this case).\nOutput into the log files directory are \"freeze.ckpt-\\*.log\" and\n\"frozen-graph.ckpt-\\*.log\" files for errors, and a \"frozen-graph.ckpt-\\*.pb/\"\nfolder containing the binary data.  This latter PB folder, or the\n\"saved_model.pb\" file therein, can in future be chosen as the model instead of\na checkpoint file.\n\nNow use the `Classify` button to generate probabilities over time for each\nannotated label.  Specify which recordings using the `File Browser` and the `WAV\nFiles` button.  Note that while the \"Checkpoint File\" button changed to \"PB\nFile\", you can leave the text box as is;  all SongExplorer needs is a filename from\nwhich it can parse \"ckpt-\\*\".  The probabilities for each label are stored in\nseparate WAV files, with the label appended as a suffix:\n\n    $ ls groundtruth-data/round2/\n    20161207T102314_ch1.wav-ambient.wav    20161207T102314_ch1.wav-mel-sine.wav\n    20161207T102314_ch1.wav-classify.log   20161207T102314_ch1.wav-other.wav\n    20161207T102314_ch1.wav-mel-pulse.wav  20161207T102314_ch1.wav\n\nDiscretize these probabilities using thresholds based on a set of\nprecision-recall ratios using the `Ethogram` button.  Choose one of the\n\"thresholds.ckpt-\\*.csv\" files in the log files folder using the `File Browser`.\nThese are created by the `Accuracy` button and the values therein are controlled\nby the `P/Rs` variable at the time you quantified the accuracy.  For convenience\nyou can also just leave this text box as it was when freezing or classifying;\nall SongExplorer needs is a filename in the logs folder from which in can parse\n\"ckpt-\\*\".  You'll also need to specify which \".wav\" files to threshold using\nthe `WAV Files` button.  Again, for convenience, you can leave this as it was\nwhen classifying, as what is needed here is the \".wav\" file of the raw\nrecording, not those containing the label probabilities.\n\n    $ ls -t1 groundtruth-data/round2/ | head -2\n    20161207T102314_ch1.wav-ethogram.log\n    20161207T102314_ch1.wav-predicted-1.0pr.csv\n\n    $ head -5 groundtruth-data/round2/20161207T102314_ch1.wav-predicted-1.0pr.csv \n    20161207T102314_ch1.wav,19976,20008,predicted,mel-pulse\n    20161207T102314_ch1.wav,20072,20152,predicted,mel-sine\n    20161207T102314_ch1.wav,20176,20232,predicted,mel-pulse\n    20161207T102314_ch1.wav,20256,20336,predicted,mel-sine\n    20161207T102314_ch1.wav,20360,20416,predicted,mel-pulse\n\nThe resulting CSV files are in the same format as those generated when we\nmanually annotated sounds earlier using the GUI.  Note that the fourth column\ndistinguishes whether these sounds were annotated or predicted.\n\n## Fixing Mistakes ##\n\nNow that we have the predicted probability of each label over time, we can look\nfor sounds the SongExplorer misclassified and add them to the ground truth with\nthe correct annotations.\n\nFirst, set `kinds to use` to \"predicted\" and confirm that `labels to use` is\nstill \"mel-pulse,mel-sine,ambient,other\".  Then, choose the recording for which\nyou just made predictions from in the recordings pull-down menu.  Below the\nwaveform you'll now see a stacked bar chart of the probability of each label.\nHover over this chart to see a tool-tip mapping the color to a label.  Pan and\nzoom through the recordings to find intervals for which the label with the\nhighest probability is wrong.  Double-click or click-and-drag the correct label\nin the lower half of the waveform as described in [Manually\nAnnotating](#manually-annotating).  These are the most important mistakes to\ncorrect.  While doing so, also keep an eye out for intervals in which all\nlabels are similarly probable, which indicates that SongExplorer is confused.\nAnnotate these cases with the correct label as well.  All other times-- those\nfor which the highest probability was correct-- should be ignored.\n\nThe most effective annotations are those that correct the classifier's\nmistakes, so don't spend much time, if any, annotating what it got right.\n\n## Minimizing Annotation Effort ##\n\nFrom here, we just keep alternating between training a new classifier and\nfixing its mistakes, using a new recording for each iteration, until mistakes\nbecome sufficiently rare.\n\nEach time you train a new classifier, all of the existing \"predicted.csv\",\n\"missed.csv\", and word-probability WAV files are moved to an \"oldfiles\"\nsub-folder as they will be out of date.  You might want to occasionally delete\nthese folders to conserve disk space:\n\n    $ rm groundtruth-data/*/oldfiles*\n\nIdeally a new model would be trained after each new annotation is made, so that\nsubsequent time is not spent correcting a prediction (or lack thereof) that\nwould no longer be made in error.  Training a classifier takes time though, so\na balance must be struck with how quickly you alternate between annotating and\ntraining.\n\nSince there are more annotations each time you train, use a proportionately\nsmaller percentage of them for validation and proportionately larger number of\ntraining steps.  You don't need more than ten-ish annotations for each word to\nconfirm that the learning curves converge, and a hundred-ish suffice to\nquantify accuracy.  Since the learning curves generally don't converge until\nthe entire data set has been sampled many times over, set `# steps` to be\nseveral fold greater than the number of annotations (shown in the table near\nthe labels) divided by the `mini-batch` size, and check that it actually\nconverges with the \"train-validation-loss.pdf\", \"P-R-F1*.pdf\", and \"PvR.pdf\"\nfigures generated by the `Accuracy` button.  If the accuracy converges before\nan entire epoch has been trained upon, use a smaller `learning rate`.\n\nAs the wall-clock time spent training is generally shorter with larger\nmini-batches, set it as high as the memory in your GPU will permit.  Multiples\nof 32 are generally faster.  The caveat here is that exceedingly large\nmini-batches can reduce accuracy, so make sure to compare it with smaller ones.\n\nOne should make an effort to choose a recording at each step that is most\ndifferent from the ones trained upon so far.  Doing so will produce a\nclassifier that generalizes better.\n\nOnce a qualitatively acceptable number of errors in the ethograms is\nachieved, quantitatively measure your model's ability to generalize\nby leaving entire recordings out for validation ([see Measuring\nGeneralization](#measuring-generalization)).  Use cross validation to\nmaximize the accuracy by fine tuning the hyperparameters (see [Searching\nHyperparameters](#searching-hyperparameters)).  Then train a single model with\nnearly all of your annotations for use in your experiments.  Optionally, use\n`# replicates` to train multiple models with different batch orderings and\ninitial weights to measure the variance.  These replicate models can also\nbe combined into an ensemble model (see [Ensemble Models](#ensemble-models))\nfor even greater accuracy.  Finally, report accuracy on an entirely separate\nset of densely-annotated test data([see Testing Densely](#testing-densely)).\n\n# Advanced Usage #\n\n## Detecting Rare Sounds ##\n\nIf your recordings are mostly ambient, panning and zooming through them to find\nsounds of interest to annotate can be quite laborious.  To make this easier,\nSongExplorer can threshold the recording to demarcate intervals in time which\nexceed a certain volume.\n\nClick on the `Label Sounds` button and then `Detect`.  In the `File Browser`,\nnavigate to a WAV file of interest and click on the `WAV Files` button.  Set the\nparameters as described below and then click `DoIt!`.\n\nThe eight parameters have default values that are suitable to the data in the\ntutorial above.  If it's not finding sounds for your data, try making the first\nnumber in `time σ` smaller and/or the first number in `freq ρ` bigger.  If\nrather it's too sensitive, do the opposite.  Conversely, if it's labelling\nambient as a sound, make the second number in `time σ` bigger and the second\nnumber in `freq ρ` smaller.  `time smooth` and `freq smooth` will fill in small\ngaps between detected sounds and cull detected sounds that are too short.  For\nfurther details, see the comments at the top of\n\"src/detect-plugins/time-freq-threshold.py\".\n\n[Customizing with Plug-ins](#customizing-with-plug-ins) describes how to\nuse arbitrary custom code of your choosing to detect events should this\ndefault algorithm not suit your data.\n\nThe result is a file of comma-separated values with the start and stop times\n(in tics) of sounds which exceeded a threshold in either the time or frequency\ndomain, plus intervals which did not exceed either.\n\n    $ grep -m 3 time groundtruth-data/round1/PS_20130625111709_ch3.wav-detected.csv\n    PS_20130625111709_ch3.wav,2251,2252,detected,time\n    PS_20130625111709_ch3.wav,2314,2316,detected,time\n    PS_20130625111709_ch3.wav,2404,2405,detected,time\n\n    $ grep -m 3 frequency groundtruth-data/round1/PS_20130625111709_ch3.wav-detected.csv\n    PS_20130625111709_ch3.wav,113872,114032,detected,frequency\n    PS_20130625111709_ch3.wav,158224,158672,detected,frequency\n    PS_20130625111709_ch3.wav,182864,182960,detected,frequency\n\n    $ grep -m 3 neither groundtruth-data/round1/PS_20130625111709_ch3.wav-detected.csv\n    PS_20130625111709_ch3.wav,388,795,detected,neither\n    PS_20130625111709_ch3.wav,813,829,detected,neither\n    PS_20130625111709_ch3.wav,868,2201,detected,neither\n\nYou can jump from one such detected event to the next using the arrow buttons\nin the lower left that have an \"L\".  Check that the `kinds to use` and `labels\nto use` text boxes are set to \"detected\" and \"time,frequency,neither\",\nrespectively.  This should have been automatically done when pressing the\n\"Label Sounds\" button above.\n\n\n## Clustering Annotations ##\n\nOnce you have a set of annotated sounds, either manual, predicted, detected,\netc., clustering them based on the hidden states of a trained classifier is a\nuseful technique to look for patterns.  It also provides another quick\nalternative to panning and zoom, as sounds within each cluster can be clicked\non to directly jump to that time in the recording.\n\nFirst, use the `Activations` button to save the input to the neural network as\nwell as its hidden state activations and output logits by classifying sounds\nwith the trained network.  Select which sounds to classify using `kinds to use`\nand `labels to use`.  The time and amount of memory this takes depends directly\non the number and dimensionality of the sounds.  To limit the problem to a\nmanageable size one can use `max sounds` to randomly choose a subset of samples\nto cluster.  So that words with few samples are not obscured by those with\nmany, one can randomly subsample the latter by setting `equalize ratio` to a\nsmall integer.  Output are three files in the `Ground Truth` directory:\n\"activations.log\", \"activations-samples.log\", and \"activations.npz\".  The two\nending in \".log\" report any errors, and the \".npz\" file contains the actual\ndata in binary format.\n\nNow reduce the dimensionality of the hidden state activations to either two or\nthree dimensions with the `Cluster` button.  By default, Songexplorer uses the\nUMAP algorithm ([McInnes, Healy, and Melville\n(2018)](https://arxiv.org/abs/1802.03426)), but tSNE and PCA can be used\ninstead via a plugin (see [Clustering Algorithm](#clustering-algorithm)).  For\nnow, leave the `neighbors` and `distance` parameters set to their default\nvalues.  A description of how they change the resulting clusters can be found\nin the aforementioned article.  Also leave the `PCA fraction` parameter at its\ndefault.  In future, if you find clustering slow for larger data sets, UMAP can\nbe preceded by PCA, and the fraction of coefficients that are retained is\nspecified using `PCA fraction`.  Lastly, choose to cluster just the last hidden\nlayer using the \"layers\" multi-select box.  Output are two or three files in\nthe `Ground Truth` directory: \"cluster.log\" contains any errors, \"cluster.npz\"\ncontains binary data, and \"cluster-pca.pdf\" shows the results of the principal\ncomponents analysis (PCA) if one was performed.\n\nFinally, click on the `Visualize` button to render the clusters in the\nleft-most panel.  Adjust the size and transparency of the markers using\nthe `Dot Size` and `Dot Alpha` sliders respectively.  There should be some\nstructure to the clusters even at this early stage of annotation.\n\nTo browse through your recordings, click on one of the more dense areas and a\nfuchsia circle (or sphere if the clustering was done in 3D) will appear.  In\nthe right panel are now displayed snippets of sounds which are within that\ncircle.  The size of the circle can be adjusted with the `Circle Radius` slider\nand the number of snippets displayed with `gui_snippet_n{x,y}` in\n\"configuration.py\".  The snippets should exhibit some similarity to one another\nsince they are neighbors in the clustered space.  They will each be labeled\n\"mel-pulse\", \"mel-sine\", or \"ambient\" to indicate which threshold criterion\nthey passed.  The color is the scale bar-- yellow is loud and purple is quiet.\nClicking on a snippet will show it in greater temporal context in the wide\ncontext panel below.\n\n## Correcting False Alarms ##\n\nIf the precision of a classifier is low, that means it's predicting intervals\nin time as having a sound of interest when it shouldn't.  To home in on just\nthis kind of mistake, you can set `kinds to use` to \"predicted\" and use the\narrow buttons with an \"L\" to jump from one prediction to the next, correcting\nas you go.  Alternatively, you can cluster all of these predictions to look at\nthem in aggregate.\n\nFirst, click on the `Fix False Positives` button and double check that `kinds\nto use` is auto-populated with \"annotated,predicted\".  Then cluster as\ndescribed in [Clustering Annotations](#clustering-annotations) .\n\nSelect `predicted` and `ambient` from the `kind` and `no hyphen` pull-down\nmenus, respectively, and then click on a dense part of the cluster plot.  Were\nthe classifier perfect, all the snippets now displayed would look like\nbackground noise.  Click on one that doesn't.  To manually annotate it, simply\ndouble click on the snippet to create a new annotation exactly where the gray\nbox in the upper half of the context window is.  One can also double-click on\nthis grey box itself.  In either case, it will be extended to the bottom half\nand your chosen label will be applied.  Double-click the snippet or the either\nthe upper or lower grey box again to change your mind.  If the temporal extent\nof the upper gray box is not quite right, create a new annotation by either\ndouble-click or click-and-drag in the bottom half of the context window\ninstead.  Similarly select `mel-` and `-pulse` from the `species` and `word`\npull-down menus and correct any mistakes, and then `mel-` and `-sine`.\n\nKeep in mind that the only words which show up in the clusters are those that\nexceed the chosen threshold.  Any mistakes you find in the snippets are hence\nstrictly false positives.\n\n## Correcting Misses ##\n\nConversely, if the recall of a classifier is low, it is \\emph{not} predicting\nintervals to have a sound when it should have.  One way to find this kind of\nmistake is to navigate to random predictions and look in the surrounding\ncontext for sounds that have not been predicted.  This method is cumbersome in\nthat you have to scan through the recording.  A better way is to directly home\nin on detected sounds that don't exceed the probability threshold.\n\nFirst click on the `Fix False Negatives` button.  Then detect sounds in a\nrecording that you have classified using the `Detect` button as described in\n[Detecting Rare Sounds](#detecting-rare-sounds).  Then create a list of the\nsubset of these sounds which were not assigned a label using the `Misses`\nbutton.  For the latter, you'll need to specify both the detected and predicted\nCSV files with the `File Browser` and the `WAV Files` button.  The result is\nanother CSV file, this time ending in \"missed.csv\":\n\n    $ head -5 groundtruth-data/round2/20161207T102314_ch1.wav-missed.csv \n    20161207T102314_ch1.wav,12849,13367,missed,other\n    20161207T102314_ch1.wav,13425,13727,missed,other\n    20161207T102314_ch1.wav,16105,18743,missed,other\n    20161207T102314_ch1.wav,18817,18848,missed,other\n    20161207T102314_ch1.wav,19360,19936,missed,other\n\nNow visualize the hidden state activations--  Double check that the `label\ntypes` variable was auto-populated with \"annotated,missed\" and then use the\n`Activations`, `Cluster`, and `Visualize` buttons in turn.\n\nExamine the false negatives by selecting `missed` in the `kind` pull-down menu\nand click on a dense cluster.  Were the classifier perfect, none of the\nsnippets would be an unambiguous example of any of the labels you trained upon\nearlier.  Annotate any of them that are, and add new label types for sound\nevents which fall outside the current categories.\n\n## Double Checking Annotations\n\nIf a mistake is made annotating, say the wrong label is applied to a\nparticular time interval, and you notice this immediately, use the `Undo`\nbutton to correct it.\n\nSometimes though, mistakes might slip into the ground truth and a model is\ntrained with them.  These latter mistakes can be corrected in a fashion similar\nto correcting false positives and false negatives.  Simply cluster the hidden\nstate activations using the `Activations`, `Cluster`, and `Visualize` buttons\nas before making sure that \"annotated\" is in `kinds to use`.  Then click on\n`annotated` in the `kind` pull-down menu and select one of your labels (e.g.\n`mel-` in `species` and `-pulse` in `word`).  Scan through the visualized\nclusters by clicking on several points and looking at the snippets therein.  If\nyou find an error, simply choose the correct label in one of the text boxes\nbelow and then double click on either the snippet itself or the corresponding\ngray box in the upper half of the wide context window.  If you want to remove\nthe annotation entirely, choose a label with an empty text box and\ndouble-click.  In both cases, the entry in the original \"annotated.csv\" file is\nremoved, and in the former case a new entry is created in the current\n\"annotated.csv\" file.  Should you make a mistake while correcting a mistake,\nsimply `Undo` it, or double click it again.  In this case, the original CSV\nentry remains deleted and the new one modified in the current \"annotated.csv\"\nfile.\n\n## Measuring Generalization ##\n\nUp to this point we have validated on a small portion of each recording.  Once\nyou have annotated many recordings though, it is good to set aside entire\nrecordings to validate on.  In this way we measure the classifier's ability to\nextrapolate to different microphones, individuals, or whatever other\ncharacteristics that are unique to the withheld recordings.\n\nTo train one classifier with a single recording or set of recordings withheld\nfor validation, first click on `Generalize` and then `Omit All`.  Use the `File\nBrowser` to either select (1) specific WAV file(s), (2) a text file containing\na list of WAV file(s) (either comma separated or one per line), or (3) a\ndirectory with WAV and annotated CSV files.  Finally press the `Validation\nFiles` button and `DoIt!`.\n\nTo train multiple classifiers, each of which withholds a single recording in a\nset you specify, click on `Omit One`.  Select the set as described above for\n`Omit All`.  The `DoIt!` button will then iteratively launch a job for each WAV\nfile that has been selected, storing the result in the same `Logs Folder` but in\nseparate files and subdirectories that are suffixed with the letter \"w\".\n\nTo train multiple classifiers, each of which withholds a portion of the\nrecordings in the set you specify, click on `Omit Some`.  Select the set as\nbefore, and specify the number of partitions using `k-fold`.  For example, if\n`k-fold` is \"4\", then four models will be trained, each with a different fourth\nof the recordings in the chosen set withheld.\n\nOf course, training multiple classifiers is quickest when done simultaneously\ninstead of sequentially.  If your model is small, you might be able to fit\nmultiple on a single GPU (see the `models_per_job` variable in\n\"configuration.py\").  Otherwise, you'll need a machine with multiple GPUs,\naccess to a cluster, or patience.\n\nA simple jitter plot of the accuracies on withheld recordings is included in\nthe output of the `Accuracy` button (right panel of \"accuracy.pdf\").  It will\nlikely be worse than a model trained on a portion of each recording.  If so,\nlabel more data, or try modifying the hyperparameters ([Searching\nHyperparameters](#searching-hyperparameters))\n\n## Searching Hyperparameters ##\n\nAchieving high accuracy is not just about annotating lots of data, it also\ndepends on choosing the right model architecture.  While there is a lot of art\nin finding the best one, an intuition can be learned by\n[https://github.com/google-research/tuning_playbook](systematically searching)\nthe space of hyperparameters.  SongExplorer by default uses convolutional neural\nnetworks, and there are many free parameters by which to tune its architecture.\nYou configure them by editing the variables itemized below, and then use\ncross-validation to compare different choices.  [Customizing with\nPlug-ins](#customizing-with-plug-ins) describes how to use arbitrary custom\nnetworks of your choosing should convolutions not suit your task.\n\n* `context` is the temporal duration, in milliseconds, that the classifier\ninputs\n\n* `shift by` is the asymmetry, in milliseconds, of `context` with respect to the\npoint in time that is annotated or being classified.  `shift by` divided by\n`stride` (see below) should be an integer.  For positive values the duration of\nthe context preceding the annotation is longer than that succeeding it.\n\n* `optimizer` can be one of [Adadelta](https://arxiv.org/abs/1212.5701),\n[AdaGrad](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf),\n[Adam](https://arxiv.org/abs/1412.6980),\n[Adamax](https://arxiv.org/abs/1412.6980),\n[Ftrl](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf),\n[Nadam](http://cs229.stanford.edu/proj2015/054_report.pdf),\n[RMSProp](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf),\nor stochastic gradient descent (SGD).\n\n* `learning rate` specifies the fraction of the gradient to change each weight\nby at each training step.  Set it such that the training curve accuracy in\n\"train-validation-loss.pdf\" does not saturate until after at least one epoch of ground\ntruth has been trained upon.\n\nThe above apply to all architectures.  Specific to the default convolutional\nnetwork architecture plugin are:\n\n* `representation` specifies whether to use the raw waveform directly, to make\na spectrogram of the waveform to input to the neural network, or to use a\nmel-frequency cepstrum (see [Davis and Mermelstein 1980;\nIEEE](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462.5073\u0026rep=rep1\u0026type=pdf)).\nWaveforms do not make any assumptions about the data, and so can learn\narbitrary features that spectrograms and cepstrums might be blind to, but need\nmore annotations to make the training converge.\n\n* `window` is the length of the temporal slices, in milliseconds, that\nconstitute the spectrogram.  `window` / 1000 * `audio_tic_rate` is the\nwindow length in tics and should round down to a power of two.\n\n* `stride` is the time, in milliseconds, by which the `window`s in the\nspectrogram are shifted.  1000/`stride` must be an integer multiple of\nthe downsampling rate achieved by `stride after`.\n\n* `range (Hz)` controls the frequency band of the spectrogram to use.\nIf left blank, the entire range from 0 to the Nyquist frequency is used\nby the neural network.  Specify a sub-band in Hertz if desired with the\nlow and high frequency bounds separated by a hyphen.\n\n* `mel \u0026 DCT` controls the frequency resolution of the mel-frequency cepstrum.\nThe first number specifies how many frequencies to use when resampling the\nlinear-frequency spectrogram into mel-frequency space, and the second is\nhow many of the lowest frequency coefficients to keep after the subsequent\ndiscrete cosine transform.  The second number should always be less than\nor equal to the first, and neither should greatly exceed the number of\nfrequencies in the original spectrogram, which is one plus half of the\n`window` length in tics.\n\n* `# conv layers` is an integer specifying the number of convolutional layers.\n`dense layers` is a comma-separated list of integers specifying the number\nof hidden units in a sequence of optional dense layers after the convolutions.\nLeave it blank to not add any dense layers.\n\n* `kernels` specifies the size of the convolutional kernels in the form\n\"T1xF,T2\".  When `representation` is \"waveform\", the values before the comma\nare ignored and 1D convolutions of width T2 are repeatedly applied until\nthe remaining unpadded tensor length is less that T2 or `# conv layers`\nhas been reached.  For \"spectrogram\" and \"mel-cepstrum\", the string before\nthe comma is the size of the 2D convolutions in time (T1) and frequency (F)\nthat are repeatedly used for each layer until the remaining unpadded tensor\nsize is smaller than this kernel in one or both dimensions.  Then full-height\n(i.e. pan-frequency) 1D convolutions are repeatedly applied whose width in\ntime is T2.  No further convolutional layers are added if `# conv layers`\nis reached, or the width in time becomes less than the second number in\n`kernels`.\n\n* `# features` is the number of feature maps to use at each of the\ncorresponding stages in `kernels`.  See [LeCun *et al* (1989; Neural\nComputation)](http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf).\n\n* `stride time` and `stride freq` specify the layers, starting from zero, at\nwhich to stride the convolutional kernels by two in the time and frequency\naxes, respectively.  The format is a comma-separated list of integers,\npossibly prefixed with an inequality, or a hyphenated range.  For example,\n\"1,2,3,5,6,7\", \"1-3,5-7\", and \"\u003c=3,\u003e=5\" are all valid and equivalent.\nStriding in time downsamples the output tic rate.\n\n* `dilate time` and `dilate freq` similarly specify the layers at\nwhich to dilate the convolutional kernels.  See [Yu and Koltun (2016;\narXiv)](https://arxiv.org/pdf/1511.07122.pdf).  Striding and dilation can\nnot be both done in the same layer.\n\n* if `pool kind` is other than none, a maximum or average pooling layer is\nadded before the final dense layer(s) whose size and stride are identical\nand specified by `pool size`.  The latter is of the form \"T,F\" where T is\nthe size and stride in time and F that in frequency.\n\n* `connection` specifies whether to use identity bypasses, which can help\nmodels with many layers converge.  See [He, Zhang, Ren, and Sun (2015;\narXiv](https://arxiv.org/abs/1512.03385).\n\n* `dropout %` is the percentage of hidden units on each forward pass\nto omit during training.  See [Srivastava, Hinton, *et al* (2014;\nJ. Machine Learning Res.)](http://jmlr.org/papers/v15/srivastava14a.html).\n`dropout kind` specifies whether to omit entire feature maps at a time,\nor to omit individual neurons separately.  See [Tompson, Foroshin, *et al*\n(2014; arXiv:1411.4280v3)](https://arxiv.org/abs/1411.4280v3).\n\n* `augment volume` is the range within which to uniformly pick a random\nnumber to muliply each waveform by.\n\n* `augment noise` is the range within which to uniformly pick a random\nnumber to use as the standard deviation of a Gaussian distribution which\nis added to each waveform.\n\n* `normalization` specifies whether to normalize each mini-batch of samples\nbefore or after the non-linearity in each layer, either across the entire\nbatch dimension or in groups across the channel dimension.  See [Ioffe and\nSzegedy (2015; arXiv:1502.03167v3)](https://arxiv.org/abs/1502.03167) and\n[Wu and He (2018; arXiv:1803.08494v3)](https://arxiv.org/abs/1803.08494).\n\nThere is also:\n\n* `weights seed` specifies whether to randomize the initial weights or not.  A\nvalue of -1 results in different values for each fold.  They are also different\neach time you run `x-validate`.  Any other number results in a set of initial\nweights that is unique to that number across all folds and repeated runs.\n\n* `batch seed` similarly specifies whether to randomize the order in which\nsamples are drawn from the ground-truth data set during training.  A value of\n-1 results in a different order for each fold and run;  any other number\nresults in a unique order specific to that number across folds and runs.\n\nTo perform a simple grid search for the optimal value of a particular\nhyperparameter, first choose how many folds you want to partition your\nground-truth data into using `k-fold`.  More folds permit characterizing the\nvariance better, but take longer to train and also result in fewer annotations\nto measure the accuracy.  Unlike `Omit Some` in [Measuring\nGeneralization](#measuring-generalization) above, the partitions here group\nindividual annotations, as opposed to entire recordings.  Ensure that you have\nat least 10 annotations for each label in the validation set if using many\nfolds.  Then set the hyperparameter of interest to the first value you want to\noptimize and use the name of the hyperparameter and it's value as the `Logs\nFolder` (e.g. \"mb64\" for a mini-batch size of 64).  Suffix any additional\nhyperparameters of interest using underscores (e.g.  \"mb64_ks129_fm64\" for a\nkernel size of 129 and 64 feature maps).  If your model is small, use\n`models_per_job` in \"configuration.py\" to train multiple folds on a GPU.  Click\nthe `X-Validate` button and then `DoIt!`.  One classifier will be trained for\neach fold, using it as the validation set and the remaining folds for training.\nSeparate files and subdirectories are created in the `Logs Folder` that are\nsuffixed by the fold number and the letter \"k\".  Plot training curves with the\n`Accuracy` button, as before.  Repeat the above procedure for each of remaining\nhyperparameter values you want to try (e.g.  \"mb128_ks129_fm64\",\n\"mb256_ks129_fm64\", etc.).  Then use the `Compare` button to create a figure of\nthe cross-validation data over the hyperparameter values, specifying for the\n`Logs Folder` the independent variable (e.g. \"mb\") suffixed with the fixed\nhyperparameters of interest (e.g. \"mb_ks128_fm64\").  Output are three files:\n\n* \"[prefix]-compare-confusion-matrices.pdf\" contains the summed confusion matrix\nfor each of the values tested.\n\n* \"[prefix]-compare-overall-params-speed.pdf\" plots the accuracy, number of\ntrainable parameters, and training time for each model.\n\n* \"[prefix]-compare-precision-recall.pdf\" shows the final error rates for each\nmodel and wanted word.\n\nTraining multiple models like this with the same hyperparameters is not\nonly useful for quantifying the variability in the accuracy, but can also\nbe used to reduce that same variability by averaging the outputs across all\nof the folds.  [Ensemble Models](#ensemble-models) describes how to\ndo just this by combining multiple models into a single one.\n\n## Limiting Ground Truth ##\n\nIf no amount of hyperparameter optimization improves the accuracy to your\nsatisfaction, you might just need more ground truth annotations.  To predict\nwhether more ground truth will help, train a model using only a fraction of\nwhat is currently available and compare the accuracy to a model which uses all\nof it.  In this way you can extrapolate whether it would be worth spending the\ntime to annotate.\n\nThe cross validation procedures used in the previous section ([Searching\nHyperparameters](#searching-hyperparameters)) makes limiting ground truth in\nthis way easy.  Simply specify a k-fold of \"2\" and compare it to a k-fold of\nsay \"10\".  The former two models are each trained with a different 50% of the\ndata and the latter ten with 90%.  Were the amount of ground truth a limiting\nfactor in model accuracy, then the accuracy would be higher for the 10-fold\nmodels, and one could reasonably expect that annotating more ground truth would\nincrease it further.\n\nGround truth can be limited even further by entering negative numbers into the\n\"k-fold\" textbox.  \"-4\" for example would train four models each of which used\na different 25% of the data for training (compared to \"4\" for which each would\nuse 75%).  Similarly, \"-10\" would train 10 each with 10%.  Only integers can be\nused here; \"-2\" is the same as \"2\"; and \"-1\", \"0\", and \"1\" are nonsensical.\nAlso keep in mind to not partition your data such that either the training or\nvalidation sets have a small number of annotations for any label.\n\n\n## Examining Errors ##\n\nMistakes can possibly be corrected if more annotations are made of similar\nsounds.  To find such sounds, cluster the errors made on the ground-truth\nannotations with sounds detected in your recordings.  Then look for\nlocalized hot spots of mistakes and make annotations therein.\n\nSongExplorer provides two ways to generate lists of errors, which you'll need to\nchoose between.  The `Accuracy` button does so just for the validation data,\nwhile `Activations` uses the entire ground truth or a randomly sampled subset\nthereof.\n\nAs [mentioned earlier](#quantifying-accuracy), the `Accuracy` button creates a\n\"predictions/\" folder in the `Logs Folder` containing CSV files itemizing whether\nthe sounds in the validation set were correctly or incorrectly classified.\nEach CSV file corresponds to a sub-folder within the `Ground Truth` folder.  The\nfile format is similar to SongExplorer's other CSV files, with the difference being\nthat the penultimate column is the prediction and the final one the annotation.\nTo use these predictions, copy these CSV files into their corresponding\n`Ground Truth` sub-folders.\n\n    $ tail -n 10 trained-classifier1/predictions/round1-mistakes.csv \n    PS_20130625111709_ch3.wav,377778,377778,correct,mel-pulse,mel-pulse\n    PS_20130625111709_ch3.wav,157257,157257,correct,mel-pulse,mel-pulse\n    PS_20130625111709_ch3.wav,164503,165339,correct,ambient,ambient\n    PS_20130625111709_ch3.wav,379518,379518,mistaken,ambient,mel-pulse\n    PS_20130625111709_ch3.wav,377827,377827,correct,mel-pulse,mel-pulse\n    PS_20130625111709_ch3.wav,378085,378085,correct,mel-pulse,mel-pulse\n    PS_20130625111709_ch3.wav,379412,379412,mistaken,ambient,mel-pulse\n    PS_20130625111709_ch3.wav,160474,161353,correct,ambient,ambient\n    PS_20130625111709_ch3.wav,207780,208572,correct,mel-sine,mel-sine\n    PS_20130625111709_ch3.wav,157630,157630,correct,mel-pulse,mel-pulse\n\nSimilarly, the `Activations` button creates an \"activations.npz\" file\ncontaining the logits of the output layer (which is just a vector of word\nprobabilities), as well as the correct answer from the ground-truth\nannotations.  To turn these data into a CSV file, use the `Mistakes` button.\nIn the `Ground Truth` subfolders, CSV files are created for each WAV file, with\nan extra column just like above.  No need to copy any files here.\n\nNow detect sounds in the ground-truth recordings for which you haven't done so\nalready.  Press the `Examine Errors` wizard and confirm that `kinds to use` is\nset to \"detected,mistaken\", and save the hidden state activations, cluster, and\nvisualize as before.  Select `mistaken` in the `kind` pull-down menu to look\nfor a localized density.  View the snippets in any hot spots to examine the\nshapes of waveforms that are mis-classified-- the ones whose text label, which\nis the prediction, does not match the waveform.  Then select `detected` in the\n`kind` pull-down menu and manually annotate similar waveforms.  In principle they\nshould cluster at the same location.\n\n## Testing Densely ##\n\nThe accuracy statistics reported in the confusion matrices described above are\nlimited to the points in time which are annotated.  If an annotation withheld\nto validate upon does not elicit the maximum probability across all output taps\nat the corresponding label, it is considered an error.  Quantifying accuracy in\nthis way is a bit misleading, as when a model is used to make ethograms, a\nword-specific threshold is applied to the probabilities instead.  Moreover,\nethograms are made over the entire recording, not just at specific times of\ninterest.  To more precisely quantify a model's accuracy then, as it would be\nused in your experiments, a dense annotation is needed-- one for which all\noccurrences of any words of interest are annotated.\n\nTo quantify an ethogram's accuracy, first select a set of recordings in your\nvalidation data that are collectively long enough to capture the variance in\nyour data set but short enough that you are willing to manually label every\nword in them.  Then detect and cluster the sounds in these recordings using\nthe `Detect`, `Activations`, `Cluster`, and `Visualize` buttons as described\nearlier.  Annotate every occurrence of each word of interest by jumping to the\nbeginning of each recording and panning all the way to the end.  Afterwards,\nmanually replace the timestamp in each resulting \"annotated-\u003ctimestamp\\\u003e.csv\"\nfile with the name of the annotator (e.g. \"annotated-\u003cname\\\u003e.csv\").  If you\nhave more than one such dense CSV file per WAV file, retain the timestamp by\npositioning it before \"annotated\" (e.g. \"\u003ctimestamp\\\u003e-annotated-\u003cname\\\u003e.csv\").\nTake your best model to date and make ethograms of these densely annotated\nrecordings using the `Classify` and `Ethogram` buttons as before.  Finally,\nuse the `Congruence` button to plot the fraction of false positives and\nnegatives, specifying which files you've densely annotated with `Ground\nTruth` and either `Validation Files` or `Test Files` (a comma-separated list\nof .wav files, a text file of .wav filenames, or a folder of .wav files;\nsee [Measuring Generalization](#measuring-generalization)).  Optionally,\nspecify the temporal resolution within which to consider two predictions a\nhit with `convolve`.  Output are the following charts in the `Ground Truth`\nfolder and sub-folders therein:\n\n* \"\u003csub-folder\\\u003e/\\*-disjoint-everyone.csv\" contains the intervals in time\nwhich SongExplorer and the anotator(s) agreed upon for each WAV file.\n\n* \"\u003csub-folder\\\u003e/\\*-disjoint-{label,tic}-only\u003cannotator\\\u003e.csv\" contains the\nintervals which only one of them labeled.  There is a separate file for each\nannotator, with SongExplorer's file name containing the precision-recall\nratio used when making the ethograms (e.g. \"only1.0pr\").  The difference\nbetween the \"label\" and \"tic\" files is that for the former any overlap in\nthe predicted/annotated intervals is considered a perfect hit, whereas in\nthe latter the two intervals will be broken up into their constituent tics\nin time if there is a partial overlap.  See the source code documentation in\n\"src/congruence.py\" for more details.\n\n* If there was more than one human annotator (see below)\n\"\u003csub-folder\\\u003e/\\*-disjoint-{tic,label}-not\u003cannotator\\\u003e.csv\" contains the\nintervals in time which everyone except the indicated annotator agreed upon.\n\n* \"congruence-{tic,label}-\u003cword\\\u003e.csv contains the aggregate statistics\nacross all WAV files.  Whereas the one row whose first column ends in\n\"pr\" shows the congruence for the threshold calculated by `Accuracy` and\nthe desired precision-recall ratio on just the sparsely annotated points\nin time, the other rows show how the congruence varies with the threshold.\n\n* \"congruence-{tic,label}-\u003cword\\\u003e.\\*pr[-venn].pdf plots with bars and Venn\ndiagrams the \"pr\" row of the corresponding CSV file.\n\n* \"congruence-{tic,label}-\u003cword\\\u003e.pdf plots the congruence versus\nthreshold data contained in the corresponding CSV file.  The vertical\nline in the left panel labeled \"sparse P/R\" corresponds to the\nthreshold calculated by `Accuracy` on just the annotated points in time.\n\"dense P/R\" is the threshold at which the \"only SongExplorer\" and \"only\n\u003cAnnotator\\\u003e\" lines cross (or \"not SongExplorer\" if there are multiple\nhuman annotators).  These densely calculated thresholds are stored in a\nnew \"thresholds-dense-\u003cYYYYMMDDTHHMMSS\\\u003e.ckpt-\\*.csv\" file in the `Logs\nFolder`.  Take note of the timestamp and use this file going forward\nwhen making new ethograms, as these thresholds most accurately deliver\nthe desired precision-recall ratio.  The right panel re-plots the same\ndata by parameterizing precision and recall as a function of threshold.\nThe area under this curve would be 1.0 for a perfect classifier.\n\nIf the congruence is not acceptable, iteratively adjust the hyperparameters\nand/or add new annotations to your training set, train a new model, and\nmake new ethograms and congruence plots until it is.\n\nOnce the accuracy is acceptable on validation data, quantify the accuracy on a\ndensely annotated test set.  The network should have never been trained or\nvalidated on these latter data before; otherwise the resulting accuracy could\nbe erroneously better.  Label every word of interest as before, make ethograms\nwith your best model, and plot the congruence with SongExplorer's predictions.\nHopefully the accuracy will be okay.  If not, and you want to change the\nhyperparameters or add more training data, then the proper thing to do is to\nuse this test data as training or validation data going forward, and densely\nannotate a new set of data to test against.\n\nThe congruence between multiple human annotators can be quantified using the\nsame procedure.  Simply create \"annotated-\u003cname\\\u003e.csv\" files for each one.\nThe plots created by `Congruence` will include lines for the number of sounds\nlabeled by all annotators (including SongExplorer), only each annotator, and\nnot by a given annotator.  If the annotators did not label the same portion\nof each recording, choose \"intersection\" in `portion` to specify that the\ncongruence should be calculated only on the portion they have in common--\nfrom the maximum of the minimum annotated time point across annotators to\nthe minimum of the maximum.\n\nMuch as one can examine the mistakes of a particular model with respect to\nsparsely annotated ground truth by clustering with \"mistaken\" as one of the\n`kinds to use`, one can look closely at the errors in congruence between a model\nand a densely annotated test set by using\n\"everyone|{tic,label}-{only,not}{1.0pr,annotator1,annotator2,...}\" as the kinds.\nThe Congruence button generates a bunch of \"disjoint.csv\" files:\n\"disjoint-everyone.csv\" contains the intersection of intervals that SongExplorer\nand all annotators agreed upon; \"disjoint-only\\*.csv\" files contain the intervals\nwhich only SongExplorer or one particular annotator labeled; \"disjoint-not\\*.csv\"\ncontains those which were labeled by everyone except SongExplorer or a given\nannotator.  Choose one or all of these kinds and then use the\n`Activations`, `Cluster`, and `Visualize` buttons as before.\n\nWhen the overall accuracy as judged by, for example, F1 is acceptable, but\nthere is an unacceptable balance between false postives and false negatives,\nthere are two ways to cope.  First, the probability waveforms generated by\n`Classify` can be adjusted on a word-specific basis to account for known\nuneven distributions in the prevalence of the words.  So for a given interval\nof time, enter into `prevalences` a comma-separated list of *a priori*\nexpected durations for each entry in `labels to use` (e.g. 6,12,42 seconds\nfor a minute of mel-pulse,mel-sine,ambient respectively); alternatively,\nthe relative probability for each word can be given (e.g. 0.1,0.2,0.7).\nThe probability of relatively rare words will then be decreased in comparison\nto more common ones, and vice versa, thereby adjusting the precision-to-recall\nratio accordingly.\n\nThe alternative to adjusting probabilities is to adjust thresholds.\nWhile this can be done by changing the `P/Rs` variable, doing so in this\nway changes them equally for all words.  A word-specific re-balancing of\nfalse negatives and false positives can be achieved using thresholds derived\nfrom dense annotations.  To do so, choose the \"thresholds-dense.ckpt-\\*.csv\"\nfile that was created when measuring congruence on the validation data set,\ninstead of the sparse ones created with `Accuracy`, when making ethograms\non the test data set.  In effect, this method measures the prevalence of\neach word, which is possible given a dense annotation, and adjusts the\nthresholds to achieve the desired precision-to-recall ratio in `P/Rs`.\nNote that these dense thresholds CSV files are suffixed with a timestamp,\nso that they are not overwritten by successive runs of `Congruence`.\nTake care to choose the correct one.\n\n## Ensemble Models ##\n\nThe procedure used to optimize hyperparameters results in slightly different\nmodels with repeated trainings due to the random intialization of weights\nand sampling of ground truth batches.  While initially this might seem\nundesirable, it affords an opportunity to increase accuracy by averaging\nacross multiple models.  Generalization is nominally improved with such\ncombined models due to the reduction in variance.\n\nFirst create multiple models using either (1) the `Train` button with `#\nreplicates` set to be greater than one, (2) the `X-Validate` button with\n`k-fold` greater than one, or even (3) the `Omit One` button after having\nannotated multiple recordings.  Then use the `Ensemble` button and select\none checkpoint file from each of those models as a comma-separated list.\nA new subfolder will be created in the `Logs Folder` with a graph folder\ntherein called \"frozen-graph.ensemble.pb\".  To create a corresponding\nthresholds file, follow the steps for measuring congruence as described\nin the previous section ([Testing Densely](#testing-densely)):  (1) use this\nmodel to classify a densely labeled recording, (2) make an ethogram using\nthe thresholds file of one of the constituent models, and (3) calculate the\n`Congruence`.  A new thresholds file suffixed with \"-dense\" will be created.\nManually copy this file into the newly created ensemble folder, and use\nit whenever classifying recordings with this ensemble model.\n\n## Transfer Learning ##\n\nManually annotation can be a lot of work.  Fortunately, the effort spent doing\nso can be reduced by leveraging someone else's work.  Let's say your colleague\nAlice has trained a model to do a task similar to what you need.  You can take\nher model, keep the first few layers intact with their learned weights, replace\nthe last couple of layers with randomly initialized ones of your own, and then\niteratively train and annotate as described above.  The features in the early\nlayers will already be quite rich, and so the new latter layers will not need\nas much ground truth data to learn your task.  Moreover, if your colleagues Bob\nand Carol also have similar trained models, you can combine all three in a\nsimilar fashion.\n\nSongExplorer comes with an architecture plugin (see [Customizing with\nPlug-ins](#customizing-with-plug-ins) for details on how plugins work) called\n\"ensemble-transfer\" that makes all this easy.  Modify the `architecture_plugin`\nvariable in your \"configuration.py\" file to be \"ensemble-transfer\".  Then in\nthe SongExplorer GUI specify (1) the checkpoint(s) of the pretrained model(s)\nyou want to use, (2) whether you want to update the weights of the pretrained\nmodel when you train with your data (or just the new layers), (3) how many of\nthe layers of the pretrained model(s) you want to use, (4) how many new\nconvolutional layers you want to add (for each layer: kernel time size x kernel\nfrequency size x num. features; e.g. \"5x5x32,10x10x64\"), (5) how many new dense\nlayers you want to add (for each layer: num. units; e.g. \"128,32,8\"), and (6)\nthe dropout rate (e.g. 50).  Then iteratively train and fix mistakes as before.\n\nNote that the pretrained models do *not* necessarily have to use the sampling\nrate-- the \"ensemble-transfer\" plugin will automatically insert a resampling\nlayer if necessary.\n\n## Autoencoding ##\n\nThere is an unsupervised variation of transfer learning that is particularly\nuseful if you have a lot of unlabelled data.  You start by training a model\nwhose output layer is identical in shape to its input layer and inbetween there\nis a layer with much fewer units.  The task is to reconstruct the input at the\noutput despite this bottleneck, thereby learning a compact representation of\nthe domain.  One of the hidden layers, typically the smallest one, is then used\nas the input to a second model which is trained in a supervised manner with\nthe few labels you have.  Typically this second model is architecturally very\nsimple, maybe consisting of just a single layer.\n\nTo use this workflow in SongExplorer, manually create an \"autoencoded.csv\" file\nfor each recording which has a single row whose start and stop tics correspond\nto the beginning and end of the file.  Something like this:\n\n    PS_20130625111709_ch3.wav,1,2500,autoencoded,all\n\nSet the `architecture_plugin` to \"autoencoder\", and devise an architecture\nwhose first few layers are similar to that which you would normally use for a\nsupervised task.  These will then be automatically mirrored.\n\nChoose \"autoencoder\" for the \"loss\" pull-down menu in the GUI.  Train and\nsearch the hyperparameter space as usual.  The \"train-validation-loss.pdf\"\nplots will not show the precision and recall curves, and rather add the\nvalidation loss to compare with the train loss.  Similarly the precision and\nrecall scatter plots are not drawn for `compare`, and instead the validation\nloss curves are overlayed and the size of the bottleneck is shown.\n\n`classify` a withheld recording to see how faithful the reconstruction is.  If\nit looks reasonable, change the `architecture_plugin` to \"ensemble-transfer\"\nand proceed as described in [Transfer Learning](#transfer-learning).\n\n## Discovering Novel Sounds ##\n\nAfter amassing a sizeable amount of ground truth one might wonder whether one\nhas manually annotated all of the types of words that exist in the recordings.\nOne way to check for any missed types is to look for hot spots in the clusters\nof detected sounds that have no corresponding annotations.  Annotating known\ntypes in these spots should improve generalization too.\n\nFirst, use the `Detect` button to threshold the recordings that you want to\nsearch for novel sounds.  Save their hidden state activations, along with\nthose of the manually annotated sounds, using the `Activations` button by\nsetting the label types to \"annotated,detected\".  Set `equalize ratio` to\na small integer as well, so that words with few samples are not obscured\nby those with many.  Cluster and visualize as before.  Now rapidly and\nalternately switch between `annotated` and `detected` in the `kind` pull-down\nmenu to find any differences in the density distributions.  Click on any new\nhot spots you find in the detected clusters, and annotate sounds which are\nlabeled as detected but not annotated.  Create new word types as necessary.\n\n## Overlapped Classes ##\n\nSo far we have assumed that songs of interest do not overlap, as is usually the\ncase in a laboratory setting with just a single or small number of animals.  In\nthis case, the output taps of the neural network are considered mutually\nexclusive and a soft-max function is used to ensure that the probabilities\nacross classes add to one at any given point in time.\n\nThis assumption quickly breaks down in field recordings, however, where multiple\nindividuals from multiple species may be present.  In this case, we need to\ndesign the network's outputs such that each of them can be reinterpreted as\nindicating whether a song is present or not, and to do so independently of each\nother.  Specifically, the soft-max across all outputs is replaced with an\nindividual sigmoid function for each.\n\nTo train a network in this fashion, a second set of labels must be used to\nindicate the absence of a class.  So, for example, \"mel-sine\" from the tutorial\nabove would be paired with \"not_mel-sine\", and similarly for \"mel-pulse\" and\n\"not_mel-pulse, etc. (but *not* \"not_ambient\"!).   These \"not_\" annotations\ndon't necessarily have to be intervals in time during which no sound is present\n(i.e. are ambient), but rather just no song of that class is present.  In fact,\nit's best to make \"not_\" annotations which also include sounds, and may even\ncontain song from another class.\n\nOnce you have annotated in this manner, select \"overlapped\" from the \"loss\"\npull-down menu.  Then enter the classes in `labels to use` as before, but do\n*not* enter the \"not_\" labels there-- rather, songexplorer will automatically\nuse the \"not_\" annotations in the groundtruth folder corresponding to those\nclasses in \"labels to use\" when \"overlapped\" is chosen.  Then train a model and\nmake predictions as before.  You should then find that the probability waveforms\ndo not necessarily add to one, and that the ethograms may be overlapped.\n\nFor maximum flexibility, in the case that one of your class names naturally\nstarts with \"not_\", the prefix used to indicate the absence of a class can be\nspecified in your configuration.py file.  See the variable named\n\"overlapped_prefix\", which defaults to \"not_\".\n\n\n# Unsupervised Methods #\n\nUntil now we have only considered the workflow to manually annotate\nindividual sounds.  Some data sets and experiments are amendable to\nautomatically labelling them *en masse* however.  Say, for example, it is\nknown for sure that there is only a single species audible in each recording\nand all that is needed is to identify the species.  In this and other similar\ncases a sequence of actions already individually described above can be used.\nFirst use the `Detect` button to find sounds in each recording.  Then manually\nedit the resulting CSV files to change each \"time\" and \"frequency\" entry to\nthe name of the species.  Finally use the `Train` button to create a species\ndetector by setting `kinds to use` to \"detected\" and `labels to use` to the\nspecies names.  You can now use the `Accuracy`, `Freeze`, `Classify`, and\n`Ethogram` buttons as before to make predictions on new recordings.  How well\nthis works is subject to the signal-to-noise ratio of your data of course;\nyou might need to carefully tune the parameters used when detecting sounds.\n\nA simple model like this is useful for more that just classifying species.\nIt can also be used to look for differences in vocalizations in closely\nrelated species, genetic mutants of the same species, or even individuals\nwithin the same species.  To do so, use the species, mutant, or individuals\nname as the label in the CSV files when training the model.  Then cluster\nthe hidden state activations of the already detected sounds using this same\nmodel with the `Activations`, `Cluster`, and `Visualize` buttons as before.\nAny non-overlapping clumps are evidence of unique vocalizations.\n\nFinally it should be noted that large amounts of ambient and \"other\"\nannotations can be in some cases be automatically labeled.  If you can\ndevise a way to position your microphone in a place and time for which no\nnatural sounds are made but for which the \"room tone\" is otherwise the\nsame, then the entire recording can be considered ambient.  Similarly,\neven if there are natural sounds present, so long as none of them are from\nany species of interest, then the `Detect` button can be used to pick out\nsounds that can all be considered \"other\".\n\n# Scripting Automation #\n\nFor some tasks it may be easier to write code instead of use the GUI-- tasks\nwhich require many tedious mouse clicks, for example, or simpler ones that must\nbe performed repeatedly.  To facilitate coding your analysis, SongExplorer is\nstructured such that each action button (`Detect`, `Misses`, `Activations`,\netc.) is backed by a Python script.  At the top of each script is\ndocumentation showing how to call it.  Here, for example, is the interface for\n`Detect`:\n\n    $ head -n 13 \u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/src/time-freq-threshold.py \n    #!/usr/bin/env python3\n\n    # threshold an audio recording in both the time and frequency spaces\n\n    # e.g.\n    # time-freq-threshold.py \\\n    #     --filename=`pwd`/groundtruth-data/round2/20161207T102314_ch1_p1.wav \\\n    #     --parameters={\"time_sigma\":\"9,4\", \"time_smooth_ms\":\"6.4\", \\\n                        \"frequency_n_ms\":\"25.6\", \"frequency_nw\":\"4\", \\\n                        \"frequency_p\":\"0.1,1.0\", \"frequency_smooth_ms\":\"25.6\", \\\n                        \"time_sigma_robust\":\"median\"} \\\n    #     --audio_tic_rate=2500 \\\n    #     --audio_nchannels=1 \\\n    #     --audio_read_plugin=load-wav \\\n    #     --audio_read_plugin_kwargs={}\n\nThe following Bash code directly calls this script to make predictions on a set\nof recordings in different folders:\n\n    $ wavfiles=(\n               groundtruth-data/round1/PS_20130625111709_ch3.wav\n               groundtruth-data/round2/20161207T102314_ch1.wav\n               groundtruth-data/round3/Antigua_20110313095210_ch26.wav\n               )\n\n    $ for wavfile in ${wavfiles[@]} ; do\n          time-freq-threshold.py \\\n              --filename=$wavfile \\\n              --parameters={\"time_sigma\":\"9,4\", \"time_smooth_ms\":\"6.4\", \\\n                            \"frequency_n_ms\":\"25.6\", \"frequency_nw\":\"4\", \\\n                            \"frequency_p\":\"0.1,1.0\", \"frequency_smooth_ms\":\"25.6\", \\\n                            \"time_sigma_robust\":\"median\"} \\\n              --audio_tic_rate=2500 \\\n              --audio_nchannels=1 \\\n              --audio_read_plugin=load-wav \\\n              --audio_read_plugin_kwargs={}\n      done\n\nThe above workflow could also easily be performed in Julia, Python, Matlab, or\nany other language that can execute shell commands.\n\nAlternatively, one can also write a Python script which invokes SongExplorer's GUI\ninterface to programmatically fill text boxes with values and to push action\nbuttons:\n\n    import sys\n    import os\n\n    # load the GUI\n    sys.path.append(os.path.join(\u003cpath-to-unzipped-executable\u003e,\n                                 \"songexplorer\", \"bin\", \"songexplorer\", \"src\", \"gui\"))\n    import model as M\n    import view as V\n    import controller as C\n\n    # start the GUI\n    M.init(\"configuration.py\")\n    V.init(None)\n    C.init(None)\n\n    # start the job scheduler\n    run([\"hstart\",\n          str(local_ncpu_cores)+','+str(local_ngpu_cards)+','+str(local_ngigabytes_memory)])\n\n    # set the needed textbox variables\n    V.detect_parameters[\"time_sigma\"].value = \"9,4\"\n    V.detect_parameters[\"time_smooth_ms\"].value = \"6.4\"\n    V.detect_parameters[\"frequency_n_ms\"].value = \"25.6\"\n    V.detect_parameters[\"frequency_nw\"].value = \"4\"\n    V.detect_parameters[\"frequency_p\"].value = \"0.1,1.0\"\n    V.detect_parameters[\"frequency_smooth_ms\"].value = \"25.6\"\n    V.detect_parameters[\"time_sigma_robust\"].value = \"median\"\n\n    # repeatedly push the Detect button\n    wavpaths_noext = [\n                     \"groundtruth-data/round1/PS_20130625111709_ch3\",\n                     \"groundtruth-data/round2/20161207T102314_ch1\",\n                     \"groundtruth-data/round3/Antigua_20110313095210_ch26\",\n                     ]\n    for wavpath_noext in wavepaths_noext:\n        V.wavtfcsvfiles.value = wavpath_noext+\".wav\"\n        C.detect_actuate()\n\n    # stop the job scheduler\n    run([\"hstop\"], stdout=PIPE, stderr=STDOUT)\n\nFor more details see the system tests in\n\u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/test/tutorial.{sh,py}.\nThese two files implement, as Bash and Python scripts respectively, the entire\nworkflow presented in this [Tutorial](#tutorial), from [Detecting\nSounds](#detecting-sounds) all the way to [Testing Densely](#testing-densely).\n\n\n# Training on Video #\n\nSongExplorer was conceived to analyze one-dimensional audio recordings.\nHowever, in addition to being able to view accompanying 3D video recordings\nas described in [Manually Annotating](#manually-annotating), one can also\ntrain a model on them as well, either in conjuction with audio recordings\nor alone.\n\nThe default plugin to load video data is \"src/load-avi-mp4-mov.py\", and\na blank template is \"src/video-read-plugin.py\".  These work similarly to\nthe corresponding plugins to load audio recordings.\n\nSee \"src/video.py\" for a network architecture plugin that inputs only video,\nand \"src/ensemble-concat-dense.py\" for an architecture plugin which combines a\n(possibly pretrained) audio-only model and a (possibly pretrained) video-only\nmodel with a simple top model.\n\n\n# Customizing with Plug-ins #\n\nSeveral features of SongExplorer are engineered in a way to be easy to\ncustomize with bespoke code on the part of the user.  SongExplorer comes with\none or more scripts to provide alternate functionality for each of these\nfeatures and a blank template which the user can fill in for additional\nfunctionality.  Which code to use is specified in variables ending in \"plugin\"\nin \"configuration.py\".\n\n## Loading Data ##\n\nAs described above the default file format for audio recordings is WAV files.\nShould your data be in a different format, one option is to convert it all.\nAlternatively, if you are fluent in python, you can write a script which tells\nSongExplorer how to load the raw data.  This script should define a function\ncalled `audio_read` which inputs the full path to the recording as well as an\ninterval of time and possibly some keyword arguments, and returns the sampling\nrate and the slice of the data during that interval.  To see how this works,\ntake a look at \"src/audio-read-plugins/load-wav.py\", which is the default\nplugin, and \"src/audio-read-plugins/template.py\" which is a blank\ntemplate.\n\nA further advantage of loading data in a plugin script is that you can add\ncustom signal pre-preprocessing.  Say for example there is artifactual noise in\na particular frequency band.  You can eliminate that before SongExplorer ever\nsees the data by simply adding a filter to the plugin.  One particular use case\nis ultrasonic vocalizations, which are frequently contaminated by low frequency\naudible noise.  SongExplorer comes with a plugin,\n\"src/audio-read-plugins/highpass-filter.py\", which uses a butterworth filter\nwith a customizable cutoff frequency to attenuate that noise.\n\nTo use an alternative plugin, change the \"audio_read_plugin\" variable in\n\"configuration.py\" to the full path to the script, without the \".py\" extension,\nand \"audio_read_plugin_kwarg\" to a dictionary of keyword arguments.  The\nspecific format for the latter for each plugin is documented in a comment at\nthe top of each python script.\n\nVideo loading is also done with a plugin.  The default is\n\"src/video-read-plugins/load-avi-mp4-mov.py\", a blank template is in\n\"src/video-read-plugins/template.py\", and the \"configuration.py\" file\nvariables are \"video_read_plugin\" and \"video_read_plugin_kwargs\".\n\n## Video Filenames ##\n\nBy default, SongExplorer expects video files to have the same basename as the\ncorresponding audio file, and the extension to be one of AVI, MP4, or MOV.  If\nthis is not the case, one can provide a python function which inputs a\ndirectory and a WAV file and outputs the name of the video file to load.  The\nname and location of the python file containing this function is specified,\nwithout the \".py\" extension, as the \"video_findfile_plugin\" parameter in\n\"configuration.py\".  For examples, see\n\"src/video-findfile-plugins/same-basename.py\" (the default) and\n\"src/video-findfile-plugins/maybe-1sec-off.py\".\n\n## Event Detection ##\n\nBy default, the Detect button thresholds in the time and frequency domains to\nfind sounds of interest.  See the source code in\n\"src/detect-plugins/time-freq-threshold.py\" for exactly how this works.  Should\nthat not suit your needs, you can supply your own code instead.  Simply put in\na python file a list called `detect_parameters` that specifies the\nhyperparameters, a function called `detect_labels` which returns the strings\nused to annotate the sounds, and a script which uses those parameters to\ngenerate a \"detected.csv\" given a WAV file.  Then change the \"detect_plugin\"\nvariable in your \"configuration.py\" file to point to the full path of this\npython file, without the \".py\" extension.  The buttons in the pinkish-orange\nbox in the GUI will change to reflect the different hyperparameters used by\nthis algorithm.  See the minimal example in \"src/detect-plugins/template.py\"\nfor a template.\n\n## Double-Click Annotations ##\n\nAs explained in [Manually Annotating](#manually-annotating), new annotations can\nbe layed down by double-clicking a snippet, double-clicking on an event in the\nupper half of the context window, clicking and horizontally dragging in the\nlower half of the context window, or double-clicking in a blank portion of the\nlower half of context window.  The default behavior for the latter method\nresults in a point annotation at just a single sample tic.\n\nWhile it is best to be as temporally precise as possible when annotating,\nit can be quite slow to position the cursor exactly at an event.  Moreover,\nunless the event really has no temporal extent, it is better to use a range\nannotation, as SongExplorer will automatically augment these annotations\nby randomly choosing a specific point in time within them each time they\nare chosen for a batch of training examples.\n\nTo facilitate temporally-precise range annotations with a double-click gesture,\nSongExplorer has a hook in the callback which you can supply with python code\nto customize the behavior.  The python file must contain a list called\n`doubleclick_parameters` which defines the required hyperparameters, and a\nfunction called `doubleclick_annotation` which inputs a point in time and\noutputs a range.  See \"src/doubleclick-plugins/snap-to.py\" for a plugin which\nsearches for a nearby peak in the waveform and lays down a range annotation of\na specified width.\n\nThe default double-click plugin is \"src/doubleclick-plugins/point.py\", a\ntemplate is in \"src/doubleclick-plugins/template.py\", and the corresponding\nvariable in \"configuration.py\" is \"gui_context_doubleclick_plugin\".\n\n## Network Architecture ##\n\nThe default network architecture is a set of layered convolutions, the depth\nand width of which can be configured as described above, with optional skip\nconnections, dropout layers, and batch normalization.  A ResNet in other words\n([He, Ren, Sun (2015)](https://arxiv.org/abs/1512.03385)).  Also available is a\nU-net architecture, in which the skip connections are mirrored around a\nbottleneck ([Ronneberger, Fischer, Brox\n(2015)](https://arxiv.org/abs/1505.04597)).\n\nShould these not prove flexible enough, you can provide a plugin with your own\nTensorFlow code that implements a whiz bang architecture of any arbitrary\ndesign.  See the minimal example in \"src/architecture-plugins/template.py\" for\na template of how this works.  In brief, two objects must be supplied in a\npython file: (1) a list named `model_parameters` which defines the variable\nnames, titles, and default values, etc. to appear in the GUI, and (2) a\nfunction `create_model` which builds and returns the network graph.  Specify as\nthe \"architecture_plugin\" in \"configuration.py\" the full path to this file,\nwithout the \".py\" extension.  The buttons in the yellow box in the GUI will\nchange to reflect the different hyperparameters used by this architecture.  All\nthe workflows described above (detecting sounds, making predictions, fixing\nmistakes, etc) can be used with this custom network in an identical manner.\n\nThe default ResNet architecture is itself written as a plug-in, and can be\nfound in \"src/architecture-plugins/res-net.py\".  Similarly the U-net plugin is\nin \"src/architecture-plugins/u-net.py\".\n\n## Clustering Algorithm ##\n\nThe method used to reduce the dimensionality of the activations for\nvisualization is also a plugin.  By default, the UMAP algorithm is used, but\nalso included are plugins for t-SNE ([van der Maaten and Hinton\n(2008)](http://www.jmlr.org/papers/v9/vandermaaten08a.html)) and PCA.  To use\nthese alternatives change \"cluster_plugin\" in \"configuration.py\" to \"tSNE\" or\n\"PCA\", respectively.  To create your own plugin, write a script which defines a\nlist called `cluster_parameters`, inputs \"activations.npz\" and outputs\n\"cluster.npz\".  See \"src/cluster-plugins/template.py\" for a template.\n\n## Augmentation ##\n\nProgrammatically modifying your recordings to artificially increase the size of\nyour ground truth can lead to higher accuracy only if it alters them in a way\nwhich would not require the labelled annotation to change.  As this is domain\nspecific, the data augmentation code is also a plugin.  The default code,\n\"src/augmentation-plugins/volume-noise-dc-reverse-invert.py\", provides some\ncommon rudimentary operations applicable to one dimensional data.  To supply\nadditional alternatives, simply add to the list of `augmentation_parameters`\nand supply corresponding signal processing code in the `augment` function.\n\n# Troubleshooting #\n\n* Sometimes using control-C to quit out of SongExplorer does not work.  In this\ncase, kill it with `ps auxc | grep -E '(songexplorer|bokeh)'` and then `kill -9\n\u003cpid\u003e`.  Errant jobs can be killed similarly.\n\n* Full paths on MS Windows are limited to 260 characters\n\n* If Songexplorer doesn't start as usual, try deleting \"songexplorer.state.yml\"\n  and trying again\n\n* If jobs launched with the red `DoIt!` button seem to hang, check if the\n  computer is busy doing something with `top` (or `Activity Monitor` on MacOS\n  and `Task Monitor` on MS Windows).  If it is not, open a terminal or shell and\n  check SongExplorer's queue of jobs using `songexplorer/bin/hjobs`.  If necessary,\n  use `hkill` to remove a job from the queue, and `hstop` to shut down the queue.\n\n\n# Frequently Asked Questions #\n\n* The `WAV,CSV Files` text box, being plural, can contain multiple\ncomma-separated filenames.  Select multiple files in the File Browser using\nshift/command-click as you would in most other file browsers.\n\n\n# Reporting Problems #\n\nThe code is hosted on [github](https://github.com/JaneliaSciComp/SongExplorer).\nPlease file an issue there for all bug reports and feature requests.\nPull requests are also welcomed!  For major changes it is best to file an\nissue first so we can discuss implementation details.  Please work with us\nto improve SongExplorer instead instead of forking your own version.\n\n\n# Development #\n\nConda is the preferred development environment.  The Singularity and Docker\nfiles provided use conda internally.\n\n## Conda ##\n\nPlatform-specific conda installation instructions can be found at\n[Conda](https://conda.io/projects/conda/en/latest/user-guide/install).  On Macs\nyou can also use [Homebrew](https://brew.sh/) to install conda.  On MS Windows,\nminiforge by default uses a command prompt; execute `conda init powershell` to\nused conda in PowerShell.\n\nTo build locally, first download the source code and install the conda\nbuild command.  These only need to be done once:\n\n    $ git clone https://github.com/JaneliaSciComp/SongExplorer\n    $ conda install conda-build\n\nThen build:\n\n    $ conda build \u003cpath-to-songexplorer-repo\u003e/install/conda/songexplorer -c conda-forge -c nvidia\n\nTo install directly from this local build:\n\n    $ conda create --name songexplorer\n    $ conda install -n songexplorer --use-local songexplorer -c conda-forge -c nvidia\n\nPay attention to the notice at the end demarcated with \"*** IMPORTANT\n!!! ***\".  Follow the directions therein to install platform-specific\ndependencies which are not in conda-forge.\n\nIf you have trouble using the GPU on Ubuntu 22.04, you might need to move a\nlibrary, set XLA_FLAGS in your environment, and install cuda-nvcc:\n\n    $ conda activate songexplorer\n    $ mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice/\n    $ cp -p $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/\n    $ echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib' \\\n            \u003e\u003e $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n    $ conda install -c nvidia cuda-nvcc\n\nTo upload to the [Janelia forge](https://anaconda.org/janelia):\n\n    $ conda activate base\n    $ conda install anaconda-client\n    $ anaconda login\n    $ anaconda upload -u janelia $CONDA_PREFIX/conda-bld/\u003carchitecture\u003e/songexplorer-\u003cversion\u003e-0.tar.bz2\n\nSubsequently you can install directly from the Janelia forge, instead of your local\nbuild:\n\n    $ conda create --name songexplorer\n    $ conda install songexplorer -n songexplorer -c janelia -c nvidia -c conda-forge\n\nTo upgrade to the latest version, first get the new version and delete the\nlocal build, and then execute the above commands again:\n\n    $ git -C \u003cpath-to-songexplorer-repo\u003e pull\n    $ conda env remove --name songexplorer\n    $ conda build purge-all\n\nTo update the conda recipe, tag a new version and make a github release, then\nupdate \"meta.yaml\" with the new version number and new hash for the tar.gz\nasset:\n\n    $ git tag v\u003cversion\u003e\n    $ git push --tags\n\n    $ openssl sha256 \u003cgithub-tar-gz-file\u003e\n\nTo upload a tarball to Github, compress the conda environment and drag and drop\nit into the assets section of the releases page:\n\n    $ cd $CONDA_PREFIX/..\n    $ tar czf songexplorer-\u003cversion\u003e-\u003carchitecture\u003e.tar.gz songexplorer\n    $ cat songexplorer-\u003cversion\u003e-\u003carchitecture\u003e.tar.gz | \\\n          split --bytes=2GB - songexplorer-\u003cversion\u003e-\u003carchitecture\u003e.tar.gz.\n\nAfter downloading, some users will need to re-install some pip dependencies\n(e.g. tensorflow-metal on MacOS) as they are not in general relocatable:\n\n    $ pip3 install --target \u003cpath-to-unzipped-tarball\u003e/songexplorer/lib/python3.10/site-packages tensorflow-metal\n\nTo make changes to the code, do so in the git repository while using the\ndependent packages in the conda environment:\n\n    $ conda activate songexplorer\n    $ echo PATH=\u003cpath-to-git-repo\u003e/src:$PATH \u003e\u003e $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n    $ echo PATH=\u003cpath-to-git-repo\u003e/test:$PATH \u003e\u003e $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh\n    $ conda activate songexplorer\n\nOr, skip activating a conda environment and depend on the packages in the tarball\ninstead:\n\n    $ echo PATH=\u003cpath-to-git-repo\u003e/src:$PATH \u003e\u003e ~/.bashrc\n    $ echo PATH=\u003cpath-to-git-repo\u003e/test:$PATH \u003e\u003e ~/.bashrc\n    $ echo PATH=\u003cpath-to-extracted-tarball\u003e/bin:$PATH \u003e\u003e ~/.bashrc\n\nCurrently, songexplorer [hangs](https://github.com/microsoft/WSL/issues/7443)\nfor large models on Windows.  This, despite updating to the latest version\nof WSL:\n\n    $ wsl --status\n    $ wsl --update\n    $ wsl --shutdown\n    $ wsl --status\n\n## Singularity ##\n\nTo build an image, change to a local (i.e. not NFS mounted; e.g.\n/opt/users) directory and:\n\n    $ git clone https://github.com/JaneliaSciComp/SongExplorer.git\n    $ rm -rf songexplorer/.git\n    $ sudo singularity build -s [-B /opt:/mnt] [--no-cleanup] \\\n          songexplorer.img songexplorer/install/singularity.def\n\nTo confirm that the image works:\n\n    $ singularity run --nv songexplorer.img\n    \u003e\u003e\u003e import tensorflow as tf\n    \u003e\u003e\u003e msg = tf.constant('Hello, TensorFlow!')\n    \u003e\u003e\u003e tf.print(msg)\n\nCompress the image into a single file:\n\n    $ sudo singularity build [--tmpdir \u003c\u003e] songexplorer.sif songexplorer.img\n\nNext create an access token at cloud.sylabs.io and login using:\n\n    $ singularity remote login SylabsCloud\n\nThen push the image to the cloud:\n\n    $ singularity sign songexplorer.sif\n    $ singularity push songexplorer.sif library://bjarthur/janelia/songexplorer[:\u003cversion\u003e]\n\nTo download the image from the cloud.  You can either go to\n[SongExplorer's cloud.sylabs.io\npage](https://cloud.sylabs.io/library/bjarthur/janelia/songexplorer) and\nclick the Download button, or equivalently use the command line:\n\n    $ singularity pull library://bjarthur/janelia/songexplorer:latest\n    INFO:    Container is signed\n    Data integrity checked, authentic and signed by:\n      ben arthur (songexplorer) \u003carthurb@hhmi.org\u003e, Fingerprint XXABCXXX\n\nPut these definitions in your .bashrc  file:\n\n    export SONGEXPLORER_BIN=\"singularity exec [--nv] [-B \u003cdisk-drive\u003e] \\\n        [--vm-cpu] [--vm-ram] \u003cpath-to-songexplorer_latest.sif\u003e\"\n    alias songexplorer=\"$SONGEXPLORER_BIN songexplorer \u003cpath-to-configuration.py\u003e 8080\"\n\nTo use a copy of the SongExplorer source code outside of the container, set\nSINGULARITYENV_PREPEND_PATH to the full path to SongExplorer's `src` directory in\nyour shell environment.  `source_path` in \"configuration.py\" must be set\nsimilarly if using a remote workstation or a cluster.\n\n## Docker ##\n\nTo start docker on Linux and set permissions:\n\n    $ service docker start\n    $ setfacl -m user:$USER:rw /var/run/docker.sock\n\nTo build a docker image and push it to docker hub:\n\n    $ cd songexplorer\n    $ docker build --file=install/dockerfile --tag=bjarthur/songexplorer[:\u003ctag\u003e] \\\n          [--no-cache=true] .\n    $ [docker tag \u003cimage-id\u003e bjarthur/songexplorer:\u003ctag\u003e]  % get image-id from `docker image ls`\n    $ docker login\n    $ docker push bjarthur/songexplorer[:\u003ctag\u003e]\n\nTo remove a tag from docker hub:\n\n    $ docker run --rm lumir/remove-dockerhub-tag --user bjarthur \\\n          --password \u003cpassword\u003e bjarthur/songexplorer:\u003ctag\u003e\n\nTo pull an image from docker hub:\n\n    $ docker pull bjarthur/songexplorer\n    Using default tag: latest\n    latest: Pulling from bjarthur/songexplorer\n    Digest: sha256:466674507a10ae118219d83f8d0a3217ed31e4763209da96dddb03994cc26420\n    Status: Image is up to date for bjarthur/songexplorer:latest\n\nPut these definitions in your .bashrc file:\n\n    export SONGEXPLORER_BIN=\"docker run \\\n        [-v \u003cdisk-drive\u003e] [-u \u003cuserid\u003e] [-w \u003cworking-directory] \\\n        bjarthur/songexplorer\"\n    alias songexplorer=\"docker run \\\n        [-v \u003cdisk-drive\u003e] [-u \u003cuserid\u003e] [-w \u003cworking-directory] \\\n        -e SONGEXPLORER_BIN -h=`hostname` -p 8080:8080 \\\n        bjarthur/songexplorer songexplorer \u003cpath-to-configuration.py\u003e 8080\"\n\nAdd to these definitions any directories you want to access using the `-v`\nflag.  You might also need to use the `-u` flag to specify your username\nor userid.  Optionally specify the current working directory with the\n`-w` flag.\n\nTo quit out of SongExplorer you might need to open another terminal window and\nissue the `stop` command:\n\n    $ docker ps\n    CONTAINER ID IMAGE             COMMAND               CREATED       STATUS ...\n    6a26ad9d005e bjarthur/songexplorer \"detect.sh /src/p...\" 3 seconds ago Up 2 seconds ...\n\n    $ docker stop 6a26ad9d005e\n\nTo make this easy, put this short cut in your .bashrc file:\n\n    alias dockerkill='docker stop $(docker ps --latest --format \"{{.ID}}\")'\n\nOn Windows docker runs within a virtual machine that is configured by default to\nonly use half the available CPU cores and half of the memory.  This\nconfiguration can be changed in the Preferences window.  Note that even when\nSongExplorer is idle these resources will *not* be available to other programs,\nincluding the operating system.\n\nTo monitor resource usage:\n\n    $ docker stats\n\nTo run a container interactively add \"-i --tty\".\n\n## System Tests ##\n\nSongExplorer comes with a comprehensive set of tests to facilitate easy validation\nthat everything works both after you've first installed it as well as after any\nchanges have been made to the code.  The tests exercise both the Python GUI as\nwell as the Linux Bash interfaces.  To run them, simply execute \"runtests\":\n\n    $ \u003cpath-to-unzipped-executable\u003e/songexplorer/bin/songexplorer/test/runtests\n","funding_links":[],"categories":["End-User Applications"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJaneliaSciComp%2FSongExplorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJaneliaSciComp%2FSongExplorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJaneliaSciComp%2FSongExplorer/lists"}