{"id":15640629,"url":"https://github.com/yeyupiaoling/audioclassification-tensorflow","last_synced_at":"2025-04-30T08:14:25.630Z","repository":{"id":110642264,"uuid":"258370324","full_name":"yeyupiaoling/AudioClassification-Tensorflow","owner":"yeyupiaoling","description":"基于Tensorflow实现声音分类，博客地址：","archived":false,"fork":false,"pushed_at":"2020-05-08T13:15:54.000Z","size":114,"stargazers_count":101,"open_issues_count":0,"forks_count":22,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-30T08:14:20.888Z","etag":null,"topics":["audioclassification","tensorflow","urbansound8k"],"latest_commit_sha":null,"homepage":"https://blog.doiduoyi.com/articles/1587654005620.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yeyupiaoling.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-24T01:08:17.000Z","updated_at":"2025-04-16T18:16:47.000Z","dependencies_parsed_at":"2023-04-05T11:48:01.932Z","dependency_job_id":null,"html_url":"https://github.com/yeyupiaoling/AudioClassification-Tensorflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeyupiaoling%2FAudioClassification-Tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeyupiaoling%2FAudioClassification-Tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeyupiaoling%2FAudioClassification-Tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yeyupiaoling%2FAudioClassification-Tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yeyupiaoling","download_url":"https://codeload.github.com/yeyupiaoling/AudioClassification-Tensorflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251666361,"owners_count":21624298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audioclassification","tensorflow","urbansound8k"],"created_at":"2024-10-03T11:38:52.018Z","updated_at":"2025-04-30T08:14:25.609Z","avatar_url":"https://github.com/yeyupiaoling.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 前言\n本章我们来介绍如何使用Tensorflow训练一个区分不同音频的分类模型，例如你有这样一个需求，需要根据不同的鸟叫声识别是什么种类的鸟，这时你就可以使用这个方法来实现你的需求了。话不多说，来干。\n\n# 环境准备\n主要介绍libsora，PyAudio，pydub的安装，其他的依赖包根据需要自行安装。\n - Python 3.7\n - Tensorflow 2.0\n\n## 安装libsora\n最简单的方式就是使用pip命令安装，如下：\n```shell\npip install pytest-runner\npip install librosa\n```\n\n如果pip命令安装不成功，那就使用源码安装，下载源码：[https://github.com/librosa/librosa/releases/](https://github.com/librosa/librosa/releases/)， windows的可以下载zip压缩包，方便解压。\n```shell\npip install pytest-runner\ntar xzf librosa-\u003c版本号\u003e.tar.gz 或者 unzip librosa-\u003c版本号\u003e.tar.gz\ncd librosa-\u003c版本号\u003e/\npython setup.py install\n```\n\n如果出现`libsndfile64bit.dll': error 0x7e`错误，请指定安装版本0.6.3，如`pip install librosa==0.6.3`\n\n## 安装PyAudio\n使用pip安装命令，如下：\n```shell\npip install pyaudio\n```\n 在安装的时候需要使用到C++库进行编译，如果读者的系统是windows，Python是3.7，可以在这里下载whl安装包，下载地址：[https://github.com/intxcc/pyaudio_portaudio/releases](https://github.com/intxcc/pyaudio_portaudio/releases)\n\n## 安装pydub\n使用pip命令安装，如下：\n```shell\npip install pydub\n```\n\n# 训练分类模型\n把音频转换成训练数据最重要的是使用了librosa，使用librosa可以很方便得到音频的梅尔频谱（Mel Spectrogram），使用的API为`librosa.feature.melspectrogram()`，输出的是numpy值，可以直接用tensorflow训练和预测。关于梅尔频谱具体信息读者可以自行了解，跟梅尔频谱同样很重要的梅尔倒谱（MFCCs）更多用于语音识别中，对应的API为`librosa.feature.mfcc()`。同样以下的代码，就可以获取到音频的梅尔频谱，其中`duration`参数指定的是截取音频的长度。\n```python\ny1, sr1 = librosa.load(data_path, duration=2.97)\nps = librosa.feature.melspectrogram(y=y1, sr=sr1)\n```\n\n## 创建训练数据\n根据上面的方法，我们创建Tensorflow训练数据，因为分类音频数据小而多，最好的方法就是把这些音频文件生成TFRecord，加快训练速度。创建`create_data.py`用于生成TFRecord文件。\n\n首先需要生成数据列表，用于下一步的读取需要，`audio_path`为音频文件路径，用户需要提前把音频数据集存放在`dataset/audio`目录下，每个文件夹存放一个类别的音频数据，如`dataset/audio/鸟叫声/······`。每条音频数据长度大于2.1秒，当然可以可以只其他的音频长度，这个可以根据读取的需要修改，如有需要的参数笔者都使用注释标注了。`audio`是数据列表存放的位置，生成的数据类别的格式为`音频路径\\t音频对应的类别标签`。读者也可以根据自己存放数据的方式修改以下函数。\n```python\ndef get_data_list(audio_path, list_path):\n    sound_sum = 0\n    audios = os.listdir(audio_path)\n\n    f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')\n    f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')\n\n    for i in range(len(audios)):\n        sounds = os.listdir(os.path.join(audio_path, audios[i]))\n        for sound in sounds:\n            sound_path = os.path.join(audio_path, audios[i], sound)\n            t = librosa.get_duration(filename=sound_path)\n            # [可能需要修改参数] 过滤小于2.1秒的音频\n            if t \u003e= 2.1:\n                if sound_sum % 100 == 0:\n                    f_test.write('%s\\t%d\\n' % (sound_path, i))\n                else:\n                    f_train.write('%s\\t%d\\n' % (sound_path, i))\n                sound_sum += 1\n        print(\"Audio：%d/%d\" % (i + 1, len(audios)))\n\n    f_test.close()\n    f_train.close()\n   \nif __name__ == '__main__':\n    get_data_list('dataset/audio', 'dataset')\n```\n\n有了以上的数据列表，就可开始生成TFRecord文件了。最终会生成`train.tfrecord`和`test.tfrecord`。笔者设置的音频长度为2.04秒，不足长度会补0，如果需要使用不同的音频长度时，需要修改wav_len参数值和len(ps)过滤值，wav_len参数值为音频长度 16000 * 秒数，len(ps)过滤值为梅尔频谱shape相乘。\n```python\n# 获取浮点数组\ndef _float_feature(value):\n    if not isinstance(value, list):\n        value = [value]\n    return tf.train.Feature(float_list=tf.train.FloatList(value=value))\n\n\n# 获取整型数据\ndef _int64_feature(value):\n    if not isinstance(value, list):\n        value = [value]\n    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))\n\n\n# 把数据添加到TFRecord中\ndef data_example(data, label):\n    feature = {\n        'data': _float_feature(data),\n        'label': _int64_feature(label),\n    }\n    return tf.train.Example(features=tf.train.Features(feature=feature))\n\n\n# 开始创建tfrecord数据\ndef create_data_tfrecord(data_list_path, save_path):\n    with open(data_list_path, 'r') as f:\n        data = f.readlines()\n    with tf.io.TFRecordWriter(save_path) as writer:\n        for d in tqdm(data):\n            try:\n                path, label = d.replace('\\n', '').split('\\t')\n                wav, sr = librosa.load(path, sr=16000)\n                intervals = librosa.effects.split(wav, top_db=20)\n                wav_output = []\n                # [可能需要修改参数] 音频长度 16000 * 秒数\n                wav_len = int(16000 * 2.04)\n                for sliced in intervals:\n                    wav_output.extend(wav[sliced[0]:sliced[1]])\n                for i in range(5):\n                    # 裁剪过长的音频，过短的补0\n                    if len(wav_output) \u003e wav_len:\n                        l = len(wav_output) - wav_len\n                        r = random.randint(0, l)\n                        wav_output = wav_output[r:wav_len + r]\n                    else:\n                        wav_output.extend(np.zeros(shape=[wav_len - len(wav_output)], dtype=np.float32))\n                    wav_output = np.array(wav_output)\n                    # 转成梅尔频谱\n                    ps = librosa.feature.melspectrogram(y=wav_output, sr=sr, hop_length=256).reshape(-1).tolist()\n                    # [可能需要修改参数] 梅尔频谱shape ，librosa.feature.melspectrogram(y=wav_output, sr=sr, hop_length=256).shape\n                    if len(ps) != 128 * 128: continue\n                    tf_example = data_example(ps, int(label))\n                    writer.write(tf_example.SerializeToString())\n                    if len(wav_output) \u003c= wav_len:\n                        break\n            except Exception as e:\n                print(e)\n\n\nif __name__ == '__main__':\n    create_data_tfrecord('dataset/train_list.txt', 'dataset/train.tfrecord')\n    create_data_tfrecord('dataset/test_list.txt', 'dataset/test.tfrecord')\n```\n\nUrbansound8K 是目前应用较为广泛的用于自动城市环境声分类研究的公共数据集，包含10个分类：空调声、汽车鸣笛声、儿童玩耍声、狗叫声、钻孔声、引擎空转声、枪声、手提钻、警笛声和街道音乐声。数据集下载地址：[https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz](https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz)。以下是针对Urbansound8K生成数据列表的函数。如果读者想使用该数据集，请下载并解压到`dataset`目录下，把生成数据列表代码改为以下代码。\n```python\n# 创建UrbanSound8K数据列表\ndef get_urbansound8k_list(path, urbansound8k_cvs_path):\n    data_list = []\n    data = pd.read_csv(urbansound8k_cvs_path)\n    # 过滤掉长度少于3秒的音频\n    valid_data = data[['slice_file_name', 'fold', 'classID', 'class']][data['end'] - data['start'] \u003e= 3]\n    valid_data['path'] = 'fold' + valid_data['fold'].astype('str') + '/' + valid_data['slice_file_name'].astype('str')\n    for row in valid_data.itertuples():\n        data_list.append([row.path, row.classID])\n\n    f_train = open(os.path.join(path, 'train_list.txt'), 'w')\n    f_test = open(os.path.join(path, 'test_list.txt'), 'w')\n\n    for i, data in enumerate(data_list):\n        sound_path = os.path.join('dataset/UrbanSound8K/audio/', data[0])\n        if i % 100 == 0:\n            f_test.write('%s\\t%d\\n' % (sound_path, data[1]))\n        else:\n            f_train.write('%s\\t%d\\n' % (sound_path, data[1]))\n\n    f_test.close()\n    f_train.close()\n\n\nif __name__ == '__main__':\n    get_urbansound8k_list('dataset', 'dataset/UrbanSound8K/metadata/UrbanSound8K.csv')\n```\n\n创建`reader.py`用于在训练时读取TFRecord文件数据。如果读者使用了其他的音频长度，需要修改一下`tf.io.FixedLenFeature`参数的值，为梅尔频谱的shape相乘的值。\n```python\nimport tensorflow as tf\n\ndef _parse_data_function(example):\n    # [可能需要修改参数】 设置的梅尔频谱的shape相乘的值\n    data_feature_description = {\n        'data': tf.io.FixedLenFeature([16384], tf.float32),\n        'label': tf.io.FixedLenFeature([], tf.int64),\n    }\n    return tf.io.parse_single_example(example, data_feature_description)\n\n\ndef train_reader_tfrecord(data_path, num_epochs, batch_size):\n    raw_dataset = tf.data.TFRecordDataset(data_path)\n    train_dataset = raw_dataset.map(_parse_data_function)\n    train_dataset = train_dataset.shuffle(buffer_size=1000) \\\n        .repeat(count=num_epochs) \\\n        .batch(batch_size=batch_size) \\\n        .prefetch(buffer_size=tf.data.experimental.AUTOTUNE)\n    return train_dataset\n\n\ndef test_reader_tfrecord(data_path, batch_size):\n    raw_dataset = tf.data.TFRecordDataset(data_path)\n    test_dataset = raw_dataset.map(_parse_data_function)\n    test_dataset = test_dataset.batch(batch_size=batch_size)\n    return test_dataset\n```\n\n## 训练\n接着就可以开始训练模型了，创建`train.py`。我们搭建简单的卷积神经网络，通过把音频数据转换成梅尔频谱，数据的shape也相当于灰度图，所以我们可以当作图像的输入创建一个深度神经网络。然后定义优化方法和获取训练和测试数据。`input_shape`设置为`(128, None, 1))`主要是为了适配其他音频长度的输入和预测是任意大小的输入。`class_dim`为分类的总数。\n```python\nimport tensorflow as tf\nimport reader\nimport numpy as np\n\nclass_dim = 10\nEPOCHS = 100\nBATCH_SIZE=32\n\nmodel = tf.keras.models.Sequential([\n    tf.keras.applications.ResNet50V2(include_top=False, weights=None, input_shape=(128, None, 1)),\n    tf.keras.layers.ActivityRegularization(l2=0.5),\n    tf.keras.layers.Dropout(rate=0.5),\n    tf.keras.layers.GlobalMaxPooling2D(),\n    tf.keras.layers.Dense(units=class_dim, activation=tf.nn.softmax)\n])\n\nmodel.summary()\n\n\n# 定义优化方法\noptimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)\n\ntrain_dataset = reader.train_reader_tfrecord('dataset/train.tfrecord', EPOCHS, batch_size=BATCH_SIZE)\ntest_dataset = reader.test_reader_tfrecord('dataset/test.tfrecord', batch_size=BATCH_SIZE)\n```\n\n最后执行训练，每200个batch执行一次测试和保存模型。要注意的是在创建TFRecord文件时，已经把音频数据的梅尔频谱转换为一维list了，所以在数据输入到模型前，需要把数据reshape为之前的shape，操作方式为`reshape((-1, 128, 128, 1))`。要注意的是如果读者使用了其他长度的音频，需要根据梅尔频谱的shape修改。\n```python\nfor batch_id, data in enumerate(train_dataset):\n    # [可能需要修改参数】 设置的梅尔频谱的shape\n    sounds = data['data'].numpy().reshape((-1, 128, 128, 1))\n    labels = data['label']\n    # 执行训练\n    with tf.GradientTape() as tape:\n        predictions = model(sounds)\n        # 获取损失值\n        train_loss = tf.keras.losses.sparse_categorical_crossentropy(labels, predictions)\n        train_loss = tf.reduce_mean(train_loss)\n        # 获取准确率\n        train_accuracy = tf.keras.metrics.sparse_categorical_accuracy(labels, predictions)\n        train_accuracy = np.sum(train_accuracy.numpy()) / len(train_accuracy.numpy())\n\n    # 更新梯度\n    gradients = tape.gradient(train_loss, model.trainable_variables)\n    optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n\n    if batch_id % 20 == 0:\n        print(\"Batch %d, Loss %f, Accuracy %f\" % (batch_id, train_loss.numpy(), train_accuracy))\n\n    if batch_id % 200 == 0 and batch_id != 0:\n        test_losses = list()\n        test_accuracies = list()\n        for d in test_dataset:\n            # [可能需要修改参数】 设置的梅尔频谱的shape\n            test_sounds = d['data'].numpy().reshape((-1, 128, 128, 1))\n            test_labels = d['label']\n\n            test_result = model(test_sounds)\n            # 获取损失值\n            test_loss = tf.keras.losses.sparse_categorical_crossentropy(test_labels, test_result)\n            test_loss = tf.reduce_mean(test_loss)\n            test_losses.append(test_loss)\n            # 获取准确率\n            test_accuracy = tf.keras.metrics.sparse_categorical_accuracy(test_labels, test_result)\n            test_accuracy = np.sum(test_accuracy.numpy()) / len(test_accuracy.numpy())\n            test_accuracies.append(test_accuracy)\n\n        print('=================================================')\n        print(\"Test, Loss %f, Accuracy %f\" % (\n            sum(test_losses) / len(test_losses), sum(test_accuracies) / len(test_accuracies)))\n        print('=================================================')\n\n        # 保存模型\n        model.save(filepath='models/resnet50.h5')\n```\n\n\n# 预测\n在训练结束之后，我们得到了一个预测模型，有了预测模型，执行预测非常方便。我们使用这个模型预测音频，输入的音频会裁剪静音部分，所以非静音部分不能小于 0.5 秒，避免特征数量太少，当然这也不是一定的，可以任意修改。在执行预测之前，需要把音频裁剪掉静音部分，并且把裁剪后的音频转换为梅尔频谱数据。预测的数据shape第一个为输入数据的 batch 大小，如果想多个音频一起数据，可以把他们存放在 list 中一起预测。最后输出的结果即为预测概率最大的标签。\n\n```python\nimport librosa\nimport numpy as np\nimport tensorflow as tf\n\nmodel = tf.keras.models.load_model('models/resnet50.h5')\n\n# 读取音频数据\ndef load_data(data_path):\n    wav, sr = librosa.load(data_path, sr=16000)\n    intervals = librosa.effects.split(wav, top_db=20)\n    wav_output = []\n    for sliced in intervals:\n        wav_output.extend(wav[sliced[0]:sliced[1]])\n    assert len(wav_output) \u003e= 8000, \"有效音频小于0.5s\"\n    wav_output = np.array(wav_output)\n    ps = librosa.feature.melspectrogram(y=wav_output, sr=sr, hop_length=256).astype(np.float32)\n    ps = ps[np.newaxis, ..., np.newaxis]\n    return ps\n\n\ndef infer(audio_path):\n    data = load_data(audio_path)\n    result = model.predict(data)\n    lab = tf.argmax(result, 1)\n    return lab\n\n\nif __name__ == '__main__':\n    # 要预测的音频文件\n    path = ''\n    label = infer(path)\n    print('音频：%s 的预测结果标签为：%d' % (path, label))\n\n```\n\n\n# 其他\n为了方便读取录制数据和制作数据集，这里提供了两个程序，首先是`record_audio.py`，这个用于录制音频，录制的音频帧率为44100，通道为1，16bit。\n```python\nimport pyaudio\nimport wave\nimport uuid\nfrom tqdm import tqdm\nimport os\n\ns = input('请输入你计划录音多少秒：')\n\nCHUNK = 1024\nFORMAT = pyaudio.paInt16\nCHANNELS = 1\nRATE = 44100\nRECORD_SECONDS = int(s)\nWAVE_OUTPUT_FILENAME = \"save_audio/%s.wav\" % str(uuid.uuid1()).replace('-', '')\n\np = pyaudio.PyAudio()\n\nstream = p.open(format=FORMAT,\n                channels=CHANNELS,\n                rate=RATE,\n                input=True,\n                frames_per_buffer=CHUNK)\n\nprint(\"开始录音, 请说话......\")\n\nframes = []\n\nfor i in tqdm(range(0, int(RATE / CHUNK * RECORD_SECONDS))):\n    data = stream.read(CHUNK)\n    frames.append(data)\n\nprint(\"录音已结束!\")\n\nstream.stop_stream()\nstream.close()\np.terminate()\n\nif not os.path.exists('save_audio'):\n    os.makedirs('save_audio')\n\nwf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')\nwf.setnchannels(CHANNELS)\nwf.setsampwidth(p.get_sample_size(FORMAT))\nwf.setframerate(RATE)\nwf.writeframes(b''.join(frames))\nwf.close()\n\nprint('文件保存在：%s' % WAVE_OUTPUT_FILENAME)\nos.system('pause')\n```\n\n创建`crop_audio.py`，笔者在训练默认训练2.04秒的音频，所以我们要把录制的硬盘安装每3秒裁剪一段，把裁剪后音频存放在音频名称命名的文件夹中。最后把这些文件按照训练数据的要求创建数据列表，和生成TFRecord文件。\n```python\nimport os\nimport uuid\nimport wave\nfrom pydub import AudioSegment\n\n\n# 按秒截取音频\ndef get_part_wav(sound, start_time, end_time, part_wav_path):\n    save_path = os.path.dirname(part_wav_path)\n    if not os.path.exists(save_path):\n        os.makedirs(save_path)\n    start_time = int(start_time) * 1000\n    end_time = int(end_time) * 1000\n    word = sound[start_time:end_time]\n    word.export(part_wav_path, format=\"wav\")\n\n\ndef crop_wav(path, crop_len):\n    for src_wav_path in os.listdir(path):\n        wave_path = os.path.join(path, src_wav_path)\n        print(wave_path[-4:])\n        if wave_path[-4:] != '.wav':\n            continue\n        file = wave.open(wave_path)\n        # 帧总数\n        a = file.getparams().nframes\n        # 采样频率\n        f = file.getparams().framerate\n        # 获取音频时间长度\n        t = int(a / f)\n        print('总时长为 %d s' % t)\n        # 读取语音\n        sound = AudioSegment.from_wav(wave_path)\n        for start_time in range(0, t, crop_len):\n            save_path = os.path.join(path, os.path.basename(wave_path)[:-4], str(uuid.uuid1()) + '.wav')\n            get_part_wav(sound, start_time, start_time + crop_len, save_path)\n\n\nif __name__ == '__main__':\n    crop_len = 3\n    crop_wav('save_audio', crop_len)\n```\n\n\n创建`infer_record.py`，这个程序是用来不断进行录音识别，录音时间之所以设置为 3 秒，保证裁剪静音部分后有足够的音频长度用于预测，当然也可以修改成其他的长度值。因为识别的时间比较短，所以我们可以大致理解为这个程序在实时录音识别。通过这个应该我们可以做一些比较有趣的事情，比如把麦克风放在小鸟经常来的地方，通过实时录音识别，一旦识别到有鸟叫的声音，如果你的数据集足够强大，有每种鸟叫的声音数据集，这样你还能准确识别是那种鸟叫。如果识别到目标鸟类，就启动程序，例如拍照等等。\n\n```python\nimport wave\nimport librosa\nimport numpy as np\nimport pyaudio\nimport tensorflow as tf\n\n# 获取网络模型\nmodel = tf.keras.models.load_model('models/resnet50.h5')\n\n# 录音参数\nCHUNK = 1024\nFORMAT = pyaudio.paInt16\nCHANNELS = 1\nRATE = 16000\nRECORD_SECONDS = 3\nWAVE_OUTPUT_FILENAME = \"infer_audio.wav\"\n\n# 打开录音\np = pyaudio.PyAudio()\nstream = p.open(format=FORMAT,\n                channels=CHANNELS,\n                rate=RATE,\n                input=True,\n                frames_per_buffer=CHUNK)\n\n\n# 读取音频数据\ndef load_data(data_path):\n    wav, sr = librosa.load(data_path, sr=16000)\n    intervals = librosa.effects.split(wav, top_db=20)\n    wav_output = []\n    for sliced in intervals:\n        wav_output.extend(wav[sliced[0]:sliced[1]])\n    if len(wav_output) \u003c 8000:\n        raise Exception(\"有效音频小于0.5s\")\n    wav_output = np.array(wav_output)\n    ps = librosa.feature.melspectrogram(y=wav_output, sr=sr, hop_length=256).astype(np.float32)\n    ps = ps[np.newaxis, ..., np.newaxis]\n    return ps\n\n\n# 获取录音数据\ndef record_audio():\n    print(\"开始录音......\")\n\n    frames = []\n    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):\n        data = stream.read(CHUNK)\n        frames.append(data)\n\n    print(\"录音已结束!\")\n\n    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')\n    wf.setnchannels(CHANNELS)\n    wf.setsampwidth(p.get_sample_size(FORMAT))\n    wf.setframerate(RATE)\n    wf.writeframes(b''.join(frames))\n    wf.close()\n    return WAVE_OUTPUT_FILENAME\n\n\n# 预测\ndef infer(audio_data):\n    result = model.predict(audio_data)\n    lab = tf.argmax(result, 1)\n    return lab\n\n\nif __name__ == '__main__':\n    try:\n        while True:\n            # 加载数据\n            data = load_data(record_audio())\n\n            # 获取预测结果\n            label = infer(data)\n            print('预测的标签为：%d' % label)\n    except Exception as e:\n        print(e)\n        stream.stop_stream()\n        stream.close()\n        p.terminate()\n```\n\n# 模型\n\n| 模型名称 | 所用数据集 | 下载地址 |\n| :---: | :---: | :---: |\n| 网络权重 | UrbanSound8K | [点击下载](https://resource.doiduoyi.com/#58c831s) |\n| 网络预测模型 | UrbanSound8K | [点击下载](https://resource.doiduoyi.com/#so8e51u) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyeyupiaoling%2Faudioclassification-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyeyupiaoling%2Faudioclassification-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyeyupiaoling%2Faudioclassification-tensorflow/lists"}