背景

公司项目有个语音转写文字的需求,项目开发完成后,产品体会发现语音录制的音频底噪较大,期望我们对噪音进行优化,所以便对降噪进行了调研。调研进程中发现依据WebRTC音频降噪的适用范围和点评都较好,对该计划进行了研究。

进程

  1. 录制语音,得到PCM数据(自行完成)。
  2. 将PCM数据缓存到环形缓冲区中,方便取用。
  3. 当缓冲区数据大于必定长度,从环形缓冲区读取指定长度数据进行降噪。
  4. 将降噪后的数据保存到文件中(自行完成)。

完成

计划运用到了noiseReduction库供给的SDK,并学习了思路。

1. 将音频录制得到的数据保存到环形缓冲区中

为什么要保存到缓冲区再进行降噪?由于WebRTC降噪的接口有要求,每次最短需求10ms的数据才能确保降噪作用,如果是16000采样率(即每秒16000个采样点)的录音,这10ms的数据就有16000 / 1000 * 10 = 160个采样点,即320个字节(每个采样点是1 short */int16_t *,sizeof(int16_t) = 2个字节)。为了能稳定调用降噪接口,缓冲区是必须的。

环形缓冲区供给了四个接口,写入、读取、长度、重置。

#include "ad_audio_buffer.h"
#include <stdlib.h>
#include <string.h>
/// 环形缓冲区容量
#define BUFFER_SIZE 10000
char audio_buffer[BUFFER_SIZE] = {0};
size_t read_index = 0;
size_t write_index = 0;
size_t data_length = 0;
/// 往缓冲区中写入length长度的数据
void ad_writeBuffer(char *buffer, size_t length) {
    if (length > BUFFER_SIZE - data_length) {
        printf("Buffer overflow!n");
        return;
    }
    size_t remaining_space = BUFFER_SIZE - write_index;
    if (length <= remaining_space) {
        memcpy(audio_buffer + write_index, buffer, length);
        write_index += length;
    } else {
        memcpy(audio_buffer + write_index, buffer, remaining_space);
        memcpy(audio_buffer, buffer + remaining_space, length - remaining_space);
        write_index = length - remaining_space;
    }
    data_length += length;
}
/// 从缓冲区中读取length长度的数据
void ad_readBuffer(char *output_buffer, size_t length) {
    if (length > data_length) {
        printf("Not enough data in buffer!n");
        return;
    }
    size_t remaining_data = 0;
    if (read_index + length > BUFFER_SIZE) {
        remaining_data = BUFFER_SIZE - read_index + write_index;
        if (length <= remaining_data) {
            memcpy(output_buffer, audio_buffer + read_index, BUFFER_SIZE - read_index);
            memcpy(output_buffer + BUFFER_SIZE - read_index, audio_buffer, length - (BUFFER_SIZE - read_index));
            read_index = length - (BUFFER_SIZE - read_index);
        } else {
            if (read_index + remaining_data <= BUFFER_SIZE) {
                memcpy(output_buffer, audio_buffer + read_index, remaining_data);
                read_index += remaining_data;
            } else {
                memcpy(output_buffer, audio_buffer + read_index, (BUFFER_SIZE - read_index));
                memcpy(output_buffer + (BUFFER_SIZE - read_index), audio_buffer, write_index);
                read_index = write_index;
            }
        }
    } else {
        remaining_data = write_index - read_index;
        if (length <= remaining_data) {
            memcpy(output_buffer, audio_buffer + read_index, length);
            read_index += length;
        } else {
            memcpy(output_buffer, audio_buffer + read_index, remaining_data);
            memcpy(output_buffer + remaining_data, audio_buffer, length - remaining_data);
            read_index = length - remaining_data;
        }
    }
    data_length -= length;
}
/// 获取缓冲区中数据长度
size_t ad_getBufferLength(void) {
  return data_length;
}
/// 清空重置缓冲区
void ad_resetBuffer(void) {
    memset(audio_buffer, 0, sizeof(audio_buffer));
    read_index = 0;
    write_index = 0;
    data_length = 0;
}
2. 读取缓冲区指定长度数据,并进行降噪

从缓冲区中读取指定长度的数据,这里封装了办法,每次读取320个采样点数据进行降噪(可以是160的整数倍)。代码如下:

/// 将录制得到的实时音频数据进行降噪
/// - Parameters:
///   - data: 实时音频PCM数据
- (void)noiseRedutionWithData:(NSData *)data {
    char *buffer = (char *)[data bytes];
    size_t length = data.length;
    //将实时录制的音频保存到环形缓存区中
    ad_writeBuffer(buffer, length);
    //循环读取缓存区中的数据进行降噪,直到没有满足长的数据
    [self readBufferToNoiseReduction];
}
- (void)readBufferToNoiseReduction {
    //每次取出320个采样点,即640个字节数据(采样点可以是160的整数倍)
    if (ad_getBufferLength() >= 320 * sizeof(int16_t)) {
        //读取缓存区中数据
        int16_t dulBuffer[320] = {0};
        size_t length = sizeof(dulBuffer);
        ad_readBuffer((char *)dulBuffer, length);
        //降噪,采样数samplesCount = length / sizeof(int16_t)
        [self.noiseHandler handlerProcess:dulBuffer sampleRate:16000 samplesCount: length / sizeof(int16_t) level:kModerate];
        //降噪后的数据保存到.pcm文件中,得到pcm文件后再转成.wav或许.mp3格局即可播映
        NSData *pcmData = [NSData dataWithBytes:dulBuffer length:length];
        [self.writeFileHandle writeData:pcmData error:nil];
        //再次读取,直到没有满足长的数据
        [self readBufferToNoiseReduction];
    }
}

降噪CAudioNoiseHandler类完成代码如下:

#import <Foundation/Foundation.h>
#import "noise_suppression.h"
NS_ASSUME_NONNULL_BEGIN
enum nsLevel {
    kLow,
    kModerate,
    kHigh,
    kVeryHigh
};
@interface CAudioNoiseHandler : NSObject 
- (int)handlerProcess:(int16_t *)buffer
           sampleRate:(uint32_t)sampleRate
         samplesCount:(unsigned long)samplesCount
                level:(enum nsLevel)level;
- (void)closeHandler;
@end
NS_ASSUME_NONNULL_END
#import "CAudioNoiseHandler.h"
#import "ad_audio_buffer.h"
@interface CAudioNoiseHandler(){
    NsHandle *_nshandle;
}
@end
@implementation CAudioNoiseHandler
/// 实时降噪办法,降噪数据的长度需求大于320个字节(大于160个采样点,每个采样点是2个字节)
/// - Parameters:
///   - buffer: int16_t的buffer数据,如果是NSData数据,需求转成int16_t *指针
///   - sampleRate: 采样率,一般是16000
///   - samplesCount: 采样数,采样数的核算依据数据长度(字节)来核算:length / sizeof(int16_t),如NSData.length / 2
///   - level: 降噪等级
- (int)handlerProcess:(int16_t *)buffer 
           sampleRate:(uint32_t)sampleRate
         samplesCount:(unsigned long)samplesCount
                level:(enum nsLevel)level {
    if (buffer == 0) return -1;
    if (samplesCount == 0) return -1;
    size_t samples = MIN(160, sampleRate / 100);
    if (samples == 0) return -1;
    uint32_t num_bands = 1;
    int16_t *input = buffer;
    size_t nTotal = (samplesCount / samples);
    //实时降噪是一个持续的进程,_nshandle不该屡次初始化,否则会影响降噪作用
    if (_nshandle == nil) {
        _nshandle = WebRtcNs_Create();
        if (0 != WebRtcNs_Init(_nshandle, (uint32_t)sampleRate)) {
            NSLog(@"WebRTC降噪-初始化失利");
            return -1;
        }
        if (0 != WebRtcNs_set_policy(_nshandle, level)) {
            NSLog(@"WebRTC降噪-设置失利");
            return -1;
        }
    }
    for (int i = 0; i < nTotal; i++) {
        int16_t *nsIn[1] = {input};   
        int16_t *nsOut[1] = {input};  
        WebRtcNs_Analyze(_nshandle, nsIn[0]);
        WebRtcNs_Process(_nshandle, (const int16_t *const *) nsIn, num_bands, nsOut);
        input += samples;
    }
    return 1;
}
- (void)closeHandler {
    WebRtcNs_Free(_nshandle);
}
@end

总结

  1. 由于不常常运用C语言,指针的运用、数据单位的转换不太熟悉,导致调研进程也受到了一些阻止。
  2. 降噪的_nshandle是完成降噪作用的关键目标,开端的时分我每次调用降噪接口都初始化了一次_nshandle,导致降噪作用大打折扣,但实际上实时降噪只需求初始化一次_nshandle即可,否则会影响降噪作用。
  3. 调研进程中遇到崩溃,有是采样点核算不对,也有提早开释了了_nshandle(WebRtcNs_Free),还有每次塞的降噪数据长度核算不对。
  4. 一开端并没有想到用缓冲区,仅仅发现截取采样数据的时分长度不能满足要求,后来找到了缓冲区这个解决计划。

期望这篇文章对大家有帮助,有什么疑问可留言。