文字转语音 & 语音转文字-六虎

一、文字转语音的完结

使用 AVFoundation 中 Speech Kit 的 AVSpeechSynthesizer 能利用系统自带的引擎，无需网络恳求，高效的完结（语音合成）文字实时转语音

使用比较简略，直接看完结过程

导入 AVFoundation 框架
```
import AVFoundation
```
创立 AVSpeechSynthesizer 实例

AVSpeechSynthesizer 是 AVFoundation 框架中用于文本转语音的类,需要用实例来进行文本到语音的转化。
```
let speechSynthesizer = AVSpeechSynthesizer()
```

装备 AVSpeechUtterance 目标

AVSpeechUtterance 是 AVFoundation 中用于表明文本转语音进程的类。它用于接收需要转语音的文本内容，语音，腔调，语速，声音性别等属性。

AVSpeechSynthesisVoice 是设置语音语音相关的类

AVSpeechSynthesisVoice.speechVoices() 是回来所有可用语音的数组

let speechUtterance = AVSpeechUtterance(string: text)
speechUtterance.rate = 0.5 // 语速
speechUtterance.pitchMultiplier = 1.2 // 腔调
// 设置语音类型和性别
let availableVoices = AVSpeechSynthesisVoice.speechVoices()
if let englishVoice = availableVoices.first(where: { $0.language == "en-US" && $0.gender == .female }) {
   speechUtterance.voice = englishVoice
} else {
   print("No suitable voice found")
}

播映语音

装备好 AVSpeechUtterance 目标，你能够使用 AVSpeechSynthesizer 实例的 speak(_:) 办法来播映语音
```
speechSynthesizer.speak(speechUtterance)
```

语音播映操控和状态判别

状态判别：是否在语音播映中、暂停

open var isSpeaking: Bool { get }
open var isPaused: Bool { get }

暂停和中止

open func stopSpeaking(at boundary: AVSpeechBoundary) -> Bool
open func pauseSpeaking(at boundary: AVSpeechBoundary) -> Bool

中止办法的挑选枚举

public enum AVSpeechBoundary : Int, @unchecked Sendable {
  // 当即中止    
  case immediate = 0
  // 播映完当前的词
  case word = 1
}

AVSpeechSynthesizerDelegate 监听语音播映的情况

例如：暂停，撤销，完结，开端这4种状态的变化

// 暂停
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didPause utterance: AVSpeechUtterance) {    
}
// 完结
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
}
didCance,  didStart
......

留意点：

调用 stopSpeaking 和语音播映完结都会触发 didFinish 署理办法（需要自己区别）
AVSpeechSynthesizer 本身不能自动区别输入的文字是哪种语音，当于设置的不一致时，或含emoji 转语音后播映的内容会异常
设备设置了静音，不独自设置 AVAudioSession，AVSpeechSynthesizer 语音会没声音

二、录音的完结

系统完结录音的办法有多种，常见的3种各有特点，简略介绍如下:

`AVFAudio Framework` 的 `AVAudioRecorder`

单纯的录音，并将录音的文件写入指定沙盒目录，不能实时获取。

// 创立 AVAudioRecorder 实例
var audioRecorder: AVAudioRecorder
func setupAudioRecorder() {
    let audioFilename = getDocumentsDirectory().appendingPathComponent("recordedAudio.wav")
    let settings: [String: Any] = [
        AVFormatIDKey: kAudioFormatLinearPCM,
        AVSampleRateKey: 44100.0,
        AVNumberOfChannelsKey: 1,
        AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
    ]
    do {
        audioRecorder = try AVAudioRecorder(url: audioFilename, settings: settings)
        audioRecorder.delegate = self
        audioRecorder.prepareToRecord()
    } catch {
        // 处理创立 AVAudioRecorder 实例的错误
    }
}
// 开端和中止录音
func startRecording() {
    if !audioRecorder.isRecording {
        audioRecorder.record()
    }
}
func stopRecording() {
    if audioRecorder.isRecording {
        audioRecorder.stop()
    }
}
// 处理录音完结后的事件
extension YourViewController: AVAudioRecorderDelegate {
    func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully flag: Bool) {
        if flag {
            // 录音完结，能够处理录音文件，例如保存途径或处理录音数据
        } else {
            // 录音失败
        }
    }
}

`AVFoundation` 的 `AVCaptureAudioDataOutput`

通过署理来接收音频数据的 buffer，这个类似于相机捕获图片时的署理办法。

init() {
// 创立 AVCaptureSession
    captureSession = AVCaptureSession()
        // 设置音频输入设备（麦克风）
        if let audioDevice = AVCaptureDevice.default(for: .audio) {
            do {
                let audioInput = try AVCaptureDeviceInput(device: audioDevice)
                if captureSession.canAddInput(audioInput) {
                    captureSession.addInput(audioInput)
                }
            } catch {
                print("Error setting up audio input: (error.localizedDescription)")
            }
        }
        // 设置音频数据输出
        let audioOutput = AVCaptureAudioDataOutput()
        audioOutput.setSampleBufferDelegate(self, queue: DispatchQueue.global(qos: .utility))
        if captureSession.canAddOutput(audioOutput) {
            captureSession.addOutput(audioOutput)
        }
}
func startCapturing() {
        if !captureSession.isRunning {
            captureSession.startRunning()
        }
    }
    func stopCapturing() {
        if captureSession.isRunning {
            captureSession.stopRunning()
        }
    }
    // AVCaptureAudioDataOutputSampleBufferDelegate 办法，获取音频数据的buffer
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        // 处理音频数据的sampleBuffer
        // 在这里能够获取并处理音频数据的buffer
    }

`AVFoundation` 的 `AVAudioEngine`

简练的办法直接 block 回调音频数据的 buffer。

let audioEngine = AVAudioEngine()
audioEngine.inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: audioEngine.inputNode.outputFormat(forBus: 0)) { [weak self] (buffer, time) in
    // 处理音频数据
 }
//敞开音频录入引擎
  try audioEngine.start()
func stopRecording() {
     audioEngine.stop()
     audioEngine.inputNode.removeTap(onBus: 0)
  }

总结：音频处理是一个非常复杂的技能，咱们最终挑选了代码简练的 AVAudioEngine 来完结录音。虽然文章说及时性比AVCaptureAudioDataOutput 慢，由于咱们没有对音频做许多处理，体验上差异不明显；

三、AVAudioSession 的设置

激活 AVAudioSession, 设置 Category， Option Category、Option枚举含义介绍

AVAudioSessionCategoryPlayAndRecord：支撑音频录制和播映，声音在没有外设的情况下，默许为听筒播映，不受手机静音影响。

AVAudioSessionCategoryOptionDefaultToSpeaker：允许改变音频session默许挑选内置扬声器（免提）；仅支撑 AVAudioSessionCategoryPlayAndRecord

// AVAudioSession 是全局的，简单被其他的代码覆盖掉
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.playAndRecord, options: .defaultToSpeaker)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)

留意：
设置错误 Category、Option 会 crash，且AVAudioSession是与硬件关联的，不管是你的APP其他地方从头给设置了，仍是其他的 APP设置了 AVAudioSession 的Category， Option 都会影响

四、语音转文字

系统自带的 Speech Kit 完结语音辨认能辨认语音的内容，SFSpeechRecognizer 需要在有网下才能辨认

简略认识语音辨认相关的类

SFSpeechRecognizer（语音辨认器）：负责处理语音辨认使命，它将录制音频，并将其发送到苹果的服务器进行处理。它提供了语音辨认恳求创立，支撑的语音辨认语言功用。

SFSpeechAudioBufferRecognitionRequest（语音辨认恳求）：继承自 SFSpeechRecognitionRequest 它代表了一个语音辨认恳求。它定义了辨认使命的参数。

SFSpeechRecognitionTask（语音辨认使命）：它来操控和管理辨认使命的执行，操控使命的暂停、撤销和完毕。

SFSpeechRecognitionResult（语音辨认成果）： 它包含了辨认出的文本、置信度分数等信息。

语音转文字的完结过程

恳求权限

info.plist 中装备权限恳求说明，代码中恳求权限
Privacy - Speech Recognition Usage Description

SFSpeechRecognizer.requestAuthorization { [weak self] authStatus in
   switch authStatus {
   case .authorized:
     // 用户已授权，能够开端语音辨认
   case .denied, .restricted:
    // 用户未授权或不可用，处理错误情况
    case .notDetermined:
      // 未授权
  }

导入头文件，创立实例

import Speech
let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
let request = SFSpeechAudioBufferRecognitionRequest()

音频导入到语音辨认恳求

AVAudioEngine 的录入的音频 buffer 加入到 SFSpeechAudioBufferRecognitionRequest 实例中。

audioEngine.inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: audioEngine.inputNode.outputFormat(forBus: 0)) { [weak self] (buffer, time) in
     self?.request.append(buffer)
 }

敞开语音辨认使命

SFSpeechRecognizer的实例敞开语音辨认，并回来一个 SFSpeechRecognitionTask， resultHandler 参数中回来辨认的成果（SFSpeechRecognitionResult）

recognitionTask = speechRecognizer?.recognitionTask(with: request, resultHandler: { [weak self] result, error in
       //  request 是 SFSpeechRecognitionResult 的实例
  })

取出语音辨认的成果

SFSpeechRecognitionResult内其实或许回来多个成果（用var transcriptions：[SFTranscription] 接收），咱们通常用最可信的那一个 var bestTranscription: SFTranscription。
```
//取出辨认的文字成果 
let transcription = result.bestTranscription.formattedString
```

留意点：

默许情况下有网才能用 SFSpeechAudioBufferRecognitionRequest 完结语音辨认。
一起 SFSpeechAudioBufferRecognitionRequest 的目标属性 requiresOnDeviceRecognition 设置为 true 时，也能够断网下完结语音转文字，需要留意 ⚠️ 的是，这会降低辨认的准确性。能够依据实际使用场景挑选。
录入的音频的语音和 SFSpeechRecognizer 不一致，会导致辨认不出来。

文字转语音 & 语音转文字

一、文字转语音的完结

二、录音的完结

AVFAudio Framework 的 AVAudioRecorder

AVFoundation 的 AVCaptureAudioDataOutput

AVFoundation 的 AVAudioEngine

三、AVAudioSession 的设置

四、语音转文字

简略认识语音辨认相关的类

语音转文字的完结过程

相关文章

《Stable Diffusion 倚天剑术》第 1.1 卷：在国内云使用 Stable Diffusion（AutoDL 版）

iOS 利用 Metal 实现滤镜与动效滤镜

【项目实践】一文梳理如何实现RBAC权限管理系统 — 功能权限

Compose开箱即用的动画API

作者信息

`AVFAudio Framework` 的 `AVAudioRecorder`

`AVFoundation` 的 `AVCaptureAudioDataOutput`

`AVFoundation` 的 `AVAudioEngine`