ChatGPT
如此火爆,但它的强悍在于NLU
(自然语言了解)、DM
(对话办理)和NLG
(自然语言生成)这三块,而Recognition
辨认和TTS
播报这两块是缺失的。假使你的 App 接入了 ChatGPT,但假如需求播报出来的话,TextToSpeech
机制就能够派上用场了。
1 前言
关于语音方面的交互,Android SDK 供给了用于语音交互的 VoiceInteraction
机制、语音辨认的 Recognition
接口、语音播报的 TTS 接口。
前者现已介绍过,本次首要聊聊第 3 块即 TTS,后续会剖析下第 2 块即 Android 规范的 Recognition 机制。
经过 TextToSpeech
机制,任意 App 都能够方便地选用体系内置或第三方供给的 TTS Engine 进行播映铃声提示、语音提示的恳求,Engine 能够由体系挑选默许的 provider 来履行操作,也可由 App 详细指定偏好的方针 Engine 来完成。
默许 TTS Engine 能够在设备设置的途径中找到,亦可由用户手动更改:Settings -> Accessibility -> Text-to-speech ouput -> preferred engine
TextToSpeech 机制的长处有许多:
- 关于需求运用 TTS 的恳求 App 而言:无需关怀 TTS 的详细完成,经过
TextToSpeech
API 即用即有 - 关于需求对外供给 TTS 能力的完成 Engine 而言,无需保护杂乱的 TTS 时序和逻辑,按照
TextToSpeechService
框架的定义对接即可,无需关怀体系怎么将完成和恳求进行衔接
本文将会论述 TextToSpeech 机制的调用、Engine 的完成以及体系调度这三块,完全整理清楚整个流程。
2 TextToSpeech 调用
TextToSpeech API 是为 TTS 调用预备,总体比较简单。
最首要的是供给初始化 TTS 接口的 TextToSpeech()
结构函数和初始化后的回调 OnInitListener
,后续的播映 TTS 的 speak()
和播映铃声的 playEarcon()
。
比较重要的是处理播映恳求的 4 种回调成果,需求依据不同成果进行 TTS 播报开始的状况记载、播报结束后的下一步动作、抑或是在播报犯错时对音频焦点的办理等等。
之前的 OnUtteranceCompletedListener
在 API level 18 时被抛弃,能够运用回调更为精细的 UtteranceProgressListener
。
// TTSTest.kt
class TTSTest(context: Context) {
private val tts: TextToSpeech = TextToSpeech(context) { initResult -> ... }
init {
tts.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
override fun onStart(utteranceId: String?) { ... }
override fun onDone(utteranceId: String?) { ... }
override fun onStop(utteranceId: String?, interrupted: Boolean) { ... }
override fun onError(utteranceId: String?) { ... }
})
}
fun testTextToSpeech(context: Context) {
tts.speak(
"你好,轿车",
TextToSpeech.QUEUE_ADD,
Bundle(),
"xxdtgfsf"
)
tts.playEarcon(
EARCON_DONE,
TextToSpeech.QUEUE_ADD,
Bundle(),
"yydtgfsf"
)
}
companion object {
const val EARCON_DONE = "earCon_done"
}
}
3 TextToSpeech 体系调度
3.1 init 绑定
首先从 TextToSpeech()
的完成下手,以了解在 TTS 播报之前,体系和 TTS Engine 之间做了什么预备作业。
-
其触发的
initTTS()
将按照如下顺序查找需求衔接到哪个 Engine:- 假如结构 TTS 接口的实例时指定了方针 Engine 的 package,那么首选衔接到该 Engine
- 反之,获取设备设置的 default Engine 并衔接,设置来自于
TtsEngines
从体系设置数据SettingsProvider
中 读取TTS_DEFAULT_SYNTH
而来 - 假如 default 不存在或许没有安装的话,从 TtsEngines 获取第一位的体系 Engine 并衔接。第一位指的是从所有 TTS Service 完成 Engine 列表里取得第一个属于 system image 的 Engine
-
衔接的话均是调用
connectToEngine()
,其将依据调用来源来选用不同的Connection
内部完成去connect()
:-
假如调用不是来自 system,选用
DirectConnection
- 其 connect() 完成较为简单,封装 Action 为
INTENT_ACTION_TTS_SERVICE
的 Intent 进行bindService()
,后续由AMS
履行和 Engine 的绑定,这儿不再打开
- 其 connect() 完成较为简单,封装 Action 为
-
反之,选用
SystemConnection
,原因在于体系的 TTS 恳求或许许多,不能像其他 App 一样总是创立一个新的衔接,而是需求 cache 并复用这种衔接-
详细是直接获取名为
texttospeech
、办理 TTS Service 的体系服务TextToSpeechManagerService
的接口署理并直接调用它的createSession()
创立一个 session,一起暂存其指向的ITextToSpeechSession
署理接口。该 session 实际上仍是
AIDL
机制,TTS 体系服务的内部会创立专用的TextToSpeechSessionConnection
去 bind 和 cache Engine,这儿不再赘述
-
-
无论是哪种方式,在 connected 之后都需求将详细的 TTS Eninge 的
ITextToSpeechService
接口实例暂存,一起将 Connection 实例暂存到 mServiceConnection,给外部类接纳到 speak() 的时分运用。而且要留意,此时还会启动一个异步任务SetupConnectionAsyncTask
将自己作为 Binder 接口ITextToSpeechCallback
回来给 Engine 以处理完之后回调成果给 Request
-
-
connect 履行结束并成果 OK 的话,还要暂存到
mConnectingServiceConnection
,以在结束 TTS 需求的时分释放衔接运用。并经过dispatchOnInit()
传递SUCCESS
给 Request App- 完成很简单,将成果 Enum 回调给初始化传入的
OnInitListener
接口
- 完成很简单,将成果 Enum 回调给初始化传入的
-
假如衔接失利的话,则调用
dispatchOnInit()
传递ERROR
// TextToSpeech.java
public class TextToSpeech {
public TextToSpeech(Context context, OnInitListener listener) {
this(context, listener, null);
}
private TextToSpeech( ... ) {
...
initTts();
}
private int initTts() {
// Step 1: Try connecting to the engine that was requested.
if (mRequestedEngine != null) {
if (mEnginesHelper.isEngineInstalled(mRequestedEngine)) {
if (connectToEngine(mRequestedEngine)) {
mCurrentEngine = mRequestedEngine;
return SUCCESS;
}
...
} else if (!mUseFallback) {
...
dispatchOnInit(ERROR);
return ERROR;
}
}
// Step 2: Try connecting to the user's default engine.
final String defaultEngine = getDefaultEngine();
...
// Step 3: Try connecting to the highest ranked engine in the system.
final String highestRanked = mEnginesHelper.getHighestRankedEngineName();
...
dispatchOnInit(ERROR);
return ERROR;
}
private boolean connectToEngine(String engine) {
Connection connection;
if (mIsSystem) {
connection = new SystemConnection();
} else {
connection = new DirectConnection();
}
boolean bound = connection.connect(engine);
if (!bound) {
return false;
} else {
mConnectingServiceConnection = connection;
return true;
}
}
}
Connection 内部类和其两个子类的完成:
// TextToSpeech.java
public class TextToSpeech {
...
private abstract class Connection implements ServiceConnection {
private ITextToSpeechService mService;
...
private final ITextToSpeechCallback.Stub mCallback =
new ITextToSpeechCallback.Stub() {
public void onStop(String utteranceId, boolean isStarted)
throws RemoteException {
UtteranceProgressListener listener = mUtteranceProgressListener;
if (listener != null) {
listener.onStop(utteranceId, isStarted);
}
};
@Override
public void onSuccess(String utteranceId) { ... }
@Override
public void onError(String utteranceId, int errorCode) { ... }
@Override
public void onStart(String utteranceId) { ... }
...
};
@Override
public void onServiceConnected(ComponentName componentName, IBinder service) {
synchronized(mStartLock) {
mConnectingServiceConnection = null;
mService = ITextToSpeechService.Stub.asInterface(service);
mServiceConnection = Connection.this;
mEstablished = false;
mOnSetupConnectionAsyncTask = new SetupConnectionAsyncTask();
mOnSetupConnectionAsyncTask.execute();
}
}
...
}
private class DirectConnection extends Connection {
@Override
boolean connect(String engine) {
Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
intent.setPackage(engine);
return mContext.bindService(intent, this, Context.BIND_AUTO_CREATE);
}
...
}
private class SystemConnection extends Connection {
...
boolean connect(String engine) {
IBinder binder = ServiceManager.getService(Context.TEXT_TO_SPEECH_MANAGER_SERVICE);
...
try {
manager.createSession(engine, new ITextToSpeechSessionCallback.Stub() {
...
});
return true;
} ...
}
...
}
}
3.2 speak 播报
后边看看重要的 speak(),体系做了什么详细完成。
-
首先将 speak() 对应的调用长途接口的操作封装为 Action 接口实例,并交给 init() 时暂存的已衔接的 Connection 实例去调度。
// TextToSpeech.java public class TextToSpeech { ... private Connection mServiceConnection; public int speak(final CharSequence text, ... ) { return runAction((ITextToSpeechService service) -> { ... }, ERROR, "speak"); } private <R> R runAction(Action<R> action, R errorResult, String method) { return runAction(action, errorResult, method, true, true); } private <R> R runAction( ... ) { synchronized (mStartLock) { ... return mServiceConnection.runAction(action, errorResult, method, reconnect, onlyEstablishedConnection); } } private abstract class Connection implements ServiceConnection { public <R> R runAction( ... ) { synchronized (mStartLock) { try { ... return action.run(mService); } ... } } } }
-
Action
的实际内容是先从mUtterances
Map 里查找方针文本是否有设置过本地的 audio 资源:- 如有设置的话,调用用 TTS Engine 的
playAudio()
直接播映 - 反之调用 text 转 audio 的接口
speak()
// TextToSpeech.java public class TextToSpeech { ... public int speak(final CharSequence text, ... ) { return runAction((ITextToSpeechService service) -> { Uri utteranceUri = mUtterances.get(text); if (utteranceUri != null) { return service.playAudio(getCallerIdentity(), utteranceUri, queueMode, getParams(params), utteranceId); } else { return service.speak(getCallerIdentity(), text, queueMode, getParams(params), utteranceId); } }, ERROR, "speak"); } ... }
后边便是 TextToSpeechService 的完成环节。
- 如有设置的话,调用用 TTS Engine 的
4 TextToSpeechService 完成
-
TextToSpeechService 内接纳的完成是向内部的
SynthHandler
发送封装的 speak 或 playAudio 恳求的SpeechItem
。SynthHandler 绑定到 TextToSpeechService 初始化的时分启动的、名为 “SynthThread” 的 HandlerThread。
- speak 恳求封装给 Handler 的是
SynthesisSpeechItem
- playAudio 恳求封装的是
AudioSpeechItem
// TextToSpeechService.java public abstract class TextToSpeechService extends Service { private final ITextToSpeechService.Stub mBinder = new ITextToSpeechService.Stub() { @Override public int speak( IBinder caller, CharSequence text, int queueMode, Bundle params, String utteranceId) { SpeechItem item = new SynthesisSpeechItem( caller, Binder.getCallingUid(), Binder.getCallingPid(), params, utteranceId, text); return mSynthHandler.enqueueSpeechItem(queueMode, item); } @Override public int playAudio( ... ) { SpeechItem item = new AudioSpeechItem( ... ); ... } ... }; ... }
- speak 恳求封装给 Handler 的是
-
SynthHandler 拿到 SpeechItem 后依据 queueMode 的值决定是 stop() 仍是继续播映。播映的话,是封装进一步 play 的操作 Message 给 Handler。
// TextToSpeechService.java private class SynthHandler extends Handler { ... public int enqueueSpeechItem(int queueMode, final SpeechItem speechItem) { UtteranceProgressDispatcher utterenceProgress = null; if (speechItem instanceof UtteranceProgressDispatcher) { utterenceProgress = (UtteranceProgressDispatcher) speechItem; } if (!speechItem.isValid()) { if (utterenceProgress != null) { utterenceProgress.dispatchOnError( TextToSpeech.ERROR_INVALID_REQUEST); } return TextToSpeech.ERROR; } if (queueMode == TextToSpeech.QUEUE_FLUSH) { stopForApp(speechItem.getCallerIdentity()); } else if (queueMode == TextToSpeech.QUEUE_DESTROY) { stopAll(); } Runnable runnable = new Runnable() { @Override public void run() { if (setCurrentSpeechItem(speechItem)) { speechItem.play(); removeCurrentSpeechItem(); } else { speechItem.stop(); } } }; Message msg = Message.obtain(this, runnable); msg.obj = speechItem.getCallerIdentity(); if (sendMessage(msg)) { return TextToSpeech.SUCCESS; } else { if (utterenceProgress != null) { utterenceProgress.dispatchOnError(TextToSpeech.ERROR_SERVICE); } return TextToSpeech.ERROR; } } ... }
-
play() 详细是调用 playImpl() 继续。关于 SynthesisSpeechItem 来说,将初始化时创立的
SynthesisRequest
实例和SynthesisCallback
实例(此处的完成是PlaybackSynthesisCallback
)搜集和调用onSynthesizeText()
进一步处理,用于恳求和回调成果。// TextToSpeechService.java private abstract class SpeechItem { ... public void play() { synchronized (this) { if (mStarted) { throw new IllegalStateException("play() called twice"); } mStarted = true; } playImpl(); } } class SynthesisSpeechItem extends UtteranceSpeechItemWithParams { public SynthesisSpeechItem( ... String utteranceId, CharSequence text) { mSynthesisRequest = new SynthesisRequest(mText, mParams); ... } ... @Override protected void playImpl() { AbstractSynthesisCallback synthesisCallback; mEventLogger.onRequestProcessingStart(); synchronized (this) { ... mSynthesisCallback = createSynthesisCallback(); synthesisCallback = mSynthesisCallback; } TextToSpeechService.this.onSynthesizeText(mSynthesisRequest, synthesisCallback); if (synthesisCallback.hasStarted() && !synthesisCallback.hasFinished()) { synthesisCallback.done(); } } ... }
-
onSynthesizeText() 是 abstract 方法,需求 Engine 复写以将 text 组成 audio 数据,也是 TTS 功能里最中心的完成。
- Engine 需求从
SynthesisRequest
中提取 speak 的方针文本、参数等信息,针对不同信息进行差异处理。并经过SynthesisCallback
的各接口将数据和时机带回: - 在数据组成前,经过
start()
奉告体系生成音频的采样频率,多少位pcm
格局音频,几通道等等。PlaybackSynthesisCallback
的完成将会创立播映的SynthesisPlaybackQueueItem
交由AudioPlaybackHandler
去排队调度 - 之后,经过
audioAvailable()
接口将组成的数据以 byte[] 方式传递回来,会取出 start() 时创立的 QueueItem put 该 audio 数据开始播映 - 最终,经过
done()
奉告组成结束
// PlaybackSynthesisCallback.java class PlaybackSynthesisCallback extends AbstractSynthesisCallback { ... @Override public int start(int sampleRateInHz, int audioFormat, int channelCount) { mDispatcher.dispatchOnBeginSynthesis(sampleRateInHz, audioFormat, channelCount); int channelConfig = BlockingAudioTrack.getChannelConfig(channelCount); synchronized (mStateLock) { ... SynthesisPlaybackQueueItem item = new SynthesisPlaybackQueueItem( mAudioParams, sampleRateInHz, audioFormat, channelCount, mDispatcher, mCallerIdentity, mLogger); mAudioTrackHandler.enqueue(item); mItem = item; } return TextToSpeech.SUCCESS; } @Override public int audioAvailable(byte[] buffer, int offset, int length) { SynthesisPlaybackQueueItem item = null; synchronized (mStateLock) { ... item = mItem; } final byte[] bufferCopy = new byte[length]; System.arraycopy(buffer, offset, bufferCopy, 0, length); mDispatcher.dispatchOnAudioAvailable(bufferCopy); try { item.put(bufferCopy); } ... return TextToSpeech.SUCCESS; } @Override public int done() { int statusCode = 0; SynthesisPlaybackQueueItem item = null; synchronized (mStateLock) { ... mDone = true; if (mItem == null) { if (mStatusCode == TextToSpeech.SUCCESS) { mDispatcher.dispatchOnSuccess(); } else { mDispatcher.dispatchOnError(mStatusCode); } return TextToSpeech.ERROR; } item = mItem; statusCode = mStatusCode; } if (statusCode == TextToSpeech.SUCCESS) { item.done(); } else { item.stop(statusCode); } return TextToSpeech.SUCCESS; } ... }
上述的 QueueItem 的放置 audio 数据和消费的逻辑如下,首要是 put 操作触发 Lock 接口的 take Condition 恢复履行,最终调用 AudioTrack 去播映。
// SynthesisPlaybackQueueItem.java final class SynthesisPlaybackQueueItem ... { void put(byte[] buffer) throws InterruptedException { try { mListLock.lock(); long unconsumedAudioMs = 0; ... mDataBufferList.add(new ListEntry(buffer)); mUnconsumedBytes += buffer.length; mReadReady.signal(); } finally { mListLock.unlock(); } } private byte[] take() throws InterruptedException { try { mListLock.lock(); while (mDataBufferList.size() == 0 && !mStopped && !mDone) { mReadReady.await(); } ... ListEntry entry = mDataBufferList.poll(); mUnconsumedBytes -= entry.mBytes.length; mNotFull.signal(); return entry.mBytes; } finally { mListLock.unlock(); } } public void run() { ... final UtteranceProgressDispatcher dispatcher = getDispatcher(); dispatcher.dispatchOnStart(); if (!mAudioTrack.init()) { dispatcher.dispatchOnError(TextToSpeech.ERROR_OUTPUT); return; } try { byte[] buffer = null; while ((buffer = take()) != null) { mAudioTrack.write(buffer); } } ... mAudioTrack.waitAndRelease(); dispatchEndStatus(); } void done() { try { mListLock.lock(); mDone = true; mReadReady.signal(); mNotFull.signal(); } finally { mListLock.unlock(); } } }
- Engine 需求从
-
上述 PlaybackSynthesisCallback 在告诉 QueueItem 的一起,会经过 UtteranceProgressDispatcher 接口将数据、成果一并发送给 Request App。
// TextToSpeechService.java interface UtteranceProgressDispatcher { void dispatchOnStop(); void dispatchOnSuccess(); void dispatchOnStart(); void dispatchOnError(int errorCode); void dispatchOnBeginSynthesis(int sampleRateInHz, int audioFormat, int channelCount); void dispatchOnAudioAvailable(byte[] audio); public void dispatchOnRangeStart(int start, int end, int frame); }
事实上该接口的完成便是 TextToSpeechService 处理 speak 恳求的 UtteranceSpeechItem 实例,其经过缓存着各
ITextToSpeechCallback
接口实例的 CallbackMap 发送回调给 TTS 恳求的 App。(这些 Callback 来自于 TextToSpeech 初始化时分经过 ITextToSpeechService 将 Binder 接口传递来和缓存起来的。)private abstract class UtteranceSpeechItem extends SpeechItem implements UtteranceProgressDispatcher { ... @Override public void dispatchOnStart() { final String utteranceId = getUtteranceId(); if (utteranceId != null) { mCallbacks.dispatchOnStart(getCallerIdentity(), utteranceId); } } @Override public void dispatchOnAudioAvailable(byte[] audio) { final String utteranceId = getUtteranceId(); if (utteranceId != null) { mCallbacks.dispatchOnAudioAvailable(getCallerIdentity(), utteranceId, audio); } } @Override public void dispatchOnSuccess() { final String utteranceId = getUtteranceId(); if (utteranceId != null) { mCallbacks.dispatchOnSuccess(getCallerIdentity(), utteranceId); } } @Override public void dispatchOnStop() { ... } @Override public void dispatchOnError(int errorCode) { ... } @Override public void dispatchOnBeginSynthesis(int sampleRateInHz, int audioFormat, int channelCount) { ... } @Override public void dispatchOnRangeStart(int start, int end, int frame) { ... } } private class CallbackMap extends RemoteCallbackList<ITextToSpeechCallback> { ... public void dispatchOnStart(Object callerIdentity, String utteranceId) { ITextToSpeechCallback cb = getCallbackFor(callerIdentity); if (cb == null) return; try { cb.onStart(utteranceId); } ... } public void dispatchOnAudioAvailable(Object callerIdentity, String utteranceId, byte[] buffer) { ITextToSpeechCallback cb = getCallbackFor(callerIdentity); if (cb == null) return; try { cb.onAudioAvailable(utteranceId, buffer); } ... } public void dispatchOnSuccess(Object callerIdentity, String utteranceId) { ITextToSpeechCallback cb = getCallbackFor(callerIdentity); if (cb == null) return; try { cb.onSuccess(utteranceId); } ... } ... }
-
ITextToSpeechCallback 的履即将经过 TextToSpeech 的中转抵达恳求 App 的 Callback,以履行“TextToSpeech 调用”章节说到的进一步操作
// TextToSpeech.java public class TextToSpeech { ... private abstract class Connection implements ServiceConnection { ... private final ITextToSpeechCallback.Stub mCallback = new ITextToSpeechCallback.Stub() { @Override public void onStart(String utteranceId) { UtteranceProgressListener listener = mUtteranceProgressListener; if (listener != null) { listener.onStart(utteranceId); } } ... }; } } // TTSTest.kt class TTSTest(context: Context) { init { tts.setOnUtteranceProgressListener(object : UtteranceProgressListener() { override fun onStart(utteranceId: String?) { ... } override fun onDone(utteranceId: String?) { ... } override fun onStop(utteranceId: String?, interrupted: Boolean) { ... } override fun onError(utteranceId: String?) { ... } }) } .... }
5 运用和完成上的注意
关于 TTS 恳求方有几点运用上的主张:
- TTS 播报前记得申请对应 type 的音频焦点
- TTS Request App 的 Activity 或 Service 生命周期毁掉的时分,比方 onDestroy() 等时分,需求调用 TextToSpeech 的
shutdown()
释放衔接、资源 - 能够经过 addSpeech() 指定固定文本的对应 audio 资源(比方说语音里常用的几套唤醒后的欢迎词 audio),在后续的文本恳求时直接播映该 audio,免去文本转语音的过程、提高功率
关于 TTS Engine 供给方也有几点完成上的主张:
-
TTS Engine 的各完成要和 TTS 的
SynthesisCallback
做好对接,要留意只能在该 callback 现已履行了 start() 并未结束的条件下调用 done()。否则 TTS 会发生如下两种过错:- Duplicate call to done()
- done() was called before start() call
-
TTS Engine 中心作用是将 text 文本组成 speech 音频数据,组成到数据之后 Engine 当然能够挑选直接播报,甚至不回传音频数据。但主张将音频数据回传,交由体系 AudioTrack 播报。一来交由体体系一播报;二来 Request App 亦能够拿到音频数据进行 cache 和剖析
6 结语
能够看到 Request App 不关怀完成、只需经过 TextToSpeech 几个 API 便可完成 TTS 的播报操作。而且 TTS 的完成也只需求按照 TextToSpeechService 约好的框架、回调完成即可,和 App 的对接作业由体系完成。
咱们再花点时间整理下整个过程:
流程:
- TTS Request App 调用
TextToSpeech
结构函数,由体系预备播报作业前的预备,比方经过Connection
绑定和初始化方针的 TTS Engine - Request App 供给方针 text 并调用
speak()
恳求 - TextToSpeech 会检查方针 text 是否设置过本地的 audio 资源,没有的话回经过 Connection 调用
ITextToSpeechService
AIDL 的 speak() 继续 -
TextToSpeechService
收到后封装恳求SynthesisRequest
和用于回调成果的SynthesisCallback
实例 - 之后将两者作为参数调用中心完成
onSynthesizeText()
,其将解析 Request 并进行 Speech 音频数据组成 - 尔后经过 SynthesisCallback 将组成前后的要害回调奉告体系,尤其是
AudioTrack
播映 - 一起需求将 speak 恳求的成果奉告 Request App,即经过
UtteranceProgressDispatcher
中转,实际上是调用ITextToSpeechCallback
AIDL - 最终经过
UtteranceProgressListener
奉告 TextToSpeech 初始化时设置的各回调
7 语音文章推荐
- 怎么打造车载语音交互:Google Voice Interaction 给你答案
8 参考 / 源码
- OnUtteranceCompletedListener 抛弃
- TextToSpeech.java
- TextToSpeechManagerService.java
- TextToSpeechService.java