鸿蒙AI能力之语音识别

人生处万类,知识最为贤。这篇文章主要讲述鸿蒙AI能力之语音识别相关的知识,希望能为你提供帮助。
【本文正在参与优质创作者激励】
文章旨在帮助大家开发录音及语音识别时少踩一点坑。
效果

鸿蒙AI能力之语音识别

文章图片

左侧为简易UI布局及识别成果,右侧为网易云播放的测试音频
开发步骤IDE安装、项目创建等在此略过。App采用SDK版本为API 6,使用JS UI
1.权限申请
AI语音识别不需要任何权限,但此处使用到麦克风录制音频,就需要申请麦克风权限。
在config.json配置文件中添加权限:
"reqPermissions": ["name": "ohos.permission.MICROPHONE"]

在MainAbility中显示申明麦克风权限
@Override public void onStart(Intent intent) super.onStart(intent); requestPermission(); //获取权限 private void requestPermission() String[] permission = "ohos.permission.MICROPHONE", ; List< String> applyPermissions = new ArrayList< > (); for (String element : permission) if (verifySelfPermission(element) != 0) if (canRequestPermission(element)) applyPermissions.add(element); requestPermissionsFromUser(applyPermissions.toArray(new String[0]), 0);

2.创建音频录制的工具类
首先创建音频录制的工具类AudioCaptureUtils
而音频录制需要用到AudioCapturer类,而在创建AudioCapture类时又会用到
AudiostreamInfo类及AudioCapturerInfo类,所以我们分别申明以上3个类的变量
private AudioStreamInfo audioStreamInfo; private AudioCapturer audioCapturer; private AudioCapturerInfo audioCapturerInfo;

在语音识别时对音频的录制是由限制的,限制如下:
鸿蒙AI能力之语音识别

文章图片

所以我们在录制音频时需要注意:
1.采样率16000HZ
2.声道为单声道
3.仅支持普通话
作为工具类,为了使AudioCaptureUtils能多处使用,我们在创建构造函数时,提供声道与频率的参数重载,并在构造函数中初始化AudioStreamInfo类及AudioCapturerInfo
//channelMask 声道 //SampleRate 频率 public AudioCaptureUtils(AudioStreamInfo.ChannelMask channelMask, int SampleRate) this.audioStreamInfo = new AudioStreamInfo.Builder() .encodingFormat(AudioStreamInfo.EncodingFormat.ENCODING_PCM_16BIT) .channelMask(channelMask) .sampleRate(SampleRate) .build(); this.audioCapturerInfo = new AudioCapturerInfo.Builder().audioStreamInfo(audioStreamInfo).build();

在init函数中进行audioCapturer的初始化,在初始化时对音效进行设置,默认为降噪模式
//packageName 包名 public void init(String packageName) this.init(SoundEffect.SOUND_EFFECT_TYPE_NS,packageName ); //soundEffect 音效uuid //packageName 包名 public void init(UUID soundEffect, String packageName) if (audioCapturer == null || audioCapturer.getState() == AudioCapturer.State.STATE_UNINITIALIZED) audioCapturer = new AudioCapturer(this.audioCapturerInfo); audioCapturer.addSoundEffect(soundEffect, packageName);

初始化后提供startstopdestory方法,分别开启音频录制、停止音频录制和销毁,此处都是调用AudioCapturer类中对应函数。
public void stop() this.audioCapturer.stop(); public void destory() this.audioCapturer.stop(); this.audioCapturer.release(); public Boolean start() if (audioCapturer == null) return false; return audioCapturer.start();

提供一个读取音频流的方法及获取AudioCapturer实例的方法
//buffers 需要写入的数据流 //offset 数据流的偏移量 //byteslength 数据流的长度 public int read(byte[] buffers, int offset, int bytesLength) return audioCapturer.read(buffers,offset,bytesLength); //获取AudioCapturer的实例audioCapturer public AudioCapturer get() return this.audioCapturer;

3.创建语音识别的工具类
在上面我们已经创建好一个音频录制的工具类,接下来在创建一个语音识别的工具类 AsrUtils
我们再回顾一下语音识别的约束与限制:
鸿蒙AI能力之语音识别

文章图片

在此补充一个隐藏限制,PCM流的长度只允许640与1280两种长度,也就是我们音频读取流时只能使用640与1280两种长度。
接下来我们定义一些基本常量:
//采样率限定16000HZ private static final int VIDEO_SAMPLE_RATE = 16000; //VAD结束时间 默认2000ms private static final int VAD_END_WAIT_MS = 2000; //VAD起始时间 默认4800ms //这两参数与识别准确率有关,相关信息可百度查看,在此使用系统默认 private static final int VAD_FRONT_WAIT_MS = 4800; //输入时常 20000ms private static final int TIMEOUT_DURATION = 20000; //PCM流长度仅限640或1280 private static final int BYTES_LENGTH = 1280; //线程池相关参数 private static final int CAPACITY = 6; private static final int ALIVE_TIME = 3; private static final int POOL_SIZE = 3;

因为要在后台持续录制音频,所以需要开辟一个新的线程。此处用到java的ThreadPoolExecutor类进行线程操作。
定义一个线程池实例以及其它相关属性如下:
//录音线程 private ThreadPoolExecutor poolExecutor; /* 自定义状态信息 **错误:-1 **初始:0 **init:1 **开始输入:2 **结束输入:3 **识别结束:5 **中途出识别结果:9 **最终识别结果:10 */ public int state = 0; //识别结果 public String result; //是否开启语音识别 //当开启时才写入PCM流 boolean isStarted = false; //ASR客户端 private AsrClient asrClient; //ASR监听对象 private AsrListener listener; AsrIntent asrIntent; //音频录制工具类 private AudioCaptureUtils audioCaptureUtils;

在构造函数中初始化相关属性
public AsrUtils(Context context) //实例化一个单声道,采集频率16000HZ的音频录制工具类实例 this.audioCaptureUtils = new AudioCaptureUtils(AudioStreamInfo.ChannelMask.CHANNEL_IN_MONO, VIDEO_SAMPLE_RATE); //初始化降噪音效 this.audioCaptureUtils.init("com.panda_coder.liedetector"); //结果值设为空 this.result = ""; //给录音控件初始化一个新的线程池 poolExecutor = new ThreadPoolExecutor( POOL_SIZE, POOL_SIZE, ALIVE_TIME, TimeUnit.SECONDS, new LinkedBlockingQueue< > (CAPACITY), new ThreadPoolExecutor.DiscardOldestPolicy()); if (asrIntent == null) asrIntent = new AsrIntent(); //设置音频来源为PCM流 //此处也可设置为文件 asrIntent.setAudioSourceType(AsrIntent.AsrAudioSrcType.ASR_SRC_TYPE_PCM); asrIntent.setVadEndWaitMs(VAD_END_WAIT_MS); asrIntent.setVadFrontWaitMs(VAD_FRONT_WAIT_MS); asrIntent.setTimeoutThresholdMs(TIMEOUT_DURATION); if (asrClient == null) //实例化AsrClient asrClient = AsrClient.createAsrClient(context).orElse(null); if (listener == null) //实例化MyAsrListener listener = new MyAsrListener(); //初始化AsrClient this.asrClient.init(asrIntent, listener); //够建一个实现AsrListener接口的类MyAsrListener class MyAsrListener implements AsrListener @Override public void onInit(PacMap pacMap) HiLog.info(TAG, "====== init"); state = 1; @Override public void onBeginningOfSpeech() state = 2; @Override public void onRmsChanged(float v) @Override public void onBufferReceived(byte[] bytes) @Override public void onEndOfSpeech() state = 3; @Override public void onError(int i) state = -1; if (i == AsrError.ERROR_SPEECH_TIMEOUT) //当超时时重新监听 asrClient.startListening(asrIntent); else HiLog.info(TAG, "======error code:" + i); asrClient.stopListening(); //注意与onIntermediateResults获取结果值的区别 //pacMap.getString(AsrResultKey.RESULTS_RECOGNITION); @Override public void onResults(PacMap pacMap) state = 10; //获取最终结果 //"result":["confidence":0,"ori_word":"你 好 ","pinyin":"NI3 HAO3 ","word":"你好。"] String results = pacMap.getString(AsrResultKey.RESULTS_RECOGNITION); ZSONObject zsonObject = ZSONObject.stringToZSON(results); ZSONObject infoObject; if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) infoObject = zsonObject.getZSONArray("result").getZSONObject(0); String resultWord = infoObject.getString("ori_word").replace(" ", ""); result += resultWord; //中途识别结果 //pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE) @Override public void onIntermediateResults(PacMap pacMap) state = 9; //String result = pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE); //if (result == null) //return; //ZSONObject zsonObject = ZSONObject.stringToZSON(result); //ZSONObject infoObject; //if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) //infoObject = zsonObject.getZSONArray("result").getZSONObject(0); //String resultWord = infoObject.getString("ori_word").replace(" ", ""); //HiLog.info(TAG, "=========== 9 " + resultWord); //@Override public void onEnd() state = 5; //当还在录音时,重新监听 if (isStarted) asrClient.startListening(asrIntent); @Override public void onEvent(int i, PacMap pacMap) @Override public void onAudioStart() state = 2; @Override public void onAudioEnd() state = 3;

开启识别与停止识别的函数
public void start() if (!this.isStarted) this.isStarted = true; asrClient.startListening(asrIntent); poolExecutor.submit(new AudioCaptureRunnable()); public void stop() this.isStarted = false; asrClient.stopListening(); audioCaptureUtils.stop(); //音频录制的线程 private class AudioCaptureRunnable implements Runnable @Override public void run() byte[] buffers = new byte[BYTES_LENGTH]; //开启录音 audioCaptureUtils.start(); while (isStarted) //读取录音的PCM流 int ret = audioCaptureUtils.read(buffers, 0, BYTES_LENGTH); if (ret < = 0) HiLog.error(TAG, "======Error read data"); else //将录音的PCM流写入到语音识别服务中 //若buffer的长度不为1280或640时,则需要手动处理成1280或640 asrClient.writePcm(buffers, BYTES_LENGTH);

识别结果是通过listener的回调获取的结果,所以我们在处理时是将结果赋值给result,通过getresult或getResultAndClear函数获取结果。
public String getResult() return result; public String getResultAndClear() if (this.result == "") return ""; String results = getResult(); this.result = ""; return results;

4.创建一个简易的JS UI,并通过JS调ServerAbility的能力调用Java
hml代码
< div class="container"> < div> < button class="btn" @touchend="start"> 开启< /button> < button class="btn" @touchend="sub"> 订阅结果< /button> < button class="btn" @touchend="stop"> 关闭< /button> < /div> < text class="title"> 语音识别内容:text < /text> < /div>

样式代码
.container flex-direction: column; justify-content: flex-start; align-items: center; width: 100%; height: 100%; padding: 10%; .title font-size: 20px; color: #000000; opacity: 0.9; text-align: left; width: 100%; margin: 3% 0; .btn padding: 10px 20px; margin:3px; border-radius: 6px;

js逻辑控制代码
//js调Java ServiceAbility的工具类 importjsCallJavaAbilityfrom ../../common/JsCallJavaAbilityUtils.js; export default data: text: "" , //开启事件 start() jsCallJavaAbility.callAbility("ControllerAbility",100,).then(result=> console.log(result) ) , //关闭事件 stop() jsCallJavaAbility.callAbility("ControllerAbility",101,).then(result=> console.log(result) ) jsCallJavaAbility.unSubAbility("ControllerAbility",201).then(result=> if (result.code == 200) console.log("取消订阅成功"); ) , //订阅Java端结果事件 sub() jsCallJavaAbility.subAbility("ControllerAbility", 200, (data) => let text = data.data.text text & & (this.text += text) ).then(result => if (result.code == 200) console.log("订阅成功"); )

ServerAbility
public class ControllerAbility extends Ability AnswerRemote remote = new AnswerRemote(); AsrUtils asrUtils; //订阅事件的委托 private static HashMap< Integer, IRemoteObject> remoteObjectHandlers = new HashMap< Integer, IRemoteObject> (); @Override public void onStart(Intent intent) HiLog.error(LABEL_LOG, "ControllerAbility::onStart"); super.onStart(intent); //初始化语音识别工具类 asrUtils = new AsrUtils(this); @Override public void onCommand(Intent intent, boolean restart, int startId) @Override public IRemoteObject onConnect(Intent intent) super.onConnect(intent); return remote.asObject(); class AnswerRemote extends RemoteObject implements IRemoteBroker AnswerRemote() super(""); @Override public boolean onRemoteRequest(int code, MessageParcel data, MessageParcel reply, MessageOption option) Map< String, Object> zsonResult = new HashMap< String, Object> (); String zsonStr = data.readString(); ZSONObject zson = ZSONObject.stringToZSON(zsonStr); switch (code) case 100: //当js发送code为100时,开启语音识别 asrUtils.start(); break; case 101: //当js发送code为101时,关闭语音识别 asrUtils.stop(); break; case 200: //当js发送code为200时,订阅获取识别结果事件 remoteObjectHandlers.put(200 ,data.readRemoteObject()); //定时获取语音识别结果并返回JS UI getAsrText(); break; default: reply.writeString("service not defined"); return false; reply.writeString(ZSONObject.toZSONString(zsonResult)); return true; @Override public IRemoteObject asObject() return this; public void getAsrText() new Thread(() -> while (true) try Thread.sleep(1 * 500); Map< String, Object> zsonResult = new HashMap< String, Object> (); zsonResult.put("text",asrUtils.getResultAndClear()); ReportEvent(200, zsonResult); catch (RemoteException | InterruptedException e) break; ).start(); private void ReportEvent(int remoteHandler, Object backData) throws RemoteException MessageParcel data = https://www.songbingjia.com/android/MessageParcel.obtain(); MessageParcel reply = MessageParcel.obtain(); MessageOption option = new MessageOption(); data.writeString(ZSONObject.toZSONString(backData)); IRemoteObject remoteObject = remoteObjectHandlers.get(remoteHandler); remoteObject.sendRequest(100, data, reply, option); reply.reclaim(); data.reclaim();

至此简易的语音识别功能完毕。
相关演示:https://www.bilibili.com/video/BV1E44y177hv/
完整代码开源:https://gitee.com/panda-coder/harmonyos-apps/tree/master/AsrDemo
想了解更多关于鸿蒙的内容,请访问:
51CTO和华为官方合作共建的鸿蒙技术社区
https://harmonyos.51cto.com/#bkwz
::: hljs-center
鸿蒙AI能力之语音识别

文章图片

【鸿蒙AI能力之语音识别】:::

    推荐阅读