MRCP初体验
背景
接手了一个MRCP业务,这里来评估下其实现逻辑。
协议图解
下面内容来自于[1]:
mrcp协议,目前业界使用的是v2版本,即基于sip信令的mrcp协议。
mrcp协议和sip、rtp、密不可分的协议。 以下是mrcpV2协议,大家有空可以详细研读协议本身。https://www.rfc-editor.org/rfc/rfc6787
上图是mrcp协议的处理流程图。
SIP协商
mrcp-client-和server之间,首先先进行的是 sip协商。 实际抓包流程如下图所示:
sip协议的invite会携带 resource:speechrecog 代表client需求是asr识别。 server返回200会携带mrcp-v2的tcp端口和new channel 信息给到client。
MRCP ASR和TTS协议
当client和server处理完sip-200-ack之后,就到了mrcp 处理asr、tts协议了。收到200后返回ack,client既可以进行 MRCP-RECONIZE
client根据sip--200协商到的mrcp 协议端口,向server发送 MRCP-RECONIZE。
mrcp-channle流程
mrcp交互过程,可以选择自定义DEFINE-GRAMMAR, 也可以选择不需要,直接进行MRCP RECOGNIZ。当然 DEFINE-GRAMMAR 必须要在 DEFINE-RECOGNIZ 之前进行。
MRCP DEFINE-GRAMMAR
先进行DEFINE-GRAMMAR 初始化asr参数等,发送DEFINE-GRAMMAR 至server
收到MRCP-COMPLETE之后,可以向server发起asr识别MRCP RECOGNIZ。
MRCP RECOGNIZ
client 向server发送
Sending request: MRCP/2.0 -1 RECOGNIZE 3
Channel-Identifier:923f34202d1711ee@speechrecog
Content-Type:text/uri-list
Content-Length:27
builtin:speech/transcribe
server收到回复:
MRCP/2.0 83 3 200 IN-PROGRESS
Channel-Identifier:923f34202d1711ee@speechrecog
接收和发送rtp-媒体
MRCP RECOGNITION COMPLETE
此次asr结果返回,server发送给client,event同时携带了asr结果等。
MRCP/2.0 562 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier:923f34202d1711ee@speechrecog
Completion-Cause:000 success
Content-Type:application/x-nlsml
Content-Length:377
<?xml version="1.0" encoding="UTF-8" ?>
<result>
<interpretation confidence="0.97">
<instance>卓卓卓卓卓卓卓卓卓</instance>
<input mode="speech">卓卓卓卓卓卓卓卓卓</input>
<detail>
<sessionId>bj1690528986127707_7lT7O6Ogaqu6F</sessionId>
<cost>251928</cost>
</detail>
</interpretation>
</result>
初体验
客户端
我已经有一个现成的mrcp server了,所以先来部署测试mrcp client的逻辑。
注:[2]中仅支持UDP的SIP,所以mrcp server中要先配置成udp的sip。
下载下来后,编辑 MrcpJavaClient.java
,修改下列字段即可:
- audioPathLeft
- audioPathRight
- LocalHostIp
- MrcpServerIp
- MrcpServerPort
效果:
[INFO ] 2024-04-08 15:16:04,819 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 94 START-OF-INPUT 2 IN-PROGRESS
Channel-Identifier:db1321ecf57711ee@speechrecog
[INFO ] 2024-04-08 15:16:06,133 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 380 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier:db1321ecf57711ee@speechrecog
Completion-Cause:000 success
Content-Type:application/x-nlsml
Content-Length:195
<?xml version="1.0"?>
<result>
<interpretation confidence="0.99">
<instance>db1321ecf57711ee</instance>
<input mode="speech">又又又又又又。</input>
</interpretation>
</result>
��������������
[INFO ] 2024-04-08 15:16:12,673 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 324 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier:db1333daf57711ee@speechrecog
Completion-Cause:002 no-input-timeout
Content-Length:165
<?xml version="1.0"?>
<result>
<interpretation>
<instance>db1333daf57711ee</instance>
<input>
<noinput/>
</input>
</interpretation>
</result>
[INFO ] 2024-04-08 15:16:16,234 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 324 RECOGNITION-COMPLETE 4 COMPLETE
Channel-Identifier:db1321ecf57711ee@speechrecog
Completion-Cause:002 no-input-timeout
Content-Length:165
<?xml version="1.0"?>
<result>
<interpretation>
<instance>db1321ecf57711ee</instance>
<input>
<noinput/>
</input>
</interpretation>
</result>
[INFO ] 2024-04-08 15:16:22,753 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 324 RECOGNITION-COMPLETE 4 COMPLETE
Channel-Identifier:db1333daf57711ee@speechrecog
Completion-Cause:002 no-input-timeout
Content-Length:165
<?xml version="1.0"?>
<result>
<interpretation>
<instance>db1333daf57711ee</instance>
<input>
<noinput/>
</input>
</interpretation>
</result>
[INFO ] 2024-04-08 15:16:24,223 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 94 START-OF-INPUT 6 IN-PROGRESS
Channel-Identifier:db1333daf57711ee@speechrecog
[INFO ] 2024-04-08 15:16:24,960 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 371 RECOGNITION-COMPLETE 6 COMPLETE
Channel-Identifier:db1333daf57711ee@speechrecog
Completion-Cause:000 success
Content-Type:application/x-nlsml
Content-Length:186
<?xml version="1.0"?>
<result>
<interpretation confidence="0.99">
<instance>db1333daf57711ee</instance>
<input mode="speech">左左左。</input>
</interpretation>
</result>
��������
[INFO ] 2024-04-08 15:16:26,336 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 324 RECOGNITION-COMPLETE 6 COMPLETE
Channel-Identifier:db1321ecf57711ee@speechrecog
Completion-Cause:002 no-input-timeout
Content-Length:165
<?xml version="1.0"?>
<result>
<interpretation>
<instance>db1321ecf57711ee</instance>
<input>
<noinput/>
</input>
</interpretation>
</result>
[WARN ] 2024-04-08 15:16:28,511 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpSocket$ReadThread.run(MrcpSocket.java:118)
java.net.SocketException: Socket closed
服务端
见本博《UniMRCP 1.7.0初体验 》一文。
搭建好服务端后,识别效果如下:
[INFO ] 2024-04-08 18:36:01,016 method:com.mrcp.yxp.protocol.mrcp.mrcp4j.client.MrcpChannel.handleMessage(MrcpChannel.java:203)
Got an event: MRCP/2.0 393 RECOGNITION-COMPLETE 2 COMPLETE
Channel-Identifier:c8525796f59311ee@speechrecog
Completion-Cause:000 success
Content-Type:application/x-nlsml
Content-Length:208
<?xml version="1.0"?>
<result>
<interpretation grammar="session:[email protected]" confidence="0.97">
<instance>one</instance>
<input mode="speech">one</input>
</interpretation>
</result>