Difference between revisions of "Text To Sound"

From Diustou Wiki
(Created page with "{{Product |images=400px |categories= {{Category|音频模块}} {{Category|传感器}} |brand=丢石头 |features= * 语音合成模块 |in...")
 
 
(3 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|images=[[File:Text To Sound_示意图.png|400px]]
 
|images=[[File:Text To Sound_示意图.png|400px]]
 
|categories=
 
|categories=
{{Category|音频模块}}
+
{{Category|Audio Module}}
{{Category|传感器}}
+
{{Category|Sensor}}
|brand=丢石头
+
|brand=Diustou
 
|features=
 
|features=
* 语音合成模块
+
* Text To Sound
 
|interfaces=
 
|interfaces=
{{Category|UART接口}}
+
{{Category|UART}}
 
}}
 
}}
  
== 产品概述 ==
+
== Product Overview ==
*Text To Sound是一款基于SYN8086芯片制作的中文语音合成模块。本模块可通过UART接口,接收待合成的文本数据,实现文本到语音(或TTS语音)的转换。转换后的音色清亮圆润。支持GB2312、GBK、UTF-8 和Unicode 四种编码方式,能识别数值、电话号码、时间日期、度量衡符号及自动识别多音字的读法并合成正确的读音。支持10级音量、30级语速、10级语调调节;多种文本控制标记,提升文本处理的正确率;多种播放控制,包括:合成、停止、暂停合成、继续合成。支持UART通讯方式;多种通讯波特率供选择;各项指标均满足室外严酷环境下的应用。可用于车载终端、考勤终端、公交报站、语音对讲等领域。
+
*Text To Sound is a Chinese voice synthesis module based on the SYN8086 chip. This module can receive text data to be synthesized through the UART interface, achieving text-to-speech (or TTS) conversion. The converted voice has a clear and mellow tone. It supports GB2312, GBK, UTF-8, and Unicode encoding methods, and can recognize numbers, phone numbers, dates and times, measurement symbols, as well as automatically identify and pronounce homophones correctly. It supports 10 volume levels, 30 speech rate levels, and 10 pitch levels for adjustment; multiple text control tags to improve the accuracy of text processing; and various playback controls, including synthesis, stop, pause synthesis, and resume synthesis. It supports UART communication with multiple baud rates to choose from; all indicators meet the requirements for applications in harsh outdoor environments. It can be used in fields such as in-vehicle terminals, attendance terminals, bus stop announcements, and voice intercoms.
  
== 产品说明 ==
+
== Product Description ==
* 工作电压:5V
+
* Working Voltage: 5V
* 通信接口:UART
+
* Communication Interface: UART
**本模块可通过与单片机的串口通信或者使用串口助手/上位机软件+串口模块完成发音操作,发送的指令格式为“5位帧头+文本”的方式,文本即为我们需要播放的内容。
+
* This module can complete the voice output operation through serial communication with a microcontroller or by using a serial port assistant/host computer software + serial port module. The command format sent is "5-byte header + text", where the text is the content we need to play.
* 音频输出:板载功放电路与喇叭,无需外接负载。
+
* Audio Output: Onboard amplifier circuit and speaker, no external load required.
* 指示灯:板载一个电源指示灯与两个通信指示灯
+
* Indicator Lights: One power indicator light and two communication indicator lights onboard.
* 支持波特率:9600bps、57600bps、115200bps,可通过拨码开关选择。
+
* Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch.
**[[File:Text To Sound_产品说明1.png|200px]]
+
** [[File:Text To Sound_产品说明1.png|400px]]
* 低功耗模式:支持 Deep Sleep模式。使用控制命令可以使芯片进入 Deep Sleep模式。
+
* Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands.
* 支持语音类型:
+
* Supported Voice Types:
**支持任意中文文本的合成。
+
** Supports synthesis of any Chinese text.
**支持英文字母的合成,遇到英文单词时按字母方式发音。
+
** Supports synthesis of English letters, pronouncing English words as individual letters.
* 支持编码类型:支持GB2312、GBK、Unicode和UTF-8四种编码方式。详见[http://public.voicetx.com/zh/home/encoding 编码说明]
+
* Supported Encoding Types: Supports GB2312, GBK, Unicode, and UTF-8 encoding methods. See [http://public.voicetx.com/zh/home/encoding Encoding Description] for details].
* 合成文本量:
+
* Text Synthesis Capacity:
**UTF-8编码:支持最多2000字节的文本合成。
+
** UTF-8 Encoding: Supports text synthesis of up to 2000 bytes.
**GB2312、GBK、UNICODE小头、UNICODE大头编码:支持最多4000字节的文本合成。
+
** GB2312, GBK, UNICODE Little Endian, UNICODE Big Endian Encoding: Supports text synthesis of up to 4000 bytes.
* 文本智能处理:对常见的数值、电话号码、时间日期、度量衡符号等格式的文本,芯片能够根据内置的文本匹配规则进行正确的识别和处理。
+
* Intelligent Text Processing: For common formats such as numbers, phone numbers, dates and times, measurement symbols, etc., the chip can correctly recognize and process the text based on built-in text matching rules.
**例如:“2012-05-01 10:36:28”读作 “二零一二年五月一日十点三十六分二十八秒”,“火车的速度是622km/h” 读作 “火车的速度是六百二十二公里每小时”,“-12℃”读作“零下十二摄氏度”。
+
** For example, "2012-05-01 10:36:28" is read as "二零一二年五月一日十点三十六分二十八秒" (May 1, 2012, 10:36:28), "The speed of the train is 622km/h" is read as "火车的速度是六百二十二公里每小时" (The speed of the train is 622 kilometers per hour), "-12℃" is read as "零下十二摄氏度" (Minus 12 degrees Celsius).
* 多音字和姓氏处理:对存在多音字的文本,例如:“银行行长穿过人行道向骑着自行车的银行职员行走过去”,芯片可以自动对文本进行分析,判别文本中多音字的读法并合成正确的读音。
+
* Homophone and Surname Processing: For text with homophones, such as "The bank president walked across the sidewalk towards the bank employee riding a bicycle", the chip can automatically analyze the text, determine the pronunciation of homophones in the text, and synthesize the correct pronunciation.
**例如:“他是一位姓朴的朴素的韩国艺人。”,句中两个“朴”字前面一个读作“piao2”,后面一个读作“pu3”。
+
** For example, "他是一位姓朴的朴素的韩国艺人。" (He is a simple Korean artist surnamed Piao with a simple personality). In this sentence, the first "朴" is pronounced as "piao2" and the second as "pu3".
* 音量、语速、语调调整:支持10级音量控制、30级语速、10级语调调节;
+
* Volume, Speech Rate, and Pitch Adjustment: Supports 10 levels of volume control, 30 levels of speech rate, and 10 levels of pitch adjustment.
* 提示音:集成了93种声音提示音,可用于不同行业不同场合的信息提醒、报警等功能。详见[http://public.voicetx.com/zh/home/prompt_cn 提示音说明]
+
* Prompt Tones: Integrates 93 types of prompt tones, which can be used for information reminders, alarms, and other functions in different industries and occasions. See [http://public.voicetx.com/zh/home/prompt_cn Prompt Tone Description] for details].
**和弦提示音:14
+
** Polyphonic Prompt Tones: 14
**兼容提示音:24
+
** Compatible Prompt Tones: 24
**铃声提示音:19
+
** Ringtone Prompt Tones: 19
**提示音:19
+
** Prompt Tones: 19
**报警提示音:8
+
** Alarm Prompt Tones: 8
**刷卡提示音特殊提示音:7
+
** Card Swipe Prompt Tones (Special): 7
**客户提示音:2
+
** Customer Prompt Tones: 2
* 支持多个发音人:提供两男、两女、一个效果器和一个女童、两个男童声共8个中文发音人,可以通过使用特殊标记[m?]来切换芯片的发音人。
+
* Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?].
**[[File:Text To Sound_产品说明2.png|200px]]
+
**[[File:Text To Sound_产品说明2.png|400px]]
* 多种文本控制标记:可通过发送“合成命令”发送文本控制标记,调节语速、语调、音量;还可以使用控制标记提升文本处理的正确率,如:设置句子的韵律、设置数字读法、设置姓氏读音策略、设置号码中“1”的读法等。详见[http://public.voicetx.com/zh/home/mark_cn 控制标记说明]
+
* Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See [http://public.voicetx.com/zh/home/mark_cn Control Marker Description] for details].
* 多种播放控制:控制命令包括合成文本、停止合成、暂停合成、恢复合成、状态查询、进入Deep Sleep模式。控制器通过通讯接口发送控制命令实现对芯片的控制。
+
* Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip.
* 查询芯片的工作状态:支持多种方式查询芯片的工作状态,包括:查询状态管脚电平、通过读芯片自动返回的回传、发送查询命令获得芯片工作状态的回传。
+
* Querying the Chip's Working Status: Supports multiple ways to query the chip's working status, including: querying the status pin level, receiving automatic return transmissions from the chip, and sending query commands to obtain return transmissions of the chip's working status.
* 参考资料:提供上位机软件与部分开发板参考例程。
+
* Reference Materials: Provides host computer software and some reference examples for development boards.
 +
 
 +
== User Guide ==
 +
=== Command Frame Format ===
 +
*[[File:Text To Sound_命令说明.png|1000px]]
 +
* The following command frame format is supported: "Frame header FD + Data area length + Data area".
 +
* All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission.
 +
* Within the same frame of data, the transmission interval between each byte should not exceed 30ms; the transmission interval between frames must exceed 30ms.
 +
* When the module is synthesizing text, if it receives another valid synthesis command frame, the chip will immediately stop synthesizing the current text and switch to synthesizing the newly received text.
 +
* When playing text content continuously, it is recommended to wait for approximately 1ms after receiving the "chip idle" byte (i.e., 0x4F) indicating that the previous frame of data has finished playing before sending the next frame of data.
 +
* For more command descriptions, please refer to the [http://public.voicetx.com/en/home/uart UART Command Description].
 +
**Note: This module does not support the following commands in the "UART Command Description": Chapter 3.5 "Voice Synthesis Command with Background Music", Chapter 4 "Custom Text-related Commands", and Chapter 5 "MP3-related Commands".
  
== 使用说明 ==
+
=== Host Computer ===
=== 命令帧格式 ===
+
==== Hardware Preparation ====
*[[File:Text To Sound_命令说明.png|600px]]
 
* 支持以下命令帧格式:“帧头FD + 数据区长度+数据区”格式。
 
* 上位机发送给芯片的所有命令和数据都需要用“帧”的方式进行封装后传输。
 
* 同一帧数据中,每个字节之间的发送间隔不能超过30ms;帧与帧之间的发送间隔必须超过30ms。
 
* 当模块正在合成文本的时候,如果又接收到一帧有效的合成命令帧,芯片会立即停止当前正在合成的文本,转而合成新收到的文本。
 
* 用户在连续播放文本内容时,在收到前一帧数据播放完毕的“芯片空闲”字节(即0x4F)后,最好延时1ms左右再发送下一帧数据。
 
* 更多命令说明详见[http://public.voicetx.com/zh/home/uart 串口命令说明]
 
**注意:本模块不支持《串口命令说明》中的这些命令:第3.5章“带背景音语音合成命令”、第4章“自定义文本相关命令”和第5章“MP3相关命令”
 
  
=== 上位机 ===
+
* PC x1
==== 硬件准备 ====
+
* Text To Sound x1
* PC 一台
+
* CP2102 USB to TTL  x1
* Text To Sound模块 一个
+
**The host software can only recognize the CP2102 USB to TTL; otherwise, an error will occur: "Serial port does not exist or is occupied. If your computer does not have a serial port, please use a USB to serial port device."
* CP2102 USB转TTL串口模块 一个
 
**上位机软件仅能识别CP2102串口模块,否则会出现报错:“串口不存在或被占用,若电脑没有串口,请使用USB转串口设备”。
 
  
==== 硬件连接 ====
+
==== Hardware Connection ====
 
*[[File:Text To Sound_上位机1.png|400px]]
 
*[[File:Text To Sound_上位机1.png|400px]]
  
==== 界面介绍 ====
+
==== Interface Introduction ====
 
*[[File:Text To Sound_上位机2.png|400px]]
 
*[[File:Text To Sound_上位机2.png|400px]]
* 语音合成功能的文本控制:
 
*1、发送文本:用于输入和编辑发送文本,也可以载入内置的演示文本,载入提示音,可以点“清空”重新
 
输入。
 
*2、日志:用于显示当前的用户操作和芯片返回等信息,可以选择十六进制是否加0x显示,是否需要回传
 
解释,也可以清空操作。
 
*3、通讯端口设置:用于选择端口号和设置通讯波特率,波特率默认为115200bps。
 
*4、演示文本设置:分为【中文合成】和【中英合成】两个模式,本模块需选择【中英合成】模式。注:本模块不支持优质前导音的播放。
 
*5、TTS属性设置:演示或评估语音合成功能时,设置语音合成的相关属性和参数。
 
*6、自定义功能:本模块无此功能
 
*7、控制操作:用于实现文本合成语音功能的各种相关控制操作,如合成、停止、暂停等。
 
*8、直接编码发送:发送编码区:可以直接把十六进制文字编码拷贝到发送编码区,点“添加命令头播放(自动)”可以根据选择的编码自动为文字编码添加语音合成命令帧头,并发送给TTS模块。
 
  
==== 按键说明 ====
+
* Text control for the voice synthesis function:
*[[File:Text To Sound_上位机3.png|800px]]
+
*1. Send Text: Used to input and edit the text to be sent. You can also load built-in demo texts or prompt tones. Click "Clear" to start over.
 +
*2. Log: Displays current user actions, chip responses, and other information. You can choose whether to display hexadecimal values with "0x" and whether to return explanations. You can also clear the log.
 +
*3. Communication Port Settings: Used to select the port number and set the communication baud rate. The default baud rate is 115200bps.
 +
*4. Demo Text Settings: Divided into two modes, [Chinese Synthesis] and [Chinese-English Synthesis]. This module requires the [Chinese-English Synthesis] mode. Note: This module does not support the playback of high-quality lead-in tones.
 +
*5. TTS Attribute Settings: Used to set the related attributes and parameters of voice synthesis when demonstrating or evaluating the function.
 +
*6. Custom Functions: This module does not have this function.
 +
*7. Control Operations: Used to perform various control operations related to the text-to-speech synthesis function, such as synthesize, stop, pause, etc.
 +
*8. Direct Encoding Send: Send Encoding Area: You can directly copy hexadecimal text encoding to the send encoding area. Click "Add Command Header and Play (Automatic)" to automatically add a voice synthesis command frame header to the text encoding based on the selected encoding and send it to the TTS module.
 +
 
 +
==== Button Description ====
 +
*[[File:Text To Sound_上位机3.png|1000px]]
  
==== 操作步骤 ====
+
==== Operation Steps ====
*'''安装驱动'''  
+
*'''Install Driver'''
**下载[[:File:CP210x_Universal_Windows_Driver.zip|CP2102 Windows 通用驱动]],解压后安装驱动,安装完成后在桌面【我的电脑】上右键单击/设备管理器/端口查看端口号。
+
**Download the [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:CP210x_Universal_Windows_Driver.zip CP2102 Universal Windows Driver], unzip it, and install the driver. After installation, right-click on "My Computer" on the desktop, go to Device Manager, and check the port number under Ports.
*'''通讯端口设置以及波特率配置方法'''  
+
*'''Communication Port Settings and Baud Rate Configuration Method'''
**打开[[:File:语音合成芯片_pc端演示程序.rar|上位机软件]],选择上一步安装驱动后生成的端口号,并进行波特率的选择,默认为115200bps。模块支持三种通信波特率:9600bps、57600bps、115200bps,可以通过调整板上拨码开关BAUD0、BAUD1进行硬件配置,0:表示低电平 1:表示高电平
+
**Open the [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90%E8%8A%AF%E7%89%87_pc%E7%AB%AF%E6%BC%94%E7%A4%BA%E7%A8%8B%E5%BA%8F.rar Host Computer Software], select the port number generated after installing the driver in the previous step, and set the baud rate. The default is 115200bps. The module supports three communication baud rates: 9600bps, 57600bps, and 115200bps. You can configure the hardware by adjusting the BAUD0 and BAUD1 DIP switches on the board. 0: represents low level, 1: represents high level.
 
**[[File:Text To Sound_产品说明1.png|200px]]
 
**[[File:Text To Sound_产品说明1.png|200px]]
*'''输入文本'''  
+
*'''Input Text'''
**在“发送文本”框内编辑文本,或者将芯片的介绍文本、提示音文本等通过点击【演示文本】或【提示音】自动载入到“发送文本”的输入框内。
+
**Edit text in the "Send Text" box, or automatically load the chip's introduction text, prompt tone text, etc., into the "Send Text" input box by clicking [Demo Text] or [Prompt Tone].
*'''设置合成参数'''  
+
*'''Set Synthesis Parameters'''
**可通过软件选择发音人,进行语速、语调、音量的调节(点击“恢复默认值”,可以将这几项设置恢复到默认值)。
+
**You can select a voice through the software and adjust the speech rate, pitch, and volume (click "Restore Defaults" to reset these settings to their default values).
* '''发送合成命令'''  
+
*'''Send Synthesis Command'''
**点击“合成”按钮,就可以听到“发送文本”输入框中的文字被合成为声音输出。
+
**Click the "Synthesize" button to hear the text in the "Send Text" input box synthesized into audio output.
  
==== 报错解决方案 ====
+
==== Error Solution ====
*运行“语音合成芯片_PC 端演示程序”报错,如下图。这是因为需要安装 MSCOMM32.OCX,在注册 ActiveX 控件时出错了,错误提示是 MSCOMM32.OCX 已加载,但是DllRegisterServer 调用失败。
+
*When running the "Voice Synthesis Chip_PC Demo Program," an error occurs as shown below. This is because MSCOMM32.OCX needs to be installed, and an error occurred when registering the ActiveX control. The error message is that MSCOMM32.OCX is loaded, but the DllRegisterServer call failed.
 
**[[File:Text To Sound_上位机4.png|500px]]
 
**[[File:Text To Sound_上位机4.png|500px]]
*解决方案:
+
*Solution:
**将上位机软件文件夹整移动至桌面上,存储路径为“C:\Users\yousi\Desktop”,程序包含有“MSCOMM32.OCX”
+
**Move the entire host software folder to the desktop with the storage path "C:\Users\yousi\Desktop". The program includes "MSCOMM32.OCX".
 
***[[File:Text To Sound_上位机8.png|400px]]
 
***[[File:Text To Sound_上位机8.png|400px]]
**按键盘上的 win+x 键调出常用命令,选择“Windows PowerShell(管理员)(A)”
+
**Press the Win+X keys on the keyboard to bring up common commands and select "Windows PowerShell (Admin) (A)".
 
***[[File:Text To Sound_上位机5.png|400px]]
 
***[[File:Text To Sound_上位机5.png|400px]]
**在命令提示符中输入“regsvr32 "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\MSCOMM32.OCX"
+
**In the command prompt, enter "regsvr32 "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\MSCOMM32.OCX"".
 
***[[File:Text To Sound_上位机6.png|600px]]
 
***[[File:Text To Sound_上位机6.png|600px]]
**输入完按回车键确定,会有成功提示,这样就表示我们注册成功了
+
**Press Enter to confirm after entering. If a success message appears, it means the registration was successful.
 
***[[File:Text To Sound_上位机7.png|400px]]
 
***[[File:Text To Sound_上位机7.png|400px]]
**启动 PC 端上位机程序未报错,则可正常使用。如遇到其它控件注册失败请按照此步骤注册就可以了,“C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\”目录下有所需要的控件。
+
**Start the PC host program without errors, and it can be used normally. If you encounter registration failures for other controls, follow these steps to register them. The required controls are located in the "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具" directory.
  
 
=== Arduino ===
 
=== Arduino ===
==== 硬件连接 ====
+
==== Hardware Connection ====
 
*[[File:Text To Sound_arduino1.png|400px]]
 
*[[File:Text To Sound_arduino1.png|400px]]
==== 下载例程 ====
+
==== Download Example ====
*打开[[:File:TTS_demo软件.zip|参考例程]]中arduino_SYN8086文件夹下的程序。
+
*Open the program in the arduino_SYN8086 folder from the[https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:TTS_demo%E8%BD%AF%E4%BB%B6.zip Reference Example]
*根据实际使用的波特率修改波特率函数:serial2.begin(115200)
+
.
*修改需要发声的文本,以及语音,语速等参数,更多参数详见[http://public.voicetx.com/zh/home/mark_cn 控制标记说明]
+
*Modify the baud rate function according to the actual baud rate used: serial2.begin(115200).
 +
*Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see [http://public.voicetx.com/zh/home/mark_cn Control Markup Instructions].
 
**[[File:Text To Sound_arduino2.png|800px]]
 
**[[File:Text To Sound_arduino2.png|800px]]
*上传修改好的程序。
+
*Upload the modified program.
  
 
=== STM32 ===
 
=== STM32 ===
==== 硬件连接 ====
+
==== Hardware Connection ====
 
*[[File:Text To Sound_stm1.png|400px]]
 
*[[File:Text To Sound_stm1.png|400px]]
==== 下载例程 ====
+
==== Download Example ====
*打开[[:File:TTS_demo软件.zip|参考例程]]中STM32F103C8T6_DEMO_SYN8086文件夹下的程序。
+
*Open the program in the STM32F103C8T6_DEMO_SYN8086 folder from the [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:TTS_demo%E8%BD%AF%E4%BB%B6.zip Reference Example].
*根据实际使用的波特率修改波特率函数:USART3_Init(115200)
+
*Modify the baud rate function according to the actual baud rate used: USART3_Init(115200).
*修改需要发声的文本,以及语音,语速等参数,更多参数详见[http://public.voicetx.com/zh/home/mark_cn 控制标记说明]
+
*Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see [http://public.voicetx.com/zh/home/mark_cn Control Markup Instructions].
 
**[[File:Text To Sound_stm2.png|800px]]
 
**[[File:Text To Sound_stm2.png|800px]]
*编译修改好的程序,编译完成后下载.HEX文件。
+
*Compile the modified program and download the .HEX file after compilation is complete.
  
 
== 参考资料 ==
 
== 参考资料 ==
* [[:File:CP210x_Universal_Windows_Driver.zip|CP2102 Windows 通用驱动]]
+
* [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:CP210x_Universal_Windows_Driver.zip CP2102 Universal Windows Driver]
* [http://www.doc.voicetx.com/zh/home/SYN8086 SYN8086芯片手册]
+
* [http://www.doc.voicetx.com/zh/home/SYN8086 SYN8086 Chip Manual]
* [[:File:语音合成芯片_pc端演示程序.rar|上位机软件]]
+
* [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90%E8%8A%AF%E7%89%87_pc%E7%AB%AF%E6%BC%94%E7%A4%BA%E7%A8%8B%E5%BA%8F.rar Host Computer Software]
* [[:File:TTS_demo软件.zip|参考例程]]
+
* [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:TTS_demo%E8%BD%AF%E4%BB%B6.zip Reference Example]
* [[:File:串口调试助手.zip|串口调试助手]]
+
* [https://wiki.diustou.com/cn/%E6%96%87%E4%BB%B6:%E4%B8%B2%E5%8F%A3%E8%B0%83%E8%AF%95%E5%8A%A9%E6%89%8B.zip Serial Port Debugging Assistant]
  
 
== FAQ ==
 
== FAQ ==
 
<div class="tabbertab" title="FAQ">
 
<div class="tabbertab" title="FAQ">
{{FAQ|可以识别声音吗?有什么用?|
+
{{FAQ|Can it recognize voice? What is it used for?|
不可以语音识别,这是语音合成,即 文本转声音,发送文本给模块它播放对应的声音。如应用于公交车报站、排队叫号、取餐叫号等语音播报场景。
+
It cannot perform voice recognition; it is for voice synthesis, which converts text to sound. You send text to the module, and it plays the corresponding sound. It is used in scenarios such as bus stop announcements, queue number calling, and meal pickup number calling.
 
}}
 
}}
{{FAQ|可以录音吗?可以转换成MP3文件吗?能保存吗?|
+
{{FAQ|Can it record audio? Can it convert to MP3 files? Can it save recordings?|
不能够录音,也不能转换为MP3文件,是实时播放的模块,发送文本立刻部分,不能保存。
+
It cannot record audio, nor can it convert to MP3 files. It is a real-time playback module that plays sound immediately upon receiving text and cannot save recordings.
 
}}
 
}}
{{FAQ|怎么使用?复杂吗?|
+
{{FAQ|How do I use it? Is it complicated?|
可以通过配套电脑软件发送文本,或者用单片机串口通信发送文本。单片机程序简单懂串口通信即可。电脑控制的话连接好就可以,无需基础。
+
You can send text using the accompanying computer software or by sending text via serial communication from a microcontroller. The microcontroller program is simple as long as you understand serial communication. For computer control, just connect and you're ready to go, no prior knowledge required.
 
}}
 
}}
{{FAQ|用单片机怎么控制它?可以合成英文吗?|
+
{{FAQ|How do I control it with a microcontroller? Can it synthesize English?|
单片机通过串口向语音模块发送“5字节帧头+文本”的格式数据,他就可以播放对应文本的声音,只支持中文。遇到英文会按照英文字母的方式朗读。
+
The microcontroller sends data to the voice module in the format of "5-byte header + text" via serial port, and it will play the corresponding sound for the text. It only supports Chinese. It will read English as individual letters.
 
}}
 
}}
{{FAQ|只可以用电脑测试吗?|
+
{{FAQ|Can it only be tested on a computer?|
本模块除了可以用电脑测试,也可以用单片机进行结合使用,电脑亦通过串口发送数 据,只要具备串口通信功能的设备都可以使用注意需要为TTL电平。
+
This module can be tested not only on a computer but also used in conjunction with a microcontroller. Computers can also send data via serial port. Any device with serial communication capability can be used, noting that it requires TTL level.
 
}}
 
}}
 
</div>
 
</div>

Latest revision as of 16:54, 6 February 2025

Text To Sound
Text To Sound 示意图.png
Information

Categories: Audio Module Sensor

Brand: Diustou

Description
Features
  • Text To Sound

Interfaces

UART

Related products

{{{related}}}

Product Overview

  • Text To Sound is a Chinese voice synthesis module based on the SYN8086 chip. This module can receive text data to be synthesized through the UART interface, achieving text-to-speech (or TTS) conversion. The converted voice has a clear and mellow tone. It supports GB2312, GBK, UTF-8, and Unicode encoding methods, and can recognize numbers, phone numbers, dates and times, measurement symbols, as well as automatically identify and pronounce homophones correctly. It supports 10 volume levels, 30 speech rate levels, and 10 pitch levels for adjustment; multiple text control tags to improve the accuracy of text processing; and various playback controls, including synthesis, stop, pause synthesis, and resume synthesis. It supports UART communication with multiple baud rates to choose from; all indicators meet the requirements for applications in harsh outdoor environments. It can be used in fields such as in-vehicle terminals, attendance terminals, bus stop announcements, and voice intercoms.

Product Description

  • Working Voltage: 5V
  • Communication Interface: UART
  • This module can complete the voice output operation through serial communication with a microcontroller or by using a serial port assistant/host computer software + serial port module. The command format sent is "5-byte header + text", where the text is the content we need to play.
  • Audio Output: Onboard amplifier circuit and speaker, no external load required.
  • Indicator Lights: One power indicator light and two communication indicator lights onboard.
  • Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch.
    • Text To Sound 产品说明1.png
  • Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands.
  • Supported Voice Types:
    • Supports synthesis of any Chinese text.
    • Supports synthesis of English letters, pronouncing English words as individual letters.
  • Supported Encoding Types: Supports GB2312, GBK, Unicode, and UTF-8 encoding methods. See Encoding Description for details].
  • Text Synthesis Capacity:
    • UTF-8 Encoding: Supports text synthesis of up to 2000 bytes.
    • GB2312, GBK, UNICODE Little Endian, UNICODE Big Endian Encoding: Supports text synthesis of up to 4000 bytes.
  • Intelligent Text Processing: For common formats such as numbers, phone numbers, dates and times, measurement symbols, etc., the chip can correctly recognize and process the text based on built-in text matching rules.
    • For example, "2012-05-01 10:36:28" is read as "二零一二年五月一日十点三十六分二十八秒" (May 1, 2012, 10:36:28), "The speed of the train is 622km/h" is read as "火车的速度是六百二十二公里每小时" (The speed of the train is 622 kilometers per hour), "-12℃" is read as "零下十二摄氏度" (Minus 12 degrees Celsius).
  • Homophone and Surname Processing: For text with homophones, such as "The bank president walked across the sidewalk towards the bank employee riding a bicycle", the chip can automatically analyze the text, determine the pronunciation of homophones in the text, and synthesize the correct pronunciation.
    • For example, "他是一位姓朴的朴素的韩国艺人。" (He is a simple Korean artist surnamed Piao with a simple personality). In this sentence, the first "朴" is pronounced as "piao2" and the second as "pu3".
  • Volume, Speech Rate, and Pitch Adjustment: Supports 10 levels of volume control, 30 levels of speech rate, and 10 levels of pitch adjustment.
  • Prompt Tones: Integrates 93 types of prompt tones, which can be used for information reminders, alarms, and other functions in different industries and occasions. See Prompt Tone Description for details].
    • Polyphonic Prompt Tones: 14
    • Compatible Prompt Tones: 24
    • Ringtone Prompt Tones: 19
    • Prompt Tones: 19
    • Alarm Prompt Tones: 8
    • Card Swipe Prompt Tones (Special): 7
    • Customer Prompt Tones: 2
  • Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?].
    • Text To Sound 产品说明2.png
  • Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See Control Marker Description for details].
  • Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip.
  • Querying the Chip's Working Status: Supports multiple ways to query the chip's working status, including: querying the status pin level, receiving automatic return transmissions from the chip, and sending query commands to obtain return transmissions of the chip's working status.
  • Reference Materials: Provides host computer software and some reference examples for development boards.

User Guide

Command Frame Format

  • Text To Sound 命令说明.png
  • The following command frame format is supported: "Frame header FD + Data area length + Data area".
  • All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission.
  • Within the same frame of data, the transmission interval between each byte should not exceed 30ms; the transmission interval between frames must exceed 30ms.
  • When the module is synthesizing text, if it receives another valid synthesis command frame, the chip will immediately stop synthesizing the current text and switch to synthesizing the newly received text.
  • When playing text content continuously, it is recommended to wait for approximately 1ms after receiving the "chip idle" byte (i.e., 0x4F) indicating that the previous frame of data has finished playing before sending the next frame of data.
  • For more command descriptions, please refer to the UART Command Description.
    • Note: This module does not support the following commands in the "UART Command Description": Chapter 3.5 "Voice Synthesis Command with Background Music", Chapter 4 "Custom Text-related Commands", and Chapter 5 "MP3-related Commands".

Host Computer

Hardware Preparation

  • PC x1
  • Text To Sound x1
  • CP2102 USB to TTL x1
    • The host software can only recognize the CP2102 USB to TTL; otherwise, an error will occur: "Serial port does not exist or is occupied. If your computer does not have a serial port, please use a USB to serial port device."

Hardware Connection

  • Text To Sound 上位机1.png

Interface Introduction

  • Text To Sound 上位机2.png
  • Text control for the voice synthesis function:
  • 1. Send Text: Used to input and edit the text to be sent. You can also load built-in demo texts or prompt tones. Click "Clear" to start over.
  • 2. Log: Displays current user actions, chip responses, and other information. You can choose whether to display hexadecimal values with "0x" and whether to return explanations. You can also clear the log.
  • 3. Communication Port Settings: Used to select the port number and set the communication baud rate. The default baud rate is 115200bps.
  • 4. Demo Text Settings: Divided into two modes, [Chinese Synthesis] and [Chinese-English Synthesis]. This module requires the [Chinese-English Synthesis] mode. Note: This module does not support the playback of high-quality lead-in tones.
  • 5. TTS Attribute Settings: Used to set the related attributes and parameters of voice synthesis when demonstrating or evaluating the function.
  • 6. Custom Functions: This module does not have this function.
  • 7. Control Operations: Used to perform various control operations related to the text-to-speech synthesis function, such as synthesize, stop, pause, etc.
  • 8. Direct Encoding Send: Send Encoding Area: You can directly copy hexadecimal text encoding to the send encoding area. Click "Add Command Header and Play (Automatic)" to automatically add a voice synthesis command frame header to the text encoding based on the selected encoding and send it to the TTS module.

Button Description

  • Text To Sound 上位机3.png

Operation Steps

  • Install Driver
    • Download the CP2102 Universal Windows Driver, unzip it, and install the driver. After installation, right-click on "My Computer" on the desktop, go to Device Manager, and check the port number under Ports.
  • Communication Port Settings and Baud Rate Configuration Method
    • Open the Host Computer Software, select the port number generated after installing the driver in the previous step, and set the baud rate. The default is 115200bps. The module supports three communication baud rates: 9600bps, 57600bps, and 115200bps. You can configure the hardware by adjusting the BAUD0 and BAUD1 DIP switches on the board. 0: represents low level, 1: represents high level.
    • Text To Sound 产品说明1.png
  • Input Text
    • Edit text in the "Send Text" box, or automatically load the chip's introduction text, prompt tone text, etc., into the "Send Text" input box by clicking [Demo Text] or [Prompt Tone].
  • Set Synthesis Parameters
    • You can select a voice through the software and adjust the speech rate, pitch, and volume (click "Restore Defaults" to reset these settings to their default values).
  • Send Synthesis Command
    • Click the "Synthesize" button to hear the text in the "Send Text" input box synthesized into audio output.

Error Solution

  • When running the "Voice Synthesis Chip_PC Demo Program," an error occurs as shown below. This is because MSCOMM32.OCX needs to be installed, and an error occurred when registering the ActiveX control. The error message is that MSCOMM32.OCX is loaded, but the DllRegisterServer call failed.
    • Text To Sound 上位机4.png
  • Solution:
    • Move the entire host software folder to the desktop with the storage path "C:\Users\yousi\Desktop". The program includes "MSCOMM32.OCX".
      • Text To Sound 上位机8.png
    • Press the Win+X keys on the keyboard to bring up common commands and select "Windows PowerShell (Admin) (A)".
      • Text To Sound 上位机5.png
    • In the command prompt, enter "regsvr32 "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\MSCOMM32.OCX"".
      • Text To Sound 上位机6.png
    • Press Enter to confirm after entering. If a success message appears, it means the registration was successful.
      • Text To Sound 上位机7.png
    • Start the PC host program without errors, and it can be used normally. If you encounter registration failures for other controls, follow these steps to register them. The required controls are located in the "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具" directory.

Arduino

Hardware Connection

  • Text To Sound arduino1.png

Download Example

.

  • Modify the baud rate function according to the actual baud rate used: serial2.begin(115200).
  • Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
    • Text To Sound arduino2.png
  • Upload the modified program.

STM32

Hardware Connection

  • Text To Sound stm1.png

Download Example

  • Open the program in the STM32F103C8T6_DEMO_SYN8086 folder from the Reference Example.
  • Modify the baud rate function according to the actual baud rate used: USART3_Init(115200).
  • Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
    • Text To Sound stm2.png
  • Compile the modified program and download the .HEX file after compilation is complete.

参考资料

FAQ

Question:
Can it recognize voice? What is it used for?
Answer:

It cannot perform voice recognition; it is for voice synthesis, which converts text to sound. You send text to the module, and it plays the corresponding sound. It is used in scenarios such as bus stop announcements, queue number calling, and meal pickup number calling.


Question:
Can it record audio? Can it convert to MP3 files? Can it save recordings?
Answer:

It cannot record audio, nor can it convert to MP3 files. It is a real-time playback module that plays sound immediately upon receiving text and cannot save recordings.


Question:
How do I use it? Is it complicated?
Answer:

You can send text using the accompanying computer software or by sending text via serial communication from a microcontroller. The microcontroller program is simple as long as you understand serial communication. For computer control, just connect and you're ready to go, no prior knowledge required.


Question:
How do I control it with a microcontroller? Can it synthesize English?
Answer:

The microcontroller sends data to the voice module in the format of "5-byte header + text" via serial port, and it will play the corresponding sound for the text. It only supports Chinese. It will read English as individual letters.


Question:
Can it only be tested on a computer?
Answer:

This module can be tested not only on a computer but also used in conjunction with a microcontroller. Computers can also send data via serial port. Any device with serial communication capability can be used, noting that it requires TTL level.



Icon-mail.png Contact Diustou

Our working hours are: 09:00-18:00 (UTC+8 Monday to Saturday)