Difference between revisions of "Text To Sound"
Yousimaier17 (talk | contribs) |
Yousimaier17 (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 21: | Line 21: | ||
* Indicator Lights: One power indicator light and two communication indicator lights onboard. | * Indicator Lights: One power indicator light and two communication indicator lights onboard. | ||
* Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch. | * Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch. | ||
− | ** [[File:Text To Sound_产品说明1.png| | + | ** [[File:Text To Sound_产品说明1.png|400px]] |
* Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands. | * Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands. | ||
* Supported Voice Types: | * Supported Voice Types: | ||
Line 44: | Line 44: | ||
** Customer Prompt Tones: 2 | ** Customer Prompt Tones: 2 | ||
* Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?]. | * Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?]. | ||
− | **[[File:Text To Sound_产品说明2.png| | + | **[[File:Text To Sound_产品说明2.png|400px]] |
* Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See [http://public.voicetx.com/zh/home/mark_cn Control Marker Description] for details]. | * Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See [http://public.voicetx.com/zh/home/mark_cn Control Marker Description] for details]. | ||
* Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip. | * Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip. | ||
Line 52: | Line 52: | ||
== User Guide == | == User Guide == | ||
=== Command Frame Format === | === Command Frame Format === | ||
− | *[[File:Text To Sound_命令说明.png| | + | *[[File:Text To Sound_命令说明.png|1000px]] |
* The following command frame format is supported: "Frame header FD + Data area length + Data area". | * The following command frame format is supported: "Frame header FD + Data area length + Data area". | ||
* All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission. | * All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission. |
Latest revision as of 16:54, 6 February 2025
| ||||||||||||||||||||||
| ||||||||||||||||||||||
|
Contents
Product Overview
- Text To Sound is a Chinese voice synthesis module based on the SYN8086 chip. This module can receive text data to be synthesized through the UART interface, achieving text-to-speech (or TTS) conversion. The converted voice has a clear and mellow tone. It supports GB2312, GBK, UTF-8, and Unicode encoding methods, and can recognize numbers, phone numbers, dates and times, measurement symbols, as well as automatically identify and pronounce homophones correctly. It supports 10 volume levels, 30 speech rate levels, and 10 pitch levels for adjustment; multiple text control tags to improve the accuracy of text processing; and various playback controls, including synthesis, stop, pause synthesis, and resume synthesis. It supports UART communication with multiple baud rates to choose from; all indicators meet the requirements for applications in harsh outdoor environments. It can be used in fields such as in-vehicle terminals, attendance terminals, bus stop announcements, and voice intercoms.
Product Description
- Working Voltage: 5V
- Communication Interface: UART
- This module can complete the voice output operation through serial communication with a microcontroller or by using a serial port assistant/host computer software + serial port module. The command format sent is "5-byte header + text", where the text is the content we need to play.
- Audio Output: Onboard amplifier circuit and speaker, no external load required.
- Indicator Lights: One power indicator light and two communication indicator lights onboard.
- Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch.
- Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands.
- Supported Voice Types:
- Supports synthesis of any Chinese text.
- Supports synthesis of English letters, pronouncing English words as individual letters.
- Supported Encoding Types: Supports GB2312, GBK, Unicode, and UTF-8 encoding methods. See Encoding Description for details].
- Text Synthesis Capacity:
- UTF-8 Encoding: Supports text synthesis of up to 2000 bytes.
- GB2312, GBK, UNICODE Little Endian, UNICODE Big Endian Encoding: Supports text synthesis of up to 4000 bytes.
- Intelligent Text Processing: For common formats such as numbers, phone numbers, dates and times, measurement symbols, etc., the chip can correctly recognize and process the text based on built-in text matching rules.
- For example, "2012-05-01 10:36:28" is read as "二零一二年五月一日十点三十六分二十八秒" (May 1, 2012, 10:36:28), "The speed of the train is 622km/h" is read as "火车的速度是六百二十二公里每小时" (The speed of the train is 622 kilometers per hour), "-12℃" is read as "零下十二摄氏度" (Minus 12 degrees Celsius).
- Homophone and Surname Processing: For text with homophones, such as "The bank president walked across the sidewalk towards the bank employee riding a bicycle", the chip can automatically analyze the text, determine the pronunciation of homophones in the text, and synthesize the correct pronunciation.
- For example, "他是一位姓朴的朴素的韩国艺人。" (He is a simple Korean artist surnamed Piao with a simple personality). In this sentence, the first "朴" is pronounced as "piao2" and the second as "pu3".
- Volume, Speech Rate, and Pitch Adjustment: Supports 10 levels of volume control, 30 levels of speech rate, and 10 levels of pitch adjustment.
- Prompt Tones: Integrates 93 types of prompt tones, which can be used for information reminders, alarms, and other functions in different industries and occasions. See Prompt Tone Description for details].
- Polyphonic Prompt Tones: 14
- Compatible Prompt Tones: 24
- Ringtone Prompt Tones: 19
- Prompt Tones: 19
- Alarm Prompt Tones: 8
- Card Swipe Prompt Tones (Special): 7
- Customer Prompt Tones: 2
- Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?].
- Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See Control Marker Description for details].
- Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip.
- Querying the Chip's Working Status: Supports multiple ways to query the chip's working status, including: querying the status pin level, receiving automatic return transmissions from the chip, and sending query commands to obtain return transmissions of the chip's working status.
- Reference Materials: Provides host computer software and some reference examples for development boards.
User Guide
Command Frame Format
- The following command frame format is supported: "Frame header FD + Data area length + Data area".
- All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission.
- Within the same frame of data, the transmission interval between each byte should not exceed 30ms; the transmission interval between frames must exceed 30ms.
- When the module is synthesizing text, if it receives another valid synthesis command frame, the chip will immediately stop synthesizing the current text and switch to synthesizing the newly received text.
- When playing text content continuously, it is recommended to wait for approximately 1ms after receiving the "chip idle" byte (i.e., 0x4F) indicating that the previous frame of data has finished playing before sending the next frame of data.
- For more command descriptions, please refer to the UART Command Description.
- Note: This module does not support the following commands in the "UART Command Description": Chapter 3.5 "Voice Synthesis Command with Background Music", Chapter 4 "Custom Text-related Commands", and Chapter 5 "MP3-related Commands".
Host Computer
Hardware Preparation
- PC x1
- Text To Sound x1
- CP2102 USB to TTL x1
- The host software can only recognize the CP2102 USB to TTL; otherwise, an error will occur: "Serial port does not exist or is occupied. If your computer does not have a serial port, please use a USB to serial port device."
Hardware Connection
Interface Introduction
- Text control for the voice synthesis function:
- 1. Send Text: Used to input and edit the text to be sent. You can also load built-in demo texts or prompt tones. Click "Clear" to start over.
- 2. Log: Displays current user actions, chip responses, and other information. You can choose whether to display hexadecimal values with "0x" and whether to return explanations. You can also clear the log.
- 3. Communication Port Settings: Used to select the port number and set the communication baud rate. The default baud rate is 115200bps.
- 4. Demo Text Settings: Divided into two modes, [Chinese Synthesis] and [Chinese-English Synthesis]. This module requires the [Chinese-English Synthesis] mode. Note: This module does not support the playback of high-quality lead-in tones.
- 5. TTS Attribute Settings: Used to set the related attributes and parameters of voice synthesis when demonstrating or evaluating the function.
- 6. Custom Functions: This module does not have this function.
- 7. Control Operations: Used to perform various control operations related to the text-to-speech synthesis function, such as synthesize, stop, pause, etc.
- 8. Direct Encoding Send: Send Encoding Area: You can directly copy hexadecimal text encoding to the send encoding area. Click "Add Command Header and Play (Automatic)" to automatically add a voice synthesis command frame header to the text encoding based on the selected encoding and send it to the TTS module.
Button Description
Operation Steps
- Install Driver
- Download the CP2102 Universal Windows Driver, unzip it, and install the driver. After installation, right-click on "My Computer" on the desktop, go to Device Manager, and check the port number under Ports.
- Communication Port Settings and Baud Rate Configuration Method
- Open the Host Computer Software, select the port number generated after installing the driver in the previous step, and set the baud rate. The default is 115200bps. The module supports three communication baud rates: 9600bps, 57600bps, and 115200bps. You can configure the hardware by adjusting the BAUD0 and BAUD1 DIP switches on the board. 0: represents low level, 1: represents high level.
- Input Text
- Edit text in the "Send Text" box, or automatically load the chip's introduction text, prompt tone text, etc., into the "Send Text" input box by clicking [Demo Text] or [Prompt Tone].
- Set Synthesis Parameters
- You can select a voice through the software and adjust the speech rate, pitch, and volume (click "Restore Defaults" to reset these settings to their default values).
- Send Synthesis Command
- Click the "Synthesize" button to hear the text in the "Send Text" input box synthesized into audio output.
Error Solution
- When running the "Voice Synthesis Chip_PC Demo Program," an error occurs as shown below. This is because MSCOMM32.OCX needs to be installed, and an error occurred when registering the ActiveX control. The error message is that MSCOMM32.OCX is loaded, but the DllRegisterServer call failed.
- Solution:
- Move the entire host software folder to the desktop with the storage path "C:\Users\yousi\Desktop". The program includes "MSCOMM32.OCX".
- Press the Win+X keys on the keyboard to bring up common commands and select "Windows PowerShell (Admin) (A)".
- In the command prompt, enter "regsvr32 "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\MSCOMM32.OCX"".
- Press Enter to confirm after entering. If a success message appears, it means the registration was successful.
- Start the PC host program without errors, and it can be used normally. If you encounter registration failures for other controls, follow these steps to register them. The required controls are located in the "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具" directory.
Arduino
Hardware Connection
Download Example
- Open the program in the arduino_SYN8086 folder from theReference Example
.
- Modify the baud rate function according to the actual baud rate used: serial2.begin(115200).
- Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
- Upload the modified program.
STM32
Hardware Connection
Download Example
- Open the program in the STM32F103C8T6_DEMO_SYN8086 folder from the Reference Example.
- Modify the baud rate function according to the actual baud rate used: USART3_Init(115200).
- Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
- Compile the modified program and download the .HEX file after compilation is complete.
参考资料
- CP2102 Universal Windows Driver
- SYN8086 Chip Manual
- Host Computer Software
- Reference Example
- Serial Port Debugging Assistant
FAQ
|
|
|
|
|
|
|
|
|
|