Difference between revisions of "Text To Sound"

Latest revision as of 16:54, 6 February 2025

Text To Sound

		Information
Categories: Audio Module Sensor
Brand: Diustou

Description

Features

Text To Sound

Interfaces

UART

Product Overview

Text To Sound is a Chinese voice synthesis module based on the SYN8086 chip. This module can receive text data to be synthesized through the UART interface, achieving text-to-speech (or TTS) conversion. The converted voice has a clear and mellow tone. It supports GB2312, GBK, UTF-8, and Unicode encoding methods, and can recognize numbers, phone numbers, dates and times, measurement symbols, as well as automatically identify and pronounce homophones correctly. It supports 10 volume levels, 30 speech rate levels, and 10 pitch levels for adjustment; multiple text control tags to improve the accuracy of text processing; and various playback controls, including synthesis, stop, pause synthesis, and resume synthesis. It supports UART communication with multiple baud rates to choose from; all indicators meet the requirements for applications in harsh outdoor environments. It can be used in fields such as in-vehicle terminals, attendance terminals, bus stop announcements, and voice intercoms.

Product Description

Working Voltage: 5V
Communication Interface: UART
This module can complete the voice output operation through serial communication with a microcontroller or by using a serial port assistant/host computer software + serial port module. The command format sent is "5-byte header + text", where the text is the content we need to play.
Audio Output: Onboard amplifier circuit and speaker, no external load required.
Indicator Lights: One power indicator light and two communication indicator lights onboard.
Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch.
Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands.
Supported Voice Types:
- Supports synthesis of any Chinese text.
- Supports synthesis of English letters, pronouncing English words as individual letters.
Supported Encoding Types: Supports GB2312, GBK, Unicode, and UTF-8 encoding methods. See Encoding Description for details].
Text Synthesis Capacity:
- UTF-8 Encoding: Supports text synthesis of up to 2000 bytes.
- GB2312, GBK, UNICODE Little Endian, UNICODE Big Endian Encoding: Supports text synthesis of up to 4000 bytes.
Intelligent Text Processing: For common formats such as numbers, phone numbers, dates and times, measurement symbols, etc., the chip can correctly recognize and process the text based on built-in text matching rules.
- For example, "2012-05-01 10:36:28" is read as "二零一二年五月一日十点三十六分二十八秒" (May 1, 2012, 10:36:28), "The speed of the train is 622km/h" is read as "火车的速度是六百二十二公里每小时" (The speed of the train is 622 kilometers per hour), "-12℃" is read as "零下十二摄氏度" (Minus 12 degrees Celsius).
Homophone and Surname Processing: For text with homophones, such as "The bank president walked across the sidewalk towards the bank employee riding a bicycle", the chip can automatically analyze the text, determine the pronunciation of homophones in the text, and synthesize the correct pronunciation.
- For example, "他是一位姓朴的朴素的韩国艺人。" (He is a simple Korean artist surnamed Piao with a simple personality). In this sentence, the first "朴" is pronounced as "piao2" and the second as "pu3".
Volume, Speech Rate, and Pitch Adjustment: Supports 10 levels of volume control, 30 levels of speech rate, and 10 levels of pitch adjustment.
Prompt Tones: Integrates 93 types of prompt tones, which can be used for information reminders, alarms, and other functions in different industries and occasions. See Prompt Tone Description for details].
- Polyphonic Prompt Tones: 14
- Compatible Prompt Tones: 24
- Ringtone Prompt Tones: 19
- Prompt Tones: 19
- Alarm Prompt Tones: 8
- Card Swipe Prompt Tones (Special): 7
- Customer Prompt Tones: 2
Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?].
Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See Control Marker Description for details].
Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip.
Querying the Chip's Working Status: Supports multiple ways to query the chip's working status, including: querying the status pin level, receiving automatic return transmissions from the chip, and sending query commands to obtain return transmissions of the chip's working status.
Reference Materials: Provides host computer software and some reference examples for development boards.

User Guide

Command Frame Format

The following command frame format is supported: "Frame header FD + Data area length + Data area".
All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission.
Within the same frame of data, the transmission interval between each byte should not exceed 30ms; the transmission interval between frames must exceed 30ms.
When the module is synthesizing text, if it receives another valid synthesis command frame, the chip will immediately stop synthesizing the current text and switch to synthesizing the newly received text.
When playing text content continuously, it is recommended to wait for approximately 1ms after receiving the "chip idle" byte (i.e., 0x4F) indicating that the previous frame of data has finished playing before sending the next frame of data.
For more command descriptions, please refer to the UART Command Description.
- Note: This module does not support the following commands in the "UART Command Description": Chapter 3.5 "Voice Synthesis Command with Background Music", Chapter 4 "Custom Text-related Commands", and Chapter 5 "MP3-related Commands".

Host Computer

Hardware Preparation

PC x1
Text To Sound x1
CP2102 USB to TTL x1
- The host software can only recognize the CP2102 USB to TTL; otherwise, an error will occur: "Serial port does not exist or is occupied. If your computer does not have a serial port, please use a USB to serial port device."

Hardware Connection

Interface Introduction

Text control for the voice synthesis function:
1. Send Text: Used to input and edit the text to be sent. You can also load built-in demo texts or prompt tones. Click "Clear" to start over.
2. Log: Displays current user actions, chip responses, and other information. You can choose whether to display hexadecimal values with "0x" and whether to return explanations. You can also clear the log.
3. Communication Port Settings: Used to select the port number and set the communication baud rate. The default baud rate is 115200bps.
4. Demo Text Settings: Divided into two modes, [Chinese Synthesis] and [Chinese-English Synthesis]. This module requires the [Chinese-English Synthesis] mode. Note: This module does not support the playback of high-quality lead-in tones.
5. TTS Attribute Settings: Used to set the related attributes and parameters of voice synthesis when demonstrating or evaluating the function.
6. Custom Functions: This module does not have this function.
7. Control Operations: Used to perform various control operations related to the text-to-speech synthesis function, such as synthesize, stop, pause, etc.
8. Direct Encoding Send: Send Encoding Area: You can directly copy hexadecimal text encoding to the send encoding area. Click "Add Command Header and Play (Automatic)" to automatically add a voice synthesis command frame header to the text encoding based on the selected encoding and send it to the TTS module.

Button Description

Operation Steps

Install Driver
- Download the CP2102 Universal Windows Driver, unzip it, and install the driver. After installation, right-click on "My Computer" on the desktop, go to Device Manager, and check the port number under Ports.
Communication Port Settings and Baud Rate Configuration Method
- Open the Host Computer Software, select the port number generated after installing the driver in the previous step, and set the baud rate. The default is 115200bps. The module supports three communication baud rates: 9600bps, 57600bps, and 115200bps. You can configure the hardware by adjusting the BAUD0 and BAUD1 DIP switches on the board. 0: represents low level, 1: represents high level.
Input Text
- Edit text in the "Send Text" box, or automatically load the chip's introduction text, prompt tone text, etc., into the "Send Text" input box by clicking [Demo Text] or [Prompt Tone].
Set Synthesis Parameters
- You can select a voice through the software and adjust the speech rate, pitch, and volume (click "Restore Defaults" to reset these settings to their default values).
Send Synthesis Command
- Click the "Synthesize" button to hear the text in the "Send Text" input box synthesized into audio output.

Error Solution

When running the "Voice Synthesis Chip_PC Demo Program," an error occurs as shown below. This is because MSCOMM32.OCX needs to be installed, and an error occurred when registering the ActiveX control. The error message is that MSCOMM32.OCX is loaded, but the DllRegisterServer call failed.
Solution:
- Move the entire host software folder to the desktop with the storage path "C:\Users\yousi\Desktop". The program includes "MSCOMM32.OCX".
- Press the Win+X keys on the keyboard to bring up common commands and select "Windows PowerShell (Admin) (A)".
- In the command prompt, enter "regsvr32 "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具\MSCOMM32.OCX"".
- Press Enter to confirm after entering. If a success message appears, it means the registration was successful.
- Start the PC host program without errors, and it can be used normally. If you encounter registration failures for other controls, follow these steps to register them. The required controls are located in the "C:\Users\yousi\Desktop\语音合成芯片_pc端演示程序\1-文档及工具" directory.

Arduino

Hardware Connection

Download Example

Open the program in the arduino_SYN8086 folder from theReference Example

.

Modify the baud rate function according to the actual baud rate used: serial2.begin(115200).
Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
Upload the modified program.

STM32

Hardware Connection

Download Example

Open the program in the STM32F103C8T6_DEMO_SYN8086 folder from the Reference Example.
Modify the baud rate function according to the actual baud rate used: USART3_Init(115200).
Modify the text to be spoken, as well as parameters such as voice, speech rate, etc. For more parameters, see Control Markup Instructions.
Compile the modified program and download the .HEX file after compilation is complete.

参考资料

FAQ

Question:

Can it recognize voice? What is it used for?

Answer:

It cannot perform voice recognition; it is for voice synthesis, which converts text to sound. You send text to the module, and it plays the corresponding sound. It is used in scenarios such as bus stop announcements, queue number calling, and meal pickup number calling.

Question:

Can it record audio? Can it convert to MP3 files? Can it save recordings?

Answer:

It cannot record audio, nor can it convert to MP3 files. It is a real-time playback module that plays sound immediately upon receiving text and cannot save recordings.

Question:

How do I use it? Is it complicated?

Answer:

You can send text using the accompanying computer software or by sending text via serial communication from a microcontroller. The microcontroller program is simple as long as you understand serial communication. For computer control, just connect and you're ready to go, no prior knowledge required.

Question:

How do I control it with a microcontroller? Can it synthesize English?

Answer:

The microcontroller sends data to the voice module in the format of "5-byte header + text" via serial port, and it will play the corresponding sound for the text. It only supports Chinese. It will read English as individual letters.

Question:

Can it only be tested on a computer?

Answer:

This module can be tested not only on a computer but also used in conjunction with a microcontroller. Computers can also send data via serial port. Any device with serial communication capability can be used, noting that it requires TTL level.

Contact Diustou

Our working hours are: 09:00-18:00 (UTC+8 Monday to Saturday)

@@ Line 21: / Line 21: @@
 * Indicator Lights: One power indicator light and two communication indicator lights onboard.
 * Supported Baud Rates: 9600bps, 57600bps, 115200bps, selectable via DIP switch.
-** [[File:Text To Sound_产品说明1.png|200px]]
+** [[File:Text To Sound_产品说明1.png|400px]]
 * Low Power Mode: Supports Deep Sleep mode. The chip can be put into Deep Sleep mode using control commands.
 * Supported Voice Types:
@@ Line 44: / Line 44: @@
 ** Customer Prompt Tones: 2
 * Supports Multiple Speakers: Provides 8 Chinese speakers including two males, two females, one effect processor, one girl, and two boys, which can be switched by using special markers [m?].
-**[[File:Text To Sound_产品说明2.png|200px]]
+**[[File:Text To Sound_产品说明2.png|400px]]
 * Multiple Text Control Markers: Text control markers can be sent through the "Synthesis Command" to adjust speech rate, pitch, and volume; control markers can also be used to improve the accuracy of text processing, such as setting the rhythm of sentences, setting the pronunciation of numbers, setting surname pronunciation strategies, setting the pronunciation of "1" in numbers, etc. See [http://public.voicetx.com/zh/home/mark_cn Control Marker Description] for details].
 * Multiple Playback Controls: Control commands include synthesizing text, stopping synthesis, pausing synthesis, resuming synthesis, status inquiry, and entering Deep Sleep mode. The controller sends control commands through the communication interface to control the chip.
@@ Line 52: / Line 52: @@
 == User Guide ==
 === Command Frame Format ===
-*[[File:Text To Sound_命令说明.png|600px]]
+*[[File:Text To Sound_命令说明.png|1000px]]
 * The following command frame format is supported: "Frame header FD + Data area length + Data area".
 * All commands and data sent from the host computer to the chip need to be encapsulated in "frames" before transmission.

Anonymous

Search

Difference between revisions of "Text To Sound"

Namespaces

More

Page actions

Latest revision as of 16:54, 6 February 2025

Contents

Product Overview

Product Description