Recording the voice
Follow these guidelines to maximise the quality of the AI voice:
- Record at least 1 minute of audio
- Keep the audio consistent
- Replicate your performance
- Find a good balance for the volume
How to prepare your voice training samples
When creating your custom voice you will need one or several training samples (Recordings). These should be recorded using the same equipment (microphone etc) and on the same set that the rest of the video is.
Must haves:
- Needs to have the same audio grading/mixing that the final video will have.
- Needs to have emotional range that reflects the rest of your video, can’t be overly monotone.
Nice to haves:
- If the intention is to use AI-voice for name reading, the training data should aim to include a small set of actual name readings, e.g 5-10 greetings. This helps increase consistency.
<aside>
Training sample requirements
- Accepted file-types:
wav, mp3
- Minimum length (per sample):
10 seconds
- Maximum length (per sample):
10 minutes
</aside>
Assuring high quality voice generation
The quality of the generated audio is mainly dependent on two factors:
- The quality of the original training data
- The creative itself (e.g how the generated audio is implemented in the script).
The more weaknesses there are in these two areas, the less life-like the end result will be.