Getting started
Infer & Download models

Inferencing

Inference is a simple process that you can do in three easy steps, where the program takes an input audio and transforms it into the voice it has been trained on. Replicating the vocal characteristics, intonation, and style of the original voice.

advanced settings section

Applio has an “advanced options” section which allows you to modify several settings for your result, this box focuses on describe each one of them:

  • Export Format: Select the format to export the audio.

  • Split Audio: Basically cuts the audio into parts to make the inference by parts and then joins them together.

  • Autotune: Apply a soft autotune to your inferences, recommended for singing conversions.

  • Clean Audio: Clean your audio output using noise detection algorithms, recommended for speaking audios.

  • Upscale Audio: Upscale the audio to a higher quality, recommended for low-quality audios.

  • Clean Strenght: The more you increase it the more it will clean up, but it will be more compressed.

  • Pitch: Adjust the tone of the model, for male it is - and female it is +. For male to female is -12 and female to male is +12.

  • Filter Radius: Applies respiration filtering to the results, the value represents the filter radius and respiration reduction to avoid artifacts.

  • Search Feature Ratio: It is the one in charge of controlling the index, the larger the ratio, the more single the dataset but it can result in artifacts, so it is better to leave it as it is by default.

  • Volume Envelope: Substitute or blend with the volume envelope of the output.

  • Protec Voiceless Consonants: Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts.

  • Hop Length: Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference and training but tend to yield higher pitch accuracy.

  • Pitch extraction algorithm: Select between rvmpe, crepe or other.

  • (optional) Embedder Model: select the Embedder model (hubert or contentvec).

Download models

Automatic Download

To download a voice model, go to the “download” section and enter the link to the file.

Applio support links from the following platforms:

  • Google Drive
  • Hugging Face
  • Discord
  • Applio Web
  • Yandex
  • Pixeldrain
  • Mediafire
  • Mega

Manual Upload

In the same download section, you can see a box to upload files, simply unzip your .zip file and drop file by file.

FAQ about making inferences

What should I do if my output audio sounds robotic?

  • Look for better quality audio.
  • In case of training, your voice model needs more training or is overtraining.
  • Remove the reverb, double vocals and noise from your acapella, you can check the UVR 5 guide or MVSEP guide.
  • The dataset of your model contained noise, you need to clean the dataset.
  • Try advanced settings.