4.7 KiB
title, source, author, published, created, description, tags
| title | source | author | published | created | description | tags |
|---|---|---|---|---|---|---|
| Create a python 3.10 conda env (you could also use virtualenv) | shenwei |
Summary:
Today's main attempt was to successfully install F5-TTS, a local version of a speech-to-text tool. https://github.com/SWivid/F5-TTS At present, I know that this tool was developed by several students from Jiaotong University. I tried to install it. There are several technical points that need to be mentioned here. The first is about the installation of Conda. Conda is a toolkit that can help create various independent environments. Whether you want to build data science/machine learning models, deploy your work to production, or securely manage a team of engineers, Anaconda provides the tools necessary to succeed. This documentation is designed to aid in building your understanding of Anaconda software and assist with any operations you may need to perform to manage your organization’s users and resources. The conda installation doc is here: https://www.anaconda.com/docs/getting-started/miniconda/install#windows-installation I am using below request to download conda windows installation package
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output .\Downloads\Miniconda3-latest-Windows-x86_64.exe
After then I followed the steps to install F5-TTS
Installation
Create a separate environment if needed
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts
Install PyTorch with matched device
NVIDIA GPU
# Install pytorch with your CUDA version, e.g.
pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124
AMD GPU
# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2
Intel GPU
# Install pytorch with your XPU version, e.g.
# Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit must be installed
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu
# Intel GPU support is also available through IPEX (Intel® Extension for PyTorch)
# IPEX does not require the Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit
# See: https://pytorch-extension.intel.com/installation?request=platform
Apple Silicon
# Install the stable pytorch, e.g.
pip install torch torchaudio
Then you can choose one from below:
1. As a pip package (if just for inference)
pip install f5-tts
2. Local editable (if also do training, finetuning)
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive # (optional, if need > bigvgan)
pip install -e .
It ran One problem encountered during this process was that ffmpeg could not be found, and the error code was:
ffmpeg was not found but is required to load audio files from filename
I later found some information on the Internet and successfully solved this problem. The main problem is that I need to download the FFMPG component package and then add it to the computer's environment variables.
- download ffmpeg from official website: https://www.gyan.dev/ffmpeg/builds/
- Exact all files and move 3 exe file to c:\ffmpeg folder

- Configure this patch in system

Launch Web UI - Gradio App
Currently supported features:
- Basic TTS with Chunk Inference
- Multi-Style / Multi-Speaker Generation
- Voice Chat powered by Qwen2.5-3B-Instruct
- Custom inference with more language support
# Launch a Gradio app (web interface)
f5-tts_infer-gradio
# Specify the port/host
f5-tts_infer-gradio --port 7860 --host 0.0.0.0
# Launch a share link
f5-tts_infer-gradio --share
Open browser: http://127.0.0.1:7860/ to launch web UI Gradio App
I tried to run a voice conversion. You need to provide a reference voice first. Then it will generate the corresponding voice for you based on the reference voice and the text you input. I tried it and the effect was very good.

But there is one thing. Because I haven't set up the GPU to accelerate the calculation, the whole conversion is completely operated by the CPU. Therefore, the CPU usage is very high during the conversion process, and the time is relatively slow. I haven't had time to use the GPU to do this conversion process yet. I haven't tried it yet. Maybe I will try it tomorrow.