nexus/raw/Daily notes/2025-03-14.md at 65803c911d95e8a633ead193b44bf32438cb0028

ishenwei/nexus

Fork 0

Files

admin 65803c911d 结构变化

2026-04-14 12:19:28 +08:00

4.7 KiB

Raw Blame History

title, source, author, published, created, description, tags

title

source

author

published

created

description

Summary:

Today's main attempt was to successfully install F5-TTS, a local version of a speech-to-text tool. https://github.com/SWivid/F5-TTS At present, I know that this tool was developed by several students from Jiaotong University. I tried to install it. There are several technical points that need to be mentioned here. The first is about the installation of Conda. Conda is a toolkit that can help create various independent environments. Whether you want to build data science/machine learning models, deploy your work to production, or securely manage a team of engineers, Anaconda provides the tools necessary to succeed. This documentation is designed to aid in building your understanding of Anaconda software and assist with any operations you may need to perform to manage your organization’s users and resources. The conda installation doc is here: https://www.anaconda.com/docs/getting-started/miniconda/install#windows-installation I am using below request to download conda windows installation package

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output .\Downloads\Miniconda3-latest-Windows-x86_64.exe

After then I followed the steps to install F5-TTS

Installation

Create a separate environment if needed

# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n f5-tts python=3.10
conda activate f5-tts

Install PyTorch with matched device

NVIDIA GPU

# Install pytorch with your CUDA version, e.g.
pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124

AMD GPU

# Install pytorch with your ROCm version (Linux only), e.g.
pip install torch==2.5.1+rocm6.2 torchaudio==2.5.1+rocm6.2 --extra-index-url https://download.pytorch.org/whl/rocm6.2

Intel GPU

# Install pytorch with your XPU version, e.g.
# Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit must be installed
pip install torch torchaudio --index-url https://download.pytorch.org/whl/test/xpu

# Intel GPU support is also available through IPEX (Intel® Extension for PyTorch)
# IPEX does not require the Intel® Deep Learning Essentials or Intel® oneAPI Base Toolkit
# See: https://pytorch-extension.intel.com/installation?request=platform

Apple Silicon

# Install the stable pytorch, e.g.
pip install torch torchaudio

Then you can choose one from below:

1. As a pip package (if just for inference)

pip install f5-tts

2. Local editable (if also do training, finetuning)

git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# git submodule update --init --recursive  # (optional, if need > bigvgan)
pip install -e .

It ran One problem encountered during this process was that ffmpeg could not be found, and the error code was:

ffmpeg was not found but is required to load audio files from filename

I later found some information on the Internet and successfully solved this problem. The main problem is that I need to download the FFMPG component package and then add it to the computer's environment variables.

download ffmpeg from official website: https://www.gyan.dev/ffmpeg/builds/
Exact all files and move 3 exe file to c:\ffmpeg folder
Configure this patch in system

Launch Web UI - Gradio App

Currently supported features:

Basic TTS with Chunk Inference
Multi-Style / Multi-Speaker Generation
Voice Chat powered by Qwen2.5-3B-Instruct
Custom inference with more language support

# Launch a Gradio app (web interface)
f5-tts_infer-gradio

# Specify the port/host
f5-tts_infer-gradio --port 7860 --host 0.0.0.0

# Launch a share link
f5-tts_infer-gradio --share

Open browser: http://127.0.0.1:7860/ to launch web UI Gradio App

I tried to run a voice conversion. You need to provide a reference voice first. Then it will generate the corresponding voice for you based on the reference voice and the text you input. I tried it and the effect was very good.

But there is one thing. Because I haven't set up the GPU to accelerate the calculation, the whole conversion is completely operated by the CPU. Therefore, the CPU usage is very high during the conversion process, and the time is relatively slow. I haven't had time to use the GPU to do this conversion process yet. I haven't tried it yet. Maybe I will try it tomorrow.

4.7 KiB Raw Blame History Unescape Escape