You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That will also install the matching `wfloat-sherpa-onnx` dependency from PyPI.
16
+
<audiocontrolssrc="./sample.wav">
17
+
<ahref="./sample.wav">Sample dialogue</a>
18
+
</audio>
15
19
16
-
When installing from this repo locally:
20
+
[Sample dialogue](sample.wav)
21
+
22
+
## Install
17
23
18
24
```bash
19
-
pip install ./packages/wfloat-python
25
+
pip install wfloat
20
26
```
21
27
22
28
## Usage
@@ -27,20 +33,115 @@ import wfloat
27
33
model = wfloat.load("wfloat/wfloat-tts")
28
34
29
35
result = model.generate(
30
-
text="The signal is clean. Start the recording.",
31
-
voice_id="narrator_woman",
32
-
emotion="neutral",
33
-
intensity=0.5,
34
-
speed=1.0,
36
+
text="No, no, that's not possible. The formula should have crystallized, but it adapted instead. Do you realize what that means for the rest of my work?",
37
+
voice_id="mad_scientist_woman",
38
+
emotion="surprise",
39
+
intensity=0.7,
35
40
)
36
41
37
42
result.audio.save("out.wav")
38
43
```
39
44
40
-
## Notes
45
+
For multi-speaker dialogue:
46
+
47
+
```python
48
+
import wfloat
49
+
50
+
model = wfloat.load("wfloat/wfloat-tts")
51
+
52
+
result = model.generate_dialogue(
53
+
segments=[
54
+
{
55
+
"voice_id": "wise_elder_man",
56
+
"text": "Rain taps against the tavern shutters as you step inside.",
57
+
"emotion": "neutral",
58
+
"intensity": 0.5,
59
+
},
60
+
{
61
+
"voice_id": "strong_hero_man",
62
+
"text": "You're late. Two bandits stole the king's map over three hours ago.",
63
+
"emotion": "fear",
64
+
"intensity": 0.6,
65
+
},
66
+
{
67
+
"voice_id": "strong_hero_man",
68
+
"text": "They fled north, up into the woods.",
69
+
"emotion": "neutral",
70
+
"intensity": 0.5,
71
+
},
72
+
],
73
+
silence_between_segments_sec=0.35,
74
+
)
75
+
76
+
result.audio.save("dialogue.wav")
77
+
```
78
+
79
+
You can also generate a WAV from the command line:
80
+
81
+
```bash
82
+
wfloat generate \
83
+
--text "Hello world!" \
84
+
--out out.wav \
85
+
--voice-id mad_scientist_woman \
86
+
--emotion surprise \
87
+
--intensity 0.7 \
88
+
--silence-padding-sec 0
89
+
```
90
+
91
+
For the full CLI help:
92
+
93
+
```bash
94
+
wfloat generate --help
95
+
```
96
+
97
+
The first load downloads the model assets. After that, the package uses the
98
+
cached local copy.
99
+
100
+
## Speaker IDs
101
+
102
+
Use `voice_id` string names or numeric `sid` values:
103
+
104
+
| Speaker | SID |
105
+
| --- | ---: |
106
+
|`skilled_hero_man`| 0 |
107
+
|`skilled_hero_woman`| 1 |
108
+
|`fun_hero_man`| 2 |
109
+
|`fun_hero_woman`| 3 |
110
+
|`strong_hero_man`| 4 |
111
+
|`strong_hero_woman`| 5 |
112
+
|`mad_scientist_man`| 6 |
113
+
|`mad_scientist_woman`| 7 |
114
+
|`clever_villain_man`| 8 |
115
+
|`clever_villain_woman`| 9 |
116
+
|`narrator_man`| 10 |
117
+
|`narrator_woman`| 11 |
118
+
|`wise_elder_man`| 12 |
119
+
|`wise_elder_woman`| 13 |
120
+
|`outgoing_anime_man`| 14 |
121
+
|`outgoing_anime_woman`| 15 |
122
+
|`scary_villain_man`| 16 |
123
+
|`scary_villain_woman`| 17 |
124
+
|`news_reporter_man`| 18 |
125
+
|`news_reporter_woman`| 19 |
126
+
127
+
## Emotions
128
+
129
+
Supported emotion labels:
130
+
131
+
-`neutral`
132
+
-`joy`
133
+
-`sadness`
134
+
-`anger`
135
+
-`fear`
136
+
-`surprise`
137
+
-`dismissive`
138
+
-`confusion`
139
+
140
+
`intensity` must be between `0.0` and `1.0`.
141
+
142
+
## More
41
143
42
-
-`wfloat` does not build or bundle native libraries.
43
-
- Low-level bindings come from the installed `wfloat-sherpa-onnx` dependency,
44
-
which provides `import sherpa_onnx`.
45
-
- The public API is intentionally high-level; low-level native config objects
46
-
are re-exported only for advanced use.
144
+
- Docs: https://docs.wfloat.com
145
+
- Model card, voices, emotions, and samples: https://huggingface.co/Wfloat/wfloat-tts
146
+
- Web package: https://github.com/wfloat/wfloat-web
0 commit comments