Skip to content

Commit f167df8

Browse files
apappascsilayaperumalg
authored andcommitted
docs: enhance transcription API documentation
Signed-off-by: Alexandros Pappas <[email protected]>
1 parent c0cc32c commit f167df8

File tree

1 file changed

+104
-2
lines changed

1 file changed

+104
-2
lines changed
Lines changed: 104 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,107 @@
11
[[Transcription]]
22
= Transcription API
33

4-
Spring AI provides support for OpenAI's Transcription API.
5-
When additional providers for Transcription are implemented, a common `AudioTranscriptionModel` interface will be extracted.
4+
Spring AI provides a unified API for Speech-to-Text transcription through the `TranscriptionModel` interface. This allows you to write portable code that works across different transcription providers.
5+
6+
== Supported Providers
7+
8+
- xref:api/audio/transcriptions/openai-transcriptions.adoc[OpenAI's Whisper API]
9+
- xref:api/audio/transcriptions/azure-openai-transcriptions.adoc[Azure OpenAI Whisper API]
10+
11+
== Common Interface
12+
13+
All transcription providers implement the following shared interface:
14+
15+
=== TranscriptionModel
16+
17+
The `TranscriptionModel` interface provides methods for converting audio to text:
18+
19+
[source,java]
20+
----
21+
public interface TranscriptionModel extends Model<AudioTranscriptionPrompt, AudioTranscriptionResponse> {
22+
23+
/**
24+
* Transcribes the audio from the given prompt.
25+
*/
26+
AudioTranscriptionResponse call(AudioTranscriptionPrompt transcriptionPrompt);
27+
28+
/**
29+
* A convenience method for transcribing an audio resource.
30+
*/
31+
default String transcribe(Resource resource) {
32+
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(resource);
33+
return this.call(prompt).getResult().getOutput();
34+
}
35+
36+
/**
37+
* A convenience method for transcribing an audio resource with options.
38+
*/
39+
default String transcribe(Resource resource, AudioTranscriptionOptions options) {
40+
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(resource, options);
41+
return this.call(prompt).getResult().getOutput();
42+
}
43+
}
44+
----
45+
46+
=== AudioTranscriptionPrompt
47+
48+
The `AudioTranscriptionPrompt` class encapsulates the input audio and options:
49+
50+
[source,java]
51+
----
52+
Resource audioFile = new FileSystemResource("/path/to/audio.mp3");
53+
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(
54+
audioFile,
55+
options
56+
);
57+
----
58+
59+
=== AudioTranscriptionResponse
60+
61+
The `AudioTranscriptionResponse` class contains the transcribed text and metadata:
62+
63+
[source,java]
64+
----
65+
AudioTranscriptionResponse response = model.call(prompt);
66+
String transcribedText = response.getResult().getOutput();
67+
AudioTranscriptionResponseMetadata metadata = response.getMetadata();
68+
----
69+
70+
== Writing Provider-Agnostic Code
71+
72+
One of the key benefits of the shared transcription interface is the ability to write code that works with any transcription provider without modification. The actual provider (OpenAI, Azure OpenAI, etc.) is determined by your Spring Boot configuration, allowing you to switch providers without changing application code.
73+
74+
=== Basic Service Example
75+
76+
The shared interface allows you to write code that works with any transcription provider:
77+
78+
[source,java]
79+
----
80+
@Service
81+
public class TranscriptionService {
82+
83+
private final TranscriptionModel transcriptionModel;
84+
85+
public TranscriptionService(TranscriptionModel transcriptionModel) {
86+
this.transcriptionModel = transcriptionModel;
87+
}
88+
89+
public String transcribeAudio(Resource audioFile) {
90+
return transcriptionModel.transcribe(audioFile);
91+
}
92+
93+
public String transcribeWithOptions(Resource audioFile, AudioTranscriptionOptions options) {
94+
AudioTranscriptionPrompt prompt = new AudioTranscriptionPrompt(audioFile, options);
95+
AudioTranscriptionResponse response = transcriptionModel.call(prompt);
96+
return response.getResult().getOutput();
97+
}
98+
}
99+
----
100+
101+
This service works seamlessly with OpenAI, Azure OpenAI, or any other transcription provider, with the actual implementation determined by your Spring Boot configuration.
102+
103+
== Provider-Specific Features
104+
105+
While the shared interface provides portability, each provider also offers specific features through provider-specific options classes (e.g., `OpenAiAudioTranscriptionOptions`, `AzureOpenAiAudioTranscriptionOptions`). These classes implement the `AudioTranscriptionOptions` interface while adding provider-specific capabilities.
106+
107+
For detailed information about provider-specific features, see the individual provider documentation pages.

0 commit comments

Comments
 (0)