Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference Providers] Async calls for text-to-video with fal.ai #1292

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

hanouticelina
Copy link
Contributor

@hanouticelina hanouticelina commented Mar 17, 2025

What does this PR do?

This PR adds asynchronous polling to the fal.ai text-to-video generation. This allows running inference with models that may take > 2 min to generate results. The other motivation behind this PR is to align the Python and JS clients, the Python client has already been merged into main: huggingface/huggingface_hub#2927

Main Changes

  • Replaced static baseUrl property with makeBaseUrl() function across all providers. This is needed to be able to customize the base url based on the task. We want to use FAL_AI_API_BASE_URL_QUEUE for text-to-video only. I'm not convinced if it's the simplest and the best way to do that.
  • Added a pollFalResponse() for text-to-video(similarly to what it's done with BFL for text-to-image).

Any refactoring suggestions are welcome! I'm willing to spend some additional time to make provider-specific updates easier to implement and better align our two clients 🙂

btw, I did not update the VCR tests as we've discussed that it'd be best to remove the VCR for text-to-video. Maybe we should remove them here?
EDIT: removed the text-to-video tests in f8a6386.

I've tested it locally with tencent/HunyuanVideo for which the generation takes more than 2min and it works fine:

fal-ai-output.mp4

Copy link
Contributor

@SBrandeis SBrandeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🤩

Comment on lines 77 to 116
async function pollFalResponse(res: FalAiOutput, args: TextToVideoArgs, options?: Options): Promise<Blob> {
const requestId = res.request_id;
if (!requestId) {
throw new InferenceOutputError("No request ID found in the response");
}
let status = res.status;
const { url, info } = await makeRequestOptions(args, { ...options, task: "text-to-video" });
const baseUrl = url?.split("?")[0] || "";
const query = url?.includes("_subdomain=queue") ? "?_subdomain=queue" : "";

const statusUrl = `${baseUrl}/requests/${requestId}/status${query}`;
const resultUrl = `${baseUrl}/requests/${requestId}${query}`;

while (status !== "COMPLETED") {
await delay(1000);
const statusResponse = await fetch(statusUrl, { headers: info.headers });

if (!statusResponse.ok) {
throw new Error(`HTTP error! status: ${statusResponse.status}`);
}
status = (await statusResponse.json()).status;
}

const resultResponse = await fetch(resultUrl, { headers: info.headers });
const result = await resultResponse.json();
const isValidOutput =
typeof result === "object" &&
!!result &&
"video" in result &&
typeof result.video === "object" &&
!!result.video &&
"url" in result.video &&
typeof result.video.url === "string" &&
isUrl(result.video.url);
if (!isValidOutput) {
throw new InferenceOutputError("Expected { video: { url: string } }");
}
const urlResponse = await fetch(result.video.url);
return await urlResponse.blob();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to move this util in the src/providers/fal-ai.ts file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, we should probably do the same for:

async function pollBflResponse(url: string, outputType?: "url" | "blob"): Promise<Blob> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in cf2d1ac. I had to move FalAiOutput to src/providers/fal-ai.ts to avoid an import cycle, maybe we should move every provider-specific output type to their respective provider files, wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should move every provider-specific output type to their respective provider files, wdyt ?

Agree with this and in particular their output type + logic to validate and parse the response. Same as the get_response's in Python. Out of scope for this PR though

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants