Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
audio		audio
img		img
ref		ref
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Repository files navigation

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
Image To Audio	📢	gray	yellow	streamlit	1.29.0	app.py	false

The Image Reader 📢

The Image Reader 📢 - Playground

This application analyzes the uploaded image, generates an imaginative phrase, and then converts it into audio.

For image_to_audio following technologies were used:
- Image Reader:
  - HuggingFace image-to-text task used with Salesforce/blip-image-captioning-base pretrained model. Which produces a small description about the image.
  - Salesforce/blip-image-captioning-base
    - BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- Generate an imaginative phrase:
  - OpenAI GPT-3.5-Turbo used to produce an imaginative narrative from the description generated earlier.
  - The phrase generated with more than 40 words.
  - GPT-3.5 Turbo
- text-to-audio:
  - suno/bark-small used to generate the audio version of the imaginative narrative earlier.
  - suno/bark-small
    - BARK: Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.

About

image_to_audio

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%