Florence Tool CLI

Overview

The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply various visual and text-based tasks, such as object detection, captioning, and OCR, on individual images or entire folders of images.

Features

Model Loading: Load and run the Florence-2 model from a local path or Hugging Face hub.
Task Variety: Supports a wide range of tasks, including captioning, object detection, dense region captioning, OCR, and more.
Batch Processing: Efficiently process images in batches from a folder.
Recursive Search: Optionally process images within subdirectories.
Customizable Output: Save results in JSON, CSV, or plain text formats with optional suffixes and overwrite modes.
Flexible Image Handling: Specify the image file extensions to process, allowing for flexibility in file types.

Installation

Clone the repository:

git clone https://github.com/bigdata-pw/florence-tool.git
cd florence-tool

Install dependencies:
```
pip install -r requirements.txt
```
Install the tool:
```
pip install -e .
```

Usage

Command-Line Interface

You can use the tool directly from the command line by running the following command:

florence-tool run [OPTIONS]

Options

--hf-hub-or-path (required): Path or Hugging Face hub model identifier for Florence-2.
--device: Device to run the model on (e.g., "cuda:0" or "cpu"). Default is "cuda:0".
--dtype: Torch dtype to use (e.g., "float16", "float32", "bfloat16"). Default is "float16".
--task (required): Task to run (e.g., "<CAPTION>", "<OD>", etc.).
--image: Path to an image file.
--folder: Path to a folder containing images.
--wds: WebDataset.
--output-dir: Directory to save the results.
--text-input: Optional text input for tasks that require it.
--max-new-tokens: Maximum number of new tokens to generate. Default is 1024.
--num-beams: Number of beams for beam search. Default is 3.
--output-format: Format to save the results (json, csv, or txt). Default is json.
--recursive: Process subdirectories if specified.
--suffix: Suffix to use for the output file.
--overwrite: Flag to overwrite existing files. If not specified, appends/updates the files.
--image-extensions: Comma-separated list of image file extensions to include (e.g., "jpg,png,jpeg"). Default is "jpg,png".
--batch-size: Number of images to process in a batch. Default is 1.
--num-workers: Number of Dataloader workers. Default is 4, overriden to 0 on Windows.
--prefetch-factor: Prefetch factor for Dataloader workers. Default is 4.
--image-key: WebDataset image key.
--no-check: Skips task type check for models with added task types - task will be processed as the default pure_text type. This is intended for testing purposes, new task types can be added to the check, just create an issue or PR.

Examples

Processing a Single Image

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --image /path/to/image.jpg --output-dir /path/to/output/

Processing Images in a Folder

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OD>" --folder /path/to/folder/ --output-dir /path/to/output/

Processing a WebDataset

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "shard-{00000..00069}.tar" --output-dir /path/to/output/

Processing a WebDataset (streaming)

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar -" --output-dir /path/to/output/

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar --endpoint-url https://00000000000000000000000000000000.r2.cloudflarestorage.com -" --output-dir /path/to/output/

Processing Images Recursively

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OCR>" --folder /path/to/folder/ --output-dir /path/to/output/ --recursive

Custom Image Extensions

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<DENSE_REGION_CAPTION>" --folder /path/to/folder/ --image-extensions jpg,png,jpeg --output-dir /path/to/output/

Saving Results in CSV Format

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<REGION_PROPOSAL>" --folder /path/to/folder/ --output-dir /path/to/output/ --output-format csv

Overwriting Existing Files

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --suffix captions --overwrite

Processing in Batches

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --batch-size 4

Running on CPU

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --device "cpu"

Tasks

<OCR>
<OCR_WITH_REGION>
<CAPTION>
<DETAILED_CAPTION>
<MORE_DETAILED_CAPTION>
<OD>
<DENSE_REGION_CAPTION>
<CAPTION_TO_PHRASE_GROUNDING>
<REFERRING_EXPRESSION_SEGMENTATION>
<REGION_TO_SEGMENTATION>
<OPEN_VOCABULARY_DETECTION>
<REGION_TO_CATEGORY>
<REGION_TO_DESCRIPTION>
<REGION_TO_OCR>
<REGION_PROPOSAL>

Third Party Task Types

Supported by MiaoshouAI/Florence-2-base-PromptGen-v1.5

<GENERATE_TAGS>
<MIXED_CAPTION>

Supported by MiaoshouAI/Florence-2-base-PromptGen

<GENERATE_PROMPT>

Development

Code Structure

florence_tool.py: Main class that implements the Florence-2 model handling and processing logic.
cli.py: Command-line interface built with Click.
modeling: Directory containing model configuration and processing scripts.

Running Locally

To run the CLI locally without installing:

python -m florence_tool.cli run [OPTIONS]

Contributing

Contributions are welcome! Please submit a pull request or open an issue if you have ideas or find a bug.

License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/florence_tool		src/florence_tool
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Florence Tool CLI

Overview

Features

Installation

Usage

Command-Line Interface

Options

Examples

Processing a Single Image

Processing Images in a Folder

Processing a WebDataset

Processing a WebDataset (streaming)

Processing Images Recursively

Custom Image Extensions

Saving Results in CSV Format

Overwriting Existing Files

Processing in Batches

Running on CPU

Tasks

Third Party Task Types

Development

Code Structure

Running Locally

Contributing

License

About

Uh oh!

Uh oh!

Languages

License

bigdata-pw/florence-tool

Folders and files

Latest commit

History

Repository files navigation

Florence Tool CLI

Overview

Features

Installation

Usage

Command-Line Interface

Options

Examples

Processing a Single Image

Processing Images in a Folder

Processing a WebDataset

Processing a WebDataset (streaming)

Processing Images Recursively

Custom Image Extensions

Saving Results in CSV Format

Overwriting Existing Files

Processing in Batches

Running on CPU

Tasks

Third Party Task Types

Development

Code Structure

Running Locally

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages