The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply various visual and text-based tasks, such as object detection, captioning, and OCR, on individual images or entire folders of images.
- Model Loading: Load and run the Florence-2 model from a local path or Hugging Face hub.
- Task Variety: Supports a wide range of tasks, including captioning, object detection, dense region captioning, OCR, and more.
- Batch Processing: Efficiently process images in batches from a folder.
- Recursive Search: Optionally process images within subdirectories.
- Customizable Output: Save results in JSON, CSV, or plain text formats with optional suffixes and overwrite modes.
- Flexible Image Handling: Specify the image file extensions to process, allowing for flexibility in file types.
-
Clone the repository:
git clone https://github.com/bigdata-pw/florence-tool.git cd florence-tool
-
Install dependencies:
pip install -r requirements.txt
-
Install the tool:
pip install -e .
You can use the tool directly from the command line by running the following command:
florence-tool run [OPTIONS]
--hf-hub-or-path
(required): Path or Hugging Face hub model identifier for Florence-2.--device
: Device to run the model on (e.g.,"cuda:0"
or"cpu"
). Default is"cuda:0"
.--dtype
: Torch dtype to use (e.g.,"float16"
,"float32"
,"bfloat16"
). Default is"float16"
.--task
(required): Task to run (e.g.,"<CAPTION>"
,"<OD>"
, etc.).--image
: Path to an image file.--folder
: Path to a folder containing images.--wds
: WebDataset.--output-dir
: Directory to save the results.--text-input
: Optional text input for tasks that require it.--max-new-tokens
: Maximum number of new tokens to generate. Default is1024
.--num-beams
: Number of beams for beam search. Default is3
.--output-format
: Format to save the results (json
,csv
, ortxt
). Default isjson
.--recursive
: Process subdirectories if specified.--suffix
: Suffix to use for the output file.--overwrite
: Flag to overwrite existing files. If not specified, appends/updates the files.--image-extensions
: Comma-separated list of image file extensions to include (e.g.,"jpg,png,jpeg"
). Default is"jpg,png"
.--batch-size
: Number of images to process in a batch. Default is1
.--num-workers
: Number of Dataloader workers. Default is4
, overriden to0
on Windows.--prefetch-factor
: Prefetch factor for Dataloader workers. Default is4
.--image-key
: WebDataset image key.--no-check
: Skips task type check for models with added task types - task will be processed as the defaultpure_text
type. This is intended for testing purposes, new task types can be added to the check, just create an issue or PR.
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --image /path/to/image.jpg --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OD>" --folder /path/to/folder/ --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "shard-{00000..00069}.tar" --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar -" --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar --endpoint-url https://00000000000000000000000000000000.r2.cloudflarestorage.com -" --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OCR>" --folder /path/to/folder/ --output-dir /path/to/output/ --recursive
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<DENSE_REGION_CAPTION>" --folder /path/to/folder/ --image-extensions jpg,png,jpeg --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<REGION_PROPOSAL>" --folder /path/to/folder/ --output-dir /path/to/output/ --output-format csv
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --suffix captions --overwrite
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --batch-size 4
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --device "cpu"
<OCR>
<OCR_WITH_REGION>
<CAPTION>
<DETAILED_CAPTION>
<MORE_DETAILED_CAPTION>
<OD>
<DENSE_REGION_CAPTION>
<CAPTION_TO_PHRASE_GROUNDING>
<REFERRING_EXPRESSION_SEGMENTATION>
<REGION_TO_SEGMENTATION>
<OPEN_VOCABULARY_DETECTION>
<REGION_TO_CATEGORY>
<REGION_TO_DESCRIPTION>
<REGION_TO_OCR>
<REGION_PROPOSAL>
Supported by MiaoshouAI/Florence-2-base-PromptGen-v1.5
<GENERATE_TAGS>
<MIXED_CAPTION>
Supported by MiaoshouAI/Florence-2-base-PromptGen
<GENERATE_PROMPT>
florence_tool.py
: Main class that implements the Florence-2 model handling and processing logic.cli.py
: Command-line interface built withClick
.modeling
: Directory containing model configuration and processing scripts.
To run the CLI locally without installing:
python -m florence_tool.cli run [OPTIONS]
Contributions are welcome! Please submit a pull request or open an issue if you have ideas or find a bug.
This project is licensed under the Apache 2.0 License.