diff --git a/README.md b/README.md index b4069bc..c701d85 100644 --- a/README.md +++ b/README.md @@ -79,6 +79,28 @@ see: [documentation of get_idolsankaku_tags](https://dghs-realutils.deepghs.org/ ### Generic Object Detection +We use official YOLO models the generic purpose of object detections. + +![object_detection](https://github.com/deepghs/realutils/blob/gh-pages/main/_images/yolo_demo.plot.py.svg) + +We can use `detect_by_yolo` for generic object detection + +```python +from realutils.detect import detect_by_yolo + +print(detect_by_yolo('yolo/unsplash_aJafJ0sLo6o.jpg')) +# [((450, 317, 567, 599), 'person', 0.9004617929458618)] +print(detect_by_yolo('yolo/unsplash_n4qQGOBgI7U.jpg')) +# [((73, 101, 365, 409), 'vase', 0.9098997116088867), ((441, 215, 659, 428), 'vase', 0.622944176197052), ((5, 1, 428, 377), 'potted plant', 0.5178268551826477)] +print(detect_by_yolo('yolo/unsplash_vUNQaTtZeOo.jpg')) +# [((381, 103, 676, 448), 'bird', 0.9061452150344849)] +print(detect_by_yolo('yolo/unsplash_YZOqXWF_9pk.jpg')) +# [((315, 100, 690, 532), 'horse', 0.9453459978103638), ((198, 181, 291, 256), 'horse', 0.917123556137085), ((145, 173, 180, 249), 'horse', 0.7972317337989807), ((660, 138, 701, 170), 'horse', 0.4843617379665375)] +``` + +More models are hosted on [huggingface repository](https://huggingface.co/deepghs/yolos). +An online demo are provided as well, you can try [it](https://huggingface.co/spaces/deepghs/yolos) out. + ### Face Detection We use YOLO models from [akanametov/yolo-face](https://github.com/akanametov/yolo-face) for face detection. @@ -103,4 +125,70 @@ print(detect_real_faces('yolo/multiple.jpg')) More models are hosted on [huggingface repository](https://huggingface.co/deepghs/yolo-face). An online demo are provided as well, you can try [it](https://huggingface.co/spaces/deepghs/yolo-face) out. +### Feature Extractor + +We support DINOv2-based image feature extractor, like this + +```python +from realutils.metrics import get_dinov2_embedding + +embedding = get_dinov2_embedding('unsplash_0aLd44ICcpg.jpg') +print(embedding.shape) +# (768,) +``` + +You can use this embedding, calculating their cosine similarities to measure their visual similarities. + +### Image-Text Models + +We support both CLIP and SigLIP for multimodal alignment operations, like this + +* CLIP + +```python +from realutils.metrics.clip import classify_with_clip + +print(classify_with_clip( + images=[ + 'xlip/1.jpg', + 'xlip/2.jpg' + ], + texts=[ + 'a photo of a cat', + 'a photo of a dog', + 'a photo of a human', + ], +)) +# array([[0.98039913, 0.00506729, 0.01453355], +# [0.05586662, 0.02006196, 0.92407143]], dtype=float32) +``` + +* SigLIP + +```python +from realutils.metrics.siglip import classify_with_siglip + +print(classify_with_siglip( + images=[ + 'xlip/1.jpg', + 'xlip/2.jpg', + ], + texts=[ + 'a photo of a cat', + 'a photo of 2 cats', + 'a photo of 2 dogs', + 'a photo of a woman', + ], +)) +# array([[1.3782851e-03, 2.7010253e-01, 9.7517688e-05, 3.6702781e-09], +# [3.3248414e-06, 2.2294161e-07, 1.9753381e-09, 2.2561464e-06]], +# dtype=float32) +``` + +For more details, you can take a look at: + +* [Documentation of realutils.metrics.clip](https://dghs-realutils.deepghs.org/main/api_doc/metrics/clip.html) +* [Models of CLIP](https://huggingface.co/deepghs/clip_onnx) +* [Documentation of realutils.metrics.siglip](https://dghs-realutils.deepghs.org/main/api_doc/metrics/siglip.html) +* [Models of SigLIP](https://huggingface.co/deepghs/siglip_onnx)