Anton Obukhov PRO

toshas

AI & ML interests

None yet

Recent Activity

Organizations

CompVis Community's profile picture ZeroGPU Explorers's profile picture Photogrammetry and Remote Sensing Lab of ETH Zurich's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

toshas's activity

posted an update 24 days ago
view post
Post
1201
Introducing ⇆ Marigold-DC — our training-free zero-shot approach to monocular Depth Completion with guided diffusion! If you have ever wondered how else a long denoising diffusion schedule can be useful, we have an answer for you!

Depth Completion addresses sparse, incomplete, or noisy measurements from photogrammetry or sensors like LiDAR. Sparse points aren’t just hard for humans to interpret — they also hinder downstream tasks.

Traditionally, depth completion was framed as image-guided depth interpolation. We leverage Marigold, a diffusion-based monodepth model, to reframe it as sparse-depth-guided depth generation. How the turntables! Check out the paper anyway 👇

🌎 Website: https://marigolddepthcompletion.github.io/
🤗 Demo: prs-eth/marigold-dc
📕 Paper: https://arxiv.org/abs/2412.13389
👾 Code: https://github.com/prs-eth/marigold-dc

Team ETH Zürich: Massimiliano Viola ( @mviola ), Kevin Qu ( @KevinQu7 ), Nando Metzger ( @nandometzger ), Bingxin Ke ( @Bingxin ), Alexander Becker, Konrad Schindler, and Anton Obukhov ( @toshas ). We thank
Hugging Face for their continuous support.
posted an update 7 months ago
view post
Post
996
Join us at our remaining CVPR presentations this week! Members of PRS-ETH will be around to connect with you and discuss our presented and ongoing works:

💐 Marigold: Discover our work on sharp diffusion-based computer vision techniques, presented in Orals 3A track on "3D from Single View", Thu, June 20, 9:00-9:15 AM. Also, drop by Poster Session 3 later that day for more tangible matters! 🌚
Project page: https://marigoldmonodepth.github.io/
Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Collection: https://huggingface.co/collections/prs-eth/marigold-6669e9e3d3ee30f48214b9ba
Space: prs-eth/marigold-lcm
Diffusers 🧨 tutorial: https://huggingface.co/docs/diffusers/using-diffusers/marigold_usage

⚙️ Point2CAD: Learn about our mechanical CAD model reconstruction from point clouds, presented in Poster Session 1, Wed, June 19, 10:30 AM - 12:00 PM.
Project page: https://www.obukhov.ai/point2cad.html
Paper: Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds (2312.04962)

🎭 DGInStyle: Explore our generative data synthesis approach as a cost-efficient alternative to real and synthetic data, presented in the Workshop on Synthetic Data for Computer Vision, Tue, June 18, at Summit 423-425.
Details and schedule: https://syndata4cv.github.io/
Project page: https://dginstyle.github.io/
Paper: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control (2312.03048)
Model: yurujaja/DGInStyle
reacted to sayakpaul's post with ❤️🔥🚀 8 months ago
view post
Post
1870
🧨 Diffusers 0.28.0 is out 🔥

It features the first non-generative pipeline of the library -- Marigold 🥁

Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.

This release also features a massive refactor (led by @DN6 ) of the from_single_file() method, highlighting our efforts for making our library more amenable to community features 🤗

Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
posted an update 9 months ago
view post
Post
1984
Another gem from our lab — DGInStyle! We use Stable Diffusion to generate semantic segmentation data for autonomous driving and train domain-generalizable networks.

📟 Website: https://dginstyle.github.io
🧾 Paper: https://arxiv.org/abs/2312.03048
🤗 Hugging Face Paper: DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control (2312.03048)
🤗 Hugging Face Model: yurujaja/DGInStyle
🐙 Code: https://github.com/yurujaja/DGInStyle

In a nutshell, our pipeline overcomes the resolution loss of Stable Diffusion latent space and the style bias of ControlNet, as shown in the attached figures. This allows us to generate sufficiently high-quality pairs of images and semantic masks to train domain-generalizable semantic segmentation networks.

Team: Yuru Jia ( @yurujaja ), Lukas Hoyer, Shengyu Huang, Tianfu Wang ( @Tianfwang ), Luc Van Gool, Konrad Schindler, and Anton Obukhov ( @toshas ).
reacted to osanseviero's post with 🔥🤗❤️ 10 months ago
view post
Post
1619
Diaries of Open Source. Part 10 🚀

🌼Marigold-LCM: A super fast SOTA Depth Estimator
Demo: prs-eth/marigold-lcm
Original paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
Model: https://hf.co/prs-eth/marigold-lcm-v1-0

🌟Quiet-STaR: A self-teaching technique via internal monologue
Paper: Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking (2403.09629)
GitHub: https://github.com/ezelikman/quiet-star
Tweetutorial: https://twitter.com/ericzelikman/status/1768663835106513041

🖼️ WebSight v0.2: A image-to-code dataset containing tailwind CSS, images in screenshots, and more!
Dataset: HuggingFaceM4/WebSight
Paper: Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset (2403.09029)
Blog: https://hf.co/blog/websight

🕵️Agent-FLAN - effective agent tuning for LLMs
Paper: Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881)
Model: internlm/Agent-FLAN-7b
Dataset: internlm/Agent-FLAN
Website: https://internlm.github.io/Agent-FLAN/

🔥HPT, a family of multimodal LLMs from HyperGAI
Blog post: https://hypergai.com/blog/introducing-hpt-a-family-of-leading-multimodal-llms
Model: HyperGAI/HPT
GitHub: https://github.com/hyperGAI/HPT

🌏Models and datasets around the world
- Tess-70B, a MiQu-70B fine-tune with high-quality data migtissera/Tess-70B-v1.6
- UNI, a model trained on 100 million pathology images from 100k+ slides MahmoodLab/UNI
- CONCH, a VLM trained on 1.17 million pathology image-text pairs MahmoodLab/CONCH
·
posted an update 10 months ago
view post
Post
1985
Introducing Marigold-LCM 🌼 — a FAST version of the now popular state-of-the-art depth estimator! Thanks to the latent consistency distillation, it retains the precision of the original Marigold but reaches the solution in just a few steps!

Check out the teaser video attached below and play with the new demo - it accepts videos now! Also, meet the new team member: Tianfu Wang ( @Tianfwang )

🤗 Demo: prs-eth/marigold-lcm
🤗 Model: https://huggingface.co/prs-eth/marigold-lcm-v1-0
🤗 Original Marigold post: https://huggingface.co/posts/toshas/656973498012745
🤗 Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
👾 Code: https://github.com/prs-eth/marigold
👾 Code: pip install diffusers
  • 1 reply
·
reacted to osanseviero's post with ❤️ 12 months ago
view post
Post
I finished my model merging experiment day.🤗I would love your thoughts on this.

What did I do? I merged Mistral Instruct 0.1 and 0.2 models using different merging techniques:
- SLERP: linear interpolation (most popular method)
- MoE: replace some forward layers with MoE layers; using a random gate for now
- Frankenmerge: also known as passthrough, but that isn't very cool. It concatenates some specified layers ending in different numbers of params. In my case, I went from 7B to 9B.

Note: merging is not building an ensemble of models. You can read more about merging techniques at https://huggingface.co/blog/mlabonne/merge-models

Results
I built the 3 models using mergekit (running in an HF Space) - took less than an hour to do the three) osanseviero/mistral-instruct-merges-659ebf35ca0781acdb86bb0a

I'm doing a quick check with the OpenLLM Leaderboard.
🚨The OpenLLM Leaderboard is more suitable for pre-trained models than instruct models, but I still thought it would be interesting to look at the insights🚨

You can look at the attached image. Some interesting things
- All three models performed somewhere between 0.1 and 0.2 - congrats to the 140 people who got it right in https://twitter.com/osanseviero/status/1745071548866736171
- Frankenmerge terribly sucked with GSM8K. It seems that adding some Mistral 0.1 layers actually degraded the performance a lot - this is worse than even 0.1!
- Otherwise, frankenmerge was decent across HellaSwag, MMLU, and specially TruthfulQA
- MoE is using random gating, so I expected something right in between 0.1 and 0.2, which was the case

What do I do with this?
Not sure tbh! I think doing proper MT bench evals would be nice. I also think all of us should give a nice GH star to mergekit because it's awesome. I would love to have the time to do end-to-end ablation studies, but cool new things are coming up. Let me know if you have any thoughts in the results
·
reacted to their post with 🤯🤗❤️ about 1 year ago
view post
Post
Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

🤗 Hugging Face Space: https://huggingface.co/spaces/toshas/marigold
🤗 Hugging Face Model: https://huggingface.co/Bingxin/Marigold
🤗 Hugging Face Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
👾 Code: https://github.com/prs-eth/marigold
👾 Code: pip install diffusers (check comments to this post for details!)
📄 Paper: https://arxiv.org/abs/2312.02145

Brought to you by the fantastic team from the Photogrammetry and Remote Sensing group of ETH Zurich: Bingxin Ke ( @Bingxin ), Anton Obukhov ( @toshas ), Shengyu Huang, Nando Metzger ( @nandometzger ), Rodrigo Caye Daudt, and Konrad Schindler.
·
replied to their post about 1 year ago
posted an update about 1 year ago
view post
Post
Introducing Marigold 🌼 - a universal monocular depth estimator, delivering incredibly sharp predictions in the wild! Based on Stable Diffusion, it is trained with synthetic depth data only and excels in zero-shot adaptation to real-world imagery. Check it out:

🤗 Hugging Face Space: https://huggingface.co/spaces/toshas/marigold
🤗 Hugging Face Model: https://huggingface.co/Bingxin/Marigold
🤗 Hugging Face Paper: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation (2312.02145)
🌐 Website: https://marigoldmonodepth.github.io
👾 Code: https://github.com/prs-eth/marigold
👾 Code: pip install diffusers (check comments to this post for details!)
📄 Paper: https://arxiv.org/abs/2312.02145

Brought to you by the fantastic team from the Photogrammetry and Remote Sensing group of ETH Zurich: Bingxin Ke ( @Bingxin ), Anton Obukhov ( @toshas ), Shengyu Huang, Nando Metzger ( @nandometzger ), Rodrigo Caye Daudt, and Konrad Schindler.
·