Social Post Explorers

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

social-post-explorers's activity

danielhanchen 
posted an update about 18 hours ago
view post
Post
499
We fixed many bugs in Phi-4 & uploaded fixed GGUF + 4-bit versions! ✨

Our fixed versions are even higher on the Open LLM Leaderboard than Microsoft's!

GGUFs: unsloth/phi-4-GGUF
Dynamic 4-bit: unsloth/phi-4-unsloth-bnb-4bit

You can also now finetune Phi-4 for free on Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb

Read our blogpost for more details on bug fixes etc: https://unsloth.ai/blog/phi4
as-cle-bert 
posted an update 1 day ago
view post
Post
716
Hi HuggingFace community!🤗

I recently released PrAIvateSearch v2.0-beta.0 (https://github.com/AstraBert/PrAIvateSearch), my privacy-first, AI-powered, user-centered and data-safe application aimed at providing a local and open-source alternative to big AI search engines such as SearchGPT or Perplexity AI.

We have several key changes:

- New chat UI built with NextJS
- DuckDuckGo API used for web search instead of Google
- Qwen/Qwen2.5-1.5B-Instruct as a language model served on API (by FastAPI)
- Crawl4AI crawler used for web scraping
- Optimizations in the data workflow inside the application

Read more in my blog post 👉 https://huggingface.co/blog/as-cle-bert/search-the-web-with-ai

Have fun and feel free to leave feedback about how to improve the application!✨
·
davanstrien 
posted an update 1 day ago
view post
Post
1030
The data-is-better-together/fineweb-c dataset is growing!

This week a few more languages have got 1,000 annotations for the educational quality of data from HuggingFaceFW/fineweb-2.

Why should you care?

The quality of pre-training data can have a big impact on the performance of downstream language models trained on that data ( HuggingFaceFW/blogpost-fineweb-v1).

Being able to filter by educational quality is on way of improving the quality of the data you use for training an LLM. Very importantly this approach can also reduce the amount of data needed for pertaining.

Why not use an LLM?

LLMs can be used to annotate educational quality for a subset of data. This data can then be used to train a smaller encoder only model to label the full dataset. However, this may not work well for languages outside of english. This is where fineweb-c (community) comes in.

The community is annotating the educational quality of fineweb2 data. Currently 114 languages have some annotations. These annotations will enable a number of things:

- Evaluate whether an LLM can label the educational quality for texts in that language well
- Directly be used for training quality classifiers
- Help discover other rules and huerisitcs for refining fineweb2 further for different languages.

This week the following languages where done:

Swedish thanks to: @Lauler @AntonVic @ohallstrom @bjarlestam @menbom @Ekgren @apsod

Ukrainian thanks to: @hannayukhymenko @robinhad @realPivo @RabotiahovDmytro @reciprocate

Assamese thanks to: @moyoor97 @Arpanjyoti @nawaf-helmi123 @pahigogoi1 @aelhence @kishorekashyap

Want to learn more: https://huggingface.co/blog/davanstrien/fineweb2-community

Contribute yourself here: data-is-better-together/fineweb-c
  • 1 reply
·
nyuuzyou 
posted an update 1 day ago
view post
Post
968
🗂️ I don't think the collections feature of Hugging Face is widely used, even though it's an excellent way to organize and discover interesting resources. To do my bit to change that, I've created two carefully curated collections that combine both my original work and other valuable datasets:

Educational Datasets
- Mostly English-Russian, but other languages are also included
- Extended by my new Begemot.ai dataset (2.7M+ Russian education records) nyuuzyou/begemot

Link: nyuuzyou/educational-datasets-677c268978ac1cec96cc3605

Anime & Art

- Extensive art-focused collection, including my new datasets:
- Buzzly.art (2K artworks) nyuuzyou/buzzlyart
- Paintberri (60K+ pieces) nyuuzyou/paintberri
- Itaku.ee (924K+ items) nyuuzyou/itaku
- Extended with other amazing datasets from the community

Link: nyuuzyou/anime-and-art-677ae996682a389fccd892c3

Collections should become a more common feature - hopefully this will encourage others to create and share their own curated collections. By organizing related datasets into these themed collections, I hope to make it easier for researchers and developers to discover and use these valuable resources.
  • 1 reply
·
MonsterMMORPG 
posted an update 3 days ago
view post
Post
1603
Famous IC-Light - Relight Images - Advanced Gradio APP with Windows, RunPod, Massed Compute and Free Kaggle Account Installers Published


Installers are shared here : https://www.patreon.com/posts/famous-ic-light-119566071

1-Click to install and use on Windows, RunPod, Massed Compute and a free Kaggle account notebook

All working perfect with more advanced Gradio app than what was officially published on official repo : https://github.com/lllyasviel/IC-Light

Moreover,

Started another experimental product training for a client. Doing FLUX Dreambooth / Finetuning via Kohya SS GUI. GPU is L40S and batch size is 7. Config name : Batch_Size_7_48GB_GPU_46250MB_29.1_second_it_Tier_1.json

Full workflow, step by step tutorial and configs : https://youtu.be/FvpWy1x5etM

Check out the attached images in full resolution fore more info
Tonic 
posted an update 3 days ago
view post
Post
1494
microsoft just released Phi-4 , check it out here : Tonic/Phi-4

hope you like it :-)
danielhanchen 
posted an update 4 days ago
MonsterMMORPG 
posted an update 7 days ago
view post
Post
3285
SANA: Ultra HD Fast Text to Image Model from NVIDIA Step by Step Tutorial on Windows, Cloud & Kaggle — Generate 2048x2048 Images

Below is YouTube link for step by step tutorial and a 1-Click to installer having very advanced Gradio APP to use newest Text-to-Image SANA Model on your Windows PC locally and also on cloud services such as Massed Compute, RunPod and free Kaggle.

https://youtu.be/KW-MHmoNcqo

This above tutorial covers the newest SANA 2K model and I predict SANA 4K model will be published as well. Sana 2K model is 4 MegaPixel so it can generate the following aspect ratio and resolutions very well:

“1:1”: (2048, 2048), “4:3”: (2304, 1792), “3:4”: (1792, 2304),
“3:2”: (2432, 1664), “2:3”: (1664, 2432), “16:9”: (2688, 1536),
“9:16”: (1536, 2688), “21:9”: (3072, 1280), “9:21”: (1280, 3072),
“4:5”: (1792, 2240), “5:4”: (2240, 1792)

I have developed an amazing Gradio app with so many new features :

VAE auto offloading to reduce VRAM usage significantly which is not exists on official pipeline

Gradio APP built upon official pipeline with improvements so works perfect

Batch size working perfect

Number of images working perfect

Multi-line prompting working perfect

Aspect ratios for both 1K and 2K models working perfect

Randomized seed working perfect

1-Click installers for Windows (using Python 3.10 and VENV — isolated), RunPod, Massed Compute and even a free Kaggle account notebook

With proper latest libraries working perfect speed on Windows too

Automatically properly saving every generated image into accurate folder

🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ⤵️
▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-116474081

🔗 SECourses Official Discord 9500+ Members ⤵️
▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

  • 2 replies
·
as-cle-bert 
posted an update 7 days ago
view post
Post
534
Are you using Obsidian to write your notes?
If the answer is yes, then this post might be for you!✅
I recently created 𝐨𝐛𝐬𝐢𝐝𝐢𝐚𝐧-𝐝𝐢𝐠𝐞𝐬𝐭, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on🧠

Repo 👉 https://github.com/AstraBert/obsidian-digest
PyPi Package 👉 https://pypi.org/project/obsidian-digest/

The app is available as:
- 𝐜𝐨𝐦𝐦𝐚𝐧𝐝-𝐥𝐢𝐧𝐞 𝐭𝐨𝐨𝐥: install it as a python package with 𝗽𝗶𝗽, and execute it from terminal anytime!📦
-𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐛𝐮𝐢𝐥𝐭 𝐟𝐫𝐨𝐦 𝐬𝐨𝐮𝐫𝐜𝐞 𝐜𝐨𝐝𝐞: clone the GitHub repo, install the needed dependencies through 𝗰𝗼𝗻𝗱𝗮, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour🤖
- 𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐝𝐞𝐩𝐥𝐨𝐲𝐞𝐝 𝐥𝐨𝐜𝐚𝐥𝐥𝐲 𝐰𝐢𝐭𝐡 𝐝𝐨𝐜𝐤𝐞𝐫 𝐜𝐨𝐦𝐩𝐨𝐬𝐞: clone the GitHub repo and launch 𝗱𝗼𝗰𝗸𝗲𝗿 𝗰𝗼𝗺𝗽𝗼𝘀𝗲 𝘂𝗽. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process🐋

Go check out the GitHub repo for more info 👉 https://github.com/AstraBert/obsidian-digest

Have fun!✨
  • 1 reply
·
as-cle-bert 
posted an update 9 days ago
view post
Post
2053
🎉𝐄𝐚𝐫𝐥𝐲 𝐍𝐞𝐰 𝐘𝐞𝐚𝐫 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬🎉

Hi HuggingFacers🤗, I decided to ship early this year, and here's what I came up with:

𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧 (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo 👉 https://github.com/AstraBert/PdfItDown
PyPi Package 👉 https://pypi.org/project/pdfitdown/

𝐒𝐞𝐧𝐓𝐫𝐄𝐯 𝐯𝟏.𝟎.𝟎 (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 performance of your 𝘁𝗲𝘅𝘁 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 models, I have good news for you🥳🥳
The new release for 𝐒𝐞𝐧𝐓𝐫𝐄𝐯 now supports 𝗱𝗲𝗻𝘀𝗲 and 𝘀𝗽𝗮𝗿𝘀𝗲 retrieval (thanks to FastEmbed by Qdrant) with 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 𝗳𝗶𝗹𝗲 𝗳𝗼𝗿𝗺𝗮𝘁𝘀 (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new 𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝗰𝗲 𝗺𝗲𝘁𝗿𝗶𝗰𝘀!
GitHub repo 👉 https://github.com/AstraBert/SenTrEv
Release Notes 👉 https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package 👉 https://pypi.org/project/sentrev/

Happy New Year and have fun!🥂
  • 2 replies
·
1aurent 
posted an update 11 days ago
as-cle-bert 
posted an update 12 days ago
view post
Post
544
Hi HF Community!🤗

As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges:

https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms

The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo 👉 https://github.com/AstraBert/DebateLLM-Championship

I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament 👉 as-cle-bert/DebateLLMs

Happy reading and happy new yeAIr!🎉
  • 3 replies
·
nyuuzyou 
posted an update 12 days ago
view post
Post
561
🎮 ALLSTAR.GG Dataset - nyuuzyou/allstar

A collection of 47,896 gaming clips featuring:
- High-quality gameplay captures with various clip lengths and resolutions
- Complete metadata including user IDs, clip titles, and game parameters
- Content captured from Counter-Strike 2 competitive matches
- Full game statistics and technical parameters
nyuuzyou 
posted an update 14 days ago
view post
Post
2192
🎨 KLING AI Dataset - nyuuzyou/klingai

A collection of 12,782 AI-generated media items featuring:
- High-quality image and video generations at various resolutions
- Complete metadata including user IDs, prompts, and generation parameters
- Content generated using text-to-image, text-to-video, and image-to-video modalities
- Full generation settings and technical parameters
nyuuzyou 
posted an update 15 days ago
view post
Post
2514
CS2 Highlights Video Dataset - nyuuzyou/cs2-highlights

A collection of 4,857 high-quality Counter-Strike 2 gameplay highlights featuring:

- Professional and competitive gameplay recordings at 1080p resolution
- Complete metadata including Steam IDs and clip titles
- Preview thumbnails for all videos
- Both 60 FPS (842 clips) and 120 FPS (4,015 clips) content
- Gameplay from Faceit and official competitive modes

This extensive highlights collection provides a valuable resource for developing and evaluating video-based AI applications, especially in esports and competitive gaming contexts. Released under Creative Commons Zero (CC0) license.
davanstrien 
posted an update 15 days ago
view post
Post
3132
🇸🇰 Hovorte po slovensky? Help build better AI for Slovak!

We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset ( data-is-better-together/fineweb-c) release!

Your contribution will help create better language models for 5+ million Slovak speakers.

Annotate here: data-is-better-together/fineweb-c.

Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community
  • 3 replies
·
as-cle-bert 
posted an update 16 days ago
MonsterMMORPG 
posted an update 18 days ago
view post
Post
2811
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB




  • 1 reply
·
as-cle-bert 
posted an update 18 days ago
view post
Post
1711
Hi HuggingFacers!🤶🏼

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons🦋

GitHub 👉 https://github.com/AstraBert/Pokemon-Bot
Demo Space 👉 as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace🤗, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!🍕