Clelia (Astra) Bertelli's picture

Clelia (Astra) Bertelli PRO

as-cle-bert

AI & ML interests

Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Recent Activity

upvoted an article 1 day ago
Search the Web with AI
View all activity

Articles

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture GreenFit AI's profile picture

as-cle-bert's activity

replied to their post about 17 hours ago
view reply

Thank you so much for letting me know! This is indeed a very interesting role :)

posted an update 1 day ago
view post
Post
716
Hi HuggingFace community!๐Ÿค—

I recently released PrAIvateSearch v2.0-beta.0 (https://github.com/AstraBert/PrAIvateSearch), my privacy-first, AI-powered, user-centered and data-safe application aimed at providing a local and open-source alternative to big AI search engines such as SearchGPT or Perplexity AI.

We have several key changes:

- New chat UI built with NextJS
- DuckDuckGo API used for web search instead of Google
- Qwen/Qwen2.5-1.5B-Instruct as a language model served on API (by FastAPI)
- Crawl4AI crawler used for web scraping
- Optimizations in the data workflow inside the application

Read more in my blog post ๐Ÿ‘‰ https://huggingface.co/blog/as-cle-bert/search-the-web-with-ai

Have fun and feel free to leave feedback about how to improve the application!โœจ
ยท
upvoted an article 1 day ago
posted an update 7 days ago
view post
Post
534
Are you using Obsidian to write your notes?
If the answer is yes, then this post might be for you!โœ…
I recently created ๐จ๐›๐ฌ๐ข๐๐ข๐š๐ง-๐๐ข๐ ๐ž๐ฌ๐ญ, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on๐Ÿง 

Repo ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/obsidian-digest/

The app is available as:
- ๐œ๐จ๐ฆ๐ฆ๐š๐ง๐-๐ฅ๐ข๐ง๐ž ๐ญ๐จ๐จ๐ฅ: install it as a python package with ๐—ฝ๐—ถ๐—ฝ, and execute it from terminal anytime!๐Ÿ“ฆ
-๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐›๐ฎ๐ข๐ฅ๐ญ ๐Ÿ๐ซ๐จ๐ฆ ๐ฌ๐จ๐ฎ๐ซ๐œ๐ž ๐œ๐จ๐๐ž: clone the GitHub repo, install the needed dependencies through ๐—ฐ๐—ผ๐—ป๐—ฑ๐—ฎ, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour๐Ÿค–
- ๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐๐ž๐ฉ๐ฅ๐จ๐ฒ๐ž๐ ๐ฅ๐จ๐œ๐š๐ฅ๐ฅ๐ฒ ๐ฐ๐ข๐ญ๐ก ๐๐จ๐œ๐ค๐ž๐ซ ๐œ๐จ๐ฆ๐ฉ๐จ๐ฌ๐ž: clone the GitHub repo and launch ๐—ฑ๐—ผ๐—ฐ๐—ธ๐—ฒ๐—ฟ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ฒ ๐˜‚๐—ฝ. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process๐Ÿ‹

Go check out the GitHub repo for more info ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest

Have fun!โœจ
  • 1 reply
ยท
replied to their post 9 days ago
view reply

Hi and thanks a lot for the specification!๐Ÿฅฐ

Just as a note from my side, in the article I specify that there is a difference between "open weights" and "open source" models, and I link this blog post: https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/ for a deeper explanation of the difference. I never (and I would never) claimed that Llama is open source, let alone a free software (see the introduction in this article of mine on privacy and data "stealing" risks: https://huggingface.co/blog/as-cle-bert/build-an-ai-powered-search-engine-from-scratch).

And I would have gladly used also DeepSeek, if it had been available on HuggingChat! :)

I nevertheless highly appreciate your comment and I'll for sure be more cautious in using the word "open/open source" in the future. Thanks!โœจ

replied to their post 9 days ago
view reply

Both PdfItDown and SenTrEv only work with text for now: in future releases, support for image will be added :)
For text extraction, I use PyPDF + Langchain

posted an update 9 days ago
view post
Post
2053
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰

Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with:

๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/

๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ
The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€!
GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv
Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/

Happy New Year and have fun!๐Ÿฅ‚
  • 2 replies
ยท
reacted to nroggendorff's post with โž• 10 days ago
view post
Post
6219
hey nvidia, can you send me a gpu?
comment or react if you want ~~me~~ to get one too. ๐Ÿ‘‰๐Ÿ‘ˆ
ยท
posted an update 12 days ago
view post
Post
544
Hi HF Community!๐Ÿค—

As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges:

https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms

The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/DebateLLM-Championship

I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament ๐Ÿ‘‰ as-cle-bert/DebateLLMs

Happy reading and happy new yeAIr!๐ŸŽ‰
  • 3 replies
ยท
upvoted an article 12 days ago
published an article 12 days ago