Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [ChatKnowledge] document embedding failed'source' #2169

Open
3 of 15 tasks
sunnf opened this issue Dec 1, 2024 · 2 comments
Open
3 of 15 tasks

[Bug] [ChatKnowledge] document embedding failed'source' #2169

sunnf opened this issue Dec 1, 2024 · 2 comments
Labels
bug Something isn't working Waiting for reply

Comments

@sunnf
Copy link

sunnf commented Dec 1, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Operating system information

Linux

Python version information

3.10

DB-GPT version

main

Related scenes

  • Chat Data
  • Chat Excel
  • Chat DB
  • Chat Knowledge
  • Model Management
  • Dashboard
  • Plugins

Installation Information

Device information

(base) root@autodl-container-3ea342bd74-8563f1df:/usr/lib# nvidia-smi
Sun Dec 1 18:15:44 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:44:00.0 Off | N/A |
| 0% 25C P8 24W / 350W | 8804MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|

Models information

LLM:vicuna-7b-v1.5 Embedding model:text2vec-large-chinese

What happened

上传pdf文档,时发生
图片

What you expected to happen

INFO: 111.41.138.94:0 - "POST /knowledge/ai/arguments HTTP/1.1" 200 OK
2024-12-01 18:08:14 autodl-container-3ea342bd74-8563f1df dbgpt.app.knowledge.api[1646] INFO /document/list params: ai, doc_name=None doc_ids=None doc_type=None status=None page=1 page_size=18
INFO: 111.41.138.94:0 - "POST /knowledge/ai/document/list HTTP/1.1" 200 OK
2024-12-01 18:10:59 autodl-container-3ea342bd74-8563f1df dbgpt.app.knowledge.api[1646] INFO Received params: ai, doc_ids=[8] model_name=None pre_separator=None separators=None chunk_size=None chunk_overlap=None
current session:<sqlalchemy.orm.session.Session object at 0x7fb9c809b490>
2024-12-01 18:10:59 autodl-container-3ea342bd74-8563f1df dbgpt.serve.rag.connector[1646] INFO VectorStore:<class 'dbgpt.storage.knowledge_graph.community_summary.CommunitySummaryKnowledgeGraph'>
2024-12-01 18:10:59 autodl-container-3ea342bd74-8563f1df dbgpt.serve.rag.service.service[1646] INFO begin save document chunks, doc:Understanding AI Technology.pdf
2024-12-01 18:10:59 autodl-container-3ea342bd74-8563f1df dbgpt.serve.rag.service.service[1646] INFO async doc persist sync, doc:Understanding AI Technology.pdf
INFO: 111.41.138.94:0 - "POST /knowledge/ai/document/sync HTTP/1.1" 200 OK
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 0 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 1 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 2 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 3 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 4 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 5 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 6 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 7 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 8 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 9 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 10 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 11 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 12 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 13 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 14 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 15 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 16 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 17 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 18 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 19 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.serve.rag.service.service[1646] ERROR document embedding, failed:Understanding AI Technology.pdf, 'source'
微信图片_20241201180406
Snipaste_2024-12-01_18-03-46

How to reproduce

NFO: 111.41.138.94:0 - "POST /knowledge/ai/document/sync HTTP/1.1" 200 OK
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 0 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 1 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 2 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 3 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 4 extract text success
2024-12-01 18:11:00 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 5 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 6 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 7 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 8 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 9 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 10 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 11 extract text success
2024-12-01 18:11:01 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 12 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 13 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 14 extract text success
2024-12-01 18:11:02 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 15 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 16 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 17 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 18 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.component[1646] INFO /root/autodl-tmp/DB-GPT/pilot/data/ai/Understanding AI Technology.pdf page 19 extract text success
2024-12-01 18:11:03 autodl-container-3ea342bd74-8563f1df dbgpt.serve.rag.service.service[1646] ERROR document embedding, failed:Understanding AI Technology.pdf, 'source'

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@sunnf sunnf added bug Something isn't working Waiting for reply labels Dec 1, 2024
@Aries-ckt
Copy link
Collaborator

just pdf? try word docx or markdown?

@Aries-ckt
Copy link
Collaborator

this pr fix the problem. #2170
pull the latest main branch and try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Waiting for reply
Projects
None yet
Development

No branches or pull requests

2 participants