|
1 | 1 | {
|
2 |
| - "cells": [ |
3 |
| - { |
4 |
| - "cell_type": "markdown", |
5 |
| - "id": "a0b3171b", |
6 |
| - "metadata": {}, |
7 |
| - "source": [ |
8 |
| - "# Langsmith Integrations\n", |
9 |
| - "\n", |
10 |
| - "[Langsmith](https://docs.smith.langchain.com/) in a platform for building production-grade LLM applications from the langchain team. It helps you with tracing, debugging and evaluting LLM applications.\n", |
11 |
| - "\n", |
12 |
| - "The langsmith + ragas integrations offer 2 features\n", |
13 |
| - "1. View the traces of ragas `evaluator` \n", |
14 |
| - "2. Use ragas metrics in langchain evaluation - (soon)\n", |
15 |
| - "\n", |
16 |
| - "\n", |
17 |
| - "### Tracing ragas metrics\n", |
18 |
| - "\n", |
19 |
| - "since ragas uses langchain under the hood all you have to do is setup langsmith and your traces will be logged.\n", |
20 |
| - "\n", |
21 |
| - "to setup langsmith make sure the following env-vars are set (you can read more in the [langsmith docs](https://docs.smith.langchain.com/#quick-start)\n", |
22 |
| - "\n", |
23 |
| - "```bash\n", |
24 |
| - "export LANGCHAIN_TRACING_V2=true\n", |
25 |
| - "export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com\n", |
26 |
| - "export LANGCHAIN_API_KEY=<your-api-key>\n", |
27 |
| - "export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to \"default\"\n", |
28 |
| - "```\n", |
29 |
| - "\n", |
30 |
| - "Once langsmith is setup, just run the evaluations as your normally would" |
31 |
| - ] |
32 |
| - }, |
33 |
| - { |
34 |
| - "cell_type": "code", |
35 |
| - "execution_count": 1, |
36 |
| - "id": "39375103", |
37 |
| - "metadata": {}, |
38 |
| - "outputs": [ |
39 |
| - { |
40 |
| - "name": "stderr", |
41 |
| - "output_type": "stream", |
42 |
| - "text": [ |
43 |
| - "Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)\n" |
44 |
| - ] |
45 |
| - }, |
46 |
| - { |
47 |
| - "data": { |
48 |
| - "application/vnd.jupyter.widget-view+json": { |
49 |
| - "model_id": "dc5a62b3aebb45d690d9f0dcc783deea", |
50 |
| - "version_major": 2, |
51 |
| - "version_minor": 0 |
52 |
| - }, |
53 |
| - "text/plain": [ |
54 |
| - " 0%| | 0/1 [00:00<?, ?it/s]" |
55 |
| - ] |
56 |
| - }, |
57 |
| - "metadata": {}, |
58 |
| - "output_type": "display_data" |
59 |
| - }, |
60 |
| - { |
61 |
| - "name": "stdout", |
62 |
| - "output_type": "stream", |
63 |
| - "text": [ |
64 |
| - "evaluating with [context_ relevancy]\n" |
65 |
| - ] |
66 |
| - }, |
67 |
| - { |
68 |
| - "name": "stderr", |
69 |
| - "output_type": "stream", |
70 |
| - "text": [ |
71 |
| - "100%|████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.90s/it]\n" |
72 |
| - ] |
73 |
| - }, |
74 |
| - { |
75 |
| - "name": "stdout", |
76 |
| - "output_type": "stream", |
77 |
| - "text": [ |
78 |
| - "evaluating with [faithfulness]\n" |
79 |
| - ] |
80 |
| - }, |
81 |
| - { |
82 |
| - "name": "stderr", |
83 |
| - "output_type": "stream", |
84 |
| - "text": [ |
85 |
| - "100%|████████████████████████████████████████████████████████████| 1/1 [00:21<00:00, 21.01s/it]\n" |
86 |
| - ] |
87 |
| - }, |
88 |
| - { |
89 |
| - "name": "stdout", |
90 |
| - "output_type": "stream", |
91 |
| - "text": [ |
92 |
| - "evaluating with [answer_relevancy]\n" |
93 |
| - ] |
94 |
| - }, |
95 |
| - { |
96 |
| - "name": "stderr", |
97 |
| - "output_type": "stream", |
98 |
| - "text": [ |
99 |
| - "100%|████████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.36s/it]\n" |
100 |
| - ] |
101 |
| - }, |
102 |
| - { |
103 |
| - "data": { |
104 |
| - "text/plain": [ |
105 |
| - "{'ragas_score': 0.1837, 'context_ relevancy': 0.0707, 'faithfulness': 0.8889, 'answer_relevancy': 0.9403}" |
106 |
| - ] |
107 |
| - }, |
108 |
| - "execution_count": 1, |
109 |
| - "metadata": {}, |
110 |
| - "output_type": "execute_result" |
111 |
| - } |
112 |
| - ], |
113 |
| - "source": [ |
114 |
| - "from datasets import load_dataset\n", |
115 |
| - "from ragas.metrics import context_relevancy, answer_relevancy, faithfulness\n", |
116 |
| - "from ragas import evaluate\n", |
117 |
| - "\n", |
118 |
| - "\n", |
119 |
| - "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n", |
120 |
| - "\n", |
121 |
| - "result = evaluate(\n", |
122 |
| - " fiqa_eval[\"baseline\"].select(range(3)), \n", |
123 |
| - " metrics=[context_relevancy, faithfulness, answer_relevancy]\n", |
124 |
| - ")\n", |
125 |
| - "\n", |
126 |
| - "result" |
127 |
| - ] |
128 |
| - }, |
129 |
| - { |
130 |
| - "cell_type": "markdown", |
131 |
| - "id": "8ce1c649", |
132 |
| - "metadata": {}, |
133 |
| - "source": [ |
134 |
| - "Voila! Now you can head over to your project and see the traces\n", |
135 |
| - "\n", |
136 |
| - "\n", |
137 |
| - "this shows the langsmith tracing dashboard overview\n", |
138 |
| - "\n", |
139 |
| - "\n", |
140 |
| - "this shows the traces for the faithfullness metrics. As you can see being able to view the reasons why the metric gives the score is helpful in figuring out how to improving it." |
141 |
| - ] |
142 |
| - } |
143 |
| - ], |
144 |
| - "metadata": { |
145 |
| - "kernelspec": { |
146 |
| - "display_name": "Python 3 (ipykernel)", |
147 |
| - "language": "python", |
148 |
| - "name": "python3" |
149 |
| - }, |
150 |
| - "language_info": { |
151 |
| - "codemirror_mode": { |
152 |
| - "name": "ipython", |
153 |
| - "version": 3 |
154 |
| - }, |
155 |
| - "file_extension": ".py", |
156 |
| - "mimetype": "text/x-python", |
157 |
| - "name": "python", |
158 |
| - "nbconvert_exporter": "python", |
159 |
| - "pygments_lexer": "ipython3", |
160 |
| - "version": "3.10.12" |
161 |
| - } |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "a0b3171b", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Langsmith Integrations\n", |
| 9 | + "\n", |
| 10 | + "[Langsmith](https://docs.smith.langchain.com/) in a platform for building production-grade LLM applications from the langchain team. It helps you with tracing, debugging and evaluting LLM applications.\n", |
| 11 | + "\n", |
| 12 | + "The langsmith + ragas integrations offer 2 features\n", |
| 13 | + "1. View the traces of ragas `evaluator` \n", |
| 14 | + "2. Use ragas metrics in langchain evaluation - (soon)\n", |
| 15 | + "\n", |
| 16 | + "\n", |
| 17 | + "### Tracing ragas metrics\n", |
| 18 | + "\n", |
| 19 | + "since ragas uses langchain under the hood all you have to do is setup langsmith and your traces will be logged.\n", |
| 20 | + "\n", |
| 21 | + "to setup langsmith make sure the following env-vars are set (you can read more in the [langsmith docs](https://docs.smith.langchain.com/#quick-start)\n", |
| 22 | + "\n", |
| 23 | + "```bash\n", |
| 24 | + "export LANGCHAIN_TRACING_V2=true\n", |
| 25 | + "export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com\n", |
| 26 | + "export LANGCHAIN_API_KEY=<your-api-key>\n", |
| 27 | + "export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to \"default\"\n", |
| 28 | + "```\n", |
| 29 | + "\n", |
| 30 | + "Once langsmith is setup, just run the evaluations as your normally would" |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "code", |
| 35 | + "execution_count": 1, |
| 36 | + "id": "39375103", |
| 37 | + "metadata": {}, |
| 38 | + "outputs": [ |
| 39 | + { |
| 40 | + "name": "stderr", |
| 41 | + "output_type": "stream", |
| 42 | + "text": [ |
| 43 | + "Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)\n" |
| 44 | + ] |
162 | 45 | },
|
163 |
| - "nbformat": 4, |
164 |
| - "nbformat_minor": 5 |
| 46 | + { |
| 47 | + "data": { |
| 48 | + "application/vnd.jupyter.widget-view+json": { |
| 49 | + "model_id": "dc5a62b3aebb45d690d9f0dcc783deea", |
| 50 | + "version_major": 2, |
| 51 | + "version_minor": 0 |
| 52 | + }, |
| 53 | + "text/plain": [ |
| 54 | + " 0%| | 0/1 [00:00<?, ?it/s]" |
| 55 | + ] |
| 56 | + }, |
| 57 | + "metadata": {}, |
| 58 | + "output_type": "display_data" |
| 59 | + }, |
| 60 | + { |
| 61 | + "name": "stdout", |
| 62 | + "output_type": "stream", |
| 63 | + "text": [ |
| 64 | + "evaluating with [context_ relevancy]\n" |
| 65 | + ] |
| 66 | + }, |
| 67 | + { |
| 68 | + "name": "stderr", |
| 69 | + "output_type": "stream", |
| 70 | + "text": [ |
| 71 | + "100%|████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.90s/it]\n" |
| 72 | + ] |
| 73 | + }, |
| 74 | + { |
| 75 | + "name": "stdout", |
| 76 | + "output_type": "stream", |
| 77 | + "text": [ |
| 78 | + "evaluating with [faithfulness]\n" |
| 79 | + ] |
| 80 | + }, |
| 81 | + { |
| 82 | + "name": "stderr", |
| 83 | + "output_type": "stream", |
| 84 | + "text": [ |
| 85 | + "100%|████████████████████████████████████████████████████████████| 1/1 [00:21<00:00, 21.01s/it]\n" |
| 86 | + ] |
| 87 | + }, |
| 88 | + { |
| 89 | + "name": "stdout", |
| 90 | + "output_type": "stream", |
| 91 | + "text": [ |
| 92 | + "evaluating with [answer_relevancy]\n" |
| 93 | + ] |
| 94 | + }, |
| 95 | + { |
| 96 | + "name": "stderr", |
| 97 | + "output_type": "stream", |
| 98 | + "text": [ |
| 99 | + "100%|████████████████████████████████████████████████████████████| 1/1 [00:07<00:00, 7.36s/it]\n" |
| 100 | + ] |
| 101 | + }, |
| 102 | + { |
| 103 | + "data": { |
| 104 | + "text/plain": [ |
| 105 | + "{'ragas_score': 0.1837, 'context_ relevancy': 0.0707, 'faithfulness': 0.8889, 'answer_relevancy': 0.9403}" |
| 106 | + ] |
| 107 | + }, |
| 108 | + "execution_count": 1, |
| 109 | + "metadata": {}, |
| 110 | + "output_type": "execute_result" |
| 111 | + } |
| 112 | + ], |
| 113 | + "source": [ |
| 114 | + "from datasets import load_dataset\n", |
| 115 | + "from ragas.metrics import context_relevancy, answer_relevancy, faithfulness\n", |
| 116 | + "from ragas import evaluate\n", |
| 117 | + "\n", |
| 118 | + "\n", |
| 119 | + "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n", |
| 120 | + "\n", |
| 121 | + "result = evaluate(\n", |
| 122 | + " fiqa_eval[\"baseline\"].select(range(3)),\n", |
| 123 | + " metrics=[context_relevancy, faithfulness, answer_relevancy],\n", |
| 124 | + ")\n", |
| 125 | + "\n", |
| 126 | + "result" |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "markdown", |
| 131 | + "id": "8ce1c649", |
| 132 | + "metadata": {}, |
| 133 | + "source": [ |
| 134 | + "Voila! Now you can head over to your project and see the traces\n", |
| 135 | + "\n", |
| 136 | + "\n", |
| 137 | + "this shows the langsmith tracing dashboard overview\n", |
| 138 | + "\n", |
| 139 | + "\n", |
| 140 | + "this shows the traces for the faithfullness metrics. As you can see being able to view the reasons why the metric gives the score is helpful in figuring out how to improving it." |
| 141 | + ] |
| 142 | + } |
| 143 | + ], |
| 144 | + "metadata": { |
| 145 | + "kernelspec": { |
| 146 | + "display_name": "Python 3 (ipykernel)", |
| 147 | + "language": "python", |
| 148 | + "name": "python3" |
| 149 | + }, |
| 150 | + "language_info": { |
| 151 | + "codemirror_mode": { |
| 152 | + "name": "ipython", |
| 153 | + "version": 3 |
| 154 | + }, |
| 155 | + "file_extension": ".py", |
| 156 | + "mimetype": "text/x-python", |
| 157 | + "name": "python", |
| 158 | + "nbconvert_exporter": "python", |
| 159 | + "pygments_lexer": "ipython3", |
| 160 | + "version": "3.10.12" |
| 161 | + } |
| 162 | + }, |
| 163 | + "nbformat": 4, |
| 164 | + "nbformat_minor": 5 |
165 | 165 | }
|
0 commit comments