diff --git a/docs/trulens_eval/intro.md b/docs/trulens_eval/intro.md index e50562ea4..c90d5bb44 100644 --- a/docs/trulens_eval/intro.md +++ b/docs/trulens_eval/intro.md @@ -14,24 +14,24 @@ To quickly play around with the TruLens Eval library: Langchain: -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/quickstart.py). Llama Index: -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/llama_index_quickstart.py) No Framework: -[no_framework_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/no_framework_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/no_framework_quickstart_colab.ipynb) +[no_framework_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/no_framework_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/no_framework_quickstart_colab.ipynb) -[no_framework_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/no_framework_quickstart.py) +[no_framework_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/no_framework_quickstart.py) ### 💡 Contributing diff --git a/trulens_eval/README.md b/trulens_eval/README.md index 42b73100c..e3cbe1cd3 100644 --- a/trulens_eval/README.md +++ b/trulens_eval/README.md @@ -14,24 +14,24 @@ To quickly play around with the TruLens Eval library: Langchain: -[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/langchain_quickstart_colab.ipynb) +[langchain_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/langchain_quickstart_colab.ipynb) -[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/quickstart.py). +[langchain_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/quickstart.py). Llama Index: -[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb) +[llama_index_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/frameworks/llama_index/llama_index_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/llama_index_quickstart_colab.ipynb) -[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/llama_index_quickstart.py) +[llama_index_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/llama_index_quickstart.py) No Framework: -[no_framework_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/no_framework_quickstart.ipynb). -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/colab/quickstarts/no_framework_quickstart_colab.ipynb) +[no_framework_quickstart.ipynb](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/no_framework_quickstart.ipynb). +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/colab/quickstarts/no_framework_quickstart_colab.ipynb) -[no_framework_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.5.0/trulens_eval/examples/no_framework_quickstart.py) +[no_framework_quickstart.py](https://github.com/truera/trulens/blob/releases/rc-trulens-eval-0.6.0/trulens_eval/examples/no_framework_quickstart.py) ### 💡 Contributing diff --git a/trulens_eval/examples/all_tools.py b/trulens_eval/examples/all_tools.py index 7b1b8ebf4..0811e7069 100644 --- a/trulens_eval/examples/all_tools.py +++ b/trulens_eval/examples/all_tools.py @@ -2,7 +2,7 @@ # coding: utf-8 # # Quickstart -# +# # In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response. # ## Setup @@ -10,18 +10,13 @@ # For this quickstart you will need Open AI and Huggingface keys import os - os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." # ### Import from LangChain and TruLens # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru -from trulens_eval import TruChain - +from trulens_eval import TruChain, Feedback, Huggingface, Tru tru = Tru() # Imports from langchain to build app. You may need to install langchain first @@ -29,12 +24,11 @@ # ! pip install langchain>=0.0.170 from langchain.chains import LLMChain from langchain.llms import OpenAI -from langchain.prompts.chat import ChatPromptTemplate +from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate from langchain.prompts.chat import HumanMessagePromptTemplate -from langchain.prompts.chat import PromptTemplate # ### Create Simple LLM Application -# +# # This example uses a LangChain framework and OpenAI LLM full_prompt = HumanMessagePromptTemplate( @@ -71,12 +65,10 @@ # ## Instrument chain for logging with TruLens -truchain = TruChain( - chain, +truchain = TruChain(chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match], - tags="prototype" -) + tags = "prototype") # Instrumented chain can operate like the original: llm_response = truchain(prompt_input) @@ -85,54 +77,57 @@ # ## Explore in a Dashboard -tru.run_dashboard() # open a local streamlit app to explore +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ### Chain Leaderboard -# +# # Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. -# +# # Note: Average feedback values are returned and printed in a range from 0 (worst) to 1 (best). -# +# # ![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # To dive deeper on a particular chain, click "Select Chain". -# +# # ### Understand chain performance with Evaluations -# +# # To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. -# +# # The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. -# +# # ![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # ### Deep dive into full chain metadata -# +# # Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain. -# +# # ![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png) -# +# # If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. # ## Or view results directly in your notebook -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all # # Logging -# +# # ## Automatic Logging -# +# # The simplest method for logging with TruLens is by wrapping with TruChain and including the tru argument, as shown in the quickstart. -# +# # This is done like so: -truchain = TruChain(chain, app_id='Chain1_ChatApplication', tru=tru) +truchain = TruChain( + chain, + app_id='Chain1_ChatApplication', + tru=tru +) truchain("This will be automatically logged.") # Feedback functions can also be logged automatically by providing them in a list to the feedbacks arg. @@ -140,21 +135,21 @@ truchain = TruChain( chain, app_id='Chain1_ChatApplication', - feedbacks=[f_lang_match], # feedback functions + feedbacks=[f_lang_match], # feedback functions tru=tru ) truchain("This will be automatically logged.") # ## Manual Logging -# +# # ### Wrap with TruChain to instrument your chain tc = TruChain(chain, app_id='Chain1_ChatApplication') # ### Set up logging and instrumentation -# +# # Making the first call to your wrapped LLM Application will now also produce a log or "record" of the chain execution. -# +# prompt_input = 'que hora es?' gpt3_response, record = tc.call_with_record(prompt_input) @@ -171,21 +166,22 @@ # Capturing app feedback such as user feedback of the responses can be added with one call. thumb_result = True -tru.add_feedback( - name="👍 (1) or 👎 (0)", record_id=record.record_id, result=thumb_result -) +tru.add_feedback(name="👍 (1) or 👎 (0)", + record_id=record.record_id, + result=thumb_result) # ### Evaluate Quality -# +# # Following the request to your app, you can then evaluate LLM quality using feedback functions. This is completed in a sequential call to minimize latency for your application, and evaluations will also be logged to your local machine. -# +# # To get feedback on the quality of your LLM, you can use any of the provided feedback functions or add your own. -# +# # To assess your LLM quality, you can provide the feedback functions to `tru.run_feedback()` in a list provided to `feedback_functions`. -# +# feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[f_lang_match] + record=record, + feedback_functions=[f_lang_match] ) print(feedback_results) @@ -194,9 +190,9 @@ tru.add_feedbacks(feedback_results) # ### Out-of-band Feedback evaluation -# +# # In the above example, the feedback function evaluation is done in the same process as the chain evaluation. The alternative approach is the use the provided persistent evaluator started via `tru.start_deferred_feedback_evaluator`. Then specify the `feedback_mode` for `TruChain` as `deferred` to let the evaluator handle the feedback functions. -# +# # For demonstration purposes, we start the evaluator here but it can be started in another process. truchain: TruChain = TruChain( @@ -213,60 +209,55 @@ # # Out-of-the-box Feedback Functions # See: -# +# # ## Relevance -# +# # This evaluates the *relevance* of the LLM response to the given text by LLM prompting. -# +# # Relevance is currently only available with OpenAI ChatCompletion API. -# +# # ## Sentiment -# +# # This evaluates the *positive sentiment* of either the prompt or response. -# +# # Sentiment is currently available to use with OpenAI, HuggingFace or Cohere as the model provider. -# +# # * The OpenAI sentiment feedback function prompts a Chat Completion model to rate the sentiment from 1 to 10, and then scales the response down to 0-1. # * The HuggingFace sentiment feedback function returns a raw score from 0 to 1. # * The Cohere sentiment feedback function uses the classification endpoint and a small set of examples stored in `feedback_prompts.py` to return either a 0 or a 1. -# +# # ## Model Agreement -# +# # Model agreement uses OpenAI to attempt an honest answer at your prompt with system prompts for correctness, and then evaluates the agreement of your LLM response to this model on a scale from 1 to 10. The agreement with each honest bot is then averaged and scaled from 0 to 1. -# +# # ## Language Match -# +# # This evaluates if the language of the prompt and response match. -# +# # Language match is currently only available to use with HuggingFace as the model provider. This feedback function returns a score in the range from 0 to 1, where 1 indicates match and 0 indicates mismatch. -# +# # ## Toxicity -# +# # This evaluates the toxicity of the prompt or response. -# +# # Toxicity is currently only available to be used with HuggingFace, and uses a classification endpoint to return a score from 0 to 1. The feedback function is negated as not_toxicity, and returns a 1 if not toxic and a 0 if toxic. -# +# # ## Moderation -# +# # The OpenAI Moderation API is made available for use as feedback functions. This includes hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. Each is negated (ex: not_hate) so that a 0 would indicate that the moderation rule is violated. These feedback functions return a score in the range 0 to 1. -# +# # # Adding new feedback functions -# +# # Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating `trulens_eval/feedback.py`. If your contributions would be useful for others, we encourage you to contribute to TruLens! -# +# # Feedback functions are organized by model provider into Provider classes. -# +# # The process for adding new feedback functions is: # 1. Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class. Add the new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both prompt (str) and response (str). It should return a float between 0 (worst) and 1 (best). -from trulens_eval import Feedback -from trulens_eval import Provider -from trulens_eval import Select -from trulens_eval import Tru - +from trulens_eval import Provider, Feedback, Select, Tru class StandAlone(Provider): - def my_custom_feedback(self, my_text_field: str) -> float: """ A dummy function of text inputs to float outputs. @@ -279,18 +270,19 @@ def my_custom_feedback(self, my_text_field: str) -> float: """ return 1.0 / (1.0 + len(my_text_field) * len(my_text_field)) - # 2. Instantiate your provider and feedback functions. The feedback function is wrapped by the trulens-eval Feedback class which helps specify what will get sent to your function parameters (For example: Select.RecordInput or Select.RecordOutput) my_standalone = StandAlone() -my_feedback_function_standalone = Feedback( - my_standalone.my_custom_feedback -).on(my_text_field=Select.RecordOutput) +my_feedback_function_standalone = Feedback(my_standalone.my_custom_feedback).on( + my_text_field=Select.RecordOutput +) # 3. Your feedback function is now ready to use just like the out of the box feedback functions. Below is an example of it being used. tru = Tru() feedback_results = tru.run_feedback_functions( - record=record, feedback_functions=[my_feedback_function_standalone] + record=record, + feedback_functions=[my_feedback_function_standalone] ) tru.add_feedbacks(feedback_results) + diff --git a/trulens_eval/examples/llama_index_quickstart.py b/trulens_eval/examples/llama_index_quickstart.py index d5f1e982d..05ffc029a 100644 --- a/trulens_eval/examples/llama_index_quickstart.py +++ b/trulens_eval/examples/llama_index_quickstart.py @@ -2,11 +2,11 @@ # coding: utf-8 # # Quickstart -# +# # In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response. # ## Setup -# +# # ### Install dependencies # Let's install some of the dependencies for this notebook if we don't have them already @@ -17,29 +17,23 @@ # For this quickstart, you will need Open AI and Huggingface keys import os - os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." # ### Import from LlamaIndex and TruLens # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import feedback -from trulens_eval import Tru -from trulens_eval import TruLlama - +from trulens_eval import TruLlama, Feedback, Tru, feedback tru = Tru() # ### Create Simple LLM Application -# +# # This example uses LlamaIndex which internally uses an OpenAI LLM. # LLama Index starter example from: https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html -# In order to run this, download into data/ Paul Graham's Essay 'What I Worked On' from https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt +# In order to run this, download into data/ Paul Graham's Essay 'What I Worked On' from https://github.com/jerryjliu/llama_index/blob/main/examples/paul_graham_essay/data/paul_graham_essay.txt -from llama_index import SimpleDirectoryReader -from llama_index import VectorStoreIndex +from llama_index import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader('data').load_data() index = VectorStoreIndex.from_documents(documents) @@ -74,11 +68,9 @@ # ## Instrument chain for logging with TruLens -tru_query_engine = TruLlama( - query_engine, +tru_query_engine = TruLlama(query_engine, app_id='LlamaIndex_App1', - feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance] -) + feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance]) # Instrumented query engine can operate like the original: llm_response = tru_query_engine.query("What did the author do growing up?") @@ -87,41 +79,41 @@ # ## Explore in a Dashboard -tru.run_dashboard() # open a local streamlit app to explore +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ### Leaderboard -# +# # Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. -# +# # Note: Average feedback values are returned and printed in a range from 0 (worst) to 1 (best). -# +# # ![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # To dive deeper on a particular chain, click "Select Chain". -# +# # ### Understand chain performance with Evaluations -# +# # To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. -# +# # The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. -# +# # ![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # ### Deep dive into full chain metadata -# +# # Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain. -# +# # ![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png) -# +# # If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. # ## Or view results directly in your notebook -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/examples/no_framework_quickstart.py b/trulens_eval/examples/no_framework_quickstart.py index d2e8bbc5f..fa842a84a 100644 --- a/trulens_eval/examples/no_framework_quickstart.py +++ b/trulens_eval/examples/no_framework_quickstart.py @@ -2,7 +2,7 @@ # coding: utf-8 # # Quickstart -# +# # In this quickstart you will create a simple text to text application and learn how to log it and get feedback. # ## Setup @@ -10,57 +10,40 @@ # For this quickstart you will need Open AI and Huggingface keys import os - os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." import openai - openai.api_key = os.environ["OPENAI_API_KEY"] # ### Import from TruLens # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru - +from trulens_eval import Feedback, Huggingface, Tru tru = Tru() # ### Create Simple Text to Text Application -# +# # This example uses a bare bones OpenAI LLM, and a non-LLM just for demonstration purposes. - def llm_standalone(prompt): return openai.ChatCompletion.create( - model="gpt-3.5-turbo", - messages=[ - { - "role": - "system", - "content": - "You are a question and answer bot, and you answer super upbeat." - }, { - "role": "user", - "content": prompt - } + model="gpt-3.5-turbo", + messages=[ + {"role": "system", "content": "You are a question and answer bot, and you answer super upbeat."}, + {"role": "user", "content": prompt} ] )["choices"][0]["message"]["content"] - import hashlib - - def simple_hash_callable(prompt): h = hashlib.shake_256(prompt.encode('utf-8')) return str(h.hexdigest(20)) - # ### Send your first request -prompt_input = "How good is language AI?" -prompt_output = llm_standalone(prompt_input) +prompt_input="How good is language AI?" +prompt_output=llm_standalone(prompt_input) prompt_output simple_hash_callable(prompt_input) @@ -76,13 +59,8 @@ def simple_hash_callable(prompt): # ## Instrument the callable for logging with TruLens from trulens_eval import TruBasicApp - -basic_app = TruBasicApp( - llm_standalone, app_id="Happy Bot", feedbacks=[f_sentiment] -) -hash_app = TruBasicApp( - simple_hash_callable, app_id="Hasher", feedbacks=[f_sentiment] -) +basic_app = TruBasicApp(llm_standalone, app_id="Happy Bot", feedbacks=[f_sentiment]) +hash_app = TruBasicApp(simple_hash_callable, app_id="Hasher", feedbacks=[f_sentiment]) response, record = basic_app.call_with_record(prompt_input) @@ -90,7 +68,7 @@ def simple_hash_callable(prompt): # ## Explore in a Dashboard -tru.run_dashboard() # open a local streamlit app to explore +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed @@ -98,5 +76,5 @@ def simple_hash_callable(prompt): # ## Or view results directly in your notebook -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/examples/quickstart.py b/trulens_eval/examples/quickstart.py index 81e23a464..a92a008d7 100644 --- a/trulens_eval/examples/quickstart.py +++ b/trulens_eval/examples/quickstart.py @@ -2,7 +2,7 @@ # coding: utf-8 # # Quickstart -# +# # In this quickstart you will create a simple LLM Chain and learn how to log it and get feedback on an LLM response. # ## Setup @@ -10,18 +10,13 @@ # For this quickstart you will need Open AI and Huggingface keys import os - os.environ["OPENAI_API_KEY"] = "..." os.environ["HUGGINGFACE_API_KEY"] = "..." # ### Import from LangChain and TruLens # Imports main tools: -from trulens_eval import Feedback -from trulens_eval import Huggingface -from trulens_eval import Tru -from trulens_eval import TruChain - +from trulens_eval import TruChain, Feedback, Huggingface, Tru tru = Tru() # Imports from langchain to build app. You may need to install langchain first @@ -29,12 +24,11 @@ # ! pip install langchain>=0.0.170 from langchain.chains import LLMChain from langchain.llms import OpenAI -from langchain.prompts.chat import ChatPromptTemplate +from langchain.prompts.chat import ChatPromptTemplate, PromptTemplate from langchain.prompts.chat import HumanMessagePromptTemplate -from langchain.prompts.chat import PromptTemplate # ### Create Simple LLM Application -# +# # This example uses a LangChain framework and OpenAI LLM full_prompt = HumanMessagePromptTemplate( @@ -71,12 +65,10 @@ # ## Instrument chain for logging with TruLens -truchain = TruChain( - chain, +truchain = TruChain(chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match], - tags="prototype" -) + tags = "prototype") # Instrumented chain can operate like the original: llm_response = truchain(prompt_input) @@ -85,41 +77,41 @@ # ## Explore in a Dashboard -tru.run_dashboard() # open a local streamlit app to explore +tru.run_dashboard() # open a local streamlit app to explore # tru.stop_dashboard() # stop if needed # Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard. # ### Chain Leaderboard -# +# # Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. -# +# # Note: Average feedback values are returned and printed in a range from 0 (worst) to 1 (best). -# +# # ![Chain Leaderboard](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # To dive deeper on a particular chain, click "Select Chain". -# +# # ### Understand chain performance with Evaluations -# +# # To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. -# +# # The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. -# +# # ![Evaluations](https://www.trulens.org/Assets/image/Leaderboard.png) -# +# # ### Deep dive into full chain metadata -# +# # Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain. -# +# # ![Explore a Chain](https://www.trulens.org/Assets/image/Chain_Explore.png) -# +# # If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. # Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard. # ## Or view results directly in your notebook -tru.get_records_and_feedback(app_ids=[] - )[0] # pass an empty list of app_ids to get all +tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all + diff --git a/trulens_eval/trulens_eval/__init__.py b/trulens_eval/trulens_eval/__init__.py index c14432a31..c64055805 100644 --- a/trulens_eval/trulens_eval/__init__.py +++ b/trulens_eval/trulens_eval/__init__.py @@ -37,7 +37,7 @@ - `utils/python.py` `utils/text.py` """ -__version__ = "0.5.0" +__version__ = "0.6.0" from trulens_eval.feedback import Feedback from trulens_eval.feedback import Huggingface