category
LLM和ML模型的评估和测试框架
控制人工智能模型中的性能、偏见和安全问题的风险
Install Giskard 🐢
Install the latest version of Giskard from PyPi using pip:
pip install "giskard[llm]" -U
We officially support Python 3.9, 3.10 and 3.11.
Try in Colab 📙
Giskard是一个开源Python库,可以自动检测人工智能应用程序中的性能、偏见和安全问题。该库涵盖了基于LLM的应用程序,如RAG代理,一直到用于表格数据的传统ML模型。
Scan: Automatically assess your LLM-based agents for performance, bias & security issues ⤵️
Issues detected include:
- 幻觉
- 有害内容生成
- 提示注入
- 稳健性问题
- 敏感信息披露
- 刻板印象和歧视
- 更多。。。
RAG评估工具包(RAGET):自动生成评估数据集并评估RAG应用程序的答案⤵️
If you're testing a RAG application, you can get an even more in-depth assessment using RAGET, Giskard's RAG Evaluation Toolkit.
-
RAGET can generate automatically a list of
question,reference_answerandreference_contextfrom the knowledge base of the RAG. You can then use this generated test set to evaluate your RAG agent. -
RAGET computes scores for each component of the RAG agent. The scores are computed by aggregating the correctness of the agent’s answers on different question types.
- Here is the list of components evaluated with RAGET:
Generator: the LLM used inside the RAG to generate the answersRetriever: fetch relevant documents from the knowledge base according to a user queryRewriter: rewrite the user query to make it more relevant to the knowledge base or to account for chat historyRouter: filter the query of the user based on his intentionsKnowledge Base: the set of documents given to the RAG to generate the answers
- Here is the list of components evaluated with RAGET:
Giskard works with any model, in any environment and integrates seamlessly with your favorite tools ⤵️
Contents
- 🤸♀️ Quickstart
- 👋 Community
🤸♀️ Quickstart
1. 🏗️ Build a LLM agent
Let's build an agent that answers questions about climate change, based on the 2023 Climate Change Synthesis Report by the IPCC.
Before starting let's install the required libraries:
pip install langchain tiktoken "pypdf<=3.17.0"
from langchain import OpenAI, FAISS, PromptTemplate
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Prepare vector store (FAISS) with IPPC report
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
loader = PyPDFLoader("https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf")
db = FAISS.from_documents(loader.load_and_split(text_splitter), OpenAIEmbeddings())
# Prepare QA chain
PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard.
Your task is to answer common questions on climate change.
You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).
Please provide short and clear answers based on the provided context. Be polite and helpful.
Context:
{context}
Question:
{question}
Your answer:
"""
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)
2. 🔎 Scan your model for issues
Next, wrap your agent to prepare it for Giskard's scan:
import giskard
import pandas as pd
def model_predict(df: pd.DataFrame):
"""Wraps the LLM call in a simple Python function.
The function takes a pandas.DataFrame containing the input variables needed
by your model, and must return a list of the outputs (one for each row).
"""
return [climate_qa_chain.run({"query": question}) for question in df["question"]]
# Don’t forget to fill the `name` and `description`: they are used by Giskard
# to generate domain-specific tests.
giskard_model = giskard.Model(
model=model_predict,
model_type="text_generation",
name="Climate Change Question Answering",
description="This model answers any question about climate change based on IPCC reports",
feature_names=["question"],
)
✨✨✨Then run Giskard's magical scan✨✨✨
scan_results = giskard.scan(giskard_model)
Once the scan completes, you can display the results directly in your notebook:
display(scan_results)
# Or save it to a file
scan_results.to_html("scan_results.html")
If you're fa


