跳转到主要内容

category

LLM和ML模型的评估和测试框架

控制人工智能模型中的性能、偏见和安全问题的风险

 

 

Install Giskard 🐢

 

Install the latest version of Giskard from PyPi using pip:

pip install "giskard[llm]" -U

We officially support Python 3.9, 3.10 and 3.11.

Try in Colab 📙

 

Open Colab notebook


Giskard是一个开源Python库,可以自动检测人工智能应用程序中的性能、偏见和安全问题。该库涵盖了基于LLM的应用程序,如RAG代理,一直到用于表格数据的传统ML模型。

Scan: Automatically assess your LLM-based agents for performance, bias & security issues ⤵️

 

Issues detected include:

  • 幻觉
  • 有害内容生成
  • 提示注入
  • 稳健性问题
  • 敏感信息披露
  • 刻板印象和歧视
  • 更多。。。

Scan Example

RAG评估工具包(RAGET):自动生成评估数据集并评估RAG应用程序的答案⤵️

 

If you're testing a RAG application, you can get an even more in-depth assessment using RAGET, Giskard's RAG Evaluation Toolkit.

  • RAGET can generate automatically a list of questionreference_answer and reference_context from the knowledge base of the RAG. You can then use this generated test set to evaluate your RAG agent.

  • RAGET computes scores for each component of the RAG agent. The scores are computed by aggregating the correctness of the agent’s answers on different question types.

    • Here is the list of components evaluated with RAGET:
      • Generator: the LLM used inside the RAG to generate the answers
      • Retriever: fetch relevant documents from the knowledge base according to a user query
      • Rewriter: rewrite the user query to make it more relevant to the knowledge base or to account for chat history
      • Router: filter the query of the user based on his intentions
      • Knowledge Base: the set of documents given to the RAG to generate the answers

Test Suite Example

Giskard works with any model, in any environment and integrates seamlessly with your favorite tools ⤵️

 

Contents

 

🤸‍♀️ Quickstart

 

1. 🏗️ Build a LLM agent

 

Let's build an agent that answers questions about climate change, based on the 2023 Climate Change Synthesis Report by the IPCC.

Before starting let's install the required libraries:

pip install langchain tiktoken "pypdf<=3.17.0"
from langchain import OpenAI, FAISS, PromptTemplate
from langchain.embeddings import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Prepare vector store (FAISS) with IPPC report
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
loader = PyPDFLoader("https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf")
db = FAISS.from_documents(loader.load_and_split(text_splitter), OpenAIEmbeddings())

# Prepare QA chain
PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard.
Your task is to answer common questions on climate change.
You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).
Please provide short and clear answers based on the provided context. Be polite and helpful.

Context:
{context}

Question:
{question}

Your answer:
"""

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever(), prompt=prompt)

2. 🔎 Scan your model for issues

 

Next, wrap your agent to prepare it for Giskard's scan:

import giskard
import pandas as pd

def model_predict(df: pd.DataFrame):
    """Wraps the LLM call in a simple Python function.

    The function takes a pandas.DataFrame containing the input variables needed
    by your model, and must return a list of the outputs (one for each row).
    """
    return [climate_qa_chain.run({"query": question}) for question in df["question"]]

# Don’t forget to fill the `name` and `description`: they are used by Giskard
# to generate domain-specific tests.
giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="Climate Change Question Answering",
    description="This model answers any question about climate change based on IPCC reports",
    feature_names=["question"],
)

✨✨✨Then run Giskard's magical scan✨✨✨

scan_results = giskard.scan(giskard_model)

Once the scan completes, you can display the results directly in your notebook:

display(scan_results)

# Or save it to a file
scan_results.to_html("scan_results.html")

If you're fa