trulens-evaluation-workflow

Name: trulens-evaluation-workflow
Author: truera

by truera

Systematically evaluate your LLM application with TruLens

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: truera
Category: Testing
Views: 20

GitHub repo

About this skill

Systematically evaluate your LLM application with TruLens

How to use

Zainstaluj skill TruLens Evaluation Workflow w swoim środowisku agenta Claude/Copilot. 2. Odpowiedz na pytanie diagnostyczne dotyczące typu Twojej aplikacji — wskaż framework, na którym zbudowałeś system (LangChain, LangGraph/Deep Agents, LlamaIndex lub Custom). 3. Wybierz zestaw metryk ewaluacji dostosowany do Twojego przypadku: dla aplikacji RAG użyj RAG Triad (Context Relevance, Groundedness, Answer Relevance), dla agentów wybierz Agent GPA (Tool Selection, Tool Calling, Execution Efficiency), dla prostych aplikacji wystarczy Answer Relevance. 4. Jeśli Twój agent zawiera etap jawnego planowania, włącz dodatkowe metryki Plan Quality i Adherence. 5. Opcjonalnie dodaj metryki uzupełniające takie jak Coherence, Conciseness lub Harmlessness, jeśli chcesz pogłębić ewaluację. 6. Skill przeprowadzi Cię przez workflow instrumentacji kodu, kuracji danych testowych i konfiguracji funkcji feedbacku, aby uruchomić pełny cykl ewaluacji Twojej aplikacji LLM.

Related skills

dependency-upgrade

by wshobson

Manage major dependency version upgrades with compatibility analysis, staged rollout, and comprehensive testing. Use when upgrading framework versions, updating major dependencies, or managing breaking changes in libraries.

Testing

17138

ppt-creator

by daymade

Create professional slide decks from topics or documents. Generates structured content with data-driven charts, speaker notes, and complete PPTX files. Applies persuasive storytelling principles (Pyramid Principle, assertion-evidence). Supports multiple formats (Marp,

Testing

2739

langgraph-docs

by langchain-ai

Use this skill for requests related to LangGraph in order to fetch relevant documentation to provide accurate, up-to-date guidance.

Testing

23127

testing-workflow

by amo-tech-ai

Comprehensive testing workflow for E2E, integration, and unit tests. Use when testing applications layer-by-layer, validating user journeys, or running test suites.

Testing

1076

creating-financial-models

by anthropics

This skill provides an advanced financial modeling suite with DCF analysis, sensitivity testing, Monte Carlo simulations, and scenario planning for investment decisions

Testing

25137

ad-creative

by alirezarezvani

When the user needs to generate, iterate, or scale ad creative for paid advertising. Use when they say 'write ad copy,' 'generate headlines,' 'create ad variations,' 'bulk creative,' 'iterate on ads,' 'ad copy validation,' 'RSA headlines,' 'Meta ad copy,' 'LinkedIn ad,' or

Testing

2863