phoenix-evals

Name: phoenix-evals
Author: Arize-ai

by Arize-ai

Build and run evaluators for AI/LLM applications using Phoenix.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: Arize-ai
Category: Security
Views: 19

GitHub repo

About this skill

Build and run evaluators for AI/LLM applications using Phoenix.

How to use

Zainstaluj Phoenix Evals dla wybranego języka, uruchamiając skrypt setup-python lub setup-typescript dostępny w dokumentacji.
Zdefiniuj, co chcesz ewaluować — przejrzyj sekcję evaluators-overview, aby wybrać metryki i kryteria oceny odpowiednie dla Twojej aplikacji.
Wybierz model, który będzie pełnić rolę sędziego (judge model) — skonsultuj się z wytycznymi w fundamentals-model-selection, aby wybrać odpowiedni LLM.
Zbuduj ewaluator — użyj pre-built evaluators jeśli pasują do Twoich potrzeb, lub stwórz własny ewaluator na bazie kodu (evaluators-code) lub LLM (evaluators-llm) dla bardziej złożonych scenariuszy.
Waliduj dokładność ewaluatora — uruchom validation-evaluators, aby sprawdzić, czy Twój ewaluator zgadza się z ocenami człowieka i działa niezawodnie.
Uruchom eksperymenty na danych — użyj evaluate-dataframe do przetworzenia dużych zbiorów danych lub experiments-running do uruchomienia pełnego eksperymentu z analizą wyników i error-analysis do zidentyfikowania problemów.

Related skills

accessibility-compliance

by wshobson

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

Security

2173

obsidian

by gapmiss

Comprehensive guidelines for Obsidian.md plugin development including all 27 ESLint rules, TypeScript best practices, memory management, API usage (requestUrl vs fetch), UI/UX standards, and submission requirements. Use when working with Obsidian plugins, main.ts files,

Security

14111

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security

2231

payload

by payloadcms

Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.

Security

50171

better-auth-best-practices

by novuhq

Skill for integrating Better Auth - the comprehensive TypeScript authentication framework.

Security

1148

zendesk

by vm0-ai

Zendesk Support REST API for managing tickets, users, organizations, and support operations. Use this skill to create tickets, manage users, search, and automate customer support workflows.

Security

11100