eval-harness

Name: eval-harness
Author: affaan-m

by affaan-m

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: affaan-m
Category: Data Science
Views: 27

GitHub repo

About this skill

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

How to use

Aktywuj skill eval-harness w sesji Claude Code – narzędzie będzie dostępne do czytania, pisania i edycji plików oraz uruchamiania poleceń bash. 2. Zdefiniuj capability eval przed implementacją: utwórz blok markdown [CAPABILITY EVAL: nazwa-funkcji] z opisem zadania, listą kryteriów sukcesu (checklistą) i oczekiwanym wynikiem. 3. Dla każdego eval'u wybierz typ gradera: code-based (bash, grep, npm test) do sprawdzania deterministycznych warunków, lub model-based (Claude) do oceny wyników otwartych. Wpisz polecenia bash lub prompt ewaluacyjny. 4. Uruchamiaj evals ciągle podczas rozwoju – po każdej zmianie kodu sprawdź, czy capability evals przechodzą i czy regression evals nie spadły poniżej poprzedniego wyniku. 5. Śledź wyniki w formacie X/Y passed – jeśli regresja się pojawi, natychmiast ją napraw zamiast iść dalej. 6. Używaj pass@k metrics do pomiaru niezawodności agenta: jeśli eval przechodzi w 8 na 10 prób, oznacza to pass@10 = 0.8 – im wyższy wskaźnik, tym bardziej niezawodny agent.

Related skills

skill-installer

by openai

Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos).

Data Science

23118

nano-banana-pro

by garg-aayush

Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., \

Data Science

535772

docx

by anthropics

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content,

Data Science

39142

skill-creator

by anthropics

Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

Data Science

59147

threejs

by mrgoonie

Build 3D web apps with Three.js (WebGL/WebGPU). Use for 3D scenes, animations, custom shaders, PBR materials, VR/XR experiences, games, data visualizations, product configurators.

Data Science

1743

arxiv-search

by langchain-ai

Search arXiv preprint repository for papers in physics, mathematics, computer science, quantitative biology, and related fields

Data Science

76172