agentic-eval

Name: agentic-eval
Author: github

by github

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: github
Category: Testing
Views: 38

GitHub repo

About this skill

Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when:\n- Implementing self-critique and reflection loops\n- Building evaluator-optimizer pipelines for quality-critical generation\n- Creating test-driven code refinement workflows\n- Designing rubric-based or LLM-as-judge evaluation systems\n- Adding iterative improvement to agent outputs (code, reports, analysis)\n- Measuring and improving agent response quality

How to use

Zainstaluj umiejętność w swoim środowisku agenta, importując moduł agentic-eval z repozytorium GitHub.
Zdefiniuj kryteria oceny dla Twojego zadania — lista konkretnych warunków, które output musi spełnić (np. "kod musi być wolny od błędów składniowych", "raport musi zawierać streszczenie").
Skonfiguruj pętlę refleksji, przekazując zadanie, listę kryteriów i maksymalną liczbę iteracji (zazwyczaj 2-3). Agent najpierw wygeneruje output, następnie go oceni.
W każdej iteracji agent porównuje swój output z kryteriami, otrzymując feedback w formacie PASS/FAIL dla każdego warunku. Jeśli wszystkie kryteria są spełnione, proces kończy się.
Jeśli kryteria nie są spełnione, agent analizuje feedback i automatycznie refaktoryzuje output, uwzględniając wskazane braki.
Powtarzaj kroki 4-5 aż do osiągnięcia pełnej zgodności z kryteriami lub wyczerpania maksymalnej liczby iteracji. Zwróć ostateczny, ulepszony output.

Related skills

nextjs-developer

by zenobi-us

Expert Next.js developer mastering Next.js 14+ with App Router and full-stack features. Specializes in server components, server actions, performance optimization, and production deployment with focus on building fast, SEO-friendly applications.

Testing

166226

vitest

by antfu

Vitest fast unit testing framework powered by Vite with Jest-compatible API. Use when writing tests, mocking, configuring coverage, or working with test filtering and fixtures.

Testing

1236

playwright-cli

by microsoft

Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.

Testing

45103

polymarket-trader

by openclaw

Query Polymarket prediction markets - trending events, crypto, politics, sports, and search

Testing

14142

performing-penetration-testing

by jeremylongshore

This skill enables automated penetration testing of web applications. It uses the penetration-tester plugin to identify vulnerabilities, including OWASP Top 10 threats, and suggests exploitation techniques. Use this skill when the user requests a \

Testing

1546

creating-financial-models

by anthropics

This skill provides an advanced financial modeling suite with DCF analysis, sensitivity testing, Monte Carlo simulations, and scenario planning for investment decisions

Testing

25137