nemo-evaluator-sdk

Name: nemo-evaluator-sdk
Author: davila7

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author: davila7
Category: Security

GitHub repo

About this skill

How to use

Zainstaluj narzędzie za pomocą pip: uruchom polecenie pip install nemo-evaluator-launcher w swoim środowisku Python.
Skonfiguruj klucz API NVIDIA, ustawiając zmienną środowiskową NGC_API_KEY na swoją wartość (np. export NGC_API_KEY=nvapi-your-key-here).
Utwórz plik konfiguracyjny config.yaml zawierający endpoint API modelu, który chcesz testować (np. Llama 3.1 8B), oraz listę benchmarków do uruchomienia (takie jak ifeval, MMLU, GSM8K). Określ katalog wyjściowy dla wyników.
Uruchom ewaluację poleceniem nemo-evaluator-launcher run --config-dir . --config-name config. Narzędzie automatycznie pobierze benchmarki i uruchomi testy na skonfigurowanym modelu.
Sprawdź dostępne benchmarki i harnessy za pomocą nemo-evaluator-launcher ls tasks, aby wybrać te, które pasują do Twoich potrzeb.
Po zakończeniu ewaluacji przejrzyj wyniki w katalogu ./results — zawierają szczegółowe metryki wydajności modelu na każdym benchmarku.

Related skills

accessibility-compliance

by wshobson

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

Security

2173

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security

2773

llama-cpp

by zechenzhangAGI

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Security

11252

solidity-security

by wshobson

Master smart contract security best practices to prevent common vulnerabilities and implement secure Solidity patterns. Use when writing smart contracts, auditing existing contracts, or implementing security measures for blockchain applications.

Security

10105

1password

by openclaw

Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in (single or multi-account), or reading/injecting/running secrets via op.

Security

1174

feishu-docs

by openclaw

飞书文档(Docx)API技能。用于创建、读取、更新和删除飞书文档。支持Markdown/HTML内容转换、文档权限管理。

Security

1574