Toolverse
All skills

llama-cpp

by zechenzhangAGI

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Category
Security
Views
252

About this skill

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

How to use

  1. Zainstaluj llama.cpp: na macOS/Linux użyj brew install llama.cpp, lub sklonuj repozytorium z GitHub (github.com/ggerganov/llama.cpp) i uruchom make. Jeśli masz Mac z Apple Silicon, dodaj flagę LLAMA_METAL=1, dla AMD GPU użyj LLAMA_HIP=1.

  2. Pobierz model w formacie GGUF z HuggingFace, np. Llama-2-7B-Chat-GGUF. Użyj komendy huggingface-cli download, podając nazwę modelu i wersję kwantyzacji (np. Q4_K_M). Modele zapisz w katalogu models/.

  3. Uruchom proste wnioskowanie: użyj llama-cli z flagą -m wskazującą ścieżkę do modelu, -p z pytaniem lub instrukcją, oraz -n określającą maksymalną liczbę tokenów odpowiedzi (np. 256).

  4. Do interaktywnej rozmowy dodaj flagę --interactive, co pozwoli na wielokrotne pytania bez restartowania programu.

  5. Dla zaawansowanego użytku uruchom tryb serwera (server mode), który umożliwia dostęp do modelu przez API — szczegóły znajdują się w dokumentacji README.

Related skills

senior-security

by davila7

Comprehensive security engineering skill for application security, penetration testing, security architecture, and compliance auditing. Includes security assessment tools, threat modeling, crypto implementation, and security automation. Use when designing security architecture,

Security
2482

content-creator

by alirezarezvani

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content

Security
25124

youtube-watcher

by openclaw

Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.

Security
2231

payload

by payloadcms

Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.

Security
50171

backend-security-coder

by sickn33

Expert in secure backend coding practices specializing in input validation, authentication, and API security. Use PROACTIVELY for backend security implementations or security code reviews.

Security
1133

qmd

by tobi

Search personal markdown knowledge bases, notes, meeting transcripts, and documentation using QMD - a local hybrid search engine. Combines BM25 keyword search, vector semantic search, and LLM re-ranking. Use when users ask to search notes, find documents, look up information in

Security
1951