Toolverse
All skills

speculative-decoding

by davila7

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models,

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Security
Views
1

About this skill

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

How to use

  1. Zainstaluj wymagane biblioteki: pip install transformers accelerate. 2. Dla Medusy (wielogłowicowe dekodowanie) sklonuj repozytorium: git clone https://github.com/FasterDecoding/Medusa, przejdź do katalogu i uruchom pip install -e . 3. Dla Lookahead Decoding sklonuj https://github.com/hao-ai-lab/LookaheadDecoding, przejdź do folderu i zainstaluj pakiet poleceniem pip install -e . 4. Opcjonalnie zainstaluj vLLM dla zaawansowanego serwowania: pip install vllm. 5. Załaduj model docelowy (duży, wolny) i model draft (szybki, mały) za pomocą AutoModelForCausalLM z transformers. 6. Uruchom dekodowanie spekulacyjne, przekazując oba modele do funkcji generacji — system automatycznie przyspieszy wnioskowanie o 1,5–3,6× bez zmian w architekturze modelu.

Related skills

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security
2773

qmd

by tobi

Search personal markdown knowledge bases, notes, meeting transcripts, and documentation using QMD - a local hybrid search engine. Combines BM25 keyword search, vector semantic search, and LLM re-ranking. Use when users ask to search notes, find documents, look up information in

Security
1951

senior-security

by davila7

Comprehensive security engineering skill for application security, penetration testing, security architecture, and compliance auditing. Includes security assessment tools, threat modeling, crypto implementation, and security automation. Use when designing security architecture,

Security
2482

google-analytics

by davila7

Analyze Google Analytics data, review website performance metrics, identify traffic patterns, and suggest data-driven improvements. Use when the user asks about analytics, website metrics, traffic analysis, conversion rates, user behavior, or performance optimization.

Security
1260

feishu-docs

by openclaw

飞书文档(Docx)API技能。用于创建、读取、更新和删除飞书文档。支持Markdown/HTML内容转换、文档权限管理。

Security
1574

software-security

by project-codeguard

A software security skill that integrates with Project CodeGuard to help AI coding agents write secure code and prevent common vulnerabilities. Use this skill when writing, reviewing, or modifying code to ensure secure-by-default practices are followed.

Security
1678