Toolverse
All skills

gptq

by davila7

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with u003c2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Data Science

About this skill

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with u003c2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

How to use

  1. Zainstaluj AutoGPTQ wraz z zależnościami: uruchom pip install auto-gptq transformers accelerate. Na Linuksie możesz dodać obsługę Tritona dla szybszych obliczeń: pip install auto-gptq[triton].

  2. Załaduj wstępnie skwantyzowany model z HuggingFace Hub. Użyj klasy AutoGPTQForCausalLM i metody from_quantized(), podając nazwę modelu (np. "TheBloke/Llama-2-7B-Chat-GPTQ") oraz urządzenie docelowe (device="cuda:0").

  3. Załaduj tokenizer dla wybranego modelu za pomocą AutoTokenizer.from_pretrained(), używając tej samej nazwy modelu.

  4. Przygotuj tekst wejściowy i zakoduj go tokenizerem, a następnie przekaż do modelu w celu generowania odpowiedzi. Model zwróci logity, które możesz zdekodować z powrotem na tekst.

  5. Jeśli chcesz dostrajać model, połącz GPTQ z PEFT i QLoRA — biblioteka peft pozwala na efektywne dostrajanie bez znacznego wzrostu zużycia pamięci.

  6. Wybierz między GPTQ a alternatywami: jeśli potrzebujesz lepszej dokładności (poniżej 1% straty), rozważ AWQ; jeśli wystarczy 8-bitowa kwantyzacja, użyj bitsandbytes.

Related skills

threejs

by mrgoonie

Build 3D web apps with Three.js (WebGL/WebGPU). Use for 3D scenes, animations, custom shaders, PBR materials, VR/XR experiences, games, data visualizations, product configurators.

Data Science
1743

quant-analyst

by zenobi-us

Expert quantitative analyst specializing in financial modeling, algorithmic trading, and risk analytics. Masters statistical methods, derivatives pricing, and high-frequency trading with focus on mathematical rigor, performance optimization, and profitable strategy development.

Data Science
67217

claude-automation-recommender

by anthropics

Analyze a codebase and recommend Claude Code automations (hooks, subagents, skills, plugins, MCP servers). Use when user asks for automation recommendations, wants to optimize their Claude Code setup, mentions improving Claude Code workflows, asks how to first set up Claude Code

Data Science
1787

skill-installer

by openai

Install Codex skills into $CODEX_HOME/skills from a curated list or a GitHub repo path. Use when a user asks to list installable skills, install a curated skill, or install a skill from another repo (including private repos).

Data Science
23118

data-storytelling

by wshobson

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

Data Science
26105

codex

by Lucklyric

Invoke Codex CLI for complex coding tasks requiring high reasoning capabilities. This skill should be invoked when users explicitly mention \

Data Science
16163