Toolverse
All skills

blip-2-vision-language

by davila7

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

Installation

Pick a client and clone the repository into its skills directory.

Installation

Quick info

Author
davila7
Category
Security
Views
1

About this skill

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

How to use

  1. Zainstaluj wymagane biblioteki: pip install transformers accelerate torch Pillow. Transformers to rekomendowana ścieżka instalacji, która zawiera wszystkie niezbędne komponenty do pracy z BLIP-2.

  2. Załaduj model BLIP-2 z biblioteki Hugging Face Transformers. Możesz wybrać wariant z różnymi backendami LLM (OPT 2.7B, OPT 6.7B, Flan-T5 XL lub Flan-T5 XXL) w zależności od dostępnych zasobów obliczeniowych.

  3. Przygotuj obraz, który chcesz analizować. BLIP-2 akceptuje obrazy w formatach obsługiwanych przez bibliotekę Pillow (JPG, PNG itp.).

  4. Dla opisywania obrazów (image captioning) przekaż obraz do modelu bez dodatkowego tekstu wejściowego. Model automatycznie wygeneruje naturalny opis zawartości obrazu.

  5. Dla odpowiadania na pytania o obraz (VQA) przekaż zarówno obraz, jak i pytanie tekstowe. Model połączy wizualną analizę z rozumowaniem języka naturalnego, aby udzielić precyzyjnej odpowiedzi.

  6. Do zaawansowanych scenariuszy multimodalnych możesz łączyć wiele obrazów z pytaniami lub prowadzić rozmowę, gdzie model utrzymuje kontekst wizualny i tekstowy między kolejnymi interakcjami.

Related skills

academic-researcher

by Shubhamsaboo

Academic research assistant for literature reviews, paper analysis, and scholarly writing.\nUse when: reviewing academic papers, conducting literature reviews, writing research summaries,\nanalyzing methodologies, formatting citations, or when user mentions academic research,

Security
1260

1password

by openclaw

Set up and use 1Password CLI (op). Use when installing the CLI, enabling desktop app integration, signing in (single or multi-account), or reading/injecting/running secrets via op.

Security
1174

security-compliance

by davila7

Guides security professionals in implementing defense-in-depth security architectures, achieving compliance with industry frameworks (SOC2, ISO27001, GDPR, HIPAA), conducting threat modeling and risk assessments, managing security operations and incident response, and embedding

Security
1172

reviewing-code

by CaptainCrouton89

Systematically evaluate code changes for security, correctness, performance, and spec alignment. Use when reviewing PRs, assessing code quality, or verifying implementation against requirements.

Security
1493

python-expert

by Shubhamsaboo

Senior Python developer expertise for writing clean, efficient, and well-documented code.\nUse when: writing Python code, optimizing Python scripts, reviewing Python code for best practices,\ndebugging Python issues, implementing type hints, or when user mentions Python, PEP 8,

Security
2777

architect-review

by sickn33

Master software architect specializing in modern architecture patterns, clean architecture, microservices, event-driven systems, and DDD. Reviews system designs and code changes for architectural integrity, scalability, and maintainability. Use PROACTIVELY for architectural

Security
2773