LlamaIndex와 LLM 기반 이미지-TO-텍스트 개발 방법을 간략히 소개한다.
개발환경 설치
다음 패키지를 pip install 로 설치한다.
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai
코딩
다음 그림을 설명하는 텍스트를 생성해 본다.
import os, torch
from os import path
from PIL import Image
from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.llms.huggingface import (
HuggingFaceInferenceAPI,
HuggingFaceLLM,
)
from transformers import AutoModelForCausalLM, AutoTokenizer
print(f'GPU={torch.cuda.is_available()}')
# Moondream2 사전 훈련 모델 로딩.
model_id = "vikhyatk/moondream2"
revision = "2024-04-02"
model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision) # 토큰나이저
# 이미지 로딩 후 모델 전달. 텍스트 생성.
image = Image.open('./cat.jpg')
enc_image = model.encode_image(image)
image_caption = model.answer_question(enc_image, "Describe this image in detail with transparency.", tokenizer)
print(image_caption)
실행
결과가 다음과 같다면, 성공한 것이다.
In a verdant field, a black and white dog lies on its side, its head resting on the lush green grass. A brown and white cat sits upright on its hind legs, its gaze fixed on the dog. The dog's tail is wagging, and the cat's tail is curled up in a playful manner. The background is a serene landscape of trees and bushes, with the sun casting dappled shadows on the grass.
레퍼런스
댓글 없음:
댓글 쓰기