2024년 5월 12일 일요일

LlamaIndex와 LLM 기반 이미지-TO-텍스트

LlamaIndex와 LLM 기반 이미지-TO-텍스트 개발 방법을 간략히 정리한다.  

개발환경 설치
다음 패키지를 pip install 로 설치한다.
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai

코딩
다음 그림을 설명하는 텍스트를 생성해 본다. 

다음을 코딩해 실행한다.
import os, torch
from os import path
from PIL import Image

from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.llms.huggingface import (
    HuggingFaceInferenceAPI,
    HuggingFaceLLM,
)
from transformers import AutoModelForCausalLM, AutoTokenizer

print(f'GPU={torch.cuda.is_available()}')

# Moondream2 사전 훈련 모델 로딩. 
model_id = "vikhyatk/moondream2"
revision = "2024-04-02"
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)  # 토큰나이저

# 이미지 로딩 후 모델 전달. 텍스트 생성. 
image = Image.open('./cat.jpg')
enc_image = model.encode_image(image)
image_caption = model.answer_question(enc_image, "Describe this image in detail with transparency.", tokenizer)
print(image_caption)

결과가 다음과 같다면, 성공한 것이다.
In a verdant field, a black and white dog lies on its side, its head resting on the lush green grass. A brown and white cat sits upright on its hind legs, its gaze fixed on the dog. The dog's tail is wagging, and the cat's tail is curled up in a playful manner. The background is a serene landscape of trees and bushes, with the sun casting dappled shadows on the grass.

레퍼런스

댓글 없음:

댓글 쓰기