Daddy Makers: LlamaIndex와 LLM 기반 이미지-TO-텍스트

2024년 5월 12일 일요일

LlamaIndex와 LLM 기반 이미지-TO-텍스트

LlamaIndex와 LLM 기반 이미지-TO-텍스트 개발 방법을 간략히 소개한다.

개발환경 설치

다음 패키지를 pip install 로 설치한다.

llama-index

llama-index-llms-huggingface

llama-index-embeddings-fastembed

fastembed

Unstructured[md]

chromadb

llama-index-vector-stores-chroma

llama-index-llms-groq

einops

accelerate

sentence-transformers

llama-index-llms-mistralai

llama-index-llms-openai

코딩

다음 그림을 설명하는 텍스트를 생성해 본다.

다음을 코딩해 실행한다.

import os, torch

from os import path

from PIL import Image

from llama_index.core import VectorStoreIndex

from llama_index.core import SimpleDirectoryReader

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core import Settings

from llama_index.llms.huggingface import (

HuggingFaceInferenceAPI,

HuggingFaceLLM,

)

from transformers import AutoModelForCausalLM, AutoTokenizer

print(f'GPU={torch.cuda.is_available()}')

# Moondream2 사전 훈련 모델 로딩.

model_id = "vikhyatk/moondream2"

revision = "2024-04-02"

model = AutoModelForCausalLM.from_pretrained(

model_id, trust_remote_code=True, revision=revision

)

tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision) # 토큰나이저

# 이미지 로딩 후 모델 전달. 텍스트 생성.

image = Image.open('./cat.jpg')

enc_image = model.encode_image(image)

image_caption = model.answer_question(enc_image, "Describe this image in detail with transparency.", tokenizer)

print(image_caption)

실행

결과가 다음과 같다면, 성공한 것이다.

In a verdant field, a black and white dog lies on its side, its head resting on the lush green grass. A brown and white cat sits upright on its hind legs, its gaze fixed on the dog. The dog's tail is wagging, and the cat's tail is curled up in a playful manner. The background is a serene landscape of trees and bushes, with the sun casting dappled shadows on the grass.

레퍼런스

댓글 없음:

댓글 쓰기

피드 구독하기: 댓글 (Atom)