Daddy Makers: 7월 2017

2017년 7월 11일 화요일

윈도우즈 10에서 우분투 앱 설치 및 실행

이 글은 윈도우즈 10에서 우분투 설치 및 실행하는 방법을 간단히 소개한다. 이와 관련된 글은 다음을 참고하였다.

Getting Ubuntu on Windows 10 is now (almost) as easy as downloading an app

한 컴퓨터에서 우분투와 윈도우를 사용하기 위한 방법은 다음과 같다.

멀티부트 설치
VMware에 우분투 설치
도커에 우분투 설치

멀티부트가 성능면에서는 제일 좋지만, 설치가 번거롭기 이를데 없다. VMware는 무거운 가상머신을 설치하고, 우분투를 설치해 한다. 도커는 설치가 손쉬우나, 가상머신처럼 약간의 성능 손실을 감수해야 한다.

MS가 최근 샌드박스에 우분투 앱 설치하는 방법을 릴리즈했다. 앱설치하는 방식으로 설치가 쉽다. 샌드박스도 가상머신이라 성능손실은 있다. 다만, 앱스토어에 우분투 앱을 제공한다는 것에 놀라는 사람이 많다. 한번 사용해 볼만한 방식이라 관련 내용을 남긴다.

2017년 7월 10일 월요일

딥러닝 GAN 기반 image-to-image 모델

이 글은 딥러닝 GAN (Generative Adversarial Network. 생성 대립 신경망)모델에 기반한 image-to-image를 소개한다. 관련 논문은 다음 링크를 참고한다. 신경망의 종류와 개념 소개에 대해서는 여기를 참고한다.

Image-to-Image Translation with Conditional Adversarial Networks

소개

이 글은 조건적 adversarial 네트워크를 이용한 GAN에 기반해, 이미지-이미지 생성 문제에 대한 범용 솔루션을 제안한다. 이러한 네트워크는 입력 이미지에서 출력 이미지로의 매핑을 학습할 수 있다. 이 접근법은 레이블 맵에서 사진을 합성하고, 가장자리 맵에서 객체를 재구성하고, 다른 작업 중에서 이미지를 페인팅하는 데 효과적이다.

GAN기반 image-to-image

내용

GAN은 랜덤 노이즈 벡터 z에서 출력 이미지 y : G : z → y 로의 매핑을 학습하는 생성 모델이다. 대조적으로 조건부 GAN은 관찰된 이미지 x와 랜덤 노이즈 벡터 z에서 y : G : {x, z} → y 로의 매핑을 학습한다. 생성기 G는 "가짜"이미지 감지 시, 훈련된 discrimintor D에 의해 "실제"이미지와 유사한 이미지를 출력하도록 훈련된다. 다음 그림은 훈련과정을 보여준다.

Training conditional GAN

G는 목적함수를 최소화하기 위해 반복 훈련하다. 이 목적함수는 D가 최대화하려는 것에 비해 대조된다.

G∗ = arg minG maxD LcGAN (G, D)

LGAN(G, D)=Ey∼pdata(y) [log D(y)] + Ex∼pdata(x),z∼pz(z)[log(1 − D(G(x, z))]

다음은 이를 통해 훈련된 image-image net을 이용하여, 이미지를 생성한 예이다.

손실 차이에 따른 결과의 품질

코딩

다음은 GAN 아키텍처를 코딩한 주피터 노트북 코드이다.

from google.colab import drive

drive.mount('/content/drive')

!cp "/content/drive/MyDrive/Colab Notebooks/data/img_align_celeba.zip" "."

!unzip "./img_align_celeba.zip" -d "./GAN/"

import glob

import matplotlib.pyplot as plt

import os

from PIL import Image

# 이미지까지의 경로

pth_to_imgs = "./GAN/img_align_celeba"

imgs = glob.glob(os.path.join(pth_to_imgs, "*"))

# 9개의 이미지를 보여줌

for i in range(9):

plt.subplot(3, 3, i+1)

img = Image.open(imgs[i])

plt.imshow(img)

plt.show()

import torch

import torchvision.transforms as tf

from torchvision.datasets import ImageFolder

from torch.utils.data.dataloader import DataLoader

# 이미지의 전처리 과정

transforms = tf.Compose([

tf.Resize(64),

tf.CenterCrop(64),

tf.ToTensor(),

tf.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

])

# ImageFolder()를 이용해 데이터셋을 작성

# root는 최상위 경로를, transform은 전처리를 의미합니다.

dataset = ImageFolder(

root="./GAN",

transform=transforms

)

loader = DataLoader(dataset, batch_size=128, shuffle=True)

import torch.nn as nn

class Generator(nn.Module):

def __init__(self):

super(Generator, self).__init__()

# 생성자를 구성하는 층 정의

self.gen = nn.Sequential(

nn.ConvTranspose2d(100, 512, kernel_size=4, bias=False),

nn.BatchNorm2d(512),

nn.ReLU(),

nn.ConvTranspose2d(512, 256, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(256),

nn.ReLU(),

nn.ConvTranspose2d(256, 128, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(128),

nn.ReLU(),

nn.ConvTranspose2d(128, 64, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(64),

nn.ReLU(),

nn.ConvTranspose2d(64, 3, kernel_size=4,

stride=2, padding=1, bias=False),

nn.Tanh()

)

def forward(self, x):

return self.gen(x)

class Discriminator(nn.Module):

def __init__(self):

super(Discriminator, self).__init__()

# 감별자를 구성하는 층의 정의

self.disc = nn.Sequential(

nn.Conv2d(3, 64, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(64),

nn.LeakyReLU(0.2),

nn.Conv2d(64, 128, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(128),

nn.LeakyReLU(0.2),

nn.Conv2d(128, 256, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(256),

nn.LeakyReLU(0.2),

nn.Conv2d(256, 512, kernel_size=4,

stride=2, padding=1, bias=False),

nn.BatchNorm2d(512),

nn.LeakyReLU(0.2),

nn.Conv2d(512, 1, kernel_size=4),

nn.Sigmoid()

)

def forward(self, x):

return self.disc(x)

def weights_init(m):

# 층의 종류 추출

classname = m.__class__.__name__

if classname.find('Conv') != -1:

# 합성곱층 초기화

nn.init.normal_(m.weight.data, 0.0, 0.02)

elif classname.find('BatchNorm') != -1:

# 배치정규화층 초기화

nn.init.normal_(m.weight.data, 1.0, 0.02)

nn.init.constant_(m.bias.data, 0)

device = "cuda" if torch.cuda.is_available() else "cpu"

# 생성자 정의

G = Generator().to(device)

# 생성자 가중치 초기화

G.apply(weights_init)

# 감별자 정의

D = Discriminator().to(device)

# 감별자 가중치 초기화

D.apply(weights_init)

import tqdm

from torch.optim.adam import Adam

G_optim = Adam(G.parameters(), lr=0.0001, betas=(0.5, 0.999))

D_optim = Adam(D.parameters(), lr=0.0001, betas=(0.5, 0.999))

for epochs in range(50):

iterator = tqdm.tqdm(enumerate(loader, 0), total=len(loader))

for i, data in iterator:

D_optim.zero_grad()

# 실제 이미지에는 1, 생성된 이미지는 0으로 정답을 설정

label = torch.ones_like(

data[1], dtype=torch.float32).to(device)

label_fake = torch.zeros_like(

data[1], dtype=torch.float32).to(device)

# 실제 이미지를 감별자에 입력

real = D(data[0].to(device))

# 실제 이미지에 대한 감별자의 오차를 계산

Dloss_real = nn.BCELoss()(torch.squeeze(real), label)

Dloss_real.backward()

# 가짜 이미지 생성

noise = torch.randn(label.shape[0], 100, 1, 1, device=device)

fake = G(noise)

# 가짜 이미지를 감별자에 입력

output = D(fake.detach())

# 가짜 이미지에 대한 감별자의 오차를 계산

Dloss_fake = nn.BCELoss()(torch.squeeze(output), label_fake)

Dloss_fake.backward()

# 감별자의 전체 오차를 학습

Dloss = Dloss_real + Dloss_fake

D_optim.step()

# 생성자의 학습

G_optim.zero_grad()

output = D(fake)

Gloss = nn.BCELoss()(torch.squeeze(output), label)

Gloss.backward()

G_optim.step()

iterator.set_description(f"epoch:{epochs} iteration:{i} D_loss:{Dloss} G_loss:{Gloss}")

torch.save(G.state_dict(), "Generator.pth")

torch.save(D.state_dict(), "Discriminator.pth")

with torch.no_grad():

G.load_state_dict(

torch.load("./Generator.pth", map_location=device))

# 특징 공간 상의 랜덤한 하나의 점을 지정

feature_vector = torch.randn(1, 100, 1, 1).to(device)

# 이미지 생성

pred = G(feature_vector).squeeze()

pred = pred.permute(1, 2, 0).cpu().numpy()

plt.imshow(pred)

plt.title("predicted image")

plt.show()

마무리

GAN은 사람의 인지적 활동을 모사하는 데 탁월한 성능을 보여준다. 특히, 예술 분야에서 GAN은 많은 관심을 받고 있는 흥미로운 학습 방법이다.

레퍼런스

Machine Learning Workshop for ART (tutorial)

2017년 7월 4일 화요일

딥러닝 기반 3차원 객체 인식 VoxNet

이 글은 3차원 점군에서 객체를 인식하는 일반적인 접근법인 복셀을 이용한 VoxNet에 관한 소개이다. 이 글은 Dimatura VoxNet 레퍼런스를 참고하였다.

1. 머리말
VoxNet은 3차원 점군을 복셀로 근사화시켜, CNN을 이용해 형상을 인식한다.

2. 설치
VoxNet은 Theano와 Lasagne 를 기반으로 개발되었다. path.py, scikit-learn 모듈이 필요하다. 다음과 같이 설치한다.

git clone git@github.com:dimatura/voxnet.git
cd voxnet
pip install --editable .

3. 훈련
데이터는 ModelNet 10을 사용하였다. 이는 3D ShapeNet 프로젝트에서 얻은 것이다. 복셀화를 손쉽게 하기 위해, .mat 파일을 이용하였다. 데이터셋은 다음과 같이 얻는다.

# scripts/download_shapenet10.sh
wget http://3dshapenets.cs.princeton.edu/3DShapeNetsCode.zip 
unzip 3DShapeNetsCode
python convert_shapenet10.py 3DShapeNets

훈련은 다음과 같이 실행한다.

cd scripts/
python train.py config/shapenet10.py shapenet10_train.tar

훈련 결과는 metrics.jsonl 파일에 저장된다.

4. 테스트
테스트는 다음과 같이 실행한다.

python test.py config/shapenet10.py shapenet10_test.tar --out-fname out.npz

가시화는 다음과 같이 실행한다.

python output_viz.py out.npz shapenet10_test.tar out.html

레퍼런스

VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

2017년 7월 2일 일요일

딥러닝 기반 3차원 비전 객체 인식 PointNet 분석

이 글은 최근 발표된 딥러닝 기반 3차원 비전 객체 인식 기술인 PointNet을 분석해 본다. 이 글은 아래 레퍼런스를 참고한다.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas)
딥러닝 기반 Scan To BIM 관련 최근 기술 동향

PointNet 분류 결과

머리말

PointNet은 CVPR 2017 컨퍼런스에서 발표된 arXiv 논문을 기반으로 한다. 스캔 결과로 얻어지는 3차원 점군을 인식하기 위해서, 많은 연구자들은 점군을 복셀(voxel) 형식으로 변환한다. 복셀은 점군을 요약하기에는 좋으나, 빈공간이 많이 발생하여, 비효율적인 부분이 있다. PointNet은 입력 시 포인트 순열의 불변성을 이용한다. PointNet을 통해, 객체 분류, 세그먼테이션, 스캔 장면의 의미 해석 등에 필요한 아키텍처를 제공한다. PointNet은 간단하지만, 매우 효율적으로 3차원 점군에서 객체를 인식한다. PointNet은 ShapeNet Part Dataset을 기반으로 훈련을 하였다.

ShapeNet Part 데이터베이스

개념
기존 방식은 점군의 통계적 속성을 이용해 세그먼테이션하거나, 격자 형식으로 변환한 CNN으로 문제 해결을 시도한다. CNN을 적용하기 위해서는 Image grid, 3D 복셀(voxel) 같이 불규칙한 입력 데이터를 정형화하는 과정이 필요하다. 이런 이유로, 대부분 3D 점군 세그먼테이션은 Image grid 및 3D 복셀 형식으로 변환해 CNN에 입력한다. 이 경우, 단점은 불필요하게 방대한 양의 데이터이다.

PointNet을 시작하기 전에 우선 PCD(point cloud data)를 정의해야 한다.
{Pi | i = 1, ..., n}
P = (x, y, z)

PointNet은 k 개 output 을 가지도록 class를 정의한다. 이 모델을 통해 n x m scores output을 가질 수 있다. 이때 n은 PCD이고 m은 semantic subcategory 이다.

사용되는 점군의 속성은 다음과 같다.

Unordered: 점들간 순서는 없음
Interaction among points: 점들간 상호영향받음
Invariance under transformations: 변환에 불변이어야 함

네트워크는 3개 주요 모듈이 있다. max pooling layer 는 모든 점들로부터 정보를 모으기 위한 symmetric 함수, 지역 및 전역 정보 조합 구조, 입력 점군과 점 특징들을 정렬하는 2개 joint alignment 네트워크로 구성된다.

Unordered input을 위한 Symmetry 함수. 모델 분변을 위해, 3개 전략이 있다. 입력을 canonical order로 정렬, 각 점으로부터 정보를 모으기위한 symmetric 함수를 준비한다. symmetric 함수는 n개 벡터를 입력과 출력으로 가진다. 이는 입력 순서에 불변이다.

Symmetry Function for unordered input
정렬은 좋은 솔류션이나 높은 차원의 자료 정렬은 존재하지 않는다. 예를 들어, 고차원 공간의 점들을 1차원 실수 선으로 프로젝션한 후 정렬할 수 도 있으나, 이의 역변환이 원 데이터를 복구할 수는 없다. 정렬된 점들을 MLP로 적용을 해도 성능상 개선은 많지 않다.

RNN을 사용하는 아이디어는 점군을 순차적 신호로 간주하고, 임의의 순열을 RNN으로 학습한다. RNN은 길이가 작은 시퀀스의 입력 순서에 대해서는 상대적으로 견고한 결과를 낸다. 하지만, 점군과 같이 수천개 이상 데이터로 확장하기는 쉽지 않다. 이 방법도 좋은 방법은 아니다.

PointNet의 아이디어는 변환된 요소에 대한 symmetric 함수를 적용한 점군에 대한 general function을 정의하는 것이다.

f({x1, ..., xn}) = g(h(x1),...,h(xn)), (1)

MLP와 단일변수를 가진 g, max pooling 함수를 이용해 h를 근사화한다. 경험상 이 함수는 잘 작동한다.

지역 및 전역 정보 수집
벡터 f1...,fk 형태 출력은 입력 집합에 대한 전역 사인이다. SVM이나 MLP를 이용해 형상이 전역 특징을 학습하는 것은 쉽다. 하지만, 지역 및 전역 정보를 구분해 얻는 것이 필요하다. 전역 특징이 계산된 후, 각 점들의 특징과 전역 특징을 연결해 포인트 특징을 얻는다. 이런 방식으로 global semantic과 지역 형상의 예측이 가능하다.

Joint 정렬 네트워크
점군의 시멘틱 라벨링은 형상 변환에 대해 불변이어야 한다. 이를 위해, affine 변환을 계산하는 mini-network(T-net)을 정의하고, 입력 점군에 대해 이를 적용하였다.

설치하기

PointNet은 텐서플로우 1.01, h5py, CUDA 8.0, cuDNN 5.1, 우분투 14.04를 사용한다.
만약, h5py 가 없다면, 다음과 같이 패키지를 설치한다. PointNet은 HDF5 를 사용하며, 관련된 상세 내용는 여기를 참고하길 바란다.

sudo apt-get install libhdf5-dev
sudo pip install h5py

참고로, 아나콘다에서는 다음과 같이 설치하면 된다.

conda install h5py

PointNet 소스를 github에서 다운로드 받는다.

git clone https://github.com/charlesq34/pointnet.git

신경망 훈련하기

단일 객체 분류 훈련
다운로드 받은 pointnet 폴더 안에서 다음과 같이 실행한다. 만약, Tensorflow GPU 버전에서 실행 시 CUDA 에러가 있다면, CPU 버전에서 실행해 본다.

python train.py

신경망 훈련중인 화면

로그 파일은 log폴더에 저장된다. 점군은 HDF5 파일로 ModelNet40 형식으로 다운로드 된다(416MB). 각 점군은 2048 포인트들을 담고 있다.
다음과 같이 학습 결과를 텐서보드를 통해 확인할 수 있다.

tensorboard --logdir log

훈련후에는 다음과 같이 정확도를 평가할 수 있다.

python evaluate.py --visu

만약, 미리 준비한 점군이 있다면, utils/data_prep_util.py 유틸리티 함수를 통해 HDF5 파일을 읽고 쓸 수 있다.
참고로, 훈련을 위해서는 ShapeNetPart 데이터(약 1.08GB)를 다음과 같이 다운로드해야 한다.

cd part_seg
sh download_data.sh

훈련과정 및 결과는 다음과 같다.

다음은 인식한 객체 정확도이다.

참고로, 훈련과정에서 사용된 포인트 클라우드는 이미지로 저장되어 확인할 수 있다.

시멘틱 세그먼테이션 훈련 및 평가
시멘틱 세그먼테이션을 위해 다음과 같이 실행한다.
cd pointnet/sem_seg
./download_data.sh

다음 링크를 방문해 stanford3dDataset_v1.2_Aligned_Version.zip(4.09GB)파일을 다운로드한고 pointnet/data 폴더에 압축을 푼다.

Standford Large scale 3D dataset

HDF5 data 준비를 위해, 다음 명령을 실행해 다운로드한 데이터를 파싱하고, 재 구조화해야 한다.
python collect_indoor3d_data.py
python gen_indoor3d_h5.py

Point cloud data parsing using collect_indoor3d_data.py

Converting PCD to npy(numpy data file in stanford_indoor3d)

npy data structure (x, y, z, r, g, b, label)

gen_indoor3d_h5 실행

gen_indoor3d_h5 실행 결과(data/indoor3d_sem_seg_hdf5_data)

gen_indoor3d_h5.py 실행 시 'unsupported operand type(s) for +: 'range' and 'list''에러가 발생하면, 다음과 같이 해당 소스 코드를 수정한다.
return np.concatenate([data, dup_data], 0), list(range(N))+list(sample)

훈련을 시작한다.
python train.py --log_dir log6 --test_area 6

그럼 다음과 같은 훈련 결과를 확인할 수 있다.

훈련된 파일은 로그가 생성된 폴더 아래 .ckpt 파일로 저장된다.

훈련된 파일을 다음과 같이 로딩해 사용해 본다.
saver.restore(sess, MODEL_PATH)

이를 위해 다음 명령을 입력한다. 전체 예측에 걸린 시간은 2분 걸렸다.
python batch_inference.py --model_path log6/model.ckpt --dump_dir log6/dump --output_filelist log6/output_filelist.txt --room_data_filelist meta/area6_data_label.txt --visu

그럼, 앞서 저장된 훈련 모델을 로딩해, 주어진 입력에 대한 예측을 수행한다.

meta/area6_data_label.txt 파일 내 입력 점군

실행 과정

실행 결과, log6/dump 폴더에 OBJ파일이 예측된 생성될 것이다.

다음은 해당 폴더 내 생성된 시멘틱 세그먼테이션 결과이다. 컬러 코드는 다음과 같다.

바닥=청색, 녹색=천정, 벽체=하늘색, 기둥=진한 분홍색, 보=노란색, 의자=적색, 테이블=옅은 분홍색, 문=옅은 노란색, 칠판=회색, 조명 등 부착물=검은회색

라벨 별 컬러 코드

점군 입력 자료를 만들기 위해서는 라벨링 도구를 이용해야 한다. 점군 라벨링 도구는 다음 링크를 참고한다.

입력 점군과 예측된 점군 파일이 각각 저장된다. 결과를 보면 다음과 같다.

입력 점군

시멘틱 세그먼테이션된 예측 점군

색상별로 객체가 잘 구분된 것을 확인할 수 있다.

시멘틱 세그먼테이션 결과

입력된 포인트 클라우드는 Area6의 Office, open space, 복도를 포함한 96개 공간들이다. 각 공간의 점군을 딥러닝으로 시멘틱 세그먼테이션하는 데 걸린 시간은 평균 1.25초(120 / 96)이다.

데이터 구조 및 절차 분석

훈련용 데이터 파일은 총 6개이며, (3 x 2048 x 2048) 데이터셋으로 포인트클라우드(점군)와 라벨 데이터셋(2048)이 저장되어 있다. 테스트용 데이터는 총 2개이다. 파일포맷은 HDF5이다.

훈련용 데이터셋(3 x 2048 x 2048) HDF5파일

데이터 입력을 위해서는 다음 프로그램을 실행하게 된다.
python collect_indoor3d_data.py
python gen_indoor3d_h5.py

다른 데이터 입력을 위해 구조를 분석해 본다. collect_indoor3d_data.py를 실행하면 다음 순서로 주석 인덱스 파일, 점군 텍스트파일, 라벨 점군 파일을 읽어 numpy 형식으로 저장한다.

주석 인덱스 파일(anno_paths.txt) 및 주석 인덱스에서 가리키는 점군이 저장된 폴더

주석 인덱스 파일에서 가리키는 점군 파일(hallway_1.txt)

해당 점군에 대한 라벨 별 점군(ceiling_1.txt. 중복 저장되어 있음)

다음은 hallway_1.txt 전체 점군 파일과 하위 폴더에 저장된 라벨 점군 파일을 확인한 것이다. 그림과 같이 라벨 점군 파일은 중복 저장되어 있다. 각 라벨 점군 파일들을 모으면, 해당 공간의 전체 점군 파일과 동일해 진다.

전체 점군과 바닥/문 점군

각 공간에 대한 라벨링된 점군들은 다음 폴더에 numpy 형식으로 저장된다. 데이터형식은 X, Y, Z, R, G, B, label 이다. 점군은 다음과 같이 -값이 포함될 수 있다. 이 점군은 해당 점이 원점이 되도록 이동된다.

-15.873 1.828 -0.005 64 72 59
-15.826 1.827 -0.005 70 76 62

라벨은 폴더별 저장된다. 이를 점군 데이터에 표현하기 위해, 다음 클래스를 미리 정의해 놓고, 이를 숫자로 변환해 저장한다.

클래스 정의(meta/class_names.txt)

다음 코드는 이런 data 구조로 변환하는 알고리즘을 구현한 것이다.

# collect_indoor3d_data.py
for anno_path in anno_paths:   # sem_seg\meta\anno_paths.txt 파일 내 데이터 경로
    os.makedirs(os.path.join(anno_path)) # 변환 데이터 폴더 생성
    try:
        elements = anno_path.split('/')
        out_filename = elements[-3]+'_'+elements[-2]+'.npy' # Area_1_hallway_1.npy 과 같은 numpy 생성
        indoor3d_util.collect_point_label(anno_path, os.path.join(output_folder, out_filename), 'numpy')

# indoor3d_util.py
def collect_point_label(anno_path, out_filename, file_format='txt'):
    """ 원본 데이터셋을 변환(each line is XYZRGBL). 파일에 저장된 각 공간의 점군을 변환.
    파라메터:
        anno_path: path to annotations. e.g. Area_1/office_2/Annotations/
        out_filename: 점군과 라벨 저장 경로 (each line is XYZRGBL)
        file_format: txt or numpy 파일 저장 종류
    Note:
        점군에는 -가 없도록 좌표 이동 저장됨.
    """
    points_list = []

    for f in glob.glob(os.path.join(anno_path, '*.txt')): # 입력 점군, 라벨 텍스트 파일변환
        points = np.loadtxt(f) # 점군 로딩
        labels = np.ones((points.shape[0],1)) * g_class2label[cls] # 라벨 텍스트
        points_list.append(np.concatenate([points, labels], 1)) # 점군 리스트 추가

    data_label = np.concatenate(points_list, 0)
    xyz_min = np.amin(data_label, axis=0)[0:3]
    data_label[:, 0:3] -= xyz_min

    if file_format=='numpy':
        np.save(out_filename, data_label) # x, y, z, r, g, b, label 순으로 저장

이 결과 다음과 같은 npy파일을 얻게 된다.

npy 데이터 일부

npy파일은 훈련용 및 평가용에 사용된다. indoor3d_util.py는 다음 코드가 구현되어 있다. npy파일에 저장된 훈련 및 학습에 사용되는 점군 및 색상 데이터는 0~1로 정규화되어 사용된다.
def room2blocks_wrapper_normalized(data_label_filename, num_point, block_size=1.0, stride=1.0, random_sample=False, sample_num=None, sample_aug=1):
    if data_label_filename[-3:] == 'npy':
        data_label = np.load(data_label_filename)
    return room2blocks_plus_normalized(data_label, num_point, block_size, stride,
                                       random_sample, sample_num, sample_aug)

gen_indoor3d_h5는 훈련용 학습을 위해 Hadoop파일로 변환하는 프로그램이다. 이 프로그램은 all_data_label.txt 에 지정된 각 스캔 공간의 npy파일을 h5로 변환한다.

Area_1_hallway_1.npy -> ply_data_all_...h5

신경망 구조 분석

단일 객체 딥러닝 구조

PointNet 신경망 구조는 다음과 같다. n 개 포인트 클라우드 (점군)을 입력한다. 입력 및 특징 변환을 수행 하고, max pooling을 통해 특징을 일반화한다. 출력으로 m 개 클래스 스코어가 분류된다. 신경망 구조는 분류(classification), 세그먼테이션(segmentation) 네트워크로 구성되어 있다. 세그먼트 네트워크는 분류 네트워크를 확장하였다. Batchnorm(Batch Normalization)은 ReLU 함수를 적용한다. Dropout은 분류 네트워크의 마지막 mlp(multi layer perception. 다층 레이어 퍼셉트론)에만 적용하였다.

PointNet 구조(mlp = multi-layer perceptron, Batchnorm = ReLU. Dropout layers are used for mlp)

분류 네트워크 구조는 다음과 같다(train.py 분석).
1. Input points
   n개의 3차원 포인트 좌표값이 input points (32x1024x3)로 입력.
   batch size = 32. num point = 1024.

   1) weights = 256x9
   2) biases = 9
   3) transform = tf.matmul(net, weights)
   4) transform = tf.nn.bias_add(transform, biases) # shape=(32,9)

        weights = tf.get_variable('weights', [256, 3*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [3*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)

   5) transform = tf.reshape(transform, [batch_size, 3, K]) # shape=(32,3,3)

   6) point cloud x transform.
     point_cloud=(32x1024x3). point_cloud_transformed=(32x1024x3)
   point_cloud_transformed = tf.matmul(point_cloud, transform)
   7) input_image = tf.expand_dims(point_cloud_transformed, -1) # (32x1024x3x1)
2. nx3 > input transform > nx3 > mlp(64x64)
1) T-Net 으로 3x3 텐서 변환
2) matrix multiply 연산 처리
   3) 변환된 n x 3 데이터가 mlp 64x64로 전달되어, n x 64 텐서로 출력됨
   4) feature transform을 통해 계산된 n x 64 텐서 출력 (outputs = 32x1024x1x64)
   net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
   net = tf_util.conv2d(net, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)

3. nx64 > feature transform > nx64 > mlp (64,128,1024) > nx1024
    mlp 64x128x1024 로 변환된어 n x 1024 텐서로 출력
    net = tf_util.conv2d(net_transformed, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay) #net=32x1024x1x1024

4. max pooling 1024 global feature

    max pooling 을 통해 일반화된 특징 벡터 1024 출력 (num_point = 1024)
net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')
5. mlp (512,256,k)
    mlp 512 x 256 x k 로 출력해 score 벡터 k 계산 (k=3)

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp2')

6. fc3 (32,40)
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')

7. Solver
    pred = net, ops['pred'] = pred
    optimizer = Adam(learning_rate). Minimize(loss, batch)

8. Learning. epoch = 50. batch_size = 32
            summary, step, _, loss_val, pred_val = sess.run([ops['merged'],
                ops['step'], ops['train_op'], ops['loss'], ops['pred']], feed_dict=feed_dict)

세그먼트 네트워크 구조는 다음과 같다.
1. 분류 네트워크의 1 ~ 4번까지 그래프 구조는 재활용됨
2. n x 1088 텐서가 mlp 512x256을 통해 point feature 텐서 n x 128 로 출력
3. n x 128 텐서가 mlp 128 x m 을 통해 n x m 텐서로 출력

시멘틱 세그먼테이션 딥러닝 구조

시멘틱 세그먼테이션은 다음과 같이 수행된다.

1. /indoor3d_sem_seg_hdf5_data/ply_data_all_3.h5 파일에서 점군과 라벨 읽어 data_batches (23585x4096x9), label_batches (23585x4096) 리스트에 추가 후 train_data, test_data 로 나눔.
2. placehoder_inputs(24, 4096) pointclouds_pl, labels_pl 생성
3. conv1(64, [1,9]). net(24, 4096, 1, 64)
4. conv2(64, [1,1]). net(24, 4096, 1, 64)
5. conv3(64, [1,1]). net(24, 4096, 1, 64)
6. conv4(128, [1,1]). net(24, 4096, 1, 128)
7. conv5(1024, [1,1]). points_feat1(24, 4096, 1, 1024)
8. max pool([1024, 1]). pc_feat1(24, 1, 1, 1024)
9. fully connect(256). pc_feat1(24, 256)
10. fully connect(128). pc_feat1(24, 128)
11. reshape([24,1,1,-1],[1,4096,1,1]). pc_feat1_expand(24,4096,1,128)
12. concat([points_feat1, pc_feat1_expand]). points_feat1_concat(24,4096,1,1152)
13. conv6(512, [1,1]). net(24,4096,1,512)
14. conv7(256, [1,1]). net(24,4096,1,256)
15. dropout(0.7).
16. conv8(13, [1,1]). net(24,4096,1,13)

훈련 과정은 다음과 같다.
1. num_batches = point cloud data size / BATCH_SIZE(24)
2. batch learning. num_batches=845
3. for batch_idx in range(num_batches):
   1) set feed_dict. pointclouds_pl=(20291,4096,9)
        feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
                     ops['labels_pl']: current_label[start_idx:end_idx],
                     ops['is_training_pl']: is_training,}
   2) learn
        summary, step, _, loss_val, pred_val = sess.run([ops['merged'], ops['step'], ops['train_op'], ops['loss'], ops['pred']], feed_dict=feed_dict)

소스코드 및 실행 분석

텐서플로우를 사용하고 파이썬으로 코딩된 train.py 전체 소스코드는 261라인이다. 좀 더 깊은 이해를 위해, 소스코드의 주요 부분을 확인하고 실행 분석을 해 보자.

실행 분석 과정(PyCharm and PDB)

MAX_NUM_POINT = 2048 # 입력 포인트 갯수

NUM_CLASSES = 40 # 클래스 갯수
BN_INIT_DECAY = 0.5 # 학습 속도

# 훈련용, 테스트용 포인트 클라우드 데이터 준비
TRAIN_FILES = provider.getDataFiles( \
os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/train_files.txt'))
TEST_FILES = provider.getDataFiles(\
os.path.join(BASE_DIR, 'data/modelnet40_ply_hdf5_2048/test_files.txt'))

def train():
with tf.Graph().as_default():
with tf.device('/gpu:'+str(GPU_INDEX)): # 신경망 계산 시 GPU 사용
pointclouds_pl, labels_pl = MODEL.placeholder_inputs(BATCH_SIZE, NUM_POINT)

# prediction 모델 구성
pred, end_points = MODEL.get_model(pointclouds_pl, is_training_pl, bn_decay=bn_decay)
loss = MODEL.get_loss(pred, labels_pl, end_points)

# 정확도 모델 구성
correct = tf.equal(tf.argmax(pred, 1), tf.to_int64(labels_pl))
accuracy = tf.reduce_sum(tf.cast(correct, tf.float32)) / float(BATCH_SIZE)

# 최적화 옵션에 따른 모델 정의
if OPTIMIZER == 'momentum':
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=MOMENTUM)
elif OPTIMIZER == 'adam':
optimizer = tf.train.AdamOptimizer(learning_rate) # 아담 최적화 모델
train_op = optimizer.minimize(loss, global_step=batch) # 오차 최소화 모델

sess.run(init, {is_training_pl: True}) # 세션 시작

for epoch in range(MAX_EPOCH): # 세대별 학습
train_one_epoch(sess, ops, train_writer) # 훈련
eval_one_epoch(sess, ops, test_writer) # 평가

if epoch % 10 == 0:
save_path = saver.save(sess, os.path.join(LOG_DIR, "model.ckpt")) # 모델 저장

def train_one_epoch(sess, ops, train_writer):
np.random.shuffle(train_file_idxs) # 임의로 서플링함

for fn in range(len(TRAIN_FILES)):
current_label = np.squeeze(current_label) # 라벨값

for batch_idx in range(num_batches): # 배치 갯수만큼 루프
rotated_data = provider.rotate_point_cloud(current_data[start_idx:end_idx, :, :]) # 회전 변환
jittered_data = provider.jitter_point_cloud(rotated_data) # 지터 처리
feed_dict = {ops['pointclouds_pl']: jittered_data,
ops['labels_pl']: current_label[start_idx:end_idx],
ops['is_training_pl']: is_training,} # 피드값 입력
summary, step, _, loss_val, pred_val = sess.run([ops['merged'], ops['step'],

ops['train_op'], ops['loss'], ops['pred']], feed_dict=feed_dict) # 세션 실행

def eval_one_epoch(sess, ops, test_writer):
for fn in range(len(TEST_FILES)):
current_label = np.squeeze(current_label)
num_batches = file_size // BATCH_SIZE
for batch_idx in range(num_batches):
feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
ops['labels_pl']: current_label[start_idx:end_idx],
ops['is_training_pl']: is_training}
summary, step, loss_val, pred_val = sess.run([ops['merged'], ops['step'], # 예측
ops['loss'], ops['pred']], feed_dict=feed_dict)
pred_val = np.argmax(pred_val, 1)
correct = np.sum(pred_val == current_label[start_idx:end_idx])

실제 pointnet 신경망 정의는 다음 폴더 아래에 있다.

이 중에 pointnet_cls, pointnet_seg 가 분류, 세그먼테이션 신경망을 구성하는 모듈이다. pointnet_cls.py는 전체 98라인이고, pointnet_seg.py는 115라인이다. 설명은 주석으로 처리하였으니, 앞의 신경망 구조와 비교해 확인해 보자.

# pointnet_cls.py
def placeholder_inputs(batch_size, num_point):
pointclouds_pl = tf.placeholder(tf.float32, shape=(batch_size, num_point, 3))
labels_pl = tf.placeholder(tf.int32, shape=(batch_size))
return pointclouds_pl, labels_pl

def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output Bx40 """
with tf.variable_scope('transform_net1') as sc:
transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
point_cloud_transformed = tf.matmul(point_cloud, transform) # 점군 변환
input_image = tf.expand_dims(point_cloud_transformed, -1) # 차원 확장

net = tf_util.conv2d(input_image, 64, [1,3],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv1', bn_decay=bn_decay) # conv1. 64 1 x 3 컨볼류션 적용
net = tf_util.conv2d(net, 64, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv2', bn_decay=bn_decay) # conv2. 64 1 x 1 컨볼루션 적용

with tf.variable_scope('transform_net2') as sc:
transform = feature_transform_net(net, is_training, bn_decay, K=64) # 변환
end_points['transform'] = transform
net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform) # matmul 계산
net_transformed = tf.expand_dims(net_transformed, [2]) # 텐서 확장

net = tf_util.conv2d(net_transformed, 64, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv3', bn_decay=bn_decay)   # conv3. 64 1 x 1 컨볼루션 적용
net = tf_util.conv2d(net, 128, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv4', bn_decay=bn_decay)   # conv4. 128 1 x 1 컨볼루션 적용
net = tf_util.conv2d(net, 1024, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv5', bn_decay=bn_decay)   # conv5. 1024 1 x 1 컨볼루션 적용

# Symmetric function: max pooling
net = tf_util.max_pool2d(net, [num_point,1], # max pooling
padding='VALID', scope='maxpool')

net = tf.reshape(net, [batch_size, -1]) # reshape
net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
scope='fc1', bn_decay=bn_decay) # fc1 512
net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training, # dp1. dropout 0.7
scope='dp1')
net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
scope='fc2', bn_decay=bn_decay) # fc2 256
net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
scope='dp2') # dp2. dropout 0.7
net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3') # fc3. 40

def get_loss(pred, label, end_points, reg_weight=0.001):
""" pred: B*NUM_CLASSES,
label: B, """
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label) # softmax
classify_loss = tf.reduce_mean(loss) # reduce mean

transform = end_points['transform'] # B x K x K
K = transform.get_shape()[1].value # transform
mat_diff = tf.matmul(transform, tf.transpose(transform, perm=[0,2,1]))
mat_diff -= tf.constant(np.eye(K), dtype=tf.float32)
mat_diff_loss = tf.nn.l2_loss(mat_diff) # loss 계산

# pointnet_seg.py
def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output BxNx50 """
with tf.variable_scope('transform_net1') as sc:
transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
point_cloud_transformed = tf.matmul(point_cloud, transform)
input_image = tf.expand_dims(point_cloud_transformed, -1)

net = tf_util.conv2d(input_image, 64, [1,3],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv1', bn_decay=bn_decay)
# 여기까지는 분류 네트워크와 구조 동일

net = tf_util.conv2d(concat_feat, 512, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv6', bn_decay=bn_decay)
net = tf_util.conv2d(net, 256, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv7', bn_decay=bn_decay)
net = tf_util.conv2d(net, 128, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv8', bn_decay=bn_decay)
net = tf_util.conv2d(net, 128, [1,1],
padding='VALID', stride=[1,1],
bn=True, is_training=is_training,
scope='conv9', bn_decay=bn_decay)

net = tf_util.conv2d(net, 50, [1,1],
padding='VALID', stride=[1,1], activation_fn=None,
scope='conv10')
net = tf.squeeze(net, [2]) # B x N x C

return net, end_points

# sem_seg\train.py --log_dir log6 --test_area 6
BASE_DIR = os.path.dirname(os.path.abspath(__file__))   # 경로 설정
ROOT_DIR = os.path.dirname(BASE_DIR)

parser = argparse.ArgumentParser()   # 옵션 설정
parser.add_argument('--gpu', type=int, default=0, help='GPU to use [default: GPU 0]')
...
parser.add_argument('--test_area', type=int, default=6, help='Which area to use for test, option: 1-6 [default: 6]')

MAX_NUM_POINT = 4096   # 점군 수
NUM_CLASSES = 13            # 13개 클래스 분류

ALL_FILES = provider.getDataFiles('indoor3d_sem_seg_hdf5_data/all_files.txt')   # 입력 인덱스 파일
room_filelist = [line.rstrip() for line in open('indoor3d_sem_seg_hdf5_data/room_filelist.txt')]

# Load ALL data
for h5_filename in ALL_FILES:
    data_batch, label_batch = provider.loadDataFile(h5_filename)   # 데이터, 라벨 파일 로딩
    data_batch_list.append(data_batch)
    label_batch_list.append(label_batch)

# 훈련 및 테스트 데이터, 라벨 설정
train_data = data_batches[train_idxs,...]
train_label = label_batches[train_idxs]
test_data = data_batches[test_idxs,...]
test_label = label_batches[test_idxs]

def get_learning_rate(batch):
    learning_rate = tf.train.exponential_decay(
                        BASE_LEARNING_RATE, # Base learning rate.
                        batch * BATCH_SIZE, # Current index into the dataset.
                        DECAY_STEP,          # Decay step.
                        DECAY_RATE,          # Decay rate.
                        staircase=True)
    learning_rate = tf.maximum(learning_rate, 0.00001) # CLIP THE LEARNING RATE!!
    return learning_rate

def get_bn_decay(batch):
    bn_momentum = tf.train.exponential_decay(
                      BN_INIT_DECAY,
                      batch*BATCH_SIZE,
                      BN_DECAY_DECAY_STEP,
                      BN_DECAY_DECAY_RATE,
                      staircase=True)
    bn_decay = tf.minimum(BN_DECAY_CLIP, 1 - bn_momentum)
    return bn_decay

def train():
    with tf.Graph().as_default():
        with tf.device('/gpu:'+str(GPU_INDEX)):
            pointclouds_pl, labels_pl = placeholder_inputs(BATCH_SIZE, NUM_POINT)
            is_training_pl = tf.placeholder(tf.bool, shape=())

            # Get model and loss
            pred = get_model(pointclouds_pl, is_training_pl, bn_decay=bn_decay)
            loss = get_loss(pred, labels_pl)
            tf.summary.scalar('loss', loss)

            correct = tf.equal(tf.argmax(pred, 2), tf.to_int64(labels_pl))
            accuracy = tf.reduce_sum(tf.cast(correct, tf.float32)) / float(BATCH_SIZE*NUM_POINT)
            tf.summary.scalar('accuracy', accuracy)

            # 훈련
            learning_rate = get_learning_rate(batch)
            tf.summary.scalar('learning_rate', learning_rate)
            if OPTIMIZER == 'momentum':
                optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=MOMENTUM)
            elif OPTIMIZER == 'adam':
                optimizer = tf.train.AdamOptimizer(learning_rate)
            train_op = optimizer.minimize(loss, global_step=batch)

            # Add ops to save and restore all the variables.
            saver = tf.train.Saver()

        merged = tf.summary.merge_all()
        train_writer = tf.summary.FileWriter(os.path.join(LOG_DIR, 'train'),
                                  sess.graph)
        test_writer = tf.summary.FileWriter(os.path.join(LOG_DIR, 'test'))

        for epoch in range(MAX_EPOCH):
            log_string('**** EPOCH %03d ****' % (epoch))
            sys.stdout.flush()

            train_one_epoch(sess, ops, train_writer)
            eval_one_epoch(sess, ops, test_writer)

            # Save the variables to disk.
            if epoch % 10 == 0:
                save_path = saver.save(sess, os.path.join(LOG_DIR, "model.ckpt")) # 훈련 모델 저장
                log_string("Model saved in file: %s" % save_path)

def train_one_epoch(sess, ops, train_writer):
    """ ops: dict mapping from string to tf ops """
    for batch_idx in range(num_batches):
        if batch_idx % 100 == 0:
            print('Current batch/total batch num: %d/%d'%(batch_idx,num_batches))
        start_idx = batch_idx * BATCH_SIZE
        end_idx = (batch_idx+1) * BATCH_SIZE

        feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
                     ops['labels_pl']: current_label[start_idx:end_idx],
                     ops['is_training_pl']: is_training,}
        summary, step, _, loss_val, pred_val = sess.run([ops['merged'], ops['step'], ops['train_op'], ops['loss'], ops['pred']],
                                         feed_dict=feed_dict)
        train_writer.add_summary(summary, step)
        pred_val = np.argmax(pred_val, 2)
        correct = np.sum(pred_val == current_label[start_idx:end_idx])
        total_correct += correct
        total_seen += (BATCH_SIZE*NUM_POINT)
        loss_sum += loss_val

def eval_one_epoch(sess, ops, test_writer):
    """ ops: dict mapping from string to tf ops """
    for batch_idx in range(num_batches):
        start_idx = batch_idx * BATCH_SIZE
        end_idx = (batch_idx+1) * BATCH_SIZE

        feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
                     ops['labels_pl']: current_label[start_idx:end_idx],
                     ops['is_training_pl']: is_training}
        summary, step, loss_val, pred_val = sess.run([ops['merged'], ops['step'], ops['loss'], ops['pred']],
                                      feed_dict=feed_dict)
        test_writer.add_summary(summary, step)
        pred_val = np.argmax(pred_val, 2)
        correct = np.sum(pred_val == current_label[start_idx:end_idx])

# sem_seg\ python -m pdb batch_inference.py --model_path ./log6/model.ckpt --dump_dir ./log6/dump --output_filelist ./log6/output_filelist.txt --room_data_filelist ./meta/area6_data_label.txt --visu
# -m pdb 로 실행 분석함.

def evaluate():
    is_training = False

    with tf.device('/gpu:'+str(GPU_INDEX)):
        pointclouds_pl, labels_pl = placeholder_inputs(BATCH_SIZE, NUM_POINT)
        is_training_pl = tf.placeholder(tf.bool, shape=())   # placeholder 생성

        # simple model
        pred = get_model(pointclouds_pl, is_training_pl)   # 모델 생성
        loss = get_loss(pred, labels_pl)
        pred_softmax = tf.nn.softmax(pred)

        # Add ops to save and restore all the variables.
        saver = tf.train.Saver()

    # 세션 생성
    config = tf.ConfigProto()
    sess = tf.Session(config=config)

    # 저장된 훈련 모델 복구.
    saver.restore(sess, MODEL_PATH)
    total_correct = 0
    total_seen = 0
    fout_out_filelist = open(FLAGS.output_filelist, 'w')
    for room_path in ROOM_PATH_LIST:
        out_data_label_filename = os.path.basename(room_path)[:-4] + '_pred.txt'
        out_data_label_filename = os.path.join(DUMP_DIR, out_data_label_filename)
        out_gt_label_filename = os.path.basename(room_path)[:-4] + '_gt.txt'
        out_gt_label_filename = os.path.join(DUMP_DIR, out_gt_label_filename)
        print(room_path, out_data_label_filename)
        # 공간 점군 파일에 대해 시멘틱 세그먼테이션 예측 반복
        a, b = eval_one_epoch(sess, ops, room_path, out_data_label_filename, out_gt_label_filename)

def eval_one_epoch(sess, ops, room_path, out_data_label_filename, out_gt_label_filename):
    total_seen_class = [0 for _ in range(NUM_CLASSES)]
    total_correct_class = [0 for _ in range(NUM_CLASSES)]
    current_data, current_label = indoor3d_util.room2blocks_wrapper_normalized(room_path, NUM_POINT) # room_path 공간 점군 파일을 읽고 정규화함. 파일 형식은 npy(numpy)로 xyzrgbl(label)이 저장되어 있음.

    for batch_idx in range(num_batches):
        start_idx = batch_idx * BATCH_SIZE
        end_idx = (batch_idx+1) * BATCH_SIZE
        cur_batch_size = end_idx - start_idx

        feed_dict = {ops['pointclouds_pl']: current_data[start_idx:end_idx, :, :],
                     ops['labels_pl']: current_label[start_idx:end_idx],
                     ops['is_training_pl']: is_training}
        loss_val, pred_val = sess.run([ops['loss'], ops['pred_softmax']],
                                      feed_dict=feed_dict)

        if FLAGS.no_clutter:
            pred_label = np.argmax(pred_val[:,:,0:12], 2) # BxN
        else:
            pred_label = np.argmax(pred_val, 2) # BxN
        # 해당 공간의 원본 점군과 예측된 라벨 점군들을 OBJ 파일로 저장
        for b in range(BATCH_SIZE):
            pts = current_data[start_idx+b, :, :]
            l = current_label[start_idx+b,:]
            for i in range(NUM_POINT):
                color = indoor3d_util.g_label2color[pred[i]]
                color_gt = indoor3d_util.g_label2color[current_label[start_idx+b, i]]
                if FLAGS.visu:
                    fout.write('v %f %f %f %d %d %d\n' % (pts[i,6], pts[i,7], pts[i,8], color[0], color[1], color[2]))
                    fout_gt.write('v %f %f %f %d %d %d\n' % (pts[i,6], pts[i,7], pts[i,8], color_gt[0], color_gt[1], color_gt[2]))
                fout_data_label.write('%f %f %f %d %d %d %f %d\n' % (pts[i,6], pts[i,7], pts[i,8], pts[i,3], pts[i,4], pts[i,5], pred_val[b,i,pred[i]], pred[i]))

실행 구조 분석을 위해 pdb를 이용해 디버깅해 본다. 다음과 같이 예측된 결과 및 원래 데이터는 ..._pred.txt, ..._gt.txt에 저장된다.

eval_one_epoch() 함수를 아래와 같이 디버깅해본다.

원본 입력되는 포인트 클라우드는 다음과 같이 room2blocks함수에서 [12][1][4096] 형태로 정규화되고 가공된다. 이 데이터가 훈련용이 된다. 다음 경우에는 입력데이터 점군은 data[536617][3] 이지만 정규화된 결과는 block_data_list[12][1][4096]이 된다.

입력되는 current_data 데이터 구조는 [1][4096][9] 이다. 라벨은 current_label[start_idx:end_idx] 에 저장되어 있고, 크기는 4096이다.

is_training이 false이므로, prediction으로 session run이 실행된다. 결과는 다음과 같이 loss, prediction value가 계산된다.
         loss_val, pred_val = sess.run([ops['loss'], ops['pred_softmax']],
                                      feed_dict=feed_dict)

소스 149, 157라인에서 예측 라벨과 입력된 라벨값을 비교해 동일하면 correct에 표시된 라벨과 예측된 라벨의 일치된 수를 다음과 같이 누적한다.
        correct = np.sum(pred_label == current_label[start_idx:end_idx,:])

신뢰도 테스트
A. 코드 변수 테스트
딥러닝 모델의 신뢰성 테스트를 위해, 몇몇 코드에 변수를 주입하고 테스트해 본다. 다음은 batch_inference로 시멘틱 예측을 할 때, 정확도 비교를 위해 얻는 current_label을 0으로 한 후, 실제 예측이 되는 지 확인해 본 것이다. 해당 변수에 영향을 받지 않고, 큰 문제 없이 예측된다는 것을 확인할 수 있다.

원본 입력 점군(좌)과 시멘틱 세그먼테이션된 결과(우)

B. 자체 스캔된 포인트 클라우드 데이터 테스트
직접 세원상가 건물 내부 공간을 스캔한 점군 데이터 S를 테스트해 본다. 이 점군은 덕트, 파이프 및 설비가 포함된 공간이며, 벽체, 바닥, 천장에 쉐도우 영역이 있는 불완전한 점군이다.

테스트 점군 입력을 위해, pointnet/data 폴더 내에 스캔된 점군 데이터가 있는 area_sewon_1.txt 파일을 복사하고, 다음과 같은 xyzrgb 형태로 가공해 수정하였다.
-15.609 39.505 2.214 71 64 54
-15.634 39.518 2.198 68 64 52
-15.622 39.514 2.195 70 61 52

입력되는 점군 데이터는 완전 무작위한 순서로 입력된다. 이는 기존 학습 및 테스트용 점군이 세그먼트 단위로 입력되었던 것에 비해 크게 다르다. 코드는 실제로 점군의 크기 등을 정규화해서 사용하는 것으로 되어 있으나, 이 부분은 논문 등에서도 불명확한 부분이다. 입력해 예측한 결과는 다음과 같으며, 결과가 좋지 않다. 이런 상황에서 기존 훈련용 데이터(스탠포드 스캔 데이터)도 무작위로 섞은 후 테스트해 볼 필요가 있다.

C. 무작위 순서 스탠포드 건물 점군 데이터 테스트
데이터 shuffle로 무작위로 섞여서 스탠포드 건물 점군 데이터를 예측 테스트해본다.

shuffle 된점군 입력 데이터

시멘틱 세그먼테이션 예측 결과

결과와 같이 서플링된 점군 입력에도 동일한 예측 결과를 보인다.

D. B 테스트 원인 분석을 위한 S 점군 비훈련 데이터 제거 후 테스트
테스트 B의 경우에는 스탠포드 스캔 데이터와 다른 차이가 있어 예측이 안되는 것이라 가정할 수 있다. 두 데이터의 차이는 다음과 같다.
1. 스탠포드 점군 데이터는 데이터 누락 및 쉐도우가 크게 없으며, 일정한 점군 밀도를 가지고 있고, 원점이 0,0,0에 근접되어 있다. 실내 의자, 침대, 벽체, 바닥, 천장, 부착물 등은 모두 학습된 것만 배열되어 있다.
2. S점군은 설비, 덕트, 파이프가 포함되어 있고, 쉐도우 영역이 많은 불완전한 포인트 클라우드이다. 아울러, 점군에서 멀리 떨어진 곳에 노이즈가 있어, 전체 최대 영역이 크게 설정되어 있다.

이를 고려해, S점군의 천장부 설비를 모두 삭제하고, 훈련될 점군만 추출 및 가공하여 테스트 점군을 다음과 같이 준비한다.

그리고, 다음과 같이 S점군을 입력해 예측해 보았다.

결과는 다음과 같이 테스트B보다 많이 개선되었다.

예측 결과(청색=바닥. 하늘색=벽체. 천정=녹색. 노란색=보. 적색=의자)

E. 3D 모델에서 점군 생성 후 테스트
3D 모델에서 10cm 간격으로 50% 노이즈를 생성하여 점군 파일을 만든다. 이를, 입력해 테스트해 보았다.

입력 점군 모델

예측 결과는 다음과 같다. 바닥, 천정, 벽체가 올바르게 시멘틱 세그먼테이션되었으나, 몇몇 부분이 보와 같이 다른 세그먼트로 분류된 것을 확인할 수 있다.

시멘틱 세그먼테이션 결과

F. LiDAR 스캔한 정합 점군 테스트
LiDAR 스캔한 정합 점군 테스트를 수행한다.

다음은 예측 결과이다. 바닥, 벽체 일부는 제대로 예측되지 않았다.

G. LiDAR 전체 스캔 정합 점군 테스트
LiDAR 전체 스캔 정합 점군 테스트를 수행한다.

예측 결과는 다음과 같다. MEP는 훈련되지 않았던 부분이므로, 가장 유사한 보로 세그먼테이션되어 있다.

PointNet 스캔 데이터 테스트 결과

PointNet 테스트 결과 다음과 같은 부분에 문제가 있음을 확인할 수 있다.

훈련되지 않은 점군은 가장 유사한 라벨로 세그먼테이션됨
불완전 점군은 제대로 예측되지 않음
점군 밀도 및 용량이 너무 크면 제대로 예측 안되는 경향이 있음
3차원 모델에서 생성된 점군 예측 정확도는 높음
스캔된 점군의 벽은 보, 기둥과 구분되지 않은 부분이 있음

에러 솔류션
텐서플로우 딥러닝 모델 훈련 시 GPU 메모리 할당 에러
텐서플로우 딥러닝 모델 훈련 시 GPU 모드에서 메모리 할당 에러가 다음처럼 발생할 수 있다(본인의 경우 NVIDIA 960M 환경에서 에러 발생함).
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

이 경우, 아래와 같이 훈련시 batch_size 혹은 num_units 를 줄여, GPU 메모리에 복사되는 데이터 크기를 줄이면 문제를 해결할 수 있다. 참고로 포인트넷 batch_size 기본값은 32이며, 이 값은 train.py에 argument로 설정되어 있다.

python train.py --batch_size=16

batch_size를 설정하는 것은 GPU 메모리 리소스와 성능 간 절충점이 있다. 그래서, 경험적으로 판단해야 한다. 좀 더 자세한 내용은 Batch Size in Deep Learning 링크를 참고한다.

이렇게 해도 메모리 에러가 발생하면 어쩔 수 없이, CPU 모드로 환경을 전환해 딥러닝 훈련을 한다. 단, 이 경우, GPU 보다 10배 이상 훈련 시간이 걸릴 수도 있다.

참고로, 포인트넷은 GPU 8G 메모리를 지원하는 GTX 1070에서는 큰 문제 없이 학습되었다. NVIDIA 960M은 2G 메모리며, 포인트넷은 그 이상 메모리를 사용한다.

tf_sampling_so.so 모듈 에러
아래와 같이, 포인트넷 폴더 안의 각 폴더에서 컴파일한다.

$ cd tf_ops/sampling

$ sh tf_sampling_compile.sh

$ cd tf_ops/grouping

$ sh tf_grouping_compile.sh

tensorflow include file 에러 발생시에는 다음과 같이 sh파일을 수정한다. 단, 녹색 표시된 부분은 각자 개발환경에 맞게 수정해야 한다.

#/bin/bash

/usr/local/cuda-10.2/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

# TF1.2

g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /home/ktw/.local/lib/python3.8/site-packages/tensorflow/include/ -I /usr/local/cuda-10.2/include -lcudart -L /usr/local/cuda-10.2/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0

디버그
npy 데이터 덤프를 위해 다음과 같이 간단한 파이썬 프로그램을 코딩하였다.

import numpy as np
ds = np.load('/home/ktw/Documents/pointnet/data/stanford_indoor3d/Area_6_hallway_1.npy')

for r in ds:
print(r)

PyCharm 통합개발환경으로 디버깅해보면 좀 더 편리하게 데이터 구조를 확인할 수 있다. 실행 및 데이터 덤프 결과는 다음과 같다.

레퍼런스

P.S 다시 밀어 닥친 회사일, 행정으로 인해 연구가 밀리고 있다. 한번 밀리면 일주일, 한달은 금방 지나간다. 연구에 집중하다 이렇게 되면 다시 시작하는 게 쉽지 않다. 무슨 부귀영화 누리자고 이러는 지 모르겠다. 산업 기술 연구하고 개발하는 것이 연구원 본연의 임무이고, 세금 잘 사용하는 것 아닌가. 남이 만든 물건 팔고 빠지는 영업맨처럼 쉽게 사는 유혹에 빠지면 몸과 정신은 편하지만, 연구자의 끝은 너무 아쉬울 것 같다.