이 글은 LLM 학습 데이터 작성 방법을 간략히 정리한다.
레퍼런스
- Question-Generation: Generating multiple choice questions from text using Machine Learning
- question-answer-generation: Question-answer generation from text
- question_extractor: Generate question/answer training pairs out of raw text.
- qag-web: Website of Question Answer Generation
- GenQuest-RAG: A Question Generation Application leveraging RAG and Weaviate vector store to be able to retrieve relative contexts and generate a more useful answer-aware questions
- question-generator
- lamini-ai/docs-to-qa
- microsoft/llm-data-creation: Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"
- facebookresearch/QA-Overlap: Code to support the paper "Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets"
- nalinrajendran/synthetic-LLM-QA-dataset-generator: Create synthetic datasets for training and testing Language Learning Models (LLMs) in a Question-Answering (QA) context
- night-chen/ToolQA: ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels (easy/hard) across eight real-life scenarios
- longluu/Medical-QA-LLM: Train LLMs on extractive Question-Answering in biomedical domain
- brmson/dataset-factoid-webquestions: WebQuestions QA Benchmarking Dataset
- ad-freiburg/large-qa-datasets: A collection of large question answering datasets
댓글 없음:
댓글 쓰기