[2022-02-07] 오늘의 자연어처리

[2022-02-07] 오늘의 자연어처리

2022. 2. 7. 10:30ㆍpaper-of-the-day

Joint Speech Recognition and Audio Captioning

Speech samples recorded in both indoor and outdoor environments are often contaminated with secondary audio sources. Most end-to-end monaural speech recognition systems either remove these background sounds using speech enhancement or train noise-robust models. For better model interpretability and holistic understanding, we aim to bring together the growing field of automated audio captioning (AAC) and the thoroughly studied automatic speech recognition (ASR). The goal of AAC is to generate natural language descriptions of contents in audio samples. We propose several approaches for end-to-end joint modeling of ASR and AAC tasks and demonstrate their advantages over traditional approaches, which model these tasks independently. A major hurdle in evaluating our proposed approach is the lack of labeled audio datasets with both speech transcriptions and audio captions. Therefore we also create a multi-task dataset by mixing the clean speech Wall Street Journal corpus with multiple levels of background noises chosen from the AudioCaps dataset. We also perform extensive experimental evaluation and show improvements of our proposed methods as compared to existing state-of-the-art ASR and AAC methods.

실내 및 실외 환경 모두에서 녹음된 음성 샘플은 종종 2차 오디오 소스에 오염되었습니다. 대부분의 종단 간 모노럴 스피치 인식 시스템은 음성을 사용하여 이러한 배경 소리를 제거하거나 제거한다. 소음 방지 모델 개선 또는 열차. 더 나은 모형 해석 및 전체론적 이해, 우리는 성장하는 자동화 분야를 하나로 모으는 것을 목표로 한다. 오디오 캡션(AAC) 및 철저하게 학습된 자동 음성 인식 (ASR). AAC의 목표는 콘텐츠의 자연어 설명을 생성하는 것이다. 오디오 샘플로. 우리는 엔드 투 엔드 공동 모델링을 위한 몇 가지 접근법을 제안한다. ASR 과 AAC 과제의 장점을 보여주고 있다. 접근방식, 이러한 작업을 독립적으로 모델링합니다. 평가의 주요 장애물 제안된 접근법은 두 가지 음성이 모두 있는 레이블링된 오디오 데이터 세트의 부족이다. 녹음 및 오디오 캡션. 따라서 우리는 또한 다중 작업을 만듭니다. 월 스트리트 저널 말뭉치를 여러 개 섞어서 데이터 세트 AudioCaps 데이터 집합에서 선택한 배경 잡음 수준. 공연도 하고. 광범위한 실험 평가와 우리가 제안한 방법의 개선을 보여준다. 기존의 최첨단 ASR 및 AAC 방법과 비교된다.

A Survey on Retrieval-Augmented Text Generation

Recently, retrieval-augmented text generation attracted increasing attention of the computational linguistics community. Compared with conventional generation models, retrieval-augmented text generation has remarkable advantages and particularly has achieved state-of-the-art performance in many NLP tasks. This paper aims to conduct a survey about retrieval-augmented text generation. It firstly highlights the generic paradigm of retrieval-augmented generation, and then it reviews notable approaches according to different tasks including dialogue response generation, machine translation, and other generation tasks. Finally, it points out some important directions on top of recent methods to facilitate future research.

최근, 검색 증강 텍스트 생성은 점점 더 많은 관심을 끌었다. 컴퓨터 언어학계의 연구원이죠 기존과 비교 생성 모델, 검색-수정 텍스트 생성은 주목할 만하다. 장점, 특히 많은 분야에서 최첨단 성능을 달성했습니다. NLP 작업. 본 논문은 검색 증강 텍스트에 대한 조사를 수행하는 것을 목표로 한다. 시대 그것은 먼저 검색 증강의 일반적인 패러다임을 강조한다. 생성, 그리고 그것은 다른 작업에 따른 주목할 만한 접근법을 검토한다. 대화 응답 생성, 기계 번역 및 기타 포함 생성 작업. 마지막으로, 그것은 위에 몇 가지 중요한 방향을 가리킵니다. 향후 연구를 촉진하기 위한 최근의 방법들

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to the lack of annotated datasets. In this paper, we present the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set. The dataset and our experiments are available at this https URL.

질문 답변(QA)은 주어진 내용을 기계가 이해하는 작업입니다. 문서와 답을 찾기 위한 질문. 의 인상적인 진보에도 불구하고 NLP 영역, 특히 영어가 아닌 사람들에게 QA는 여전히 어려운 문제입니다. 주석이 달린 데이터 집합이 없기 때문에 언어. 이 논문에서, 우리는 다음을 제시한다. 일본어 질문 응답 데이터 세트, JaQuAD, 사람이 주석을 단다. JaQuAD는 일본어에 대한 39,696개의 추출형 질문-응답 쌍으로 구성되어 있다. 위키백과 문서. F1에 대해 78.92%를 달성하는 기준선 모델을 미세 조정했다. 시험 세트의 EM 점수 및 63.38%입니다. 데이터 세트와 우리의 실험은 이 https URL에서 사용할 수 있습니다.

'paper-of-the-day' 카테고리의 다른 글

[2022-02-09] 오늘의 자연어처리 (0)	2022.02.09
[2022-02-08] 오늘의 자연어처리 (0)	2022.02.08
[2022-02-04] 오늘의 자연어처리 (0)	2022.02.04
[2022-02-03] 오늘의 자연어처리 (0)	2022.02.03
[2022-01-28] 오늘의 자연어처리 (0)	2022.01.28

codlingual

codlingual

태그

최근글

댓글

공지사항

아카이브

Joint Speech Recognition and Audio Captioning

A Survey on Retrieval-Augmented Text Generation

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

'paper-of-the-day' 카테고리의 다른 글

관련글

티스토리툴바