-
#1Collaborative Storytelling with Human Actors and AI Narrators: An Event Report AnalysisAnalysis of using GPT-3 as a co-narrator in live improvisational theatre. Covers methodology, audience/performer feedback, and implications for human-AI creative collaboration.
-
#2audio-novel - Technical Documentation and ResourcesComprehensive technical documentation and resources about audio-novel technology and applications.
-
#3Cross-Modal Audio Retrieval with Natural Language QueriesResearch on retrieving audio using free-form natural language queries, introducing new benchmarks and baselines for cross-modal audio retrieval.
-
#4AudioBoost: Enhancing Audiobook Discovery in Spotify Search via LLM-Generated Synthetic QueriesAnalysis of AudioBoost, a system using LLMs to generate synthetic queries from audiobook metadata to improve retrieval and query suggestions in Spotify's cold-start scenario.
-
#5Audiobook-CC: A Framework for Controllable Long-Context Multicast Audiobook GenerationAnalysis of Audiobook-CC, a novel speech synthesis framework for generating coherent, emotionally expressive multicast audiobooks with fine-grained control and long-context modeling.
-
#6Audiobook-CC: A Framework for Controllable Long-Context Multicast Audiobook GenerationAnalysis of Audiobook-CC, a novel TTS framework for generating coherent, emotionally expressive, and contextually consistent multicast audiobooks with fine-grained control.
-
#7Personalized Audiobook Recommendations at Spotify Through Graph Neural NetworksSpotify's innovative 2T-HGNN system using Graph Neural Networks for audiobook recommendations, achieving +46% start rate and +23% streaming rate improvements.
-
#8VINA: Learning to Ground Instructional Articles in Videos through NarrationsA novel approach for weakly-supervised temporal grounding of procedural steps in instructional videos using multi-modal alignment of frames, narrations, and step descriptions from wikiHow.
-
#9MAMLCon: Meta-Learning for Continual Few-Shot Spoken Word ClassificationA novel meta-learning approach (MAMLCon) that mitigates catastrophic forgetting in continual few-shot learning for spoken word classification, outperforming existing methods like OML.
-
#10Mobile Audiobooks for EFL Listening Comprehension: A Framework for College StudentsAnalysis of integrating mobile audiobooks to develop listening comprehension skills in EFL college students, covering advantages, selection criteria, teaching phases, and assessment.
-
#11Movie101v2: An Improved Benchmark for Automatic Movie Narration GenerationAnalysis of Movie101v2, a large-scale bilingual dataset for movie narration, including its three-stage task roadmap, baseline evaluations, and future research directions.
-
#12Effect of Music and Lyrics on Spoken-Word Recognition: Analysis and ImplicationsAn analysis of research investigating how background music with and without lyrics impacts spoken-word recognition, with implications for social settings and future work.
-
#13WonderFlow: Narration-Centric Design of Animated Data VideosAn interactive authoring tool that simplifies the creation of animated data videos by linking narration to chart animations and providing structure-aware animation effects.
-
#14Narration Generation for Cartoon Videos: Task Formalization, Dataset, and ModelsThis paper introduces the novel task of narration generation for videos, presents a dataset from Peppa Pig, and proposes models for timing and content generation.
-
#15A Phonetic Model of Non-Native Spoken Word Processing: Analysis and InsightsAnalysis of a computational model exploring phonetic perception's role in non-native word processing, challenging traditional phonological explanations.
-
#16Phonetic and Semantic Embedding of Spoken Words with Applications in Spoken Content RetrievalA two-stage framework for embedding spoken words with both phonetic and semantic information, enabling advanced spoken document retrieval beyond simple term matching.
-
#17Prosody Analysis of Audiobooks: NLP Models for Enhanced Text-to-SpeechResearch on predicting prosody attributes (pitch, volume, rate) from narrative text using NLP and language models, improving TTS for audiobook generation.
-
#18Classifying Unreliable Narrators with Large Language ModelsResearch on computational identification of unreliable narrators using LLMs, featuring the TUN A dataset and classification of intra-narrational, inter-narrational, and inter-textual unreliability.
-
#19Weakly-Supervised Action Detection Guided by Audio NarrationA research paper exploring how to use noisy audio narration as weak supervision to train video action detection models, reducing annotation costs while leveraging multimodal features.
-
#20MultiActor-Audiobook: Zero-Shot Generation with Multiple SpeakersA zero-shot approach for generating expressive audiobooks using multimodal speaker personas and LLM-based script instructions, eliminating costly training and manual annotation.
-
#21MultiActor-Audiobook: Zero-Shot Generation with Faces and VoicesA technical analysis of MultiActor-Audiobook, a novel zero-shot system for generating expressive audiobooks using multimodal speaker personas and LLM-based script instructions.
Last updated: 2026-01-08 15:31:31