-
#1Collaborative Storytelling with Human Actors and AI Narrators: An Event Report AnalysisAnalysis of using GPT-3 as a co-narrator in live improvisational theatre. Covers methodology, audience/performer feedback, and implications for human-AI creative collaboration.
-
#2Assessing Language Models' Worldview for Fiction GenerationAnalysis of LLMs' ability to maintain consistent fictional worlds, revealing limitations in narrative coherence and state retention for creative writing.
-
#3audio-novel - Technical Documentation and ResourcesComprehensive technical documentation and resources about audio-novel technology and applications.
-
#4Cross-Modal Audio Retrieval with Natural Language QueriesResearch on retrieving audio using free-form natural language queries, introducing new benchmarks and baselines for cross-modal audio retrieval.
-
#5Conceptualising the Audiobook Experience: A Theoretical FrameworkAn analysis of the theoretical framework for conceptualizing differences between reading printed books and listening to audiobooks, emphasizing mobile listening practices.
-
#6AudioBoost: Enhancing Audiobook Discovery in Spotify Search via LLM-Generated Synthetic QueriesAnalysis of AudioBoost, a system using LLMs to generate synthetic queries from audiobook metadata to improve retrieval and query suggestions in Spotify's cold-start scenario.
-
#7AudioBoost: Enhancing Audiobook Discovery in Spotify Search via LLM-Generated Synthetic QueriesAnalysis of AudioBoost, a system using Large Language Models to generate synthetic queries for improving audiobook retrievability in Spotify's search engine during cold-start scenarios.
-
#8Audiobook-CC: A Framework for Controllable Long-Context Multicast Audiobook GenerationAnalysis of Audiobook-CC, a novel speech synthesis framework for generating coherent, emotionally expressive multicast audiobooks with fine-grained control and long-context modeling.
-
#9Audiobook-CC: A Framework for Controllable Long-Context Multicast Audiobook GenerationAnalysis of Audiobook-CC, a novel TTS framework for generating coherent, emotionally expressive, and contextually consistent multicast audiobooks with fine-grained control.
-
#10Analysis of 'Digital Audiobooks: New Media, Users, and Experiences' - A Media Studies PerspectiveA critical analysis of the book review on 'Digital Audiobooks' exploring mediatization theory, post-phenomenology, and the evolving landscape of audio-based literature consumption.
-
#11End-to-End Automatic Speech Translation of Audiobooks: Corpus, Models & AnalysisAnalysis of end-to-end speech-to-text translation models on an augmented audiobook corpus, exploring training scenarios and model efficiency.
-
#12Music and Levels of Narration in Film: A Narratological AnalysisAn in-depth analysis of film music through the lens of narratology, exploring how music functions across different narrative levels in cinema.
-
#13Personalized Audiobook Recommendations at Spotify Through Graph Neural NetworksSpotify's innovative 2T-HGNN system using Graph Neural Networks for audiobook recommendations, achieving +46% start rate and +23% streaming rate improvements.
-
#14VINA: Learning to Ground Instructional Articles in Videos through NarrationsA novel approach for weakly-supervised temporal grounding of procedural steps in instructional videos using multi-modal alignment of frames, narrations, and step descriptions from wikiHow.
-
#15J-MAC: Japanese Multi-Speaker Audiobook Corpus for Speech SynthesisAnalysis of J-MAC corpus construction methodology, technical contributions, evaluation results, and future directions for expressive audiobook speech synthesis.
-
#16MAMLCon: Meta-Learning for Continual Few-Shot Spoken Word ClassificationA novel meta-learning approach (MAMLCon) that mitigates catastrophic forgetting in continual few-shot learning for spoken word classification, outperforming existing methods like OML.
-
#17Mobile Audiobooks for EFL Listening Comprehension: A Framework for College StudentsAnalysis and framework for integrating Mobile Audiobooks (MABs) to develop listening comprehension skills in EFL college students, covering advantages, selection, implementation, and assessment.
-
#18Mobile Audiobooks for EFL Listening Comprehension: A Framework for College StudentsAnalysis and framework for integrating mobile audiobooks to develop listening comprehension skills in EFL college students, covering advantages, selection, implementation, and assessment.
-
#19Movie101v2: An Improved Benchmark for Automatic Movie Narration GenerationAnalysis of Movie101v2, a large-scale bilingual dataset for movie narration, including its three-stage task roadmap, baseline evaluations, and future research directions.
-
#20Effect of Music and Lyrics on Spoken-Word Recognition: Analysis and ImplicationsAn analysis of research investigating how background music with and without lyrics impacts spoken-word recognition, with implications for social settings and future work.
-
#21WonderFlow: Narration-Centric Design of Animated Data VideosAn interactive authoring tool that simplifies the creation of animated data videos by linking narration to chart animations and providing structure-aware animation effects.
-
#22Narration Generation for Cartoon Videos: Task Formalization, Dataset, and ModelsA research paper introducing the task of automatic narration generation for videos, presenting a new dataset from Peppa Pig, and proposing models for timing and content generation.
-
#23A Phonetic Model of Non-Native Spoken Word Processing: Analysis and InsightsAnalysis of a computational model exploring phonetic perception's role in non-native word processing, challenging traditional phonological explanations.
-
#24Phonetic and Semantic Embedding of Spoken Words with Applications in Spoken Content RetrievalA two-stage framework for embedding spoken words with both phonetic and semantic information, enabling advanced spoken document retrieval beyond simple term matching.
-
#25Prosody Analysis of Audiobooks: NLP Models for Enhanced Text-to-SpeechResearch on predicting prosody attributes (pitch, volume, rate) from narrative text using NLP and language models, improving TTS for audiobook generation.
-
#26Classifying Unreliable Narrators with Large Language ModelsResearch on computational identification of unreliable narrators using LLMs, featuring the TUN A dataset and classification of intra-narrational, inter-narrational, and inter-textual unreliability.
-
#27Weakly-Supervised Action Detection Guided by Audio NarrationA research paper exploring how to use noisy audio narration as weak supervision to train action detection models, reducing annotation costs while leveraging multimodal video features.
-
#28MultiActor-Audiobook: Zero-Shot Generation with Multiple SpeakersA zero-shot approach for generating expressive audiobooks using multimodal speaker personas and LLM-based script instructions, eliminating costly training and manual annotation.
-
#29MultiActor-Audiobook: Zero-Shot Generation with Faces and VoicesA technical analysis of MultiActor-Audiobook, a novel zero-shot system for generating expressive audiobooks using multimodal speaker personas and LLM-based script instructions.
Last updated: 2026-02-22 12:01:10