AudioBoost: Enhancing Audiobook Discovery in Spotify Search via LLM-Generated Synthetic Queries

1. Introduction & Problem Statement
2. The AudioBoost System
3. Technical Implementation & Evaluation
4. Core Insights & Analyst Perspective
5. Technical Details & Mathematical Framework
6. Analysis Framework: A Non-Code Case Study
7. Future Applications & Research Directions
8. References

1. Introduction & Problem Statement

Spotify's expansion into audiobooks created a classic cold-start problem. The platform's search and recommendation systems, optimized for years of music and podcast interactions, suffered from a severe retrievability bias against the new content type. Users were not accustomed to searching for audiobooks, and the systems lacked sufficient interaction data to accurately rank them against established content. This created a vicious cycle: low visibility led to few interactions, which in turn reinforced poor ranking. The core challenge was twofold: 1) Inspiring users to type exploratory, topic-based queries for audiobooks (e.g., "psychological thrillers set in Scandinavia") instead of specific titles, and 2) Augmenting retrieval systems to effectively handle these broad, exploratory queries for which little real user data existed.

2. The AudioBoost System

AudioBoost is Spotify's engineered response to this cold-start challenge. It is not merely a ranking tweak but a systemic intervention using synthetic data to bootstrap discovery.

2.1 Core Methodology

The system leverages the rich, structured metadata associated with each audiobook (title, author, publisher, genre, synopsis, tropes). This metadata is the seed for generation.

2.2 Synthetic Query Generation with LLMs

A Large Language Model (LLM) is prompted to generate multiple plausible user search queries conditioned on this metadata. For example, given metadata for a sci-fi audiobook about AI, the LLM might generate queries like: "best AI dystopian novels," "sci-fi books about consciousness," "futuristic stories about technology." This process artificially creates the "long-tail" of search traffic that would naturally develop over time.

2.3 Dual-Indexing Strategy

The genius of AudioBoost lies in its dual application:

Query AutoComplete (QAC): Synthetic queries are injected as suggestions, directly influencing user behavior by planting exploratory search ideas.
Search Retrieval Engine: The same synthetic queries are indexed against the audiobook, improving its match score for similar real user queries, thereby increasing its retrievability.

This creates a positive feedback loop: better suggestions lead to more exploratory queries, which are then better served by the retrieval system.

Key Results at a Glance

Audiobook Impressions: +0.7%
Audiobook Clicks: +1.22%
Exploratory Query Completions: +1.82%

Source: Online A/B Test, AudioBoost System

3. Technical Implementation & Evaluation

3.1 Offline Evaluation Metrics

Before the live test, the quality and utility of synthetic queries were assessed offline. Metrics likely included:

Query Relevance: Human or model-based evaluation of whether a generated query is a plausible search for the associated audiobook.
Retrievability Coverage: Measuring the increase in the number of audiobooks that appear in top-K search results for a basket of test queries after indexing synthetic data.
Diversity & Novelty: Ensuring generated queries cover a broad range of search intents (topic, genre, trope, mood) beyond obvious title/author matches.

The paper indicates synthetic queries were shown to be of "high quality" and increased retrievability in this offline setting.

3.2 Online A/B Test Results

The ultimate validation was a controlled online A/B test. The treatment group experienced search with AudioBoost enabled. The results were statistically significant and operationally meaningful:

+0.7% Audiobook Impressions: More audiobooks were surfaced in search results.
+1.22% Audiobook Clicks: Users engaged with these audiobook results more.
+1.82% Exploratory Query Completions: Critically, users adopted the system-suggested exploratory queries at a higher rate, proving the behavioral nudge worked.

These metrics confirm AudioBoost successfully broke the cold-start cycle.

3.3 Key Performance Indicators (KPIs)

The chosen KPIs are expertly aligned with the business and product goals: Discovery (Impressions), Engagement (Clicks), and Query Behavior Shift (Exploratory Completions).

4. Core Insights & Analyst Perspective

Core Insight: Spotify's AudioBoost is a masterclass in applied AI pragmatism. It reframes the cold-start problem not as a lack of data, but as a lack of signal. Instead of waiting for users to generate that signal organically (a losing proposition for a new catalog), it uses LLMs to simulate user intent at scale, effectively bootstrapping the marketplace. This is a more sophisticated evolution of traditional content-based filtering, supercharged by generative AI's ability to understand and mimic human language nuances.

Logical Flow: The system's logic is elegantly circular and self-reinforcing. Metadata → Synthetic Queries → Improved QAC & Retrieval → User Engagement → Real Data → Improved Models. It's a engineered shortcut to the network effects that platforms like Spotify rely on. This approach is reminiscent of techniques in computer vision like CycleGAN (Zhu et al., 2017), which learns to translate between domains (e.g., horses to zebras) without paired examples. Similarly, AudioBoost learns to "translate" between the domain of audiobook metadata and the domain of user search intent, without relying on paired (query, audiobook) interaction data at the outset.

Strengths & Flaws: The primary strength is its immediate deployability and impact, as shown by the positive A/B test. It's a low-risk, high-reward intervention that works within existing infrastructure (QAC, retrieval index). However, the approach has inherent flaws. First, it risks creating an "echo chamber of synthesis"—if the LLM's query generation is biased or limited, it could narrow, rather than expand, the discovery landscape. Second, it potentially decouples retrieval from genuine user interest in the short term; a book may be retrieved for a synthetic query no real user cares about. Third, as noted by research from institutions like the Stanford HAI, over-reliance on synthetic data can lead to model collapse or unexpected drift if not carefully managed with real data feedback loops.

Actionable Insights: For product leaders, the takeaway is clear: Generative AI is your ultimate cold-start weapon. The blueprint is replicable across domains—new product categories, new geographic markets, new content formats. The key is to focus on the quality and diversity of the generative process. Invest in prompt engineering, curation, and validation of synthetic outputs as a first-class engineering task. Furthermore, plan for the obsolescence of the system; the goal of AudioBoost should be to accelerate the collection of real data so that the synthetic layer can be gradually phased out or down-weighted, transitioning to a fully organic discovery ecosystem. This is not a permanent crutch, but a strategic accelerator.

5. Technical Details & Mathematical Framework

While the paper does not delve into complex formulas, the core retrieval enhancement can be conceptualized. Let $R(q, d)$ be the relevance score of document (audiobook) $d$ for query $q$ in the original model. In a cold-start, for an audiobook $d_a$ and an exploratory query $q_e$, $R(q_e, d_a)$ is low due to sparse data.

AudioBoost generates a set of synthetic queries $Q_s = \{q_{s1}, q_{s2}, ..., q_{sn}\}$ for $d_a$. The retrieval system is then augmented such that the new relevance score $R'(q, d)$ considers matches to these synthetic queries. A simplified view could be:

$R'(q_e, d_a) = R(q_e, d_a) + \lambda \cdot \sum_{q_s \in Q_s} \text{sim}(q_e, q_s) \cdot I(d_a, q_s)$

Where:

$\text{sim}(q_e, q_s)$ is a semantic similarity score between the user's exploratory query and a synthetic query (e.g., from an embedding model).
$I(d_a, q_s)$ is an indicator or strength of association between $d_a$ and $q_s$ (established by the LLM generation).
$\lambda$ is a blending parameter controlling the influence of the synthetic signal, which should decay as real data accumulates.

This framework shows how synthetic queries act as a bridge, boosting $d_a$'s score for $q_e$ via semantic similarity to its pre-generated synthetic counterparts.

6. Analysis Framework: A Non-Code Case Study

Scenario: A new streaming platform "StreamFlow" launches a stand-up comedy specials category. It faces the same cold-start problem as Spotify with audiobooks.

Applying the AudioBoost Framework:

Identify Metadata: For each comedy special: Comedian name, special title, tags (e.g., observational, political, surreal), transcript keywords, recording year, audience vibe (raucous, intimate).
Define Query Generation Prompts: Engineer LLM prompts like: "Given a comedy special by [Comedian] titled [Title] with tags [Tags], generate 10 diverse search queries a user might type to find similar comedy content. Include queries about style, topic, mood, and comparable comedians."
Generate & Index: For a special tagged "political satire," "2020s," the LLM generates: "funny political commentary," "best satire on current events," "comedians like [Comedian]," "stand-up about modern society." These are indexed.
Dual Application: These queries appear as suggestions when a user starts typing "comedy about...". They also help retrieve this special when a user searches for "satirical news shows."
Measure & Iterate: Track KPIs: Comedy special impressions, play starts, and usage of generated query suggestions. Use this real data to fine-tune the LLM prompt and gradually reduce the $\lambda$ parameter for older specials as they accumulate watches.

This case study demonstrates the portability of the core concept beyond audiobooks.

7. Future Applications & Research Directions

The AudioBoost paradigm opens several compelling future avenues:

Cross-Modal & Multi-Modal Retrieval: Extending beyond text queries. Could synthetic audio snippets (e.g., "play something that sounds like this") or visual mood boards be generated from metadata to bootstrap voice or visual search?
Personalized Synthetic Generation: Moving from one-size-fits-all synthetic queries to generating queries conditioned on individual user profiles. For example, for a user who listens to history podcasts, generate audiobook queries like "historical biographies with deep research" instead of generic ones.
Dynamic & Adaptive Synthesis: Instead of a static batch generation, creating a system where the synthetic query generation model continuously adapts based on which synthetic queries actually lead to user engagement, creating a self-improving loop.
Mitigating Synthetic Bias: A major research direction is developing methods to audit and ensure the diversity and fairness of LLM-generated queries to prevent the amplification of societal or catalog biases in the discovery process. Techniques from algorithmic fairness research will be crucial here.
Application in Enterprise Search: This method is directly applicable to internal company search engines for new document repositories, knowledge bases, or product catalogs, where initial user search behavior is unknown.

The frontier lies in making the synthetic generation process more dynamic, personalized, and accountable.

8. References

Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: A survey. Information Processing & Management, 56(5), 1698-1735.
Jiang, J. Y., et al. (2021). Understanding and predicting user search mindset. ACM Transactions on Information Systems.
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232). [External Source - CycleGAN]
Stanford Institute for Human-Centered Artificial Intelligence (HAI). (2023). On the Risks and Challenges of Synthetic Data. [External Source - Research Institute]
Palumbo, E., Penha, G., Liu, A., et al. (2025). AudioBoost: Increasing Audiobook Retrievability in Spotify Search with Synthetic Query Generation. In Proceedings of the EARL Workshop@RecSys.
Bennett, P. N., et al. (2012). Modeling the impact of short- and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference.

Table of Contents