Classifying Unreliable Narrators with Large Language Models

1 Introduction
2 Methodology
3 Technical Implementation
- 3.1 Mathematical Framework
- 3.2 Model Architectures
4 Results and Analysis
- 4.1 Performance Metrics
- 4.2 Comparative Analysis
5 Case Study Framework
6 Future Applications
7 Critical Analysis
8 References

1 Introduction

Unreliable narrators present a significant challenge in computational linguistics, particularly as first-person accounts proliferate across digital platforms. This research bridges literary theory from narratology with modern natural language processing techniques to develop automated classification systems for narrator reliability. The work addresses critical gaps in trust assessment for personal narratives across domains including social media, reviews, and professional communications.

2 Methodology

2.1 TUN A Dataset

The TUN A (Taxonomy of Unreliable Narrators Annotation) dataset comprises expert-annotated narratives from multiple domains: blog posts, subreddit discussions, hotel reviews, and literary works. The dataset includes 1,200 annotated instances with multi-dimensional reliability labels.

2.2 Unreliability Classification Framework

Three distinct unreliability types are defined: Intra-narrational (internal inconsistencies and verbal tics), Inter-narrational (contradictions between primary and secondary narrators), and Inter-textual (conflicts with external factual knowledge).

2.3 Experimental Setup

Experiments employed both open-weight (Llama-2, Mistral) and proprietary (GPT-4, Claude-2) LLMs in few-shot, fine-tuning, and curriculum learning configurations. The curriculum learning approach progressively exposed models to increasingly complex reliability patterns.

3 Technical Implementation

3.1 Mathematical Framework

The reliability classification problem is formalized as: $P(R|T) = \frac{P(T|R)P(R)}{P(T)}$ where $R$ represents reliability labels and $T$ represents textual features. Feature extraction employs transformer attention mechanisms: $Attention(Q,K,V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

3.2 Model Architectures

Dual-encoder architectures process narrative content and contextual cues separately before fusion layers. The models incorporate multi-task learning objectives to jointly optimize for the three unreliability types.

4 Results and Analysis

4.1 Performance Metrics

Best performance achieved F1 scores of 0.68 for intra-narrational, 0.59 for inter-narrational, and 0.52 for inter-textual classification. The results demonstrate the progressive difficulty across unreliability types, with inter-textual proving most challenging due to required external knowledge.

4.2 Comparative Analysis

Fine-tuned open-weight models outperformed few-shot proprietary models on intra-narrational tasks, while proprietary models maintained advantages on inter-textual classification requiring broader world knowledge.

5 Case Study Framework

Scenario: Hotel review analysis
Text: "The room was absolutely perfect, though I suppose the bed could have been more comfortable and the view wasn't exactly what I expected. The staff were helpful, I think."
Analysis: This exhibits intra-narrational unreliability through hedging phrases ("I suppose," "I think") and contradictory assessments, reducing narrator credibility despite positive overall tone.

6 Future Applications

Potential applications include automated credibility assessment for online content moderation, educational tools for writing improvement, forensic linguistics for legal testimony analysis, and enhanced conversational AI systems capable of detecting user uncertainty or deception.

7 Critical Analysis

Core Insight: This research represents a bold but fundamentally flawed attempt to quantify literary theory through computational methods. The authors' ambition to bridge narratology and NLP is commendable, but their approach suffers from oversimplification of complex psychological phenomena.

Logical Flow: The paper follows a conventional ML research structure—problem definition, dataset creation, experimentation, results. However, the logical leap from literary theory to computational labels lacks rigorous validation. Like early attempts at sentiment analysis that reduced complex emotions to positive/negative binaries, this work risks creating a Procrustean bed where nuanced narrative devices are forced into rigid categories.

Strengths & Flaws: The TUN A dataset is the paper's crown jewel—expert-annotated, multi-domain, and publicly available. This addresses a critical gap in narrative analysis resources. However, the classification performance (F1 scores 0.52-0.68) reveals fundamental limitations. The models struggle particularly with inter-textual unreliability, echoing challenges noted in the CycleGAN paper where domain adaptation works better for superficial than semantic features. The curriculum learning approach shows promise but feels underdeveloped compared to progressive training techniques used in vision-language models like CLIP.

Actionable Insights: Future work should incorporate psycholinguistic features beyond textual patterns—prosodic cues for spoken narratives, writing rhythm analysis, and cross-cultural narrative conventions. The field should look to cognitive psychology frameworks like Theory of Mind for modeling narrator intentionality. Most critically, researchers must address the ethical implications: automated reliability assessment could become a dangerous tool for discrediting marginalized voices if not developed with careful consideration of cultural and contextual factors.

8 References

Booth, W.C. (1961). The Rhetoric of Fiction.
Nünning, A. (2015). Handbook of Narratology.
Hansen, P.K. (2007). Reconsidering the Unreliable Narrator.
Zhu et al. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.
Radford et al. (2021). Learning Transferable Visual Models From Natural Language Supervision.

Table of Contents