Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs
- Weijia Xu ,
- Nebojsa Jojic ,
- Sudha Rao ,
- Chris Brockett ,
- Bill Dolan
Proceedings of the National Academy of Sciences of the United States of America |
With rapid advances in large language models (LLMs), there has been an increasing application of LLMs in creative content ideation and generation. A critical question emerges: can current LLMs provide ideas that are diverse enough to truly bolster the collective creativity?
We examine two state-of-the-art LLMs, GPT-4 and LLaMA-3, on story generation and discover that LLM-generated stories often consist of plot elements that are echoed across a number of generations. This repetition lends itself to less unique outputs that are deterministic and predictable. To quantify this phenomenon, we introduce the Sui Generis (Latin for “of its own kind”) score, an automatic metric that estimates how unlikely a plot element is to appear in alternative storylines generated by the same LLM. It helps quantify creativity at the narrative level, not just by counting unique words or topics and it also correlates with human judgement of surprise, a factor in how we experience stories.
Evaluating 100 short stories, we find that LLM-generated stories often contain combinations of idiosyncratic plot elements echoed frequently across generations, while the original human-written stories are far more diverse and rarely recreated or even echoed in pieces. Our experiments with the Sui Generis score demonstrated the lack of plot-level diversity in LLM-generated stories, in contrast to the more varied and unique elements found in human-written stories. Human creativity brings something truly unique to the pages of a story that LLMs can’t recreate.