Want to talk to someone from 1930? Now you can, thanks to AI

Have you ever wondered what it would be like to have a conversation with someone from a century ago — not a reenactor, not a historian, but someone who genuinely doesn't know how the story ends? A new AI model allows you to do that. Talkie simulates a conversational partner from the early 20th century, and unlike conventional models that scrape the entire internet to formulate their answers, it can only retrieve information up to December 31, 1930.

This 'vintage' language model aims to reflect the knowledge, culture, and linguistic nuances of that specific era. The project, led by Nick Levine, David Duvenaud, and Alec Radford, received compute support from the AI lab Anthropic. Beyond serving as a mere chatbot of the past, Talkie can help understand AI's forecasting capabilities. Moreover, by freezing the model's knowledge in 1930, researchers can observe how it handles concepts it was never explicitly taught.

This approach moves away from the standard 'more data is better' philosophy common in Silicon Valley. Instead, it focuses on the quality and temporal specificity of information. The resulting system offers a distinct perspective on the world, completely unburdened by the digital age or modern historical hindsight. The model is publicly accessible, and everyone can engage in a conversation.

Acquiring data efficiently

Building Talkie came with a major challenge: finding the right data to train it on. Unlike most AI models, which learn from modern text found online, Talkie needed to learn from old books, magazines, and physical archives — pulling together a massive 260 billion words of historical content.

The first big problem was turning scanned pages into usable text. This is done through a process called OCR (Optical Character Recognition), which is essentially software that "reads" images of text. The trouble is that most OCR tools are built for clean, modern documents and struggle with the faded, inconsistently formatted pages of century-old materials.

Early attempts were poor — the text the software produced was so error-prone that the AI was only learning a third as effectively as it would from perfectly transcribed material. The team improved this significantly by applying automated text-cleaning techniques, raising learning efficiency to 70%.

To close the final gap, the team is building its own custom OCR tool designed specifically for historical documents. Beyond accuracy, this also addresses another subtle problem: ensuring that no modern text accidentally slips into the training data. Since Talkie is meant to reflect the world as it was understood before 1931, even something as small as a modern-style date stamp on a scanned document could compromise that. Keeping the data strictly historical is what allows the model to remain an authentic window into the past.

Training an AI model without ethical bias

According to the team, one of the most creative aspects of building Talkie was shaping its personality and conversational style. Most modern AI models are refined using feedback from real human conversations — a process that naturally bakes in today's values, communication norms, and sensibilities. For Talkie, that would be a problem, since the goal is for the model to think and speak like someone from the 1930s, not like a contemporary chatbot with a vintage coat of paint.

To get around this, the team avoided modern training material entirely. Instead, they turned to period-appropriate sources — etiquette guides, cookbooks, dictionaries, and poetry collections from the early 20th century — to teach the model how people of that era communicated and what they considered proper or normal. It's a subtle but powerful distinction: rather than telling the AI how to behave, they let the culture of the time speak for itself.

The team is also working with historians to build detailed historical personas — essentially character profiles grounded in what a real person of that period would have known, believed, and had access to. This stops the model from accidentally drawing on modern knowledge or reasoning in ways that would have been foreign to someone living in 1931.

The result is an AI that doesn't just mimic old-fashioned language, but genuinely reasons within the boundaries of its era — making it feel far more like a window into the past than a modern system playing dress-up.

Why Talkie is a useful experiment

There are multiple reasons Talkie is a useful AI experiment. On the scientific side, a historically bounded AI turns out to be a surprisingly useful research tool. Because the model only knew what was known before 1931, researchers can test whether it would have been able to anticipate major events that followed, such as the economic collapse of the Great Depression or key scientific discoveries of the mid-20th century. It's a bit like a controlled experiment in hindsight: if the patterns were there in the data, could an AI have spotted them?

On the human side, Talkie opens up something harder to quantify but equally compelling — a way to engage with how people actually thought and reasoned in a different era. Rather than reading about the past, users can interact with a system that reflects its logic from the inside. This has obvious appeal for education and historical research, offering a more immersive way to explore a particular moment in time.

There's also a broader technical question the project helps answer: how capable can an AI be when it's trained on a smaller, more focused dataset rather than the vast, sprawling content of the modern internet? Most cutting-edge models owe much of their ability to the sheer scale and variety of their training data. Talkie tests what's possible when you strip that away and work within tight constraints — and what that means for building effective AI in specialized fields where web data simply isn't available or appropriate.

What comes next

Talkie is currently available as a 13-billion-parameter model — smaller than giants like GPT-4 but already showing solid reasoning in early tests. Unsurprisingly, it lags behind modern models on general knowledge benchmarks, though this is entirely expected given that it's missing nearly a century of human history by design. The team is working to address this by scaling the training data beyond 1 trillion tokens and expanding beyond English-language sources to incorporate a more global historical perspective.

The team's near-term goal is to release a significantly more powerful version by summer 2026, targeting capabilities roughly on par with GPT-3.5 — while still maintaining the strict pre-1931 knowledge cutoff. Getting there means continuing to improve their custom OCR system and growing the quality of the training corpus. The broader ambition, though, goes beyond building a better historical chatbot. If a model can reach a high level of reasoning using only pre-1931 data, it demonstrates that the modern internet isn't the only road to capable AI — opening the door to specialized models built around other historical periods or technical domains where web data is scarce or simply not the right fit.

Want to talk to someone from 1930? Now you can, thanks to AI

By: Team IO+

Acquiring data efficiently

Training an AI model without ethical bias

Why Talkie is a useful experiment

What comes next