The company is long gone, its leaders have faded into history, but Enron’s emails are forever.
During its 2002 investigation of the bankruptcy of Enron, the US Federal Energy Regulatory Commission (FERC) checked the energy company’s emails: more than 600,000 messages sent from 158 employees, mostly senior management.
The collected missives—a mixture of high-level business negotiations, discussions between managers and their spouses about holiday plans, and many, many requests to be unsubscribed from mailing lists—formed part of the evidence that led FERC to conclude the company had in fact engaged in illegal price manipulation, and the US Department of Justice to press criminal charges against former CEOs Kenneth Lay and Jeff Skilling.
After its investigation, the commission determined the emails were in the public’s interest and dumped them on a website.
Though ostensibly for research and academic use, the trove was so messy and unwieldy that it was effectively useless—until an MIT computer science professor named Leslie Kaelbling bought the data for $10,000 and handed it over to colleagues who cleaned it up, took out duplicates, organized the remaining 200,000 messages into folders, and released it into the world.
“What was weird was that the data itself was in the public domain, but we still had to pay a company for the service of giving it to us on a disk,” Kaelbling said. “After that, we just gave it away for free.”
If Enron went down for defrauding the public, the company has unwittingly repaid a small part of its debt to society through the gift of its emails.
The Enron Corpus, as the collection is known, has been used in more than 100 projects since that research team presented it to the public in 2004. As the biggest public collection of natural written language in an organizational setting, it has been used to study everything from statistics to artificial intelligence to email attachment habits. An online art project by two Brooklyn artists will send every single one of the emails to your personal inbox, a process which (depending on the frequency of emails you request) will take anywhere from seven days to seven years.
Making all this data public has had the benefit of allowing all kinds of research into corporate behavior that just wouldn’t be possible without it. The downside is that as these emails are used as training data for artificial intelligence projects – the Enron Corpus was the training set used for the prototype of Gmail’s “smart compose” feature, though not its final version – they represent a small and atypical slice of society. That’s an entry point for bias to creep into algorithms and other automated processes. I’m sure there’s plenty more to be done with and learned from the Enron Corpus. We just shouldn’t consider it to be the be-all and end-all of how people communicate.
(I found this story while doing a Google news search on Jeff Skilling following the news of his release. So credit the Enron Corpus for finding a way to spread the word about itself, too.)