Sniffing Out Fraud, Word by Word

Regulators and auditors need all the help they can get to spot underhanded accounting. Here’s a text-based tool that can X-ray annual and quarterly reports
Sniffing Out Fraud, Word by Word

The essentials

Queen’s researchers Lynnette Purda of the School of Business and David Skillicorn of the School of Computing have developed a linguistic-based statistical method that flags suspicious words and phrases — indicators of fraud — in the management discussion and analysis (MD&A) section of company annual and quarterly reports. Their method was able to spot 80 percent of frauds. The tool can help securities regulators, auditors, and investors flag potentially fraudulent reports.

When Livent founder Garth Drabinsky was convicted of fraud in 2009, creditors and the auditing community held their breath, waiting for the other shoe to drop. Early this month, it finally did: an Ontario Supreme Court of Justice ruled that Livent’s former auditor, Deloitte, had failed to detect clear signs of fraud, and ordered the auditor to pay $84.8 million in damages to the theatre company’s creditors.

Such legal judgments are rare, but the ruling does highlight the sobering reality that much financial fraud goes undetected and, when exposed, is usually as a result of an insider coming forward. 

“Corporate fraud is a very large problem,” says Lynnette Purda, associate professor and RBC Fellow of Finance at Queen’s School of Business. “A study a few years ago found that most fraud tends to be identified internally by whistle blowers or the media. But auditors, lawyers, regulators — they’re usually not the ones to first identify the fraud. . . The fact that the people we specifically charge with looking for fraud are not the ones that are revealing it tells us that our traditional tools are lacking in some way.” (Download the paper, “Who blows the whistle on corporate fraud,” by Dyck, Morse, and Zingales, at

Purda believes she has a new and nontraditional tool to do the job. Developed with David Skillicorn, a professor in Queen’s School of Computing, the text-based statistical method flags suspicious words and phrases — indicators of fraud — in the management discussion and analysis (MD&A) section of company annual and interim reports.

Similar methods have been used in linguistics and psychology to analyze, say, the tone of a piece of writing. Such methods are only now being applied to the business world. Linguistic-based software, for example, has been developed that scans employee emails for evidence of illicit activity.

“It’s not necessarily straightforward because a lot of words that we use in business have a very different meaning than if they were used in ordinary speech,” says Purda. “What we wanted to do is to ask, How well do various detection methods work for business.”

More than a "bag of words"

Up to now, text-based methods to identify business fraud have been based on one of two approaches. The “bag of words” approach relies on a pre-defined list of words thought to be associated with a particular sentiment such as negativity, optimism, or deceptiveness. The approach is less than ideal since many words that suggest deception in common use have a more benign meaning in a business context.

The second approach — the one used by Purda and Skillicorn — starts with a blank slate and then uses data mining techniques to analyze fraudulent financial statements in search of suspicious words or phrases. Using this approach on a large number of statements, a linguistic pattern emerges that can be used to classify documents as fraudulent or truthful.

“What’s useful about our approach is that you don’t have to identify ahead a time what you think would be suspicious,” Purda says. “It’s also useful because you can continually update this approach, so that more observations lead to perhaps new words being flagged. And it’s not just the appearance of a word but often the appearance of certain words in combination.”

By including quarterly reports in their sample, the researchers were able to track when a firm moved from truthful reporting to misrepresentation and fraud

To build their tool, Purda and Skillicorn began with a sample of MD&A statements in quarterly and annual reports issued by firms that were subject to an Accounting and Auditing Enforcement Release (civil lawsuit by the Securities and Exchange Commission in the U.S.). They created a table of word frequencies from these statements, then sorted the words from most to least predictive of fraud. Some words were related to merger activity (acquisition, acquired), potential legal problems (settlement, legal, judgments), or financing activities (debt, lease). Others were fairly innocuous but the variation in the frequencies of these words stood out. The final step was to analyze the top 200 words using an algorithm to detect patterns and assign a “probability of truth.” In their research, Purda says, this method was able to spot 80 percent of frauds.

By including quarterly reports in their sample, the researchers were able to benchmark how a firm communicated their financial results and track when a firm moved from truthful reporting to misrepresentation and fraud. They found that from the quarter directly preceding the fraud to the first fraudulent report, there was a dramatic and significant drop in the report’s probability of truth. 

“If you think about misrepresentation, there is a lot of latitude in how some things are interpreted and there is discretion in using some accounting models, so it might not go from truth immediately to fraud,” Purda says. “There is probably a gradual path from truth to slight exaggeration to aggressive accounting, and eventually to misrepresentation. Benchmarking and noting that change seems to help the predictive power of our model.”

Can these anti-fraud tools be gamed?

As for the ability to game the system and simply avoid known suspicious terms, Purda says it’s not that simple. “People always say, ‘If you tell me these 50 words that are used to identify fraud, then I’ll just avoid using those 50 words.’ Our method is more sophisticated in that it can be continually updated and it’s not just an appearance of words but often it’s words in combinations or the frequency of use that will show up.”

Purda sees growing support among regulators and hedge funds for the use of textual analysis. It can make the work of auditors more efficient by helping them look closer at firms whose financial statements have already been flagged. In the case of a hedge fund, investors regularly try to identify firms with something to hide before anyone else, and then bet against that firm. 

What intrigues Purda is why this text-based tool works in a business context. “When we talk about textual analysis from a psychology or linguistics point of view, generally studies have been done where I’m telling you a story and I am intentionally lying,” she says. “But when you think about an annual report, hundreds of people may be involved in writing it, and investor relations, lawyers, and auditors all review it. It’s not clear that all of those people are aware of the deception. So why is this method still able to work?” 

Alan Morantz

Subscribe to the Insight Newsletter

Keep up with the latest in Smith thought leadership, faculty research, and more.


Smith School of Business

Goodes Hall, Queen's University
Kingston, Ontario
Canada K7L 3N6

Follow us on:

Queen's logo