Sentiment Modeling in Finance Requires Specialized Training

Bryan Healey
3 min readApr 7, 2021

Sentiment in the world of financial and corporate speech is a unique beast; while the straightforward rules of language can often net a reasonable looking result, the nuance of the industry might be disappointed.

A simple yet illustrative example:

We laid off 15% of retail personnel in Q1, which is less than expected, but with another round of layoffs set to come this month.

Intuition might argue that escalating layoffs are a negative; and running the above text through a more traditionally trained sentiment model does produce a lightly negative sentiment score (-0.173).

And yet, to the accounting principles of corporate earnings, this may yet be a positive, reducing expenses during a time of reduced consumer demand after the holiday season. Thus, any good financial sentiment effort requires that we use specialized training and tuning to fit the industry use case.

We took a two-pronged approach to building our in-house sentiment model at Aiera. The first and most important step is in collecting and cleaning our training data; to expedite this typically laborious process, we built a context-sensitive framework of rules to quickly identify text segments that are likely positive or negative across our corpus of event transcripts.

For example, consider this excerpt:

Sales to the chemical processing market accounted for 21% of our revenue at $27.8 million in the fourth quarter. This represents a solid increase of 19.2% from $23.3 million in the same period last year, due to a 14.3% increase in volume combined with a 4.2% increase in the average selling price.

Using the aforementioned rules framework, we can determine that the above is likely positive because we find revenue within an acceptable contextual distance of solid increase:

From this framework, we can quickly identify many thousands of likely positive and negative samples that can be then manually reviewed for affirmation (or exclusion) prior to training our classification model.

Doing the above, we’ve been able to identify and verify over 140K positive text segments and over 90K negative text segments, paired with critical metadata, such as speaker, company, topics/KPIs, and any numerical associations. From there, we can construct our custom classification model using word embeddings in order to score more accurately.

Using our adjusted model, the text related to layoffs shown at the beginning of this article will produce the following more accurate score:

positive: 0.4019

This model can be applied at all levels of granularity, including per-sentence. By allowing the model access to contextual features, we can identify positive or negative sentences even when prior or subsequent sentences are needed to do so accurately.

We continue to refine our training data and tighten our model as we work to help investors quickly identify areas of focus throughout earnings season.

Additionally, we intend to pair this textual analysis with audio-based tonal analysis, so that investors can find interesting/informative deltas between words said and the spoken tone (i.e. positive text with a distressed voice).

--

--