Finding Topics That Matter

Bryan Healey
3 min readMay 11, 2021

Topic modeling for financial text must be highly adaptive. If a company announces a new product or feature in an earnings call or conference, we must be able to recognize it as a new topic without any prior training. Additionally, the model must have a sense of historical consistency, so that topics of importance to a company will persist over time for as long as they are relevant, and can detect changes in relevance along the way. To accomplish this, we’re going to take a holistic view of each document in entirety, and use the rules and structure of language to guide identification.

The first step in this process is to identify the topic(s) for any single sentence. To do this, we combine parts-of-speech and entity tagging with our internal ontology data and KPI mappings, to identify critical mentions:

Using our ontology workflow, we can resolve the found proper noun and organization entity AWS to a known organization within the company Amazon, and to several KPI mappings (AWS Sales, AWS EBIT, and AWS EBITDA). Feeding these values into our topic identification heuristic, AWS is correctly identified as the primary (and only) topic of that particular sentence. However, we do not require such a resolution to select a topic.

By first relying on linguistic structure to find these likely candidates for topics, rather than using a model trained on more traditional labeled data, we can quickly identify candidates even if there is no prior understanding of the selection (no corresponding ontology element or KPI).

Now that we have a method for finding topic candidates in a single sentence, we will pull back and use that data to find the core topics for a document, in priority order; in this case, within an event transcript. We first calculate the primary topic(s) for every sentence within the document:

Using companion metadata (position in the document, by sentence and paragraph, who is speaking, the tone of the speaker, and frequency of the topic [with appropriate influential decay], the prior prevalence of the topic, and so on) we can craft an algorithm for ordering and ranking topics and eliminating those without top-level relevancy.

Once we have identified and ordered these topics, we perform value-added analysis on this data, including: rolling prominent topics for an equity, adjusting priority for a topic over time, and other equity correlations.

We are further experimenting with topic analysis across speakers, by sector and sub-sector, for macro influence, and more. Soon, you will be able to do detailed, self-directed analysis across our database of topic information, to help with your research and investment process.

--

--