Financial Semantic Textual Similarity: A New Dataset and Model

Shanshan Yang; Steve Yang; Feng Mai

doi:10.1109/CIFEr62890.2024.10772793

Back

Conference proceeding

Financial Semantic Textual Similarity: A New Dataset and Model

Shanshan Yang, Steve Yang and Feng Mai

IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, pp.1-8

10/22/2024

DOI: 10.1109/CIFEr62890.2024.10772793

View Online

Abstract

We introduce FinSTS, a novel dataset for financial semantic textual similarity (STS), comprising 4,000 sentence pairs from earnings calls and SEC filings. To improve models for the Financial STS task, we propose an active learning (AL) algorithm that efficiently selects informative sentence pairs for annotation by GPT-4 and creates high-quality training data. Using this approach, we train FinSentenceBERT, a model that generates semantic embeddings specifically for financial text. FinSentenceBERT establishes a new performance benchmark on FinSTS, outperforming models that use basic pooling strategies or are fine-tuned on general datasets. Surprisingly, a general SBERT model trained using our AL approach surpasses even models based on FinBERT, a language model pre-trained on financial text. Our research contributes a specialized dataset, model, and methodology that advance semantic understanding in the financial domain, with potential applications to other specialized domains.

Semantics

Active learning

Adaptation models

Analytical models

Benchmark testing

BERT

Biological system modeling

Representation learning

Supervised learning

Text processing

Text similarity

Training data

Unsupervised learning

Vectors

Details

Title: Subtitle: Financial Semantic Textual Similarity: A New Dataset and Model
Creators: Shanshan Yang - Stevens Institute of Technology
Steve Yang - Stevens Institute of Technology
Feng Mai - University of Iowa,Department of Business Analytics,Iowa City,IA,USA
Resource Type: Conference proceeding
Publication Details: IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, pp.1-8
DOI: 10.1109/CIFEr62890.2024.10772793
eISSN: 2640-7701
Publisher: IEEE
Language: English
Date published: 10/22/2024
Academic Unit: Business Analytics
Record Identifier: 9984757993202771

Metrics

100 Record Views