Logo image
Financial Semantic Textual Similarity: A New Dataset and Model
Conference proceeding

Financial Semantic Textual Similarity: A New Dataset and Model

Shanshan Yang, Steve Yang and Feng Mai
IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, pp.1-8
10/22/2024
DOI: 10.1109/CIFEr62890.2024.10772793

View Online

Abstract

We introduce FinSTS, a novel dataset for financial semantic textual similarity (STS), comprising 4,000 sentence pairs from earnings calls and SEC filings. To improve models for the Financial STS task, we propose an active learning (AL) algorithm that efficiently selects informative sentence pairs for annotation by GPT-4 and creates high-quality training data. Using this approach, we train FinSentenceBERT, a model that generates semantic embeddings specifically for financial text. FinSentenceBERT establishes a new performance benchmark on FinSTS, outperforming models that use basic pooling strategies or are fine-tuned on general datasets. Surprisingly, a general SBERT model trained using our AL approach surpasses even models based on FinBERT, a language model pre-trained on financial text. Our research contributes a specialized dataset, model, and methodology that advance semantic understanding in the financial domain, with potential applications to other specialized domains.
Semantics Active learning Adaptation models Analytical models Benchmark testing BERT Biological system modeling Representation learning Supervised learning Text processing Text similarity Training data Unsupervised learning Vectors

Details

Metrics

100 Record Views
Logo image