Conference proceeding
Financial Semantic Textual Similarity: A New Dataset and Model
IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, pp.1-8
10/22/2024
DOI: 10.1109/CIFEr62890.2024.10772793
Abstract
We introduce FinSTS, a novel dataset for financial semantic textual similarity (STS), comprising 4,000 sentence pairs from earnings calls and SEC filings. To improve models for the Financial STS task, we propose an active learning (AL) algorithm that efficiently selects informative sentence pairs for annotation by GPT-4 and creates high-quality training data. Using this approach, we train FinSentenceBERT, a model that generates semantic embeddings specifically for financial text. FinSentenceBERT establishes a new performance benchmark on FinSTS, outperforming models that use basic pooling strategies or are fine-tuned on general datasets. Surprisingly, a general SBERT model trained using our AL approach surpasses even models based on FinBERT, a language model pre-trained on financial text. Our research contributes a specialized dataset, model, and methodology that advance semantic understanding in the financial domain, with potential applications to other specialized domains.
Details
- Title: Subtitle
- Financial Semantic Textual Similarity: A New Dataset and Model
- Creators
- Shanshan Yang - Stevens Institute of TechnologySteve Yang - Stevens Institute of TechnologyFeng Mai - University of Iowa,Department of Business Analytics,Iowa City,IA,USA
- Resource Type
- Conference proceeding
- Publication Details
- IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, pp.1-8
- Publisher
- IEEE
- DOI
- 10.1109/CIFEr62890.2024.10772793
- eISSN
- 2640-7701
- Language
- English
- Date published
- 10/22/2024
- Academic Unit
- Business Analytics
- Record Identifier
- 9984757993202771
Metrics
1 Record Views