Student R&D · Applied ML · Market Sentiment

Financial trend forecasting leveraging sentiment analysis.

A student research project exploring how traditional indicators, SEC filings, news sentiment, and deep learning architectures can work together to predict stock price movements and inform algorithmic trading strategies.

LSTM & Transformer time series News & SEC sentiment features AAPL & JNJ case study

LSTM R² ≈ 0.93 on AAPL LSTM R² ≈ 0.85 on JNJ ARIMA baseline underperformed

Prototype · Market Sentiment Lab

Forecasting daily price moves using signals + narratives.

Historical prices, indicators, and VADER sentiment scores are fused into supervised time-series models that output next-day closing prices.

Assets

AAPL (Tech)

JNJ (Healthcare)

Models

LSTM regressor

Transformer regressor

ARIMA baseline

Objective

The goal of this student-led project was to build a forecasting pipeline predicting next-day closing prices for selected equities. By combining structured price data, engineered financial indicators, and sentiment extracted from public disclosures and news, we aimed to understand how narrative signals influence traditional quantitative models.

The work compares three families of models—ARIMA, LSTM, and Transformer—on identical datasets, evaluating them with MAE, MSE, RMSE, R², residuals, and simulated cumulative returns for basic trading heuristics.

Assets & horizon

Daily OHLCV data is combined with hand-built technical indicators and sentiment scores from SEC filings and news headlines.

Each timestep in the training sequences includes:

• Normalized price and volume
• SMA, EMA, MACD, RSI, Bollinger bands, and stochastic oscillators
• SEC sentiment score (interpolated for non-filing days)
• News sentiment score from up to five headlines

Feature snapshot

Example features fed into the sequence models:

SMA(20), EMA(20)

MACD & signal line

RSI(14)

Stochastic %K, %D

Bollinger band width

Log returns

News sentiment score

SEC sentiment score

Models

Three architectures are treated as next-day regressors (no classification). Training, validation, and test splits are strictly chronological to respect time-series ordering.

ARIMA baseline

Classic ARIMA on closing prices only. Provides a smooth baseline but fails to react to regime changes and sentiment shifts.

Transformer regressor

A lightweight Transformer with positional embeddings over 5-day windows. It captures some temporal structure but is more sensitive to noise.

LSTM sequence model

A stacked LSTM with dropout and Adam optimization. After tuning, it delivers the best fit on both assets, with R² around 0.93 for AAPL and 0.85 for JNJ.

Results

On both AAPL and JNJ, the LSTM tracks the trend and local swings much more closely than ARIMA. ARIMA behaves like a slowly drifting line, while the LSTM learns to follow rallies, drawdowns, and sideways ranges.

The gap between the two curves is largest around sharp moves—earnings, macro news, or sector shocks—where sentiment features give the LSTM extra signal that ARIMA never sees.

Cumulative returns from a simple strategy using model forecasts versus actual returns, highlighting where learned models can add value vs. naive baselines.

Actual prices versus ARIMA and LSTM predictions for AAPL and JNJ — Actual closing prices vs. predictions. LSTM tracks the underlying trend and local swings closely, while ARIMA tends toward a smoothed baseline and the Transformer is more volatile.

Classroom R&D that led to a sentiment analysis system.

What started as a student research initiative had evolved into a working prototype for sentiment-aware forecasting.

An applied forecasting student development.