BERT Text Extraction for Reviews

[ OK ] 415 — full content available

[ INFO ] category: Coding · Domain Specific difficulty: hard freq: medium first seen: 2026-01-13

[HARD][DOMAIN SPECIFIC][MEDIUM]NLPBERTDeep LearningPyTorch

$ cat problem.md

You are given a dataset of customer reviews for a product. Each review is a plain-text string. Your task is to build a PyTorch model that, given a review, extracts the exact span of text that best expresses the customer’s overall sentiment (the “sentiment span”).

Model architecture:

Load a pre-trained BERT-base-uncased encoder from Hugging Face.
Feed the review text through BERT to obtain contextual token embeddings.
Add two independent linear classification heads on top of BERT:
- Start-head: outputs a logit for every token indicating the probability that the token is the first token of the sentiment span.
- End-head: outputs a logit for every token indicating the probability that the token is the last token of the sentiment span.
During inference, choose the start/end positions with the highest softmax probabilities such that start ≤ end.

Training:

Loss is the average of cross-entropy losses for the start and end positions.
Fine-tune all BERT parameters together with the two new heads.

Deliverables:

Complete PyTorch implementation (model, training loop, and inference function).
Script must accept a list of raw review strings and return the extracted sentiment spans.

user@intervues:~/figma$