In-Context Learning for LLMs

Researching how large language models infer rules, manage ambiguity, and adapt to new tasks through context alone.

University

Smith College

Date

Spring 2026

In-Context Learning for LLMs

This Smith College Data Science capstone project with 99P Labs explored how large language models learn and reason from context. The team studied in-context learning, or the ability of a model to adapt to a new task using only the examples, rules, and information provided in a prompt. Rather than retraining models, the project focused on how context itself shapes model behavior and where that behavior breaks down.

The research tested models across several task domains designed to expose different forms of reasoning failure. In turn-based prompting, the team used a Truth-or-Dare setup to study whether models could maintain sequence, follow rules, and remember whose turn it was over time. In rule-inference tasks, the team tested whether models could infer character-level transformations, identify palindrome rules, and recognize when examples were too ambiguous to support a single answer.

The project also examined relational reasoning through fictional social and family scenarios. These tests asked models to navigate messy relationships, conflicting rules, inverse retrieval, and multi-step dependencies. This part of the research reflected a broader real-world challenge: AI systems often need to reason across human relationships, roles, histories, and shifting context, but they may oversimplify situations when several constraints matter at once.

Across the project, the team found that models often defaulted to simple rules that seemed plausible from the available context. This worked in some clear cases, but it led to failures when tasks were ambiguous, complex, or required tracking multiple rules. The models often answered with confidence even when the correct response was to abstain or say that the answer could not be determined. Larger models performed better overall, but scale did not eliminate these failure patterns.

The broader takeaway is that AI reliability depends not only on model capability, but also on how information is structured and presented. The project points toward “teaching patches,” or lightweight context-level interventions that guide models toward better reasoning without retraining them. These could include clearer state tracking, better organization of rules and relationships, explicit uncertainty checks, or prompt structures that require models to compare possible explanations before choosing an answer.

Stay Connected

Follow our journey on Medium and LinkedIn.

Read Our Blog Connect on LinkedIn