PairCoder++: Pair Programming as a Universal Paradigm for Verified Code-Driven Multimodal and Structured-Artifact Generation 1d 12h ago
The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies 1d 12h ago
TokenScope: Token-Level Explainability and Interpretability for Code-Oriented Tasks in Large Language Models 1d 12h ago
Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment 1d 12h ago
Prompt Framing Distorts Count-Based Evaluation of LLM Error Detection: Evidence from Numeric Anchoring 1d 12h ago
RusFinChain: A Russian Benchmark for Verifiable Chain-of-Thought Reasoning in Finance with Fuzzy-Aligned Evaluation 1d 12h ago
IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs 1d 12h ago
Grounded Optimization: A Layered Engineering Framework for Reducing LLM Hallucination in Automated Personal Document Rewriting 1d 12h ago
Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale 1d 12h ago
Beyond Skepticism: Evaluating LLMs Pedagogical Intent Reasoning with the Adaptive Pedagogical Vigilance Framework 1d 12h ago
When Does Generating More Help? Disentangling Fixed-Source Synthesis from Source Expansion in Synthetic Data Scaling 1d 12h ago
Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving 1d 12h ago
Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine 2d 12h ago
Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs 2d 12h ago
"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo 2d 12h ago
Harnessing the Latent Space: From Steering Vectors to Model Calibrators for Control and Trust 2d 12h ago
Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth 2d 12h ago
Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting 2d 12h ago
Structural Pattern Mining in Inka Khipus: Unsupervised Clustering, Provenance Classification, and a Computational Validation of the Santa Valley Match 2d 12h ago
DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning 2d 12h ago
Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training 2d 12h ago