ClaudeChatGPTGitHubGemini

arxiv-cl

Latest items from arxiv-cl on ClawDigest — updated automatically as new stories are ingested.

PairCoder++: Pair Programming as a Universal Paradigm for Verified Code-Driven Multimodal and Structured-Artifact Generation

1d 12h ago

The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies

1d 12h ago

TUDUM: A Turkish-Thinking Reasoning Pipeline for Qwen3.5-27B

1d 12h ago

TokenScope: Token-Level Explainability and Interpretability for Code-Oriented Tasks in Large Language Models

1d 12h ago

Safeguarding LLM Agents from Misalignment through Provenance Analysis

1d 12h ago

Kara: Efficient Reasoning LLM Serving via Sliding-Window KV Cache Compression

1d 12h ago

Breaking Safety at the Token Boundary: How BPE Tokenization Creates Exploitable Gaps in LLM Alignment

1d 12h ago

Prompt Framing Distorts Count-Based Evaluation of LLM Error Detection: Evidence from Numeric Anchoring

1d 12h ago

Mapping Text to Multiplex Graph: Prompt Compression as L\'evy Walk-Guided Graph Pruning

1d 12h ago

Office Comprehension Benchmark

1d 12h ago

RuleChef: Grounding LLM Task Knowledge in Human-Editable Rules

1d 12h ago

TurnNat: Automatic Evaluation of Turn-Taking Naturalness in Dyadic Spoken Dialogue

1d 12h ago

RusFinChain: A Russian Benchmark for Verifiable Chain-of-Thought Reasoning in Finance with Fuzzy-Aligned Evaluation

1d 12h ago

Multi-Objective Exploration and Preference Optimization via Mutual Information

1d 12h ago

MultAttnAttrib: Training-Free Multimodal Attribution in Long Document Question Answering

1d 12h ago

IsoSci: A Benchmark of Isomorphic Cross-Domain Science Problems for Evaluating Reasoning versus Knowledge Retrieval in LLMs

1d 12h ago

FaithMed: Training LLMs For Faithful Evidence-Based Medical Reasoning

1d 12h ago

Grounded Optimization: A Layered Engineering Framework for Reducing LLM Hallucination in Automated Personal Document Rewriting

1d 12h ago

Comparing Architectures for Supervised Political Scaling

1d 12h ago

From Monolingual to Multilingual: Evaluating Mamba for ASR in South African Languages

1d 12h ago

Parameter Golf: What Really Works?

1d 12h ago

Can Language Models Actually Retrieve In-Context? Drowning in Documents at Million Token Scale

1d 12h ago

Beyond Skepticism: Evaluating LLMs Pedagogical Intent Reasoning with the Adaptive Pedagogical Vigilance Framework

1d 12h ago

When Does Generating More Help? Disentangling Fixed-Source Synthesis from Source Expansion in Synthetic Data Scaling

1d 12h ago

Rethinking Speech-LLM Integration for ASR: Effective Joint Speech-Text Training by Interleaving

1d 12h ago

PARTREP: Learning What to Repeat for Decoder-only LLMs

1d 12h ago

On the Limits of Steering Vectors for Preference-Aligned Generation

1d 12h ago

A Task-State Representation for Long-Horizon Mobile GUI Agents

2d 12h ago

Dual-Confidence Contrastive Decoding for Retrieval-Augmented Generation

2d 12h ago

Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine

2d 12h ago

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

2d 12h ago

Multi-Turn Agentic Scientific Literature Search via Workflow Induction

2d 12h ago

"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo

2d 12h ago

Auditing Forgetting in Limited Memory Language Models

2d 12h ago

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

2d 12h ago

Persona Without Substrate: Regime-Dependence and the LLM Individuation Problem

2d 12h ago

Controllable Narrative Rendering for Enhanced Assisted Writing

2d 12h ago

Harnessing the Latent Space: From Steering Vectors to Model Calibrators for Control and Trust

2d 12h ago

Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth

2d 12h ago

Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study

2d 12h ago

Readable but Not Controllable: Neuron-Level Evidence for Medical LLM Hallucination

2d 12h ago

Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting

2d 12h ago

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

2d 12h ago

Structural Pattern Mining in Inka Khipus: Unsupervised Clustering, Provenance Classification, and a Computational Validation of the Santa Valley Match

2d 12h ago

SLIM-RL: Risk-Budgeted Random-Masking RL for Diffusion LLMs Without Trajectory Slicing

2d 12h ago

LV-ROVER: Multi-Stream Tesseract Voting for Maltese Paragraph OCR

2d 12h ago

SEFORA: Student Essays with Feedback Corpus and LLM Feedback Evaluation Framework

2d 12h ago

DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning

2d 12h ago

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training

2d 12h ago

A Mechanistic View of Authority Hierarchy in LLM Sycophancy

2d 12h ago

← back to ClawDigest