Skip to main content Skip to main navigation menu Skip to site footer
Articles
Published: 2025-07-27

Senior Principal Software Engineer, The Home Depot., United States

Journal of Data Science and Information Technology

ISSN 2998-3592

Structured Language Interpretation Using Small Language Models for Real-Time Systems

Authors

  • Karthik Perikala Senior Principal Software Engineer, The Home Depot., United States

Keywords

Structured Language Interpretation, Small Language Models, Real-Time NLP Systems, Production AI, Low-Latency Inference

Abstract

Structured language interpretation—the transformation of short natural language inputs into machine readable representations—is a foundational capability for modern AI-driven systems. Typical tasks include entity extraction, attribute identification, normalization, and schema-constrained output generation, enabling deterministic downstream processing.

Large Language Models (LLMs) have demonstrated strong performance on structured language tasks, benefiting from scale and broad contextual reasoning. However, these capabilities come with increased inference latency, token-dependent execution time, and variable operational cost when deployed at scale.

In latency-sensitive production environments, interpretation components are often required to operate within strict millisecond-level latency budgets. Even moderate tail-latency inflation can violate endto-end service objectives and degrade system responsiveness. As a result, LLM-based approaches are frequently unsuitable for request paths that demand predictable millisecond-scale execution.

This paper examines the use of Small Language Models (SLMs) for real-time structured language interpretation. By constraining model capacity, task scope, and output structure, SLMs enable bounded execution behavior with latency measured in tens to low hundreds of milliseconds, while preserving semantic accuracy for well-defined language tasks.

We evaluate this approach under sustained production-like workloads using normalized latency and throughput metrics. Results demonstrate that SLM-based structured language interpretation can consistently operate within millisecond-level latency envelopes, making it practical for high-throughput, real-time systems.

Keywords: Structured Language Interpretation, Small Language Models, Real-Time NLP Systems, Low-Latency Inference, Production AI

Make a Submission

Current Issue

Browse

Published

2025-07-27

How to Cite

Perikala, K. . (2025). Structured Language Interpretation Using Small Language Models for Real-Time Systems. Journal of Data Science and Information Technology, 2(2), 1-6. https://doi.org/10.55124/jdit.v2i2.272