Tuesday, 02 January 2024 12:17 GMT

Shunya Labs Releases SAML, The World's First Open Standard For Stuttering Annotation


(MENAFN- ForPressRelease) Gurugram, India, 26 May 2026: Shunya Labs has announced the release of SAML (Stuttering Annotation Markup Language), the world's first open standard for annotating stuttering in speech data. Developed as a free and open-source XML specification, SAML addresses one of the most persistent challenges in stuttering research, which is the absence of a unified annotation framework across datasets, laboratories, and clinical institutions.

Stuttering affects more than 80 million people globally, roughly 1% of the world's population, yet research in the field has remained fragmented for years. People who stutter are 5 times more likely to face employment discrimination and 80% of cases begin before the age of five, underscoring the need for earlier, more accurate intervention. Different institutions and research groups often rely on their own annotation systems and documentation methods, making it difficult to compare datasets, reproduce findings, or scale collaborative work across organisations.

SAML introduces a common and interoperable annotation framework designed specifically for stuttering research. For the first time, researchers, clinicians, and developers have access to a shared standard that can be used consistently across datasets, tools, and workflows. The specification extends W3C SSML standards and includes full schema validation, ensuring interoperability and conformance across tools.

It also includes comprehensive support for identifying different types of dysfluencies, including repetitions, prolongations, and blocks, along with their subtypes, and features a clinical severity scale ranging from 0 to 8 based on duration, that provides consistent and objective measurements. Multi-speaker support enables annotation of conversation transcripts with speaker IDs, making SAML suitable for group and dialogue research as well as individual clinical assessment.

To lower the barrier to adoption, SAML also introduces a shorthand notation system, a human-friendly annotation format that converts seamlessly to validated SAML XML via a CLI tool and Python library. Annotators can begin working in plain shorthand without writing XML directly, reducing the technical overhead of adoption. Compatibility with widely used speech analysis platforms such as Audacity and Praat makes it easier to integrate SAML within existing research and clinical environments.

Sourav Bandyopadhyay, Co-Founder and Chief Scientist, Shunya Lab said, "Despite decades of work in speech research, stuttering annotation has remained highly inconsistent across institutions and datasets. SAML was created to provide a shared framework that researchers, clinicians, and developers can adopt across studies and tools. Standardisation is critical if speech research is expected to become interoperable, reproducible, and collaborative at scale."

To support broader adoption, Shunya Labs has made the complete specification, annotator guidelines, training resources, schema documentation, a validation suite of 28 test cases, and conversion tools freely available to the research community. Shunya Labs has also committed to keeping all resources free of cost, reflecting its position that access to quality research infrastructure should not depend on institutional budget.

The release addresses a longstanding gap in speech research infrastructure, where the absence of shared annotation standards has historically limited interoperability, reproducibility, and large-scale collaboration. By introducing a common framework for stuttering annotation, SAML simplifies dataset exchange, reduces inconsistencies in labelling practices, and improves compatibility across research environments and speech technologies.


About Shunya Labs

Shunya Labs is an advanced voice AI platform delivering real-time, on-premise solutions for speech, language, and reasoning. The company develops high-accuracy speech recognition models optimised for Indian and multilingual environments, enabling secure, privacy-first deployments across cloud, edge, and air-gapped systems for enterprises and public institutions.

MENAFN26052026003198003206ID1111168175



ForPressRelease

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.

Search