Using the Web Speech API

I had the chance to work with website accessibility in 2025, and one feature that I learned about that was a lot of fun was the SpeechSynthesisUtterance API. This is a Web Speech API that has the capability to take in a string and outputs audio via the web browser. Most modern browsers have this API built in, which makes it accessible to a wider variety of web developers. The browsers I worked and tested in with were Edge, Chrome, and Firefox. Chrome and Edge do offer more voice options for output.

My Intro to the SpeechSynthesisUtterance API

When I was younger, I remember using the text to speech options and programs in earlier versions of Windows for things like video commentating and just having fun with friends. This part of the modern Web Speech API is very customizable and fun to use. Different properties available for customization include:

  • Pitch
  • Rate
  • Voice
  • Volume

These speech utterance features allow us to deliver different types of voices with multiple characteristics for different pieces of content. Also, developers have different events available to listen to so that we can better handle interactivity with the application utilizing the SpeechSynthesisUtterance API. Some of the most useful events we can use to listen to include:

  • Start
  • End
  • Pause
  • Resume

Using SpeechSynthesisUtterance On Page Content

I wanted to show off some usage of the SpeechSynthesisUtterance API to perform a simple task so that developers can have some idea on how to incorporate it into their work if need be. In my example, double-clicking on a paragraph will use the Web Speech API to speak the selected text via the web browser. If you double-click a paragraph while the API is already speaking, the browser will switch to speaking the newly selected content. Hitting the spacebar will end the speech entirely. I had to make sure to cancel any previously assigned spacebar events (page scrolling) so that we could get the usage we wanted.

<script>
        const synth = window.speechSynthesis;
        let speakables = document.querySelectorAll('.entry-content p, h1, h2');
        speakables.forEach(s => {
            s.addEventListener("dblclick", function () {
                let t = this.textContent;
                try {
                    speak(t);
                    
                } catch (error) {
                    console.log('SpeechSynthesisUtterance Error - ' + error);
                }
            });
        });

        function speak(text = "Hello world!") {
            if (synth.speaking) {
                synth.cancel();
            }
            let utterance = new SpeechSynthesisUtterance(text);
            synth.speak(utterance);
        }

        document.addEventListener('keydown', function (event) {
            if (event.code === 'Space') {
                event.preventDefault();
                synth.cancel();
            }
        });
    </script>

Here is a link to the actual code in action. Hopefully you will find this useful!