Portfolio Project: JavaScript Text-to-Speech App: Using the Web Speech API

The Web Speech API is a powerful tool that enables web developers to incorporate speech synthesis (text-to-speech), and speech recognition features into their applications.

In this tutorial, we will focus on the speech synthesis aspect, creating a simple JavaScript Text-to-Speech application using the Web Speech API.

A robot hand with a audio pattern behind

Prerequisites

To follow this tutorial, you will need:

A basic understanding of HTML, CSS, and JavaScript
A modern web browser supporting the Web Speech API (e.g., - Google Chrome, Microsoft Edge, or Mozilla Firefox)

Setting up the HTML structure

First, let's create a simple HTML structure for our text-to-speech application. Create a new HTML file called index.html and add the following code:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>JavaScript Text-to-Speech</title>
    <link rel="stylesheet" href="styles.css" />
  </head>
  <body>
    <div class="container">
      <h1>JavaScript Text-to-Speech</h1>
      <textarea id="text-input" placeholder="Type your text here"></textarea>
      <button id="speak-btn">Speak!</button>
    </div>
    <script src="app.js"></script>
  </body>
</html>

Add some basic CSS styles

Create a new CSS file named styles.css and add the following styles to make the UI more appealing:

* {
    box-sizing: border-box;
  }
  
  body {
    font-family: Arial, sans-serif;
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh;
    margin: 0;
    background-color: #f3f3f3;
  }
  
  .container {
    width: 80%;
    max-width: 600px;
    background-color: #fff;
    padding: 30px;
    border-radius: 10px;
    border: 1px solid #eee;
    box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
  }
  
  textarea {
    width: 100%;
    height: 150px;
    padding: 10px;
    font-size: 16px;
    border-radius: 5px;
    border: 1px solid #ccc;
    resize: none;
  }
  
  button {
    display: block;
    width: 100%;
    padding: 10px;
    font-size: 18px;
    background-color: #007bff;
    color: #fff;
    border: none;
    border-radius: 5px;
    cursor: pointer;
    margin-top: 20px;
  }
  
  button:hover {
    background-color: #0056b3;
  }

Now you should have a nice-looking application with no functionality.

Next, let's dive into the heart of the application and add the magic with JavaScript. 👇

Implementing the Text-to-Speech functionality

Now, let's create the JavaScript functionality. Create a new JavaScript file named app.js and add the following code:

// Initialize when the page loads
window.addEventListener("DOMContentLoaded", () => {
  if ("speechSynthesis" in window) {
    document.getElementById("speak-btn").addEventListener("click", () => {
      // Get text from input
      const textInput = document.getElementById("text-input");
      const text = textInput.value.trim();
      
      // Show alert if no text is written
      if (!text) {
        alert("Please enter some text");
        return;
      }
      
      // This creates our speech request and the content we want to be spoken
      const utterance = new SpeechSynthesisUtterance(text);

      // Optional: Customize voice properties
      utterance.voice = speechSynthesis.getVoices()[0];
      // Choose a voice
      utterance.rate = 1; // Set the speech rate (0.1 to 10)
      utterance.pitch = utterance.volume = 1; // Set the speech volume (0 to 1)

      // Speak the text
      speechSynthesis.speak(utterance);
    });
  } else {
    alert("Your browser does not support the Web Speech API");
  }
});

Let's explain a couple of pieces here in case the comments in the code don't reveal all.

The line with const utterance = new SpeechSynthesisUtterance(text); creates our speech request and the content we want to be spoken. This can be modified as can be seen in the code where we can update settings like pitch and rate.

We can then "speak" by calling speechSynthesis.speak(); and passing it our utterance.

Everything else is either an optional configuration and making our app interactive.

Play around with the index value in utterance.voice = speechSynthesis.getVoices()[0]; and listen to the variety of voices.

console.log(speechSynthesis.getVoices()) to dive in and see what option is perfect for you. The first value localized based on the users' location settings.

We will add the ability to choose a voice in the next optional section.

The best way to learn about the API is by playing with the configuration, and you can find all the settings here in the Mozilla Docs.

Demo

Here's the working code up to now:

Optional: Add a Voice Selection

We will add a select input element to the HTML structure to allow users to choose a voice for the speech synthesis. Update the HTML file to include the new select element:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>JavaScript Text-to-Speech</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <div class="container">
        <h1>JavaScript Text-to-Speech</h1>
        <!-- New Select options -->
        <select id="voice-select">
        <!-- Voice options will be populated by JavaScript -->
        </select>
        <textarea id="text-input" placeholder="Type your text here"></textarea>
        <button id="speak-btn">Speak!</button>
    </div>
    <script src="app.js"></script>
</body>
</html>

And to keep it looking tasty, add the stylings for this input into your styles.css:

select {
  width: 100%;
  padding: 5px;
  font-size: 16px;
  border-radius: 5px;
  border: 1px solid #ccc;
  resize: none;
  margin-bottom: 10px;
}

Now let's update our JavaScript to include our voice selection. First, let's update the code so we can see the different voice options.

Add this to the top of our app (inside the event listener).

// Get the voice-select element
const voiceSelect = document.getElementById("voice-select");

// Function to populate the voice options
function populateVoiceList() {
  const voices = speechSynthesis.getVoices();

  voices.forEach((voice) => {
    const option = document.createElement("option");
    option.textContent = `${voice.name} (${voice.lang})`;
    option.setAttribute("data-lang", voice.lang);
    option.setAttribute("data-name", voice.name);
    voiceSelect.appendChild(option);
  });
}

// Call the function to populate the voice options when the voices are available
if (speechSynthesis.onvoiceschanged !== undefined) {
      speechSynthesis.onvoiceschanged = populateVoiceList;
}

Now we need to get the selected option by updating our utterance.voice variable:

 // Get the selected voice
const selectedVoiceName = voiceSelect.selectedOptions[0].getAttribute(
  "data-name"
);
const selectedVoice = speechSynthesis
  .getVoices()
  .find((voice) => voice.name === selectedVoiceName);
utterance.voice = selectedVoice;

Here's a demo of the full code:

What next?

Tutorials are fun, but don't stop after following along.

Change some styling and make it your own!

Try to add some <input type="range"> (docs) to configure the pitch and rate dynamically too to make sure you understand how to work with the API.

If you customize or make your version of this app, leave your CodePen links below so I can see them. 🙌

Follow me on Twitter or connect on LinkedIn.

🚨 Want to make friends and learn from peers? You can join our free web developer community here. 🎉