Building a Talk ChatGPT App with React Native Expo, NestJS, Google Text-to-Speech, OpenAI and TS
Beto, August 28, 2023 · 22,296 views
Learn how to create a voice-enabled ChatGPT app using React Native with Expo for the frontend and NestJS for the backend. You'll learn recording voice, converting speech to text locally, sending the text to a NestJS server, and using OpenAI and Google Text-to-Speech APIs to generate and play audio responses.
If you want to build a conversational app that talks with ChatGPT and plays back responses as audio, this tutorial is for you. You'll learn the architecture, key dependencies, and practical coding steps to get a working prototype.
What's inside
- Introduction to the voice ChatGPT app concept and demo
- Overview of the app architecture with React Native frontend and NestJS backend
- Using local speech-to-text on the client to avoid API costs
- Sending text to NestJS server to handle OpenAI ChatGPT conversation
- Using Google Text-to-Speech API on the server to generate MP3 audio
- Sending the MP3 audio back to the client and playing it
- Managing conversation state simply with one active chat
- Creating the Expo React Native project with TypeScript
Introduction to the voice ChatGPT app concept and demo
I open with a demo of the app where pressing a red button starts recording your voice. The app converts your speech to text, sends it to a backend, and receives a ChatGPT response. The response is converted to speech using Google Text-to-Speech and played back as audio on the device.
This shows a full voice conversation loop with ChatGPT, demonstrating how you can talk to the AI and hear its answers. The demo highlights the core user experience and sets the stage for building the app.
Overview of the app architecture with React Native frontend and NestJS backend
The app consists of a React Native client built with Expo that handles voice recording and speech-to-text locally. The backend is a NestJS server that communicates with OpenAI and Google APIs.
The architecture separates concerns: the client handles voice input and playback, while the server manages API keys, conversation logic, and text-to-speech conversion. This keeps sensitive keys secure and off the client.
Using local speech-to-text on the client to avoid API costs
Instead of sending raw audio to a cloud speech-to-text API (which can be costly), the app uses a React Native dependency to convert speech to text locally on the device. This approach leverages modern device capabilities and reduces expenses.
The client captures voice, translates it to text immediately, then sends only the text to the backend. This is efficient and cost-effective for developers.
Sending text to NestJS server to handle OpenAI ChatGPT conversation
Once the client has the text, it sends it to the NestJS backend via an API call. The server uses OpenAI’s ChatGPT API to process the message and generate a response.
The server keeps track of the conversation history to maintain context, though in this demo it handles only one conversation at a time for simplicity. This backend logic abstracts the OpenAI API usage away from the client.
Using Google Text-to-Speech API on the server to generate MP3 audio
After receiving the ChatGPT text response, the server calls Google Cloud’s Text-to-Speech API to convert the text into an MP3 audio file. This audio file represents the spoken reply from ChatGPT.
The server then sends this MP3 file back to the client for playback. Using Google’s API ensures high-quality, natural-sounding speech synthesis.
Sending the MP3 audio back to the client and playing it
The React Native app receives the MP3 audio from the server and plays it using native audio playback capabilities. This completes the voice interaction loop, letting users hear ChatGPT’s answers.
The app also saves the last audio message locally so users can replay it if desired. This simple replay feature enhances usability without adding backend complexity.
Managing conversation state simply with one active chat
To keep the demo straightforward, the backend handles only one conversation at a time and does not store conversation history in a database or cloud storage. The last message and audio are saved on the client side.
I mention that you could extend this by adding databases or AWS S3 storage for full conversation history and audio archives, but that would increase complexity beyond this tutorial’s scope.
Creating the Expo React Native project with TypeScript
I walk through creating a new Expo project using the blank TypeScript template. It uses the command:
This sets up the React Native app foundation to build the voice chat interface. Expo simplifies development and testing on mobile devices.
Resources

CourseReact Native course
Master React Native fundamentals through shipping real apps with TypeScript and Expo.

Premium resourcePro Membership
Get access to premium tutorials, code examples, and community support.
Like this article? Get the rest of the library plus weekly React Native tips. Free.