Week 4 Summary - Deployment Time!
Deployment time!
This week, our focus was to implement our app into the IT infrastructure at the Red Cross, to enable deployment and demo testing in the real world! We also made our app working smoothly on Android devices.
We kicked off the week by enabling Android compatibility. Previously, our App was only tested on iPhones, which caused some problems on Android devices. The main challenge was how we processed audio files and used the embedded audio player within the devices. While iOS uses AVPlayer, Android uses ExoPlayer. The solution was to come up with a new algorithm to ensure one audio file finishes playing before starting the next. So now it works seamlessly on both iPhones and Androids - great stuff!
With Android issues sorted in the beginning of the week, we spent the rest of the week focusing on deploying our app on Microsoft Azure, wanting to shift our backend from running locally to running in the Azure environment instead. This is a necessity to enable the Red Cross to demo test our mobile application within their actual organization. However, this deployment has been trickier than expected - here’s a snapshot:
- Our server deployed to Microsoft Azure App Service via Github Actions, which went very smoothly. This is done by creating a deploy-server.yml file directly in the repo, as well as by adding Repository Action Secrets containing Azure Credentials from the Red Cross IT Deparment on GitHub.
- The client is running locally on the device and successfully connects to the server, and the server handles all our AI functionalities with our connected API keys.
- Audio files are processed through our API:s, and the resulting files with translation or community guidance responses, are uploaded to our Azure Database called Blob Container, which is a cloud storage for binary files such as audio and images. Unfortunately, we have yet to find out how to reliably send these audio files back to the client for playback on the mobile device.
Some tech-nerdy explanations:
The major problem during the deployment to Azure has been the re-building and re-coding of our server functionality to handle the uploads, access and downloads of the audio files. Previously, everything was managed in the local computer's file system, but when moving the server and storage to Azure, a lot of the functions needed to be revised.
A specific problem is the handling of audio files directly in memory, since the Whisper API needs a proper file to proceed and transcribe the audio. Here we managed to use Buffers (temporary holding spot for data being moved from one place to another). This also lets us skip the process of uploading the file to the Blob Storage before downloading it again and put it through the API. However, Buffers are not seen as files, which the Whisper API expects. To overcome this, we are using Form-Data that are file-like objects which can contain the binary information directly in memory without the need of an actual storage place locally. The Buffer is first converted to .wav format, and then appended to the Form-Data object. With the help of Axios (instead of using OpenAI's official Node.js library) we are then able to send this audio “file” into Whisper API and get the transcription back. Then this transcription is processed through the rest of the API:s, and finally after TTS (Text-to-Speech) is done, the translated audio Buffer is uploaded to Azure Blob Storage in the wait to be played, and then directly deleted after being played. Azure Blob Storage also has a SDK for Node.js which helped the development process.
Hence, moving into next week, our primary goal is to get our app fully functional in the Azure environment. Once this is achieved, we can fully focus on adding new cool features and getting the app running in the Red Cross organization as soon as possible!
Until next week!
/Team