Week 3 Summary

Image by Röda Korset 145d ago update

Good evening!

The third week of the AI for Impact Summer Talent Program is done, and it has brought a lot of valuable insights and learnings. This friday, we had a check-in meeting with the other teams in the program where everyone presented their work so far. It was loads of fun and we are very impressed with everyone's work so far! The knowledge sharing between the teams is really helpful and rewarding. 

We have made a lot of progress this week with our mobile app for the Red Cross. Starting the week, we focused on fine-tuning our translation model. We realized that the OpenAI Whisper (Speech-to-Text) model had a hard time transcribing the audio files without the relevant context of the talk when using our new script that automatically cuts the audio as soon as there is a natural pause while talking. We solved this issue by prompting Whisper with the previous transcriptions of the talk, to make the model understand the context and therefore easily predict what to transcribe from the next audio files. We also made some code updates to improve the speed of the model and to correct some button-bugs. 

We have successfully added chat-bubbles to show the text on the screen. When the user talks, the text pops up in a chat-bubble as it gets transcribed, to ensure that it is correct. As the user continues to talk, the chat-bubble gets filled with more text. When the user is done and wants it translated, the stop-button gets pressed, and the chat-bubble turns into the translated language simultaneously as the translated audio is played. This way, the second user is able to read the translated text too when hearing the translated audio. Moving on, it works the same for the second user when speaking back in the other language. Our app is displaying the conversation with chat-bubbles on both sides, like a normal conversation-app, ensuring that all the Swedish speaker says is displayed on the right side in blue color, and what the Ukrainian speaker says is displayed on the left side in white color. An additional feature is that all chat-bubbles are clickable, so when clicking on one, it instantly changes the text to the opposite language, so that each user can read the whole conversation in its own language. 

Our second major feature we have completed this week is the Community Guide. Via a menu button in the top left corner, users can switch to a new page where they can access the Community Guide. In the Community Guide refugees can ask questions regarding Swedish society, either by recording their question or by typing it in the search bar. The question is answered by OpenAI's GPT-4 API, and the response is both displayed on the screen in text and played as audio in the same language the question was asked. The GPT-4 model is prompted with a task description and guidelines on how to behave like a community guide for Ukrainian migrants. We have also implemented a toggle function similar to the one in the translation feature. When clicking on either a question or an answer, the text is translated to Swedish/Ukrainian. This is beneficial if a Ukrainian refugee wants to ask or double check a Swedish Red Cross volunteer about something that the Community Guide has answered.

Finally, our last progress of this week has been to get the deployment pipeline process started. We have had meetings with the Swedish Red Cross IT Department and gotten access to their IT infrastructure, including Microsoft Azure. This will enable us to deploy and try demo versions of the mobile app directly via their IT infrastructure, facilitating the future implementation and hand-over of our product after the summer. We will dig deeper into this deployment pipeline next week.