App functionality: Semantic similarity
One common need at Vinnova is finding projects that fit certain criteria, such as a query or the description of another project or deciding on the right evaluator (bedömare) for a project. This can be solved by making use of semantic similarity, which involves embedding the texts and obtaining similarities of the vectors that represent them. This way, one can, for example, specify that they want a project related to “mental health and therapy with horses”, and the app will return the projects that are the most similar to the query. It works similarly when the user inputs a project description and wants to find projects that are closely related or needs to decide on the evaluator that is best fit to assess a proposal.
To speed up the processing time of this features, we resorted to using a vector database called Chroma. Chroma stores the project abstracts as vectors, and it can store the text itself and other metadata (such as the original portfolio, the project identification number, etc.), and the vectors are retrieved with a query. A vector database, if not updated regularly or via a pipeline, is static. As project proposals keep coming into the Vinnova database, our Chroma database updates every midnight.