Multimodal semantic search

Data Science

A new way of querying data and extracting comprehensive information

Industries:

Retail & FMCG - Industrial - Life Science

Solutions:

Data Science

Technologies:

PyTorch - Python

Overview

The daily use of search engines on business data (of customers, products, internal procedures…) is now widespread, both by business users and customers. Such widespread use of this technology has generated growing expectations on search has led to important advances, such as semantic search, that go beyond keyword search capabilities. However, both keyword and semantic search have important trade-offs to consider:

Considering only keywords in a search may imply a loss of important parts of the context
Considering only semantic search may imply a loss of important keywords

In addition, it is increasingly necessary to search or return information on unstructured data such as images or videos, especially when searching on ecommerce sites or in document research.

Multimodal Semantic Search represents a new approach to information search, specifically designed to simultaneously handle keywords, semantic context and scraping of structured and unstructured data, to provide a customised, timely and complete search experience.

Challenges

The main problems that can occur from a traditional approach to searching for information via search engines:

time-consuming searches for information in company documentation, often scattered and not centralised
time spent on both internal and third-party doc audit activities during verification and investigation activities
search engine query results of a site (corporate or e-commerce) not substantiated with the query
extraction of information from images incomplete or not fully functional to the research questions

Solution

Quantyca has developed an approach and architecture capable of implementing a multimodal search engine on unstructured data, refining the ML algorithm to search on specific and timely customer data suitable for a use case of internal information search, customer interface and search within e-commerce sites.

In recent years, NLP and CV models have undergone considerable developments, contaminating the strengths of the two fields of ML and providing extraordinary potential to be exploited for a variety of uses. In particular, Quantyca has used Large Language Models (LLM) to develop this solution: these are machine learning models that use deep learning algorithms to process and generate text and other content based on knowledge acquired from huge amounts of data. These models are among the successful applications of transformation models. LLMs consist of a neural network with many parameters, pre-trained on large amounts of unlabelled text using self-supervised learning. Each neuron in each layer of the network can receive input from other neurons and produce output. The output of each neuron is determined by its weights, which are adjusted during model training.

LLMs are able to recognise, summarise, translate, predict and generate text and other content and are able to process large amounts of data, which leads to improved accuracy in classification tasks, question understanding and answer generation.

Quantyca Multimodal Semantic Search schema immagine

Quantyca used this class of models to recognise, summarise, translate, extract and generate text and other content based on knowledge gained and refined from its customers’ data. The machine learning models used employ deep learning algorithms to process and generate text and other content. The benefits of LLM include:

the reduction of manual labour and costs in information search and verification operations
increased availability of information within business divisions
the customisation of search and the satisfaction of the end customer

Quantyca has developed a cross-use-case architecture that is able to bring into production and monitor models trained on customer data in a timely manner, providing users with an accurate and effective hybrid (image and text) search engine to meet different types of needs.

The complete route

1. DOCUMENTATION PARSING AND SCRAPING

Optimisation of unstructured data and extraction of text, tables and images using OCR and image extraction algorithms

2. QUERY GENERATION

The 'natural language' query made by the engine user is translated into a query to be passed to the model

3. VECTORIZATION

The data and queries are vectorised and passed to the model, to generate a set of information to be clustered and used in the response depending on their 'distance' from the concept or query

4. HYBRID APPROACH

Keywords and context are used to minimise distance with textual data and reference images, so as to explore the entire available database in depth

5. RESPONSE

The results are assembled into a textual, graphical or hybrid response and displayed to the end user