The daily use of search engines on business data (of customers, products, internal procedures…) is now widespread, both by business users and customers. Such widespread use of this technology has generated growing expectations on search has led to important advances, such as semantic search, that go beyond keyword search capabilities. However, both keyword and semantic search have important trade-offs to consider:

  • Considering only keywords in a search may imply a loss of important parts of the context
  • Considering only semantic search may imply a loss of important keywords

In addition, it is increasingly necessary to search or return information on unstructured data such as images or videos, especially when searching on ecommerce sites or in document research.

Multimodal Semantic Search represents a new approach to information search, specifically designed to simultaneously handle keywords, semantic context and scraping of structured and unstructured data, to provide a customised, timely and complete search experience.


The main problems that can occur from a traditional approach to searching for information via search engines:

  • time-consuming searches for information in company documentation, often scattered and not centralised
  • time spent on both internal and third-party doc audit activities during verification and investigation activities
  • search engine query results of a site (corporate or e-commerce) not substantiated with the query
  • extraction of information from images incomplete or not fully functional to the research questions


Quantyca has developed an approach and architecture capable of implementing a multimodal search engine on unstructured data, refining the ML algorithm to search on specific and timely customer data suitable for a use case of internal information search, customer interface and search within e-commerce sites.


In recent years, NLP and CV models have undergone considerable developments, contaminating the strengths of the two fields of ML and providing extraordinary potential to be exploited for a variety of uses. In particular, Quantyca has used Large Language Models (LLM) to develop this solution: these are machine learning models that use deep learning algorithms to process and generate text and other content based on knowledge acquired from huge amounts of data. These models are among the successful applications of transformation models. LLMs consist of a neural network with many parameters, pre-trained on large amounts of unlabelled text using self-supervised learning. Each neuron in each layer of the network can receive input from other neurons and produce output. The output of each neuron is determined by its weights, which are adjusted during model training.

LLMs are able to recognise, summarise, translate, predict and generate text and other content and are able to process large amounts of data, which leads to improved accuracy in classification tasks, question understanding and answer generation.


Quantyca Multimodal Semantic Search schema immagine
Quantyca Multimodal Semantic Search schema immagine


Quantyca used this class of models to recognise, summarise, translate, extract and generate text and other content based on knowledge gained and refined from its customers’ data. The machine learning models used employ deep learning algorithms to process and generate text and other content. The benefits of LLM include:

  • the reduction of manual labour and costs in information search and verification operations
  • increased availability of information within business divisions
  • the customisation of search and the satisfaction of the end customer

Quantyca has developed a cross-use-case architecture that is able to bring into production and monitor models trained on customer data in a timely manner, providing users with an accurate and effective hybrid (image and text) search engine to meet different types of needs.

Quantyca Multimodal Semantic Search schema immagine

The complete route

Optimisation of unstructured data and extraction of text, tables and images using OCR and image extraction algorithms
The 'natural language' query made by the engine user is translated into a query to be passed to the model
The data and queries are vectorised and passed to the model, to generate a set of information to be clustered and used in the response depending on their 'distance' from the concept or query
Keywords and context are used to minimise distance with textual data and reference images, so as to explore the entire available database in depth
The results are assembled into a textual, graphical or hybrid response and displayed to the end user


Optimising the search experience
Reducing the search effort for timely information
Dissemination of a Q&A approach to business and customer research
Reducing operating costs
Greater agility of development

Need personalised advice? Contact us to find the best solution!

This field is for validation purposes and should be left unchanged.

Join the Quantyca team, let's be a team!

We are always looking for talented people to join the team, discover all our open positions.