Semantic Models in Vector Search
Semantic Models in Vector Search
Over the last few years, artificial intelligence has become a recurrent element in both our professional and personal lives. It is more and more present in our daily tasks, thanks to hundreds of thousands of AI-based applications.
Natural language processing (NLP) and natural language generation (NLG) programs are systems generated with AI and built to understand, for example, how humans talk or write. These systems have allowed the creation of semantic models that can learn how to do specific tasks, like predicting the end of the sentence you’re typing. NLP and NLG are revolutionising the search experience by leveraging advanced search based on vectors, using algorithms that understand the semantic meaning and context of queries and documents.
Generating these systems requires a considerable amount of time and resources, as it must be ensured that a large, well-labelled dataset is used to tackle a specific task. That’s why the starting point is foundation models, large AI models trained on a vast corpus of unlabelled data, often using self-supervised learning, that can power a wide variety of downstream tasks.
Open-source or Closed-source Foundation Models
There are two types of foundation models:
- Closed-source foundation models: Usually, end-to-end applications used to create these models or integrations with APIs.
- Open-source foundation models: Normally, model hubs that host the foundation models, on top of which applications or APIs can be created.
Empathy Platform uses open-source foundation models that are applied to semantic, vectorized search for several reasons:
- Price: Closed-source foundation models are quite expensive, while open-source foundation models are more cost-effective.
- Practicality: They are more general, as they have not been trained by a specific company for specific tasks.
- Ethics: There is more information and transparency about the data used to train the models used to build APIs and applications.
- Privacy and Integrity: Do these models respect privacy? Was consent for data usage given expressly? While there isn’t a way to know, Empathy Platform establishes privacy and consent controls that reinforce customers’ trust and brands’ reputations.
Fine-tuning a Semantic Model
Semantic models can extend into any domain with tuning. They are trained with proprietary tuning data—specific, well-labelled domain information to fine-tune the model for specific tasks.
Empathy Platform’s foundation models are trained with query-click and query-product combinations to create semantic associations based on consent integrity, anonymous and session-based customer interactions. Therefore, our domain-based model ensures privacy and integrity.
Privacy and Integrity with Semantic Models
The creation of a foundation model requires a huge amount of training data that leads to consent integrity and privacy problems. It is not possible to ensure and track consent and privacy integrity when working with already-trained models. Open-source foundation models require consent from the individuals whose actions feed the models.
Empathy.co is establishing privacy controls as a firewall against legal and reputational risks for brands. If there is no data subject consent, there is no data integrity. In addition to fine-tuning models to specific use cases, an ePrivacy Stress Test is executed to safeguard retailers’ reputations. AI should offer opportunities based on confidentiality, not compromise trust.
How Empathy Platform applies Semantic Models
Using NLP foundation models fine-tuned with our own proprietary data sets, we created the Semantics API, which identifies semantic similarities between queries, as well as between products in a catalogue. The Semantics API leverages vector search to complement keyword search, yielding faster, more relevant results.
Vector search helps shoppers overcome many frustrating situations, such as zero or partial results, misspellings, or low results. It’s also being used to improve search effectiveness and relevance. By combining the strengths of keyword-based and vector-based indexing, Empathy.co is working to develop a hybrid search solution that can effectively address long-tail scenarios and enhance the relevance of search results.