Hi! Salve! Namaste! Γεια σου! 你好 ! مرحبا

LUCIAN GRUIA

GDG Meetup – Effective Data Chunking Strategies in RAG (ContinualBot, 2024)

Info

  • Project type Tech meetup, Live coding
  • Date 16th of May 2024
  • Duration 1h
  • My Role Speaker
  • Topics Data Chunking, Semantics
  • Keywords Vector Database, Embeddings
  • Skills developed Public speaking
  • Presentation Slides
  • Photo

Description

In May 2024, I delivered a talk at Google Developer Group Bucharest titled "Effective Data Chunking strategies in RAG", an event organized by Digital Stack and Avira Romania.

The talk introduced fundamental concepts of Retrieval Augmented Generations (RAG) and highlighted different data chunking strategies with their key strengths and weaknesses.

My audience comprised experienced Java and Kotlin developers, and cybersecurity software engineers from one of the largest players in the industry, Gen.

The talk's brief:

A RAG system essentially correlates a user's prompt with a relevant data chunk. It does this by identifying the most semantically similar chunk from the database. This chunk then becomes the context for the prompt. When passed to the LLM, it enables the system to provide a relevant answer within the given context.

The main topics explored in the workshop:

  • Processing highly complex, unstructured data.
  • Extracting meaning from data.
  • Practical knowledge representation techniques for handling large-scale context.
  • Real-time lexical and semantic search.

During the workshop, I demonstrated live coding and use cases I encountered while building my framework, ContinualBot.com.

I thoroughly enjoyed my time at the meetup, especially the engaging questions during the Q&A session and the quality networking after the event.

I also appreciated the commitment of the GDG in organizing such professional events. This was an opportunity to expand my community and network connections in the Romanian software development community, where I had the pleasure of meeting many intelligent and inspiring individuals.


 

Resources

Tech stack

  • Programming Languages: Java, Python, JavaScript
  • Databases: MariaDB, Qdrant
  • OS: Windows, Linux

Suggested posts