Ciklum Speaker’s Corner – Architecting Scalable AI RAG Systems: From Startup to Enterprise. A Live Coding Session (Ciklum, 2024)
Info
- Project type Public Speaking, Tech Presentation
- Date 16th of April 2024
- My Role Speaker, Moderator
- Registered 510, from 21 countries
- Topics NLP, LLM, RAG, Cloud, AWS, Azure, Java, Python, JavaScript, AI QA, Chatbots, FAISS, PgVector, Qdrant, Docker
- Keywords LLM Wrapper, RAG, Semantic Search, Portability, Scalability, Enterprise
- Skills developed Presentation, Speaking
- Event URL Ciklum's website
- Event recording (Youtube)
Description
This is the 4th edition of the Ciklum Speakers Corner that I am speaking at, and it is by far the most complex, involving extensive preparations. This time, the initiative was mine—I wanted to elevate the event to the next level and transform it into a coding festival rather than just a live coding session.
My previous Ciklum Speaker's Corner events:
- AI SoA (Oct 2022)
- Building Full Stack AI Chatbots With Java (Feb 2023)
- Neural Network That Can Learn: Creating From Scratch (Sept 2023)
During this session, we built a RAG system starting with a Java LLM wrapper featuring Spring AI and OpenAI integration, locally on Windows.
We then transitioned to Python with Langchain, using a Llama 2 LLM locally on Linux, with a FAISS vector DB. Subsequently, we packaged everything into a Docker image and deployed it on AWS, demonstrating infrastructure as code capabilities and the portability of our architecture.
In the next phase, we again used Langchain but with JavaScript locally on macOS, replacing the vector DB FAISS with PgVector to leverage a well-known DBMS.
We then used the same Docker container, initially built and deployed on AWS, and shifted everything to Azure. We switched the vector DB to Qdrant and pivoted from Llama back to OpenAI. At this level, we delved deeper into chat history, context awareness, and semantic searching.
Between each phase, we introduced theoretical concepts (such as data chunking, embeddings, retrieval, vector databases, semantic search or neural network quantization) and reviewed the steps we had just taken, providing technical explanations and the motivations behind our choices.
Afterwards, we covered the QA aspect, focusing on methodologies and best practices for data protection, enterprise privacy, and digital assurance in AI development.
Ultimately, we introduced fundamental concepts on how users can interact with the LLM and RAG through Prompt Engineering and some key concepts, such as Context, Tone, Temperature, Prompt tactics, Chain of thoughts and Thread-of-thoughts.
There were several strategic goals I aimed to achieve with this event, and I designed it to cover them all:
- Demonstrating cutting-edge technology as an outsourcing company is not common. I saw this as an ideal opportunity to show that we are not only up to date with the latest technologies but are also capable of coordinating and executing practical, effective solutions that work in real life.
- We demonstrated not only individual expertise but also our ability to function as "one team". This aligns with our company vision, and I found it a great opportunity to show how we make this happen in practice.
- As a Technology Lead for AI at Ciklum, I am frequently questioned by clients and prospects about our team's capabilities. Hence, I chose to prepare video content rather than just slide presentations or PDFs.
- We highlighted our proficiency in languages other than Python. The general community often assumes that all AI-related coding is done in Python. While partially true, it's not entirely accurate. Many tasks can be accomplished in other languages, which we demonstrated in the current demo with Java and JavaScript. This is beneficial for potential colleagues who may not have a Python background and for some clients hesitant about adopting AI due to their unfamiliarity with Python.
- We demonstrated that infrastructure is not a concern, as we can support all on-prem OS and showcased two major cloud platforms—AWS and Azure.
- We highlighted the new QA challenges that emerge with the rise of AI and how we tackle them.
This event was incredibly challenging and interesting, involving great collaboration across many departments and countries. I am grateful to work in such a collaborative environment and to have the opportunity to pioneer these technological innovations.
References
- Video recording (Ciklum's YouTube channel)
- Presentation slides
- Follow-up materials (including Git repos)
- Neural network that can learn: creating from scratch
- Building full stack AI chatbots with Java | Live coding session
Tech stack
- Programming Languages: Java 17, Python 3.11, Javascript
- Databases: FAISS, Postgres, Qdrant
- Infrastructure: Windows, macOS, Linux, Docker, AWS, Azure