Building a Graduate School Recommender with RAG + Agentic AI: From Hackathon to Production
Building a Graduate School Recommender with RAG + Agentic AI: From Hackathon to Production
🎯 Project Overview
Recently, while learning about RAG (Retrieval-Augmented Generation) and Agentic LLM technologies, I participated in a hackathon to implement these concepts in practice. This article shares my complete experience of building a graduate school lab recommender system from concept to deployment.
📁 GitHub Repository: https://github.com/elecsonJ/team07-lab-recommender
🎪 Hackathon Background and Motivation
Learning Objectives
- RAG System Implementation: Convert PDF data to vectors for similarity-based search
- Agentic AI Experience: Build intelligent agents that choose different strategies based on context
- End-to-End Service: Complete pipeline from API integration to web deployment
Project Concept
We aimed to solve the challenge that prospective graduate students face when finding labs that match their research interests. Instead of simple keyword matching, our goal was to create an AI-powered intelligent recommendation system.
🏗️ System Architecture
1. RAG (Retrieval-Augmented Generation) Core Structure
1
2
3
4
5
6
7
8
9
10
11
User Query Input
↓
Vector Embedding Conversion (text-embedding-3-small)
↓
Cosine Similarity Calculation (matching with 31 professors)
↓
Top-K Professor Selection
↓
GPT-4o-mini Detailed Recommendation Generation
↓
Quality Assessment & Result Return
2. Agentic AI Philosophy Implementation
The core of our system is adaptive strategy selection based on context:
- Primary Strategy: Generate responses using local RAG search
- Quality Assessment: If response quality is unsatisfactory
- Secondary Strategy: Switch to web search for additional information
- Final Response: Generate high-quality answers with integrated information
This goes beyond simple rule-based systems, demonstrating agent characteristics that assess situations and change strategies.
💾 Data and Limitations
Practical Constraints
Due to hackathon time constraints and data collection challenges, we could only utilize data from 31 professors at Seoul National University’s College of Medicine.
Ideal Vision
Our original goal was to collect professor data from all universities nationwide to:
- Cover all departments and research fields
- Build a universal graduate school recommendation system
- Create comprehensive service not limited to specific universities
Scalability Potential
The current system is designed with an architecture that can easily expand to other universities and fields by simply replacing the data.
🛠️ Technical Implementation
1. Data Processing Pipeline
PDF Data Extraction:
1
2
3
4
5
6
7
8
9
# Extract professor information from PDF
import pdfplumber
from openai import AzureOpenAI
def extract_professor_info(pdf_path):
with pdfplumber.open(pdf_path) as pdf:
# Convert to structured data using GPT-4o-mini
structured_data = process_with_gpt(raw_text)
return structured_data
Vector Embedding Generation:
1
2
3
4
5
6
from langchain_openai import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
model="text-embedding-3-small",
dimensions=1536
)
2. Hybrid Search System
RAG Search:
1
2
3
4
5
6
7
8
9
def rag_search(query, k=5):
# Select top K professors using vector similarity
similar_docs = vector_store.similarity_search(query, k=k)
# Generate personalized recommendations with GPT-4o-mini
response = llm.invoke(
f"Recommend based on these professors: {similar_docs}"
)
return response
Web Search Fallback:
1
2
3
4
5
def web_search_fallback(query):
if quality_score < threshold:
# Real-time web search using Tavily API
web_results = search_web(enhanced_query)
return generate_response_with_web_data(web_results)
3. Conversational Interface
Conversation Context Management:
1
2
3
4
5
6
7
8
@dataclass
class ConversationHistory:
queries: List[str] = field(default_factory=list)
responses: List[str] = field(default_factory=list)
def get_context(self, last_n=3):
# Build context from last N conversations
return build_context(self.queries[-last_n:])
Query Classification System:
1
2
3
4
5
def classify_query(query):
# Automatically distinguish new vs follow-up questions
if is_followup_question(query):
return enhance_with_context(query)
return query
🌐 Deployment and Service
Streamlit Cloud Deployment
To transform our hackathon implementation into an actual service, we utilized Streamlit Cloud.
Deployment Process:
- Push code to GitHub
- Automatic build on Streamlit Cloud
- Securely configure environment variables (API keys)
- Launch real-time service
User Experience Optimization:
- Responsive UI for mobile/desktop compatibility
- Real-time conversational interface
- Query classification result visualization
- Detailed recommendation reasoning
📈 Project Results and Learning
Technical Achievements
1. First-time Technology Experiences:
- Chatbot service implementation using OpenAI API
- RAG system design and implementation
- Translating Agentic AI philosophy into actual code
- Complete web service deployment process
2. Data Quality Optimization:
- Built PDF extraction → structuring → vectorization pipeline
- Achieved 95.8% high data completeness
- Clear separation of research topics and technical methods
3. Hybrid Search Effectiveness:
- Fast primary filtering with vector search
- Complementary information collection via web search
- Context-adaptive strategy selection
User Experience Improvements
Natural Conversation Flow:
1
2
3
4
5
User: "I'm interested in cancer research"
AI: "For cancer research at SNU Medical School, I recommend Prof. Park Jaehyun's Immune Cancer Therapy Lab and Prof. Kang Kunwook's Molecular Imaging and Therapy Lab..."
User: "What about nanotechnology specifically?"
AI: "Among the professors mentioned, Prof. Kang Kunwook specifically conducts cancer treatment research using nanotechnology. His main research focus is targeted imaging and therapy using nanoparticles..."
Personalized Recommendations:
- Semantic understanding beyond simple keyword matching
- User intent recognition and customized responses
- Specific contact information and lab details
🔮 Expansion Possibilities and Future Vision
Short-term Improvements
- Data Expansion: Add data from other universities and departments
- Advanced Filtering: Research funding scale, graduate career paths, advisor styles
- Personalization: User profile-based long-term recommendations
Long-term Vision
- Nationwide Integrated Platform: Integrate all university lab information
- Real-time Updates: Automatic updates of papers, grants, hiring information
- Community Features: Senior-junior matching, lab reviews
💡 Key Learnings and Insights
1. Practical RAG Application
Implementing RAG beyond theory made me realize the critical importance of search quality. Simple vector similarity alone is insufficient; domain-specific preprocessing and postprocessing are essential.
2. Potential of Agentic AI
I experienced the power of systems that adapt based on context rather than relying on a single strategy. This represents true ‘intelligence’ implementation beyond simple if-else logic.
3. Importance of Deployment
No matter how good the technology is, it’s meaningless if actual users can’t access it. Simple deployment through Streamlit Cloud transformed our project into a ‘finished product’.
4. Reality of Data
Perfect data doesn’t exist, and valuable services can be created even with limited data. What matters isn’t the quantity of data but its quality and how it’s utilized.
🎉 Conclusion
This hackathon project experience of implementing RAG and Agentic AI and creating a complete end-to-end service was invaluable.
Particularly:
- Theory to Practice: Implementing technologies I’d only read about into actual code
- Problem-solving Experience: Addressing real-world issues like data scarcity, API limitations, deployment challenges
- User Perspective: Considering actual user needs beyond technical implementation
I look forward to continuing practical yet technically challenging projects like this. I will keep exploring ways AI technology can contribute to solving real societal problems.
Project Links:
- 📁 GitHub: https://github.com/elecsonJ/team07-lab-recommender
- 🏷️ Tech Stack: RAG, Agentic AI, LangChain, Streamlit, GPT-4o-mini, Vector Embeddings
Questions or feedback are always welcome! Please leave a comment. 😊