Bachelor’s or Master’s degree in Computer Science, Information Systems, or related field.
Sound knowledge of feature engineering and deriving algorithms, with 7+ years of experience in data engineering, data warehousing, or big data environments.
Proficiency in SQL, Spark (PySpark is preferred), and modern scheduler tools, with strong Python programming skills and practical experience in data processing and AI model support within big data environments.
Experience working with on-premises data lake and data warehouses.
Solid understanding of data modeling, pipeline orchestration, and performance optimization.
Proven experience supporting analytics and machine learning workflows (data prep, feature stores, deployment).
5+ years’ experience of using modern data storage based on FOSS tools, such as Delta Table, Chroma, Neo4j, etc.
5+ years’ experience of developing customized feature encoding algorithms (such as Dimensionality Reduction, Word Embedding, etc.) and applying orchestrator tools (such as Dagster, Prefect) to deliver automated and efficient data pipeline works.
Experienced in prompt engineering and API usage and fundamental principles of mainstream Large Language Models (LLMs), and familiar with the entire process of model fine-tuning, quantization, and deployment;
Be familiar with the entire RAG (Retrieval-Augmented Generation) process: text chunking, embedding model tuning, and vector database retrieval optimization.
Have experience in end-to-end AI Agent development and be familiar with mainstream Agent frameworks (such as LangChain, Dify, Ollama).
Proficient in Linux operating systems, with experience in system configuration and troubleshooting.
Strong communication skills, with ability to work directly with business stakeholders and present solutions.
Fluent in Chinese and English.
Self-motivated, self-driven.