Launch agentic AI with confidence. Watch our on-demand webinar to learn how. Watch it Now

Big Data Engineer

Vietnam, Hanoi / Full-time / Hybrid

About the position

We are seeking a Senior Big Data Engineer with a strong background in managing structured and unstructured data pipelines, who thrives in a fast-paced AI-focused environment. You will be instrumental in building and scaling our data lake architecture, supporting a system designed to fuel intelligent AI agents for data collection, labeling, and analytical reasoning. This includes integrating vector databases and optimizing for retrieval-augmented generation (RAG) workflows deployed on AWS Bedrock and other AI stacks.

Responsibilities

  • Design and implement scalable ingestion pipelines for structured/unstructured data using AWS and Databricks Unity Catalog.
  • Build and maintain high-throughput ETL/ELT pipelines with Apache Airflow and Databricks.
  • Architect and manage data modeling, storage, and indexing strategies in PostgreSQL and RDS, ensuring compatibility with AI retrieval systems.
  • Integrate and manage vector databases to support fast semantic and embedding-based search in RAG pipelines.
  • Collaborate with AI engineers to ensure seamless compatibility with LangGraph and LangSmith agent systems.
  • Implement robust data validation, lineage, and governance systems using Unity Catalog.
  • Optimize performance across distributed compute environments (Databricks, EC2).
  • Deploy and maintain Lambda-based microservices for scalable, real-time data ingestion and enrichment.

Requirements

  • 5+ years working with big data systems in production environments.
  • Proven expertise with Databricks, Unity Catalog, and Apache Spark.
  • Proficiency in Airflow, AWS stack (Lambda, EC2, RDS), and cloud-based data lake architectures.
  • Strong SQL and database design skills (PostgreSQL preferred).
  • Working knowledge of vector databases (Chroma, Pinecone, FAISS).
  • Solid understanding of data lifecycle management in ML/AI contexts.
  • Bonus: Familiarity with LangGraph, LangSmith, LangChain, or similar agent orchestration tools.

Preferred Qualifications

  • Experience with AI agent pipelines or large-scale ML model support.
  • Emphasis on data observability, security, and lineage tracking.
  • Hands-on with RAG architecture, including vector storage and semantic retrieval.
  • Exposure to AWS Bedrock and model deployment orchestration.

About ActiveFence

ActiveFence is the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms every day.

As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, ActiveFence empowers security, AI, and policy teams with low-latency Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques. Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, ActiveFence enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes.