Anna Dai

Anna Dai

Data Scientist | Machine Learning & Data Systems

trivago

Biography

Anna Dai is a Data Scientist at trivago working on large-scale machine learning and data systems for travel search and pricing analytics.Her work focuses on scalable data pipelines, applied machine learning, and improving production systems that support decision-making across millions of travel searches.

Previously, she conducted research in natural language processing at the EPFL NLP Lab under Professor Antoine Bosselut, where her work explored responsible and secure machine learning systems. She holds an MSc in Data Science from Duke University. Anna is particularly interested in applied machine learning research that bridges academic ideas and real-world systems, especially in areas such as large-scale modeling, efficient data processing, and responsible AI.

Interests
  • Large-scale Data Systems
  • Natural Language Processing
  • Applied Machine Learning
  • Responsible AI
Education
  • MSc in Data Science, 2023

    Duke University, USA

  • BA in Business Economics, 2016

    University of California, Los Angeles (UCLA), USA

Experience

 
 
 
 
 
trivago
Data Scientist
trivago
January 2024 – Present Düsseldorf, Germany
Built large-scale machine learning and data systems for travel pricing and search analytics, including scalable quantile estimation and production ML pipelines
 
 
 
 
 
EPFL - NLP Lab
Research Assistant
EPFL - NLP Lab
January 2023 – January 2024 Lausanne, Switzerland
Developed LLM-based workflows for skill extraction and matching across job postings, course catalogs, and resumes
 
 
 
 
 
Duke University
Graduate Research and Teaching Assistant
Duke University
January 2022 – January 2023 Durham, United States

Research at CEI Lab: Implemented baseline models from literature and evaluated defenses against adversarial model extraction in vertical federated learning systems

Teaching at MIDS and Fuqua: Held weekly office hours for Data Engineering Systems, Data Analysis at Scale in the Cloud, and Fraud Analytics courses

 
 
 
 
 
Stitch Fix
Data Science Intern
Stitch Fix
January 2022 – January 2022 San Francisco, United States
Explored integration of incorporating external fashion trends (NLP task) into our historical-data-based forecasting models
 
 
 
 
 
EY
Senior Tax Consultant, Tax Staff
EY
January 2017 – January 2021 San Francisco, United States
Led data-driven R&D tax credit analyses using large client datasets across financial and technology sectors
 
 
 
 
 
UCLA
Teaching Assistant
UCLA
January 2014 – January 2016 Los Angeles, United States
Assisted teaching in taxation and business law courses, leading office hours and grading exams and assignments

Publications

ModelGuard: Information-Theoretic Defense Against Model Extraction Attacks
Proposed novel defense against adaptive model extraction attacks through prediction perturbation by leveraging information theory.

Projects

Drug Diversion Detection (Capstone Keynote)
As my master’s capstone project, I worked with my team of four to develop a machine learning model to detect drug diversion by anesthesiologists from the Duke University Hospitals.
Drug Diversion Detection (Capstone Keynote)
Predict the Look @ SFIX
As a member of the Algorithms team at Stitch Fix, I worked on a project to predict fashion trends based on external data sources to better inform our buyers and designers.
Predict the Look @ SFIX

Contact

Please reach out to me if you have any questions or would like to collaborate!