jkisolo.com

Uber's Michelangelo: A Comprehensive ML Platform Overview

Written on

Introduction to Michelangelo

Michelangelo is Uber's robust machine learning (ML) platform that facilitates the training and deployment of numerous models in production environments. This platform is meticulously crafted to encompass the complete ML workflow, accommodating a variety of methodologies including classical ML, time series forecasting, and deep learning. Its applications are diverse, ranging from generating marketplace forecasts and addressing customer support inquiries to accurately estimating arrival times (ETAs) and enhancing Uber's One-Click Chat feature through natural language processing (NLP) models integrated within the driver application.

The Motivation Behind Michelangelo

In around 2015, Uber's ML engineers recognized an underlying technical debt within their machine learning systems, akin to the challenges described in their technical debt series. Although Uber's engineers could create custom, one-off systems that interfaced with ML models, these solutions contributed to the technical debt and were not sustainable within a large engineering framework. They observed a lack of systems that could reliably, uniformly, and reproducibly create and manage training and prediction data at scale.

This realization prompted the development of Michelangelo, which leverages Uber's expansive data lake filled with transactional and logged data. The platform accommodates both offline (batch) and online (real-time streaming) predictions. For offline scenarios, containerized Spark jobs generate batch predictions, while online deployments utilize a prediction service cluster, typically comprising hundreds of machines that handle client requests for predictions, whether individual or in batches.

Understanding the Functionality

The platform meticulously tracks metadata pertinent to model management, encompassing runtime statistics, model configurations, lineage, feature distributions, evaluation metrics, learned parameters, and summary statistics for each experiment conducted.

Michelangelo is capable of deploying multiple models within a single serving container, facilitating smooth transitions from older to newer model versions and enabling A/B testing of various models side by side.

Updates and Enhancements

The current version of the platform has adopted Spark's ML pipeline serialization, supplemented with a streamlined interface for online serving that supports a lightweight single-example scoring method. This enhancement is particularly beneficial for scenarios demanding tight Service Level Agreements (SLAs), such as fraud detection and prevention, as it circumvents the overhead typically associated with Spark SQL's Catalyst optimizer. While Spark's ML pipelines present certain limitations, these can be effectively addressed using Kafka Streaming.

The shift towards native Spark serialization and deserialization introduces greater flexibility and compatibility across different environments at the pipeline stage for model persistence.

Conclusion: The Future of Michelangelo

With ongoing advancements and updates to the Michelangelo platform, Uber's machine learning stack is now equipped to accommodate a broader array of use cases. This includes the capacity to experiment and train models seamlessly in Uber's Data Science Workbench—a distributed Jupyter notebook environment that can be utilized within Michelangelo—as well as supporting comprehensive deep learning processes using TFTransformers.

This video explores the story behind Michelangelo, detailing its development and impact on Uber's machine learning initiatives.

In this video, Min Cai discusses the past, present, and future of the Michelangelo ML platform, highlighting its evolution and future potential.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Pathway to Becoming a DevOps Engineer: Your Comprehensive Guide

Discover the essential phases and resources to become a proficient DevOps engineer, from foundational knowledge to advanced cloud skills.

generate a new title here, between 50 to 60 characters long

The story of Flaco the Owl highlights the complexities of animal captivity and the responsibilities of both zoos and activists.

Sickle Cell Disease: A New Era in Gene Therapy Treatment Options

Explore the groundbreaking gene therapies for sickle cell disease and their potential impact on patients' lives.