Machine learning has become an important cog in the functioning of all large businesses. Many companies are now building their own machine learning platform. These platforms are based on open source technologies; however, a few functions require custom solutions. To do this, companies invest in building internal components for their machine learning platform. In this article, we take a look at a few.


Launched in 2017, Uber’s Michelangelo has been in the works for two years. The goal behind creating a proprietary ML-as-a-service platform is to make scaling AI as easy as booking a ride. In the first quarter of 2020, the taxi transport service made an average of 1,658 million trips per day; it meant the company was sitting on a treasure trove of rich data.

Initially, Uber relied mostly on separate predictive models or smaller systems for individual problems. However, these were short-term solutions that weren’t enough to contain the speed at which Uber’s AI systems were growing. And as a result, Michelangelo was born. It is now deployed in multiple Uber data centers to predict the online services charged to the business.

Data & Analytics Conclave. Free Recordings>>

Credit: Uber

The platform consists of several open source systems, which include components such as HDFS, Spark, Cassandra, MLLib, XGBoost, Samza, and TensorFlow. Besides open source systems, Uber has also developed some of Michelangelo’s components in-house.

Horovod: This is an open source distributed training framework for TensorFlow, PyTorch and MXNet. Its job is to make distributed deep learning fast and easy. It uses ring-allreduce and requires minimal user code changes. With Horovod, the training script can be extended to run on hundreds of GPUs using just a few lines of Python code. Horovod can be installed on on-premise and cloud platforms; Additionally, Horovod can also run on Apache Spark, allowing data processing and model training to be unified into a single pipeline. Once configured, the same infrastructure can be used to train models on any framework and switch between TensorFlow, PyTorch, MXNet.

Louis: This is also an open source deep learning toolkit from Uber that is built on TensorFlow. It allows users to train and test deep learning models without writing any code. It is an AutoML platform that provides a set of model architectures. These architectures can be combined to create an end-to-end model for a given use case. It supports functions like text classification, sentiment analysis, image classification, machine translation and image captioning, among others.


Netflix started out as a DVD rental platform in 1997 that has grown into a huge Over-The-Top giant with over 209 million subscribers. At the height of the pandemic last year, Netflix US added up to 500 new shows. One of the main catalysts for its growth has been the recommendation system, which is considered one of the best in the industry. The personalized recommendation algorithm that plays a major role in customer retention has helped Netflix to pocket profits of up to $ 1 billion annually. Plus, over 80% of the shows people watch on Netflix are discovered through its recommendation system.

Some of the internal components developed by the machine learning teams at Netflix are:

Metaflux: Developed over a period of four years, Metaflow is a full-stack framework for data science. Netflix’s open source metaflow in 2019. It allows the OTT company to define machine learning workflows, test them, scale them in the cloud, and ultimately deploy them to production. It is a user-friendly Python / R library for scientists and engineers to create and manage real-world data science projects. Metaflow gives data scientists the ability to choose the right modeling approach, manage data, and easily build workflows, while ensuring that the resulting project runs robustly on the production infrastructure.

Metaflow and AWS integration

It was originally developed by Netflix to increase the productivity of data scientists working on a variety of projects – from classic statistics to cutting-edge deep learning. It has been adopted by several companies outside of Netflix to power their machine learning in production.

polynote: Netflix has a Scala-supported multilingual notebook called Polynote. It has Apache Spark integration, multilingual interoperability with Scala, Python, SQL and others. Polynote offers data scientists and machine learning engineers a notebook environment to seamlessly integrate with Netflix’s JVM-based ML platform with Python’s popular machine learning and visualization libraries.


Until 2016, Airbnb struggled with ML models in production, which not only took a long time to develop, but were also inconsistent. In addition, there were significant discrepancies between offline and online data. In view of these challenges, Airbnb has developed its own machine learning platform called BigHead. Built on Python and Spark, BigHead aims to tie together multiple open source and internal projects to avoid the accidental complexity of ML workflows. The production cycle, the training environment and the data collection and transformation processes are standardized; each of these models is reproducible and iterable. Some of its components developed in-house at Airbnb include:

Credit: Airbnb

Zipline: This is Airbnb’s data management platform specifically designed for machine learning use cases. It helps define features, populate training sets, and enable feature sharing. It effectively solves the inconsistency problem of offline and online datasets. Airbnb has been successful in deploying better controls and monitoring using Zipline.

See also

Red point: This is a hosted, containerized, multi-tenant Jupyter notebook service where each user’s environment is containerized via docker. It allows users to customize the environment of the laptop without affecting other users.

Deep thought: This is a shared REST API service for online inference that supports all frameworks built into the Machine Learning Platform. It provides standardized logging, alerts and dashboard for monitoring and analyzing model performance.


Launched in 2008, Spotify quickly established itself as the world’s music catalog. Moving away from rudimentary recommendation features, Spotify has evolved over the years to include features like Create Unique Playlists and Discover Weekly. All of these features are made possible by standardizing best practices and building tools to bridge the gaps between data, machine learning, and backend through a machine learning platform.

Credit: Spotify

Scio: This is a Scapa API for the Apache Beam Java SDK built by Spotify. A corporate blog says it’s heavily inspired by Scalding and Spark. It offers features such as a good balance between productivity and performance, access to a larger ecosystem of infrastructure in Java, functional and type-secure code.

Zoltar: This is a common library to serve TensorFlow and XGBoost models in production. It helps load predictive machine learning models into a JVM and offers several key abstractions. Zoltar can be used to load a serialized model, present input data, and serve model predictions.

Apollo: It is a set of Java libraries used when writing micro-services. It includes features such as HTTP server and URI routing system that facilitate the implementation of RESTful services. It has three main parts: apollo-api, apollo-core and apollo-http-service.

Subscribe to our newsletter

Receive the latest updates and relevant offers by sharing your email.

Join our Telegram Group. Be part of an engaging community