Constructing Effective Machine Learning Operations for Businesses
Written on
Chapter 1 Understanding MLOps
In my professional journey, I've come to realize that the cornerstone of successful AI implementation lies in the effective deployment of machine learning models into production. This is crucial for unlocking their commercial capabilities on a larger scale. However, achieving this is a complex task that entails the integration of diverse technologies and teams, often requiring a cultural transformation within organizations—a framework commonly referred to as MLOps.
It's essential to note that there is no universal MLOps strategy. In this discussion, I present a versatile blueprint for MLOps that can serve as a foundation or a method to refine your existing processes. While the MLOps journey can be intricate, I recommend viewing it as a critical first step in incorporating AI into your business rather than a secondary aspect.
MLOps: Beyond Just Technology
Before we delve into the details, I want to share some (non-technical) insights based on my observations of various MLOps implementations. MLOps encompasses more than mere technology; it fundamentally relies on three pivotal elements: Investment, Culture, and Technology. Organizations that acknowledge all three components from the beginning tend to achieve greater success with their strategies. A frequent error I observe is businesses investing heavily in solutions without considering the necessary cultural adjustments. This lack of foresight can severely jeopardize your strategy, wasting resources and eroding confidence among executives or investors.
Culture
Introducing a new cultural framework within any organization is no small task and requires the full backing of its workforce. A common mistake I've witnessed is when companies hastily swap out old tools for new, more appealing ones without contemplating the cultural impact. This tactic often breeds dissatisfaction and leads to these tools being underutilized or misapplied.
Conversely, organizations that successfully manage cultural shifts actively involve end-users in the development of the MLOps strategy and assign them roles that promote a sense of ownership. They also provide crucial support and training to enhance user skills, thus encouraging participation in these initiatives.
A technically superior solution may falter without fostering cultural change, as it is ultimately people who utilize the technology, not the other way around.
Technology
For brevity, I define technology as the combination of technical infrastructure and data management services. An effective MLOps strategy is underpinned by a mature data ecosystem. Utilizing data management tools, data scientists should be empowered to securely access data for model development while adhering to regulatory requirements.
From a technical infrastructure standpoint, we should enable data scientists and machine learning engineers to access the necessary hardware and software essential for creating and delivering AI products. For many organizations, harnessing cloud infrastructure is a vital enabler for this.
Investment
There are no shortcuts to successful MLOps, especially regarding investment. An effective MLOps strategy should prioritize funding for both personnel and technology. A common issue I encounter with clients is the inclination to build an MLOps strategy around a single data scientist due to budget limitations. In such scenarios, I often recommend reevaluating or at least moderating expectations.
At the outset, it’s crucial to determine the extent and duration of your investment in innovation. Indeed, ongoing investment is essential if you want AI to become integral to your operations and realize its associated benefits.
For insights on developing AI strategies, consider reading my article on crafting AI strategies using Wardley Maps: A High-level Blueprint for MLOps.
Now that we've established the foundation, let’s explore some technical components of MLOps.
The first video, "A Primer on Machine Learning Operations (MLOps)," provides an introduction to the concept and its importance in business strategy.
Model Development Laboratory
The model development process is inherently unpredictable and iterative. Organizations that do not acknowledge this reality will struggle to create effective AI strategies. In fact, model development is often the most chaotic aspect of the workflow, characterized by experimentation, repetition, and frequent setbacks. These elements are crucial for exploring new solutions and fostering innovation. Hence, what do data scientists require? The freedom to experiment, innovate, and collaborate.
There is a prevailing notion that data scientists should follow best practices in software engineering while writing code. While I agree with this principle, it’s essential to recognize that there is an appropriate time and place for everything. I contend that model development labs may not be the best environment for enforcing this. Instead of attempting to suppress this chaos, we should embrace it as an integral part of the workflow and utilize tools that help manage it. An effective model development lab should facilitate this. Let’s explore some potential components.
Experimentation & Prototyping — Jupyter Labs
Jupyter Labs serves as a versatile Integrated Development Environment (IDE) for crafting preliminary models and proof-of-concept projects. It offers access to notebooks, scripts, and command line interfaces—all features familiar to data scientists.
As an open-source tool, Jupyter Labs seamlessly integrates with Python and R, accommodating most contemporary data science model development tasks. Most workloads can be conducted within this lab IDE.
Environment Management — Anaconda
Effective environment management can enhance subsequent MLOps workflow stages by ensuring safe access to open-source libraries and replicating the development environment. Anaconda, a package manager, enables data scientists to create virtual environments and install necessary libraries and packages for model development through its straightforward Command-Line Interface (CLI).
Anaconda also provides repository mirroring, which evaluates open-source packages for secure commercial use, although the associated risks of third-party management should be noted. Utilizing virtual environments is crucial for managing the experimental phase, offering a contained setting for all packages and dependencies related to a particular experiment.
Version Control & Collaboration — GitHub Desktop
Collaboration is vital for a successful model development lab, and GitHub Desktop is an effective tool to support this. Data scientists can create a repository for each lab through GitHub Desktop. Each repository stores the model development notebook or script, along with an environment.yml file that instructs Anaconda on how to replicate the environment in which the notebook was developed on another machine.
Combining all three components—Jupyter Labs, Anaconda, and GitHub—provides data scientists with a secure space to experiment, innovate, and collaborate.
Model Pipeline Development
In conversations with clients who are in the early phases of their MLOps maturity, there seems to be a prevailing idea that data scientists develop models and then "hand them off" to machine learning engineers for "production." This approach is ineffective and often results in losing valuable machine learning engineers, as no one wants to deal with someone else’s messy code. It is unreasonable to expect engineers to clean up after others.
Instead, organizations should cultivate a culture where data scientists are responsible for developing models within data labs and subsequently formalizing them into end-to-end model pipelines. Here’s why:
- Data scientists have a deeper understanding of their models than anyone else. Assigning them the responsibility for creating the model pipeline enhances efficiency.
- This approach instills a culture of software engineering best practices at each stage of development.
- Machine learning engineers can then focus on value-added aspects of their role, such as resource provisioning, scaling, and automation, rather than refactoring someone else's work.
While building end-to-end pipelines may seem intimidating initially, there are tools designed to help data scientists achieve this.
Model Pipeline Build — Kedro
Kedro is an open-source Python framework from McKinsey Quantum Black that aids data scientists in constructing model pipelines.
Kedro offers a standard template for building end-to-end model pipelines while adhering to software engineering best practices. The framework encourages data scientists to develop modular, reproducible, and maintainable code. Upon completing the Kedro workflow, data scientists have essentially created something that can be more easily deployed to a production environment. The overarching concepts include:
- Project Template: Kedro provides a structured and user-friendly project template that enhances organization, collaboration, and efficiency.
- Data Catalog: The Data Catalog in Kedro serves as a registry for all data sources that the project can utilize, offering a straightforward method to define data storage.
- Pipelines: Kedro organizes data processing into a pipeline of interconnected tasks, enforcing a clear code structure and visualizing data flow and dependencies.
- Nodes: In Kedro, a Node represents a wrapper for a Python function that identifies the inputs and outputs of that function, acting as the building blocks of a Kedro pipeline.
- Configuration: Kedro manages various configurations for different environments (development, production, etc.) without hardcoding settings into the code.
- I/O: In Kedro, input/output operations are separated from actual computations, increasing code testability and modularity, and facilitating transitions between different data sources.
- Modularity and Reusability: Kedro promotes a modular coding style, resulting in reusable, maintainable, and testable code.
- Testing: Kedro integrates with PyTest, a Python testing framework, making it easy to write tests for your pipeline.
- Versioning: Kedro supports versioning for both data and code, allowing for the reproduction of any previous state of your pipeline.
- Logging: Kedro provides a standardized logging system to monitor events and changes.
- Hooks and Plugins: Kedro allows for hooks and plugins, which extend the framework's capabilities as per project requirements.
- Integration with Other Tools: Kedro can be integrated with various tools like Jupyter Notebook, Dask, Apache Spark, and more to facilitate different aspects of a data science workflow.
All Kedro projects adhere to this basic template, and enforcing this standard across your data science teams will ensure reproducibility and maintainability.
For more in-depth insights into the Kedro framework, please refer to the Kedro Documentation.
Registry & Storage — Data Version Control (DVC)
Registry and storage play a crucial role in ensuring reproducibility in machine learning, a key consideration for any business looking to adopt ML. Machine learning models comprise code, data, model artifacts, and environments—all of which must be traceable for reproducibility.
DVC is a tool designed for version control and tracking of models and data. While GitHub could serve as an alternative, it has limitations in storing large objects, which can be problematic for extensive datasets or models. DVC extends Git's capabilities, providing the same version control while allowing for the storage of larger datasets and models in a DVC repository, which can be local or cloud-based.
In commercial contexts, there are clear security advantages to versioning code in a Git repository while storing model artifacts and data separately in a controlled environment. Remember, model reproducibility will grow increasingly important as regulations surrounding the commercial use of AI tighten. Reproducibility enhances auditability.
Model Pipeline Deployment — Docker
Deployment is not merely a standalone task but rather a carefully orchestrated combination of tools, activities, and processes; Docker integrates all these elements for model deployment. Crucial for complex machine learning applications with numerous dependencies, Docker guarantees consistency across any machine by encapsulating the application along with its environment.
The process begins with a Dockerfile, which Docker then uses to create an image—a ready-to-use model pipeline suitable for any Docker-enabled machine. When combined with Kedro's pipeline functionality, Docker can effectively deploy both model retraining and inference pipelines, ensuring reproducibility throughout all stages of the machine learning workflow.
Model Monitoring & Retraining Pipeline — MLflow
Over time, machine learning models can experience performance declines due to concept drift or data drift. It’s essential to monitor when our models' performance starts to falter and to retrain them as needed. MLflow offers us this capability through its tracking API. This tracking API should be integrated into the model training and inference pipelines established by data scientists. Although I have highlighted MLflow for tracking within the model monitoring and retraining pipeline, tracking can also be conducted in the model development lab, particularly for experiment tracking.
The Inference Endpoint
Since the inference pipeline is encapsulated within a Dockerfile, we can create a Docker image of the pipeline to serve as an API endpoint for any application. The choice of where to deploy the Docker image will depend on the use case, but that discussion falls outside the scope of this article.
Roles & Responsibilities
Assigning clear roles and responsibilities within MLOps is critical for its success. The multifaceted nature of MLOps, which spans various disciplines, necessitates a clear delineation of roles. This ensures that every task is performed efficiently, fosters accountability, and facilitates quicker issue resolution. Ultimately, clear delegation minimizes confusion and overlap, creating a more efficient and harmonious working environment—much like a finely-tuned machine, where each cog plays its part flawlessly.
Data Scientists
Role: Data scientists play a central role in MLOps strategies, focusing on model development. This includes initial experiments, prototyping, and establishing modeling pipelines for validated models.
Responsibilities: Data scientists ensure models comply with best practices in machine learning and align with business objectives. Beyond lab activities, they collaborate with business stakeholders to identify impactful solutions. A lead data scientist should set the operational rhythm and best practices for the data labs.
Machine Learning Engineers
Role: Machine learning engineers manage the technical infrastructure of MLOps, seeking innovative solutions, developing strategies alongside data scientists, and enhancing process efficiencies.
Responsibilities: They ensure the functionality of the technical infrastructure, monitor component performance to control costs, and ensure production models meet demand at the necessary scale.
Data Governance Professionals
Role: Data governance professionals uphold security and data privacy policies, playing a crucial role in the secure transfer of data within the MLOps framework.
Responsibilities: While data governance is a collective responsibility, these professionals develop policies and conduct regular checks and audits to ensure compliance. They stay updated on regulations and ensure adherence from all data consumers.
Conclusion
Navigating the landscape of MLOps requires intentional planning, the right mix of technology and talent, and an organizational culture that embraces change and learning.
Although the journey may seem complex, employing a well-structured blueprint and treating MLOps as a comprehensive, iterative process rather than a one-off project can yield tremendous value from your AI strategies. However, keep in mind that no single approach is suitable for every scenario. It is vital to adapt your strategy to meet your specific needs and remain flexible in response to evolving circumstances.
The second video, "What Is Machine Learning Operations (MLOps)? Full Guide," offers a comprehensive overview of MLOps and its significance in modern business practices.
Follow me on LinkedIn
Subscribe to Medium for more insights from me:
If you're interested in integrating AI or data science into your business operations, feel free to schedule a complimentary initial consultation with us: