MLOps is a practice that automates and standardizes the entire lifecycle of machine learning model development, training, deployment, and monitoring, enabling the continuous operation of models in production environments.
## "Building a Model" and "Operating a Model" Are Different Jobs Even if you can build a highly accurate model in Jupyter Notebook, keeping it running stably in a production environment requires an entirely different skill set. Updating training data, retraining models, version control, A/B testing, detecting performance degradation——managing all of this manually will break down regardless of team size. MLOps applies the DevOps philosophy to machine learning, but it comes with unique challenges that differ from software deployment. These include the need to simultaneously version-control three things——code, data, and model weights——the fact that model performance degrades over time due to shifts in data distribution (drift), and the requirement to ensure reproducibility of experiments. ## Components of an MLOps Pipeline **Data Pipeline**: Automates the collection, preprocessing, and validation of training data. Since data quality directly determines model quality, this is the most critical layer. **Experiment Tracking**: Tools like MLflow, Weights & Biases, and Comet are used to record hyperparameters, learning curves, and evaluation metrics, ensuring reproducibility of experiments. **Model Registry**: Stores trained models with versioning and manages the promotion flow from staging to production. **Serving**: Exposes models as APIs. Inference engines such as vLLM, TensorRT-LLM, and Triton Inference Server are commonly used. **Monitoring**: Tracks not only inference latency and error rates, but also data drift (shifts in input data distribution) and model drift (gradual degradation of accuracy over time). It is also common to have a mechanism that automatically triggers retraining when a threshold is exceeded. ## MLOps in the Age of LLMs The rise of LLMs has given birth to a derivative concept called "LLMOps." New operational challenges have emerged that did not exist in traditional MLOps, including prompt version control, evaluation of RAG pipelines, configuring guardrails, and optimizing inference costs. The toolchain has also expanded to include LLM-specific offerings such as LangSmith, Braintrust, and Arize AI.


A2A (Agent-to-Agent Protocol) is a communication protocol that enables different AI agents to perform capability discovery, task delegation, and state synchronization, published by Google in April 2025.

Acceptance testing is a testing method that verifies whether developed features meet business requirements and user stories, from the perspective of the product owner and stakeholders.

Agent Skills are reusable instruction sets defined to enable AI agents to perform specific tasks or areas of expertise, functioning as modular units that extend the capabilities of an agent.

Local LLM / SLM Deployment Comparison — AI Utilization Without Cloud API Dependency

Agentic AI is a general term for AI systems that interpret goals and autonomously repeat the cycle of planning, executing, and verifying actions without requiring step-by-step human instruction.