Get in Touch

Course Outline

Introduction to AIOps with Open Source Tools

  • Overview of AIOps concepts and benefits
  • The role of Prometheus and Grafana in the observability stack
  • The place of machine learning in AIOps: predictive versus reactive analytics

Setting Up Prometheus and Grafana

  • Installing and configuring Prometheus for time series data collection
  • Creating dashboards in Grafana using real-time metrics
  • Exploring exporters, relabeling, and service discovery

Data Preprocessing for Machine Learning

  • Extracting and transforming Prometheus metrics
  • Preparing datasets for anomaly detection and forecasting
  • Utilising Grafana’s transformations or Python pipelines

Applying Machine Learning for Anomaly Detection

  • Fundamental machine learning models for outlier detection (e.g., Isolation Forest, One-Class SVM)
  • Training and evaluating models on time series data
  • Visualising anomalies within Grafana dashboards

Forecasting Metrics with Machine Learning

  • Building simple forecasting models (ARIMA, Prophet, LSTM introduction)
  • Predicting system load or resource usage
  • Leveraging predictions for early alerting and scaling decisions

Integrating Machine Learning with Alerting and Automation

  • Defining alert rules based on machine learning output or predefined thresholds
  • Using Alertmanager and notification routing
  • Triggering scripts or automation workflows upon anomaly detection

Scaling and Operationalizing AIOps

  • Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace)
  • Operationalizing machine learning models within observability pipelines
  • Best practices for implementing AIOps at scale

Summary and Next Steps

Requirements

  • A solid understanding of system monitoring and observability concepts
  • Practical experience using Grafana or Prometheus
  • Familiarity with Python and foundational machine learning principles

Audience

  • Observability engineers
  • Infrastructure and DevOps teams
  • Monitoring platform architects and site reliability engineers (SREs)
 14 Hours

Upcoming Courses

Related Categories