★★ 中級

11. Monitor / Log Analytics

Azure 運用の中心は Azure Monitor。ログ集約は Log Analytics Workspace、APM は Application Insights、アラート発火は Action Group。これらを組み合わせます。

登場人物

用語役割AWS 対応
Log Analytics Workspaceログ・メトリクスの保存先CloudWatch Logs
Diagnostic Setting「あるリソースのログを LAW に送る」設定個別設定
Application Insightsアプリ APM(リクエスト・例外・依存追跡)X-Ray + CloudWatch Application Insights
Metric Alertメトリクスのしきい値アラートCloudWatch Alarm
Action Group通知先の定義(メール・Webhook・Logic App)SNS Topic

Log Analytics Workspace

resource "azurerm_log_analytics_workspace" "main" {
  name                = "law-myapp-prd"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "PerGB2018"   # 従量制
  retention_in_days   = 30            # 30-730 日

  daily_quota_gb = 10   # 1 日あたり 10 GB で頭打ち(暴走防止)

  tags = local.common_tags
}
retention と料金 Log Analytics は 取り込み量 + 保管期間 で課金。retention 30 日超は別料金。意図しない大量ログ流入で月数万円~の事故が起きがち。daily_quota_gb でガードを。

Diagnostic Settings

各 Azure リソースの「アクティビティログ」「リソースログ」を LAW に流す設定。これがないと何も貯まらない

# Key Vault のアクセスログを LAW に
resource "azurerm_monitor_diagnostic_setting" "kv" {
  name                       = "kv-to-law"
  target_resource_id         = azurerm_key_vault.main.id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id

  enabled_log {
    category = "AuditEvent"
  }
  enabled_log {
    category = "AzurePolicyEvaluationDetails"
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }
}

# Storage Account(Blob)のログ
resource "azurerm_monitor_diagnostic_setting" "storage_blob" {
  name                       = "blob-to-law"
  target_resource_id         = "${azurerm_storage_account.data.id}/blobServices/default/"
  log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id

  enabled_log {
    category_group = "audit"
  }
}

Application Insights

resource "azurerm_application_insights" "main" {
  name                = "appi-myapp-prd"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  application_type    = "web"
  workspace_id        = azurerm_log_analytics_workspace.main.id   # workspace-based(推奨)
  retention_in_days   = 90

  tags = local.common_tags
}

# Container App や Function App に接続文字列を渡す
resource "azurerm_container_app" "api" {
  # ...
  template {
    container {
      env {
        name  = "APPLICATIONINSIGHTS_CONNECTION_STRING"
        value = azurerm_application_insights.main.connection_string
      }
    }
  }
}

アラートと Action Group

# 通知グループ
resource "azurerm_monitor_action_group" "ops" {
  name                = "ag-ops"
  resource_group_name = azurerm_resource_group.main.name
  short_name          = "ops"

  email_receiver {
    name          = "oncall"
    email_address = "oncall@example.com"
  }

  webhook_receiver {
    name        = "slack"
    service_uri = var.slack_webhook_url
  }
}

# メトリクスアラート(Container App のリクエスト失敗率)
resource "azurerm_monitor_metric_alert" "api_5xx" {
  name                = "api-5xx-high"
  resource_group_name = azurerm_resource_group.main.name
  scopes              = [azurerm_container_app.api.id]
  description         = "5xx error rate > 5%"
  severity            = 2

  criteria {
    metric_namespace = "Microsoft.App/containerApps"
    metric_name      = "Requests"
    aggregation      = "Total"
    operator         = "GreaterThan"
    threshold        = 100

    dimension {
      name     = "statusCodeCategory"
      operator = "Include"
      values   = ["5xx"]
    }
  }

  window_size        = "PT5M"
  frequency          = "PT1M"

  action {
    action_group_id = azurerm_monitor_action_group.ops.id
  }
}

# ログクエリベースのアラート(KQL)
resource "azurerm_monitor_scheduled_query_rules_alert_v2" "errors" {
  name                = "log-errors-spike"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location

  evaluation_frequency = "PT5M"
  window_duration      = "PT15M"
  scopes               = [azurerm_log_analytics_workspace.main.id]
  severity             = 2

  criteria {
    query                   = "traces | where severityLevel == 3 | summarize count() by bin(timestamp, 5m)"
    time_aggregation_method = "Total"
    threshold               = 50
    operator                = "GreaterThan"

    failing_periods {
      minimum_failing_periods_to_trigger_alert = 1
      number_of_evaluation_periods             = 1
    }
  }

  action {
    action_groups = [azurerm_monitor_action_group.ops.id]
  }
}
最低限のセット ① Log Analytics Workspace(retention 30 日、daily_quota_gb で頭打ち)/② 重要リソースに Diagnostic Setting/③ App Insights を 1 つ/④ ops 用の Action Group + 5xx と Failed Request の Alert 2-3 個。これだけで実用最小ラインに到達。