★★ 中級

11. Cloud Logging / Monitoring

GCP の運用基盤。Cloud Logging でログ集約、Cloud Monitoring でメトリクスとアラート、Cloud Trace で分散トレース。すべて Project デフォルトで有効、サービスごとの設定が中心です。

運用 4 兄弟

サービス役割AWS 対応
Cloud Loggingログ集約・検索(_Default バケットに自動保存)CloudWatch Logs
Cloud Monitoringメトリクス・ダッシュボード・アラートCloudWatch Metrics / Alarms
Cloud Trace分散トレースX-Ray
Error Reporting例外集約

GCP の特徴は 「サービスを使うと自動でログ・メトリクスが Cloud Logging / Monitoring に集まる」。AWS のように個別に「設定しないとログが取れない」状況がほとんどない。

Log Sink(ログのエクスポート)

Cloud Logging の _Default バケットに集まったログを、別バケット/BigQuery/Pub/Sub に転送する仕組み。

# 専用 Log Bucket(保持 90 日)
resource "google_logging_project_bucket_config" "audit" {
  project        = "myapp-prd"
  location       = "asia-northeast1"
  retention_days = 365
  bucket_id      = "audit-logs"
}

# 監査ログだけを audit バケットへ
resource "google_logging_project_sink" "audit" {
  name        = "audit-to-bucket"
  destination = "logging.googleapis.com/projects/myapp-prd/locations/asia-northeast1/buckets/${google_logging_project_bucket_config.audit.bucket_id}"
  filter      = "logName:\"cloudaudit.googleapis.com\""

  unique_writer_identity = true
}

# その sink にロール付与
resource "google_project_iam_member" "audit_sink_writer" {
  project = "myapp-prd"
  role    = "roles/logging.bucketWriter"
  member  = google_logging_project_sink.audit.writer_identity
}

# 長期保管用に Cloud Storage に書き出し
resource "google_logging_project_sink" "archive" {
  name        = "archive-to-gcs"
  destination = "storage.googleapis.com/${google_storage_bucket.logs.name}"
  filter      = "severity >= WARNING"

  unique_writer_identity = true
}

ログ保持期間とコスト

Cloud Logging の _Default バケットはデフォルト 30 日保持。期間を延ばすと取り込み量 × 保管月 で課金。

resource "google_logging_project_bucket_config" "default" {
  project        = "myapp-prd"
  location       = "global"
  retention_days = 30           # 必要に応じて 30-3650 日
  bucket_id      = "_Default"
}
_Required は変更不可 Cloud Audit Logs を保管する _Required バケットは 400 日固定で変更不可。Project 単位で常にコストが発生します(小規模なら無料枠内)。

アラート (AlertPolicy)

resource "google_monitoring_alert_policy" "api_5xx" {
  display_name = "API 5xx error rate"
  combiner     = "OR"

  conditions {
    display_name = "5xx > 5% for 5m"

    condition_threshold {
      filter          = "metric.type=\"run.googleapis.com/request_count\" resource.type=\"cloud_run_revision\" metric.label.response_code_class=\"5xx\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 10

      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_RATE"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.id]

  alert_strategy {
    auto_close = "1800s"
  }

  documentation {
    content   = "API の 5xx エラー率が高い状態です。ログを確認してください。"
    mime_type = "text/markdown"
  }
}

# ログクエリベースのアラート
resource "google_logging_metric" "errors" {
  name   = "app-errors"
  filter = "resource.type=\"cloud_run_revision\" severity=\"ERROR\""

  metric_descriptor {
    metric_kind = "DELTA"
    value_type  = "INT64"
  }
}

resource "google_monitoring_alert_policy" "error_spike" {
  display_name = "Error log spike"
  combiner     = "OR"

  conditions {
    display_name = "errors > 50 in 5m"

    condition_threshold {
      filter          = "metric.type=\"logging.googleapis.com/user/${google_logging_metric.errors.name}\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 50

      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_RATE"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.id]
}

通知チャネル

resource "google_monitoring_notification_channel" "email" {
  display_name = "Ops Email"
  type         = "email"
  labels = {
    email_address = "ops@example.com"
  }
}

resource "google_monitoring_notification_channel" "slack" {
  display_name = "Ops Slack"
  type         = "slack"
  labels = {
    channel_name = "#alerts"
  }
  sensitive_labels {
    auth_token = var.slack_token
  }
}

resource "google_monitoring_notification_channel" "pagerduty" {
  display_name = "Oncall PagerDuty"
  type         = "pagerduty"
  sensitive_labels {
    service_key = var.pd_integration_key
  }
}

Uptime Check

外部から URL に定期 HTTP リクエストして死活監視。AWS Route 53 Health Check 相当。

resource "google_monitoring_uptime_check_config" "api" {
  display_name = "api-uptime"
  timeout      = "10s"
  period       = "60s"

  http_check {
    path           = "/health"
    port           = "443"
    use_ssl        = true
    validate_ssl   = true
    accepted_response_status_codes {
      status_class = "STATUS_CLASS_2XX"
    }
  }

  monitored_resource {
    type = "uptime_url"
    labels = {
      project_id = "myapp-prd"
      host       = "api.myapp.com"
    }
  }

  selected_regions = ["ASIA_PACIFIC", "USA"]
}

resource "google_monitoring_alert_policy" "api_uptime" {
  display_name = "API down"
  combiner     = "OR"

  conditions {
    display_name = "Uptime check failing"
    condition_threshold {
      filter          = "metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" resource.type=\"uptime_url\" metric.label.check_id=\"${google_monitoring_uptime_check_config.api.uptime_check_id}\""
      duration        = "60s"
      comparison      = "COMPARISON_LT"
      threshold_value = 1

      aggregations {
        alignment_period     = "60s"
        per_series_aligner   = "ALIGN_NEXT_OLDER"
        cross_series_reducer = "REDUCE_COUNT_FALSE"
        group_by_fields      = ["resource.label.host"]
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.pagerduty.id]
}
最低限のセット ① _Default バケットを 30-90 日 retention/② Cloud Run / Cloud SQL 等の主要メトリクスにアラート 2-3 個/③ Email + Slack 通知/④ 公開 API に Uptime Check。これで基本運用に必要な可視化が整います。