Skip to content

prometheus WAL data corruption in segment! 发现了数据不一致问题 #6453

@222luhang

Description

@222luhang
ts=2025-11-06T11:59:47.768Z caller=dedupe.go:112 component=remote level=error remote_name=76a2b4 url=http://localhost:8428/api/v1/write msg="error tailing WAL" err="segment 12: unexpected non-zero byte in page term bytes"
ts=2025-11-06T12:00:03.730Z caller=dedupe.go:112 component=remote level=warn remote_name=76a2b4 url=http://localhost:8428/api/v1/write msg="Ignoring error reading to end of segment, may have dropped data" segment=12 err="segment 12: unexpected non-zero byte in page term bytes"
ts=2025-11-06T13:00:02.244Z caller=db.go:1090 level=error component=tsdb msg="compaction failed" err="WAL truncation in Compact: create checkpoint: read segments: corruption in segment /juicefs/wal/00000012 at 196608: unexpected non-zero byte in padded page"

在使用juicefs作为Prometheus的存储系统时,Prometheus数据校验出现了问题,疑似是底层存储出现了损坏。
Prometheus的文档中对POSIX语义有强依赖,是否可能与此有关联,原文如下:

CAUTION: Non-POSIX compliant filesystems are not supported for Prometheus' local storage as unrecoverable corruptions may happen. NFS filesystems (including AWS's EFS) are not supported. NFS could be POSIX-compliant, but most implementations are not. It is strongly recommended to use a local filesystem for reliability.

juicefs版本为:juicefs version 1.3.0+2025-07-03.30190ca。通过curl -sSL https://d.juicefs.com/install | sh - 进行安装的,选择了本地ext2作为bucket数据存储基座。

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-more-infoThis issue requires more information to address

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions