RocketMQ HA模式master节点文件系统readonly后的一些问题

  • Post author:
  • Post category:其他


前言

最近身边有个朋友问RocketMQ HA模式master节点文件系统readonly产生的一些问题,所以这里从源码角度分析下master节点文件系统readonly分别对broker端存储消息、HA主从同步以及对客户端的影响(源码版本是4.9.1,测试环境中配置的是同步刷盘同步复制)。


一、broker端存储消息

1.写入内存

当master节点文件系统readonly时,此时如果commitlog的最新mappedFile没有被写满数据,那么此时producer发送的数据会被正常写入内存中,但是当最新的mappedFile被写满了,此时会执行创建mappedFile的操作,此时由于文件系统readonly所以无法成功创建,会报出创建mappedFile的异常,所以消息无法写入内存。

private void init(final String fileName, final int fileSize) throws IOException {
        this.fileName = fileName;
        this.fileSize = fileSize;
        this.file = new File(fileName);
        this.fileFromOffset = Long.parseLong(this.file.getName());
        boolean ok = false;

        ensureDirOK(this.file.getParent());

        try {
            this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel();
            this.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize);
            TOTAL_MAPPED_VIRTUAL_MEMORY.addAndGet(fileSize);
            TOTAL_MAPPED_FILES.incrementAndGet();
            ok = true;
        } catch (FileNotFoundException e) {
            log.error("Failed to create file " + this.fileName, e);
            throw e;
        } catch (IOException e) {
            log.error("Failed to map file " + this.fileName, e);
            throw e;
        } finally {
            if (!ok && this.fileChannel != null) {
                this.fileChannel.close();
            }
        }
    }

2.刷盘

由于broker配置文件中配置的是同步刷盘,所以这里提供数据刷盘的service是GroupCommitService,当producer发送的数据被正常写入内存中,那么接着就会执行刷盘操作,在刷盘的过程会报错以下错误日志:

GroupCommitService - Error occurred when force data to disk.
java.io.IOException: Read-only file system
        at java.nio.MappedByteBuffer.force0(Native Method) ~[na:1.8.0_242]
        at java.nio.MappedByteBuffer.force(MappedByteBuffer.java:203) ~[na:1.8.0_242]
        at org.apache.rocketmq.store.MappedFile.flush(MappedFile.java:281) ~[rocketmq-store-4.7.0.jar:4.7.0]
        at org.apache.rocketmq.store.MappedFileQueue.flush(MappedFileQueue.java:430) [rocketmq-store-4.7.0.jar:4.7.0]
        at org.apache.rocketmq.store.CommitLog$GroupCommitService.doCommit(CommitLog.java:1432) [rocketmq-store-4.7.0.jar:4.7.0]
        at org.apache.rocketmq.store.CommitLog$GroupCommitService.run(CommitLog.java:1459) [rocketmq-store-4.7.0.jar:4.7.0]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

这里刷盘的核心方法如下,由于文件系统readonly,所以这里会导致刷盘失败,这里可以看到源码中对于刷盘失败的异常只是在日志中以ERROR级别打印错误,所以在这种情况下刷盘返回的结果是正常状态。

/**
     * @return The current flushed position
     */
    public int flush(final int flushLeastPages) {
        if (this.isAbleToFlush(flushLeastPages)) {
            if (this.hold()) {
                int value = getReadPosition();

                try {
                    //We only append data to fileChannel or mappedByteBuffer, never both.
                    if (writeBuffer != null || this.fileChannel.position() != 0) {
                        this.fileChannel.force(false);
                    } else {
                        this.mappedByteBuffer.force();
                    }
                } catch (Throwable e) {
                    log.error("Error occurred when force data to disk.", e);
                }

                this.flushedPosition.set(value);
                this.release();
            } else {
                log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
                this.flushedPosition.set(getReadPosition());
            }
        }
        return this.getFlushedPosition();
    }

3.HA数据同步

这里可以先大致回顾下HA数据同步的过程,详细解析可以参考笔者之前的文章RocketMQ源码分析之主从数据复制
(1)启动master并监听slave连接
(2)启动slave并建立与master连接
(3)slave向master发送待拉取数据的物理偏移量
(4)master根据slave请求数据的物理偏移量打包数据并发送给slave
(5)slave读取master发送的数据并唤醒ReputMessageService构建consumequeue
这里我们重点关注master向slave返回的commitlog数据来源,从上述数据过程可以知道,master节点是WriteSocketService负责向slave推送commitlog数据,所以可以从这里入手,在其run方法中,重点关注以下片段,其逻辑是根据nextTransferFromWhere来获取slave想要的数据然后通过transferData发送给slave。

SelectMappedBufferResult selectResult =
                        HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
                    if (selectResult != null) {
                        int size = selectResult.getSize();
                        if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
                            size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
                        }

                        long thisOffset = this.nextTransferFromWhere;
                        this.nextTransferFromWhere += size;

                        selectResult.getByteBuffer().limit(size);
                        this.selectMappedBufferResult = selectResult;

                        // Build Header
                        this.byteBufferHeader.position(0);
                        this.byteBufferHeader.limit(headerSize);
                        this.byteBufferHeader.putLong(thisOffset);
                        this.byteBufferHeader.putInt(size);
                        this.byteBufferHeader.flip();

                        this.lastWriteOver = this.transferData();

从getCommitLogData方法可以得知,HA数据同步的master内存中的数据,所以此时它可以将正常写入master内存中的数据同步到slave上。

public SelectMappedBufferResult getCommitLogData(final long offset) {
        if (this.shutdown) {
            log.warn("message store has shutdown, so getPhyQueueData is forbidden");
            return null;
        }

        return this.commitLog.getData(offset);
    }
public SelectMappedBufferResult getData(final long offset) {
        return this.getData(offset, offset == 0);
    }
public SelectMappedBufferResult getData(final long offset, final boolean returnFirstOnNotFound) {
        int mappedFileSize = this.defaultMessageStore.getMessageStoreConfig().getMappedFileSizeCommitLog();
        MappedFile mappedFile = this.mappedFileQueue.findMappedFileByOffset(offset, returnFirstOnNotFound);
        if (mappedFile != null) {
            int pos = (int) (offset % mappedFileSize);
            SelectMappedBufferResult result = mappedFile.selectMappedBuffer(pos);
            return result;
        }

        return null;
    }

当master文件系统恢复后,由于master重启会导致内存中的数据丢失,即master会落后于slave,所以会在master的日志中看到“Slave fall behind master: XXX bytes”,该值是个负数。

4.构建consumequeue

在RocketMQ中consumequeue这块有两个相关的服务,其中ReputMessageService负责构建consumequeue,FlushConsumeQueueService负责consumequeue刷盘。
当master节点的文件系统readonly且有数据被正常写入内存时,此时ReputMessageService会为该消息构建consumequeue,在构建consumequeue时会先计算出待写入数据在consumequeue文件中的位置,如果此时待写入位置的文件已经存在,则会执行appendMessage,appendMessage方法具体如下:

public boolean appendMessage(final byte[] data) {
        int currentPos = this.wrotePosition.get();

        if ((currentPos + data.length) <= this.fileSize) {
            try {
                this.fileChannel.position(currentPos);
                this.fileChannel.write(ByteBuffer.wrap(data));
            } catch (Throwable e) {
                log.error("Error occurred when append message to mappedFile.", e);
            }
            this.wrotePosition.addAndGet(data.length);
            return true;
        }

        return false;
    }

此时由于文件系统readonly,所以导致appendMessage方法中的this.fileChannel.write(ByteBuffer.wrap(data));操作会出现异常,同时在日志中可以看到以下错误:

ReputMessageService - Error occurred when append message to mappedFile.
java.io.IOException: Read-only file system
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_242]
        at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) ~[na:1.8.0_242]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_242]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_242]
        at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) ~[na:1.8.0_242]
        at org.apache.rocketmq.store.MappedFile.appendMessage(MappedFile.java:240) ~[rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.ConsumeQueue.putMessagePositionInfo(ConsumeQueue.java:474) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.ConsumeQueue.putMessagePositionInfoWrapper(ConsumeQueue.java:398) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore.putMessagePositionInfo(DefaultMessageStore.java:1478) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore$CommitLogDispatcherBuildConsumeQueue.dispatch(DefaultMessageStore.java:1538) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore.doDispatch(DefaultMessageStore.java:1472) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.doReput(DefaultMessageStore.java:1910) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore$ReputMessageService.run(DefaultMessageStore.java:1968) [rocketmq-store-4.9.1.jar:4.9.1]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

但是如果此时该文件需要被创建,那么同样会面临创建文件失败的异常。

当内存中存在还未刷盘的consumequeue时,FlushConsumeQueueService负责将其及时刷盘,由于此时文件系统是readonly状态,所以会在日志中看到以下错误:

ERROR FlushConsumeQueueService - Error occurred when force data to disk.
java.io.IOException: Read-only file system
        at sun.nio.ch.FileDispatcherImpl.force0(Native Method) ~[na:1.8.0_242]
        at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:80) ~[na:1.8.0_242]
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388) ~[na:1.8.0_242]
        at org.apache.rocketmq.store.MappedFile.flush(MappedFile.java:285) ~[rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.MappedFileQueue.flush(MappedFileQueue.java:430) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.ConsumeQueue.flush(ConsumeQueue.java:325) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore$FlushConsumeQueueService.doFlush(DefaultMessageStore.java:1806) [rocketmq-store-4.9.1.jar:4.9.1]
        at org.apache.rocketmq.store.DefaultMessageStore$FlushConsumeQueueService.run(DefaultMessageStore.java:1832) [rocketmq-store-4.9.1.jar:4.9.1]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_242]

报错的核心代码如下,具体是this.fileChannel.force(false);导致的:

/**
     * @return The current flushed position
     */
    public int flush(final int flushLeastPages) {
        if (this.isAbleToFlush(flushLeastPages)) {
            if (this.hold()) {
                int value = getReadPosition();

                try {
                    //We only append data to fileChannel or mappedByteBuffer, never both.
                    if (writeBuffer != null || this.fileChannel.position() != 0) {
                        this.fileChannel.force(false);
                    } else {
                        this.mappedByteBuffer.force();
                    }
                } catch (Throwable e) {
                    log.error("Error occurred when force data to disk.", e);
                }

                this.flushedPosition.set(value);
                this.release();
            } else {
                log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
                this.flushedPosition.set(getReadPosition());
            }
        }
        return this.getFlushedPosition();
    }

二、客户端

1.producer

当master文件系统readonly后,producer继续发送消息,如果此时master上commitlog最新mappedFIle没有被写满,那么此时数据会正常写入内存,结合前面分析对刷盘和HA数据同步的影响,可以发现producer会收到SEND_OK的状态。

2.consumer

当master文件系统readonly后,kill master的进程,consumer会从slave上消费数据


版权声明:本文为qq_25145759原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。