flink kafka写入jdbc数据库mysql,JdbcSink源码

  • Post author:
  • Post category:mysql




1. 概述

关系型数据库是企业中业务数据存储的主要形式。MySQL 就是典型代表,尽管在大数据处理中直接与 MySQL 交互的场景不多,但最终处理的计算结果是要给外部应用消费使用的,而外部应用读取的数据存储往往就是 MySQL



2. 使用



2.1 添加依赖(注意引入的驱动版本)

    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-jdbc_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.47</version>
    </dependency>



2.2 源码

按照批次的处理数据,需要提供的参数:

sql:DML query (e.g. insert, update, upsert)

statementBuilder: 设置占位符参数值

executionOptions: 设置 batch size 和 maximum retries参数值等

connectionOptions: 设置连接 参数, JDBC URL等

  

数据核心处理逻辑:

GenericJdbcSinkFunction类

    @Override
    public void invoke(T value, Context context) throws IOException {
        outputFormat.writeRecord(value);
    }

JdbcBatchingOutputFormat类 判断batchCount是否大于设置的批次大小,决定调用flush方法

    @Override
    public final synchronized void writeRecord(In record) throws IOException {
        checkFlushException();

        try {
            addToBatch(record, jdbcRecordExtractor.apply(record));
            batchCount++;
            if (executionOptions.getBatchSize() > 0
                    && batchCount >= executionOptions.getBatchSize()) {
                flush();
            }
        } catch (Exception e) {
            throw new IOException("Writing records to JDBC failed.", e);
        }
    }

除了按照batch大小写入数据,在JdbcBatchingOutputFormat类的open()方法中,设置有按BatchInterval定时调度,flush数据

 @Override
    public void open(int taskNumber, int numTasks) throws IOException {
        super.open(taskNumber, numTasks);
        jdbcStatementExecutor = createAndOpenStatementExecutor(statementExecutorFactory);
        if (executionOptions.getBatchIntervalMs() != 0 && executionOptions.getBatchSize() != 1) {
            this.scheduler =
                    Executors.newScheduledThreadPool(
                            1, new ExecutorThreadFactory("jdbc-upsert-output-format"));
            this.scheduledFuture =
                    this.scheduler.scheduleWithFixedDelay(
                            () -> {
                                synchronized (JdbcBatchingOutputFormat.this) {
                                    if (!closed) {
                                        try {
                                            flush();
                                        } catch (Exception e) {
                                            flushException = e;
                                        }
                                    }
                                }
                            },
                            executionOptions.getBatchIntervalMs(),
                            executionOptions.getBatchIntervalMs(),
                            TimeUnit.MILLISECONDS);
        }
    }



2.3 代码

pojo对象

public class Event {
    public String user;
    public String url;
    public long timestamp;

    public Event() {
    }

    public Event(String user, String url, Long timestamp) {
        this.user = user;
        this.url = url;
        this.timestamp = timestamp;
    }

    @Override
    public int hashCode() {
        return super.hashCode();
    }

    public String getUser() {
        return user;
    }

    public void setUser(String user) {
        this.user = user;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    @Override
    public String toString() {
        return "Event{" +
                "user='" + user + '\'' +
                ", url='" + url + '\'' +
                ", timestamp=" + new Timestamp(timestamp) +
                '}';
    }
}


flink写入mysql


(1) 读取kafka

(2) flatmap扁平映射

(3) sink写入mysql

public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =
                StreamExecutionEnvironment.getExecutionEnvironment();

        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.42.102:9092,192.168.42.103:9092,192.168.42.104:9092");
        properties.setProperty("group.id", "consumer-group");
        properties.setProperty("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("auto.offset.reset", "latest");
        DataStreamSource<String> stream = env.addSource(new
                FlinkKafkaConsumer<String>(
                "clicks",
                new SimpleStringSchema(),
                properties
        ));

        SingleOutputStreamOperator<Event> flatMap = stream.flatMap(new FlatMapFunction<String, Event>() {
            @Override
            public void flatMap(String s, Collector<Event> collector) throws Exception {
                String[] split = s.split(",");
                collector.collect(new Event(split[0], split[1], Long.parseLong(split[2])));
            }
        });

        JdbcStatementBuilder<Event> jdbcStatementBuilder = new JdbcStatementBuilder<Event>() {
            @Override
            public void accept(PreparedStatement preparedStatement, Event event) throws SQLException {
                preparedStatement.setString(1, event.getUser());
                preparedStatement.setString(2, event.getUrl());
                preparedStatement.setLong(3,event.getTimestamp());
            }
        };
        JdbcConnectionOptions jdbcConnectionOptions = new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withUrl("jdbc:mysql://192.168.0.23/demo")
                // 对于 MySQL 5.7,用"com.mysql.jdbc.Driver"
                .withDriverName("com.mysql.jdbc.Driver")
                .withUsername("root")
                .withPassword("C/hEz8ygYA4g459j/TsQHg==htdata")
                .build();

        JdbcExecutionOptions executionOptions = JdbcExecutionOptions.builder()
                .withBatchSize(1000)
                .withBatchIntervalMs(200)
                .withMaxRetries(5)
                .build();
        flatMap.addSink(JdbcSink.sink(" insert into event (user,url,timestamp)values (?,?,?)", jdbcStatementBuilder, executionOptions, jdbcConnectionOptions));

        env.execute();
    }

结果

./kafka-console-producer.sh  --broker-list 192.168.42.102:9092,192.168.42.103:9092,192.168.42.104:9092 --topic clicks

在这里插入图片描述

在这里插入图片描述



版权声明:本文为javahelpyou原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。