Java分布式跟踪系统Zipkin(二):Brave源码分析-Tracer和Span

  • Post author:
  • Post category:java


所有博文均在个人独立博客

http://blog.mozhu.org

首发,欢迎访问!

Brave是Java版的Zipkin客户端,它将收集的跟踪信息,以Span的形式上报给Zipkin系统。

(Zipkin是基于Google的一篇论文,名为Dapper,Dapper在荷兰语里是“勇敢的”的意思,这也是Brave的命名的原因)

Brave目前版本为4.9.1,兼容zipkin1和2的协议,github地址:

https://github.com/openzipkin/brave

我们一般不会手动编写Trace相关的代码,Brave提供了一些开箱即用的库,来帮助我们对某些特定的库类来进行追踪,比如servlet,springmvc,mysql,okhttp3,httpclient等,这些都可以在下面页面中找到:


https://github.com/openzipkin/brave/tree/master/instrumentation

我们先来看看一个简单的Demo来演示下Brave的基本使用,这对我们后续分析Brave的原理和其他类库的使用有很大帮助

TraceDemo

package tracing;

import brave.Span;
import brave.Tracer;
import brave.Tracing;
import brave.context.log4j2.ThreadContextCurrentTraceContext;
import brave.propagation.B3Propagation;
import brave.propagation.ExtraFieldPropagation;
import zipkin2.codec.SpanBytesEncoder;
import zipkin2.reporter.AsyncReporter;
import zipkin2.reporter.Sender;
import zipkin2.reporter.okhttp3.OkHttpSender;

import java.util.concurrent.TimeUnit;

public class TraceDemo {

    public static void main(String[] args) {
        Sender sender = OkHttpSender.create("http://localhost:9411/api/v2/spans");
        AsyncReporter asyncReporter = AsyncReporter.builder(sender)
                .closeTimeout(500, TimeUnit.MILLISECONDS)
                .build(SpanBytesEncoder.JSON_V2);

        Tracing tracing = Tracing.newBuilder()
                .localServiceName("tracer-demo")
                .spanReporter(asyncReporter)
                .propagationFactory(ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "user-name"))
                .currentTraceContext(ThreadContextCurrentTraceContext.create())
                .build();

        Tracer tracer = tracing.tracer();
        Span span = tracer.newTrace().name("encode").start();
        try {
            doSomethingExpensive();
        } finally {
            span.finish();
        }


        Span twoPhase = tracer.newTrace().name("twoPhase").start();
        try {
            Span prepare = tracer.newChild(twoPhase.context()).name("prepare").start();
            try {
                prepare();
            } finally {
                prepare.finish();
            }
            Span commit = tracer.newChild(twoPhase.context()).name("commit").start();
            try {
                commit();
            } finally {
                commit.finish();
            }
        } finally {
            twoPhase.finish();
        }


        sleep(1000);

    }
    
    private static void doSomethingExpensive() {
        sleep(500);
    }

    private static void commit() {
        sleep(500);
    }

    private static void prepare() {
        sleep(500);
    }

    private static void sleep(long milliseconds) {
        try {
            TimeUnit.MILLISECONDS.sleep(milliseconds);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

启动Zipkin,然后运行TraceDemo,在Zipkin的UI界面中能查到两条跟踪信息

点击第一条跟踪信息,可以看到有一条Span(encode),耗时500ms左右
encode跟踪信息

本条跟踪信息对应的代码片段为:

Tracer tracer = tracing.tracer();
Span span = tracer.newTrace().name("encode").start();
try {
    doSomethingExpensive();
} finally {
    span.finish();
}

由Tracer创建一个新的Span,名为encode,然后调用start方法开始计时,之后运行一个比较耗时的方法doSomethingExpensive,最后调用finish方法结束计时,完成并记录一条跟踪信息。

这段代码实际上向Zipkin上报的数据为:

[
  {
    "traceId": "16661f6cb5d58903",
    "id": "16661f6cb5d58903",
    "name": "encode",
    "timestamp": 1510043590522358,
    "duration": 499867,
    "binaryAnnotations": [
      {
        "key": "lc",
        "value": "",
        "endpoint": {
          "serviceName": "tracer-demo",
          "ipv4": "192.168.99.1"
        }
      }
    ]
  }
]

然后我们再来看第二条稍微复杂的跟踪信息,可以看到一条名为twoPhase的Span,总耗时为1000ms,它有2个子Span,分别名为prepare