okhttp3源码解析(9)-Okio
前言
上两篇文章写了下okhttp3的缓存功用,里边用到了许多okio的内容,在网络请求的部分也用到了许多,我觉得仍是有必要研讨下的。一开端看okhttp3源码的时分,我也不知道怎样下手,后边发现不如直接就从最简略的运用开端,看到什么研讨什么。下面的okio我也计划这么做。
其实我再写文章前找了些材料看了下,现已有作者写了很好的文章了,读者能够先看下他人的,图文并茂。
Android IO 结构 Okio 的完成原理,究竟哪里 OK? Android IO 结构 Okio 的完成原理,怎么检测超时? 值得一用的IO神器Okio
我这就只能说是自己看源码的了解了,比较僵硬,比较无聊,仅供参考。
ps. 这篇文章时间跨度有点长了,中心慢慢写得,有点当地好啰嗦,有的当地又有点急了,写得也特别特别长,原本应该分好几篇文章的,建议按目录按需看。
Okio运用事例
和okhttp3一开端研讨相同,咱们从Okio的用法开端,我从okhttp的源码里边找了一些比如,能够先看下:
// DiskLruCache
synchronized void rebuildJournal() throws IOException {
// ...
BufferedSink writer = Okio.buffer(fileSystem.sink(journalFileTmp));
try {
writer.writeUtf8(MAGIC).writeByte('\n');
writer.writeUtf8(VERSION_1).writeByte('\n');
writer.writeDecimalLong(appVersion).writeByte('\n');
writer.writeDecimalLong(valueCount).writeByte('\n');
writer.writeByte('\n');
for (Entry entry : lruEntries.values()) {
if (entry.currentEditor != null) {
writer.writeUtf8(DIRTY).writeByte(' ');
writer.writeUtf8(entry.key);
writer.writeByte('\n');
} else {
writer.writeUtf8(CLEAN).writeByte(' ');
writer.writeUtf8(entry.key);
entry.writeLengths(writer);
writer.writeByte('\n');
}
}
} finally {
writer.close();
}
// ...
}
// Http1Codec
private class UnknownLengthSource extends AbstractSource {
private boolean inputExhausted;
// ...
@Override public long read(Buffer sink, long byteCount)
throws IOException {
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (closed) throw new IllegalStateException("closed");
if (inputExhausted) return -1;
long read = super.read(sink, byteCount);
if (read == -1) {
inputExhausted = true;
endOfInput(true, null);
return -1;
}
return read;
}
// ...
}
找了两个比如,一个读一个写,别离来自DiskLruCache和Http1Codec,一个是对IO文件的处理,一个是对socket流的处理,okio首要也是用在这两部分。这儿的比如看起来还不是很明显,咱们再来看下okio简略的运用:
// OKio写文件
private static void writeFileByOKio() {
try (Sink sink = Okio.sink(new File(path));
BufferedSink bufferedSink = Okio.buffer(sink)) {
bufferedSink.writeUtf8("write" + "\n" + "success!");
} catch (IOException e) {
e.printStackTrace();
}
}
//OKio读文件
private static void readFileByOKio() {
try (Source source = Okio.source(new File(path));
BufferedSource bufferedSource = Okio.buffer(source)) {
for (String line; (line = bufferedSource.readUtf8Line()) != null; ) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
关于Java的IO来说,这儿看起来仍是比较简洁的,Java要读取个文件要一层一层的封装Stream,相似下面代码:
fileStream = new FileInputStream(path);
binStream = new BufferedInputStream(fileStream);
dataInputStream = new DataInputStream(binStream);
dataInputStream.readInt();
实践上便是BufferedSource和BufferedSink两个,把大部分活都干了,而Java的要将BufferedInputStream/BufferedOutputStream转成专门处理的类去干活,各有各的优点吧,可是关于咱们搬砖的来说okio必定用起来简略,这点就够了。
Okio类剖析
从上面比如能够看出,Okio类便是okio的一个入口,用来供给source和sink,能够看下它的源码,全是static的办法,大致归类下就三种:
- 生成sink
- 生成source
- 生成buffer
生成sink和source有几种办法: file、path、socket、inputStream\outputStream,其实终究都是通过stream的办法创建的:
private static Sink sink(final OutputStream out, final Timeout timeout) {
// ...
return new Sink() {
@Override public void write(Buffer source, long byteCount) throws IOException {
// 检查巨细参数是否正确
checkOffsetAndCount(source.size, 0, byteCount);
while (byteCount > 0) {
// 同步超时检测
timeout.throwIfReached();
// segment是缓存的最小单位,运用双向链表保存,能够同享
Segment head = source.head;
// head.limit按注释解释,是其时segment下一个可读取字节的方位->所以相减得到长度
int toCopy = (int) Math.min(byteCount, head.limit - head.pos);
out.write(head.data, head.pos, toCopy);
// 这个head都会pop了还有必要加上读取字节数吗?也或许没有pop啊!得加上
head.pos += toCopy;
byteCount -= toCopy;
source.size -= toCopy;
// 意思便是head这个segment读取完了吧
if (head.pos == head.limit) {
source.head = head.pop();
// SegmentPool会暂时贮存读取完的Segment,当然呗同享出去的不会进入SegmentPool
SegmentPool.recycle(head);
}
}
}
// 暂时忽略其他办法
};
}
private static Source source(final InputStream in, final Timeout timeout) {
// ...
return new Source() {
@Override public long read(Buffer sink, long byteCount) throws IOException {
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (byteCount == 0) return 0;
try {
timeout.throwIfReached();
// 没有则从SegmentPool拿,又就通过双向链表移到终究(head的pre便是终究)
Segment tail = sink.writableSegment(1);
// Segment缓存种剩余能写入的最大空间,tail.limit下一个能写入的方位
int maxToCopy = (int) Math.min(byteCount, Segment.SIZE - tail.limit);
// 把数据读取到tail里边,tail.limit是offset
int bytesRead = in.read(tail.data, tail.limit, maxToCopy);
if (bytesRead == -1) return -1;
// 这个是Segment下一个能写入的方位
tail.limit += bytesRead;
// 这个是sink的总巨细
sink.size += bytesRead;
return bytesRead;
} catch (AssertionError e) {
if (isAndroidGetsocknameError(e)) throw new IOException(e);
throw e;
}
}
// 暂时忽略其他办法
};
}
能够看到sink和source办法都是自己创建了一个实例进行回来,首要重写了其间的read和write办法,这两个办法里边呈现了许多许多东西,这些呈现的东西便是咱们接下来要研讨的,首要有下面几个内容:
- Buffer,缓存
- timeout,超时检测
- Segment,分段缓存
- SegmentPool,缓存池
注释解释了一些,可是仍是得体系的研讨下,下面咱们开端研讨。
Okio承继结构
在解析Okio的详细内容前,我觉得仍是有必要先看下它的承继结构,这儿拿他人的图看下,如有侵权请联络啊!
这是Java IO的结构:
这是Okio的结构:
这儿掉了一个Buffer类的,Buffer承继了BufferedSource和BufferedSink接口,也便是说Buffer既是sink又是source,当然它也有好多自己的功用。
根据上面的图,首要功用就在BufferedSource和BufferedSink里边,它们的详细完成则是RealBufferedSource和RealBufferedSink,关于Segment和SegmentPool的内容上面也没有,下面独立讲解。
下面内容想了挺久,怎样去安排这样一篇博文,怎样更顺畅去看源码,终究我决定由小及大的办法来讲,先从最小的segment开端,到BufferedSource和BufferedSink,终究到buffer,下面看内容。
Segment
Segment大致便是一个数据类,是双向链表的一个节点,供给了一些操作segment的的办法,下面咱们先看数据域:
/** The size of all segments in bytes. */
static final int SIZE = 8192;
/** Segments will be shared when doing so avoids {@code arraycopy()} of this many bytes. */
static final int SHARE_MINIMUM = 1024;
final byte[] data;
/** The next byte of application data byte to read in this segment. */
int pos;
/** The first byte of available data ready to be written to. */
int limit;
/** True if other segments or byte strings use the same byte array. */
boolean shared;
/** True if this segment owns the byte array and can append to it, extending {@code limit}. */
boolean owner;
/** Next segment in a linked or circularly-linked list. */
Segment next;
/** Previous segment in a circularly-linked list. */
Segment prev;
其实注释写的十分清楚了,SIZE是segment的最大容量,SHARE_MINIMUM是同享的数组最短长度(即一个segment能被分红八份),data表明数据,pos和limit用来符号数据部分区间,shared符号是否被同享,owner表明这个segment是否拥有这个data数组(拥有才能够写、追加,被同享的不能写),pre和next是双向指针。
数据域看完了,大致便是一个数据类,下面看下功用。
同享
Segment供给了同享的操作,有两种,一种是同享data数组的同享,一种是深度拷贝,同享出去的对象彻底是一个克隆体:
final Segment sharedCopy() {
shared = true;
return new Segment(data, pos, limit, true, false);
}
final Segment unsharedCopy() {
return new Segment(data.clone(), pos, limit, false, true);
}
看到这儿,应该就能了解shared和owner两个标志了吧,榜首个函数总data是数组,不会进行赋值,而是会被新的segment持有引证,深度拷贝同享的shared为false、owner为true,就表明它是一个全新的segment仅仅数据内容和仿制的相同。
segment操作办法
剩余的办法便是对segment的操作了,有点相似stack,这儿完成了pop和push办法来对链表操作,留意下这些操作是头插法。
spilt办法很有意思,咱们这儿看下,也有利于了解pop和push:
public final Segment split(int byteCount) {
if (byteCount <= 0 || byteCount > limit - pos) throw new IllegalArgumentException();
Segment prefix;
// 分了两种状况,一个是直接同享了,一个是创建个新的segment
if (byteCount >= SHARE_MINIMUM) {
prefix = sharedCopy();
} else {
prefix = SegmentPool.take();
// 将pos开端的前byteCount个数据仿制到新segment
System.arraycopy(data, pos, prefix.data, 0, byteCount);
}
// 确认新segment地尾端
prefix.limit = prefix.pos + byteCount;
// 把其时segment越过被split的byteCount个字节,即按pos读的时分越过一部分
pos += byteCount;
// 头插法,新的segment放入链表头部
prev.push(prefix);
return prefix;
}
SegmentPool
都提到Segment了,SegmentPool当然得提下了,这个类很短,就六十多行代码,它唯一的意图便是把用不到的segment运用起来,避免重复创建对象带来的损耗,类是线程安全的。
A collection of unused segments, necessary to avoid GC churn and zero-fill. This pool is a thread-safe static singleton.
SegmentPool就两个办法: take和recycle,三个变量,MAX_SIZE界说了SegmentPool的最大容量(64 * 1024 = 8 * 8192,即默许八个segment),next变量指向了下一个能够运用的segment,byteCount变量记载了这个SegmentPool实践运用的字节数。
take办法
咱们先来看下take办法,用来取的一个segment以供运用:
static Segment take() {
synchronized (SegmentPool.class) {
if (next != null) {
Segment result = next;
next = result.next;
result.next = null;
byteCount -= Segment.SIZE;
return result;
}
}
return new Segment(); // Pool is empty. Don't zero-fill while holding a lock.
}
这儿便是链表操作,把next取出来,next的next作为新的next保存到SegmentPool中。假如next为空,就创建一个新的Segment,这儿写在同步代码外,还加了句注释,我看得不是很懂,意思是在next为空的时分,recycle的同步代码不要往里边push值吗?
recycle办法
static void recycle(Segment segment) {
// 不能乱了SegmentPool的链表
if (segment.next != null || segment.prev != null) throw new IllegalArgumentException();
// segment被同享,其间的data还在运用,不能复用,也不会被GC收回,所以不必管
if (segment.shared) return; // This segment cannot be recycled.
synchronized (SegmentPool.class) {
if (byteCount + Segment.SIZE > MAX_SIZE) return; // Pool is full.
// 这儿按segment最大数量算,所以SegmentPool中的segment个数是整数
byteCount += Segment.SIZE;
segment.next = next;
segment.pos = segment.limit = 0;
next = segment;
}
}
recycle办法也很短,首要便是做了一些判别,就把新放进来的segment放到了链表头上。
BufferedSource和BufferedSink
上面提到了Buffer类承继了BufferedSource和BufferedSink,所以在讲Buffer之前,咱们先来看下这两个办法。从上面的承继结构咱们知道了,BufferedSource和BufferedSink承继自Sink和Source,但其功用由RealBufferedSource和RealBufferedSink完成。下面咱们一层一层看。
Sink和Source接口
Sink和Source便是两个最简略的接口,下面看下源码:
public interface Sink extends Closeable, Flushable {
void write(Buffer source, long byteCount) throws IOException;
@Override void flush() throws IOException;
Timeout timeout();
@Override void close() throws IOException;
}
public interface Source extends Closeable {
long read(Buffer sink, long byteCount) throws IOException;
Timeout timeout();
@Override void close() throws IOException;
}
删掉了注释,实践上Sink和Source便是读写加上了timeout的超时操控和Closeable的符号,Sink由于要输出加上了Flushable接口。
BufferedSource和BufferedSink接口
BufferedSource和BufferedSink都是抽象接口,仅仅界说了一些办法,两者大体相似,也有不同,下面先来看BufferedSink。
BufferedSink接口
BufferedSink办法特别多,咱们这归类下,首要有四种:
- Sink接口的flush办法
- getter办法: buffer、outputStream
- 一系列的writeXXX办法,取得BufferedSink
- 两个emit办法,也回来BufferedSink
flush办法、buffer办法、outputStream办法都比较好了解,BufferedSink接口的首要功用就在这些writeXXX办法上,下面大致搜集下write能够写什么:
- ByteString、byte[]、Source、String(Utf8)、codePoint(UTF-8)、
- String(Charset)、Byte、Short(Big-Endian)、ShortLe(Little-Endian低位在低地址端)、
- Int、IntLe、Long、LongLe、DecimalLong(保存为十进制字符串)、HexadecimalUnsignedLong(保存为十六进制字符串)
大致便是根本的类型BufferedSink都能够直接写入,这儿就不翻开讲了,毕竟功用在RealBufferedSink中完成的,再来看下两个emit办法。实践上这办法的注释写的十分清楚:
BufferedSink b0 = new Buffer();
BufferedSink b1 = Okio.buffer(b0);
BufferedSink b2 = Okio.buffer(b1);
b2.writeUtf8("hello");
assertEquals(5, b2.buffer().size());
assertEquals(0, b1.buffer().size());
assertEquals(0, b0.buffer().size());
b2.emit();
assertEquals(0, b2.buffer().size());
assertEquals(5, b1.buffer().size());
assertEquals(0, b0.buffer().size());
b1.emit();
assertEquals(0, b2.buffer().size());
assertEquals(0, b1.buffer().size());
assertEquals(5, b0.buffer().size());
这儿说什么都不如看比如来的多,关于emitCompleteSegments也有注释,可是如同更难了解一些:
/**
* Writes complete segments to the underlying sink, if one exists. Like {@link #flush}, but
* weaker. Use this to limit the memory held in the buffer to a single segment. Typically
* application code will not need to call this: it is only necessary when application code writes
* directly to this {@linkplain #buffer() sink's buffer}. <pre>{@code
*/
BufferedSink b0 = new Buffer();
BufferedSink b1 = Okio.buffer(b0);
BufferedSink b2 = Okio.buffer(b1);
b2.buffer().write(new byte[20_000]);
assertEquals(20_000, b2.buffer().size());
assertEquals( 0, b1.buffer().size());
assertEquals( 0, b0.buffer().size());
b2.emitCompleteSegments();
assertEquals( 3_616, b2.buffer().size());
assertEquals( 0, b1.buffer().size());
assertEquals(16_384, b0.buffer().size()); // This example assumes 8192 byte segments.
结合英语了解下,大致意思便是这个emit是彻底提交,无论封装多少层,终究都是提交到原始的Buffer上,可是这个提交是看segment的,只有完整的segment才会被emit,及抵达SIZE(8192)的,没抵达的不提交。而且这个根本是RealBufferedSink中的writeXXX就处理了,用户除非操作了原始的buffer,否则是用不着的。
BufferedSource接口
BufferedSource接口和BufferedSink接口相似,没了flush办法,也多了几种办法:
- 判别读取完的exhausted办法
- getter办法: buffer、inputStream
- 一系列的readXXX办法,取得数据
- 用来越过的skip办法
- 用来判别要读取数据是否超出长度的require和request办法
- 用来判别一行前缀的select办法
- 一系列的indexOf办法
- 用来比较规模内持平的rangeEquals办法
exhausted办法便是source内没数据了,require和request办法都是判别取的数据是否超长,request会报反常,request则是回来false,readXXX办法和上面的writeXXX相似,skip办法便是越过,没什么好讲的,这儿就讲下后边三种办法。
关于select办法,也是看注释:
/**
* Finds the first string in {@code options} that is a prefix of this buffer, consumes it from
* this buffer, and returns its index. If no byte string in {@code options} is a prefix of this
* buffer this returns -1 and no bytes are consumed.
*/
Options FIELDS = Options.of(
ByteString.encodeUtf8("depth="),
ByteString.encodeUtf8("height="),
ByteString.encodeUtf8("width="));
Buffer buffer = new Buffer()
.writeUtf8("width=640\n")
.writeUtf8("height=480\n");
assertEquals(2, buffer.select(FIELDS));
assertEquals(640, buffer.readDecimalLong());
assertEquals('\n', buffer.readByte());
assertEquals(1, buffer.select(FIELDS));
assertEquals(480, buffer.readDecimalLong());
assertEquals('\n', buffer.readByte());
这儿最好看下英文注释,一开端我以为是用来筛选的,后边发现了解错了,这个是对prefix的一种识别,回来界说在options中的index,options里边没有就回来-1。
indexOf办法也没什么好说的,便是对byte、ByteString、Element(多个匹配字符任选)的匹配,支撑fromIndex和toIndex之类的。
rangeEquals办法和indexOf相似,都是在规模内判别是否持平,不过rangeEquals办法回来的是true or false。
RealBufferedSource和RealBufferedSink
尽管说RealBufferedSource和RealBufferedSink终究完成了BufferedSource和BufferedSink接口,可是翻开源码,咱们却能够发现里边的操作根本都是交给了Buffer去处理的,Buffer咱们下一节看,这儿咱们就来看下这两个类加了Real之后有什么不同。
RealBufferedSource
这儿先来看下数据域和构造函数:
public final Buffer buffer = new Buffer();
public final Source source;
boolean closed;
RealBufferedSource(Source source) {
if (source == null) throw new NullPointerException("source == null");
this.source = source;
}
RealBufferedSource内保存了外部传来的source,buffer是直接new出来的,closed变量是通过调用close办法操控的,当其被符号为true的时分就不能运用了:
@Override public void close() throws IOException {
if (closed) return;
closed = true;
source.close();
buffer.clear();
}
@Override public boolean isOpen() {
return !closed;
}
这儿的close办法是从Channel接口过来的,Channel还有个办法isOpen,其实Channel这个接口是从BufferedSource来的:
RealBufferedSource -> BufferedSource -> ReadableByteChannel -> Channel
这儿有个ReadableByteChannel办法,前面讲BufferedSource时,BufferedSource并没有完成它(三个办法),所以越过了,现在咱们持续看下ReadableByteChannel接口:
public interface ReadableByteChannel extends Channel {
public int read(ByteBuffer dst) throws IOException;
}
比BufferedSource许多readXXX多了一个read办法,而且这个是从ByteBuffer读取的,这个ByteBuffer很长,咱们只需知道它是一个Buffer就行了。
public abstract class ByteBuffer extends Buffer implements Comparable<ByteBuffer> { ... }
看完从ReadableByteChannel和Channel来的办法,RealBufferedSource就剩余来自Source和BufferedSource接口的办法了,并没有其自己的额外办法。
下面咱们挑个办法讲下,首要仍是得看Buffer的内容:
@Override public long read(Buffer sink, long byteCount) throws IOException {
if (sink == null) throw new IllegalArgumentException("sink == null");
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (closed) throw new IllegalStateException("closed");
// buffer没数据,向buffet里边读取一个segment的长度
if (buffer.size == 0) {
long read = source.read(buffer, Segment.SIZE);
if (read == -1) return -1;
}
// 考虑下byteCount > buffer.size的时分,按buffer.size读,所以read在外部调用的时分需求调用多次(套循环)
long toRead = Math.min(byteCount, buffer.size);
// 从buffer读取到sink中去
return buffer.read(sink, toRead);
}
这个是承继自Source的办法,其他的也大差不差,都是先判别下,然后交给buffer去做,读者能够看下源码,我这就不写了。
RealBufferedSink
RealBufferedSink和上面介绍的RealBufferedSource根本一模相同,仅仅source换成了sink,终究也是通过Buffer去完成的,这儿就看下几个不相同的办法。
emitCompleteSegments办法在BufferedSink中提到过,意思是提交写满了数据的segment到最底层的sink,它会在每个writeXXX的终究调用,这儿看下它的详细完成:
@Override public BufferedSink writeHexadecimalUnsignedLong(long v) throws IOException {
if (closed) throw new IllegalStateException("closed");
buffer.writeHexadecimalUnsignedLong(v);
// writeXXX的终究都会调用
return emitCompleteSegments();
}
@Override public BufferedSink emitCompleteSegments() throws IOException {
if (closed) throw new IllegalStateException("closed");
// 得到的是segment最大值的整数倍
long byteCount = buffer.completeSegmentByteCount();
// 所以这儿写入到sink的是写满的segment
if (byteCount > 0) sink.write(buffer, byteCount);
return this;
}
flush办法是RealBufferedSource中没有的,咱们这儿也看下:
@Override public void flush() throws IOException {
if (closed) throw new IllegalStateException("closed");
if (buffer.size > 0) {
// 悉数写入到sink
sink.write(buffer, buffer.size);
}
// 改写
sink.flush();
}
close办法也有改变,留意下:
@Override public void close() throws IOException {
if (closed) return;
// Emit buffered data to the underlying sink. If this fails, we still need
// to close the sink; otherwise we risk leaking resources.
Throwable thrown = null;
try {
// 先把缓存数据写入再关闭,出错了会丢失数据也没办法
if (buffer.size > 0) {
sink.write(buffer, buffer.size);
}
} catch (Throwable e) {
thrown = e;
}
try {
sink.close();
} catch (Throwable e) {
if (thrown == null) thrown = e;
}
closed = true;
// 先试试关闭后,再把两个会反常的当地的反常抛出
if (thrown != null) Util.sneakyRethrow(thrown);
}
Buffer
讲了那么多总算到Buffer这个最中心的类了!提到buffer,为什么要用buffer,这个就涉及到计算机的操作体系了,拜访磁盘和网卡等 IO 操作需求通过体系调用来履行,频频调用操作体系会带来上下文切换的功用损耗。以读取为例,一次性读取一些数据并保存起来(buffer),在需求的时分从buffer里读取,这样就能下降功用损耗。 当然,这是我总结的话,或许不太准,大致意思便是这样。
其实在Java里边也用到了buffer,也便是BufferedInputStream和BufferedOutputStream,里边是通过一个byte数组完成的,Okio的buffer则不相同,里边通过Segment双向链表完成buffer,并且buffer还支撑同享(上面提到了,能够同享segment中的data数组),同享能够减少双流仿制时不必要的开销。
Buffer的结构
Buffer有两千多行,一个一个办法去讲有点不现实,和上面相同,咱们先看Buffer承继的办法,再来看它自己的办法,并对它自己的办法分类,这样就能够把Buffer看成如下结构:
- Source
- Sink
- Channel
- ReadableByteChannel
- WritableByteChannel
- BufferedSource
- BufferedSink
- 内部类: UnsafeCursor,及四个获取UnsafeCursor的办法
- 加密: md5、sha1等
- snapshot: 取得缓存数据的ByteString
- copy、write、read、getByte(pos)、clear、skip、rangeEquals,对缓存操作
- 获取参数: size、completeSegmentByteCount(segment剩余空间)、segmentSizes
- selectPrefix,判别前缀格局(配合options取得index)
- writableSegment(minimumCapacity),获取能够够容量的segment
看着有允许大,简略的就不说了,咱们下面慢慢说。
Source、Sink、三个Channel接口
先来看前五个接口的,这几个接口简略,办法数比较少,先来看下Channel接口的完成:
@Override public boolean isOpen() { return true; }
@Override public void close() {}
就这?没错,这两个办法是空的,而ReadableByteChannel和WritableByteChannel就有点东西了:
// ReadableByteChannel
@Override public int read(ByteBuffer sink) throws IOException {
Segment s = head;
if (s == null) return -1;
// 最大读入其时segment能剩余的字节数
int toCopy = Math.min(sink.remaining(), s.limit - s.pos);
sink.put(s.data, s.pos, toCopy);
s.pos += toCopy;
size -= toCopy; // size是Buffer剩余的size
// 其时segment读取完,pop并存到SegmentPool去
if (s.pos == s.limit) {
head = s.pop();
SegmentPool.recycle(s);
}
return toCopy;
}
// WritableByteChannel
@Override public int write(ByteBuffer source) throws IOException {
if (source == null) throw new IllegalArgumentException("source == null");
int byteCount = source.remaining();
int remaining = byteCount;
while (remaining > 0) {
// 从SegmentPool取一个能够写入的segment,至少能写一个字节
Segment tail = writableSegment(1);
// 最多能够写入segment剩余的字节数
int toCopy = Math.min(remaining, Segment.SIZE - tail.limit);
source.get(tail.data, tail.limit, toCopy);
remaining -= toCopy;
tail.limit += toCopy; // limit便是下一个能够写入的方位
}
size += byteCount;
return byteCount;
}
不是很杂乱,首要就在sink.put(..)和source.get(…)办法,这个是ByteBuffer的功用,咱们后边再巧,不过也好了解。再来看下Source和Sink的详细完成:
// Source
@Override public long read(Buffer sink, long byteCount) {
if (sink == null) throw new IllegalArgumentException("sink == null");
if (byteCount < 0) throw new IllegalArgumentException("byteCount < 0: " + byteCount);
if (size == 0) return -1L;
if (byteCount > size) byteCount = size;
// 便是从外来的Buffer写入到本Buffer
sink.write(this, byteCount);
return byteCount;
}
// Sink
@Override public void write(Buffer source, long byteCount) {
// 很长一段注释。。。goals: don't waste CPU and don't waste memory.
if (source == null) throw new IllegalArgumentException("source == null");
if (source == this) throw new IllegalArgumentException("source == this");
checkOffsetAndCount(source.size, 0, byteCount);
while (byteCount > 0) {
// 这时分数据只在source其时segment中
// 假如其时buffer终究一个segment能包容这些数据,那就没有必要移动去移动segment了
// 假如其时buffer终究一个segment包容不了或许不能修正里边data,那仍是得拼接曩昔
// 而且由于source其时segment中或许还有不需求读的数据,要把这个segment按byteCount一分为二,前面部分拼接曩昔
// Is a prefix of the source's head segment all that we need to move?
if (byteCount < (source.head.limit - source.head.pos)) {
// 双向链表,head的prev便是tail啊,写入要在尾部追加
Segment tail = head != null ? head.prev : null;
// tail(segment)是其data一切者,即能够修正data数据
if (tail != null && tail.owner
// 剩余空间加上byteCount后不超过segment最大长度,limit表明下一个可写方位,pos表明下一个可读方位
// 假如其时segment前面部分share出去了,前面部分都不能写入(0 - limit),没被share是不是或许segment前后都有空余空间?
&& (byteCount + tail.limit - (tail.shared ? 0 : tail.pos) <= Segment.SIZE)) {
// Our existing segments are sufficient. Move bytes from source's head to our tail.
source.head.writeTo(tail, (int) byteCount); // 充裕的,能够写入
source.size -= byteCount;
size += byteCount;
// byteCount在source和tail其时segment都允许的状况下,仿制一下就行了,这儿就over了
return;
} else {
// 留意这儿是在byteCount小于source其时segment巨细的if里边,这儿把source.head分红两个segment
// 榜首部分长byteCount,下面还在while中会进行处理
// We're going to need another segment. Split the source's head
// segment in two, then move the first of those two to this buffer.
source.head = source.head.split((int) byteCount);
}
}
// segment功率中心来了!直接把source的segment拼接到其时Buffer链表终究就行了
// Remove the source's head segment and append it to our tail.
Segment segmentToMove = source.head;
long movedByteCount = segmentToMove.limit - segmentToMove.pos;
// 从source中移除改segment
source.head = segmentToMove.pop();
if (head == null) {
// head为空的时分直接作为链表头,首尾指针指向自己(构成闭环,tail和head都是自己)
head = segmentToMove;
head.next = head.prev = head;
} else {
// 在终究拼接就行
Segment tail = head.prev;
tail = tail.push(segmentToMove);
// 消除一下拼接后中心的冗余,由于拼接后tail前一个的空余容量或许比tail的巨细更大,这时分tail是不必要的,能够被收回
tail.compact();
}
// 处理下巨细记载,byteCount被修正会影响while循环
source.size -= movedByteCount;
size += movedByteCount;
byteCount -= movedByteCount;
}
}
// 默许不超时检测
@Override public Timeout timeout() { return Timeout.NONE; }
Source的read办法很简略,Sink的timeout办法默许也是回来一个不超时检测的Timeout,也就Sink的write办法杂乱些,作者还在前面加了很长的注释(大致便是让别糟蹋内存、CPU、合理运用segment之类的),上面注释写的很清楚了,不过我仍是想再流通的简述下,便于了解。
当咱们从一个buffer往另一个buffer里边写的时分,由于segment的存在,咱们只需把segment从source的链表移动到sink的链表中去就行了,可是这要考虑下特殊状况,榜首个是移动曩昔的segment或许能够与前一个segment合并,需求做下合并操作; 第二个便是byteCount比较小,直接就能仿制到sink终究一个segment里边去; 第三个便是尽管byteCount比较小,能够仿制曩昔,可是sink终究一个segment没有修正data数组的能力(!owner),仿制不曩昔,只能拼接曩昔; 第四个便是source的终究一个segment或许包括两部分数据,一部分要读取的,一部分不需求读取的,这时分就要将这个segment一分为二,把要读取的那部分移动曩昔就行了;
看完这段代码,真便是牛逼!这个便是okio高效的中心吧。
BufferedSource和BufferedSink的完成
上面有一节咱们专门讲到了BufferedSource和BufferedSink,介绍了里边办法的功用,不过终究详细完成都是Buffer完成的,下面挑几个我觉得有意思的办法完成看下。
BufferedSource接口完成
readInt办法
从readXXX根本类型里边挑一个Int看一下,我觉得仍是有代表性的:
@Override public int readInt() {
// int型四字节
if (size < 4) throw new IllegalStateException("size < 4: " + size);
Segment segment = head;
int pos = segment.pos;
int limit = segment.limit;
// If the int is split across multiple segments, delegate to readByte().
if (limit - pos < 4) {
// 不行四字节,降级分红四个一字节的byte读取,通过移位合成4byte的int,readByte里边会处理segment的切换
return (readByte() & 0xff) << 24
| (readByte() & 0xff) << 16
| (readByte() & 0xff) << 8
| (readByte() & 0xff);
}
// 从其时segment读取四个字节
byte[] data = segment.data;
int i = (data[pos++] & 0xff) << 24
| (data[pos++] & 0xff) << 16
| (data[pos++] & 0xff) << 8
| (data[pos++] & 0xff);
size -= 4;
if (pos == limit) {
// 刚好读完一个segment
head = segment.pop();
SegmentPool.recycle(segment);
} else {
segment.pos = pos;
}
return i;
}
// 高低位反向,在嵌入式中仍是挺常见的吧
@Override public int readIntLe() {
return Util.reverseBytesInt(readInt());
}
其实仍是挺简略的,特别是和上面Sink的write办法比起来,不过仍是要留意下segment的边界状况。
readDecimalLong办法
readDecimalLong办法是一个很有意思的办法,前面咱们讲过了,会把数按十进制的字符串读取,下面看下完成:
@Override public long readDecimalLong() {
if (size == 0) throw new IllegalStateException("size == 0");
// This value is always built negatively in order to accommodate Long.MIN_VALUE.
long value = 0; // 值,一致按负数算,由于Long.MIN_VALUE绝对值大一
int seen = 0; // 其时读了几位
boolean negative = false; // 负数
boolean done = false;
// long 的规模 -2,147,483,648 到 2,147,483,647
long overflowZone = Long.MIN_VALUE / 10; // 加终究一位前假如溢出了,就不需求持续下去了
long overflowDigit = (Long.MIN_VALUE % 10) + 1; // 单个字符对应溢出的数字,即终究一位不能大于该数
do {
Segment segment = head;
byte[] data = segment.data;
int pos = segment.pos;
int limit = segment.limit;
for (; pos < limit; pos++, seen++) {
byte b = data[pos];
if (b >= '0' && b <= '9') {
// 规划成取负数
int digit = '0' - b;
// 由于是负数,所以更小便是溢出了,溢出后拿了个新Buffer拼出字符串抛出反常,假如有负号,越过负号对应字符
// Detect when the digit would cause an overflow.
if (value < overflowZone || value == overflowZone && digit < overflowDigit) {
Buffer buffer = new Buffer().writeDecimalLong(value).writeByte(b);
if (!negative) buffer.readByte(); // Skip negative sign.
throw new NumberFormatException("Number too large: " + buffer.readUtf8());
}
value *= 10;
value += digit;
} else if (b == '-' && seen == 0) {
// 榜首个方位或许是负号,long负数的规模比正数规模大一
negative = true;
overflowDigit -= 1;
} else {
// 第0个方位呈现了其他字符,直接便是反常
if (seen == 0) {
throw new NumberFormatException(
"Expected leading [0-9] or '-' character but was 0x" + Integer.toHexString(b));
}
// Set a flag to stop iteration. We still need to run through segment updating below.
done = true;
break;
}
}
if (pos == limit) {
head = segment.pop();
SegmentPool.recycle(segment);
} else {
// 读完一位
segment.pos = pos;
}
} while (!done && head != null);
// 减去读取的长度,根据负号符号对存在负数里的成果处理
size -= seen;
return negative ? value : -value;
}
这儿便是一个从字符串里读取long型数据的功用,里边验证数据规模的逻辑仍是挺有意思的,这个负数和减一用的很奇妙。
read(byte[]…)办法
上面讲到了从ReadableByteChannel承继来的read(ByteBuffer)办法,可是关于Buffer里边大部分的read办法其实都是从下面办法得到的成果:
@Override public int read(byte[] sink, int offset, int byteCount) {
checkOffsetAndCount(sink.length, offset, byteCount);
Segment s = head;
if (s == null) return -1;
// 得到head能够仿制的容量
int toCopy = Math.min(byteCount, s.limit - s.pos);
System.arraycopy(s.data, s.pos, sink, offset, toCopy);
s.pos += toCopy;
size -= toCopy;
if (s.pos == s.limit) {
head = s.pop();
SegmentPool.recycle(s);
}
return toCopy;
}
indexOf办法
查找一般很考验算法,okio这儿正好有一个,那仍是得好好研讨下。
@Override public long indexOf(ByteString bytes, long fromIndex) throws IOException {
if (bytes.size() == 0) throw new IllegalArgumentException("bytes is empty");
if (fromIndex < 0) throw new IllegalArgumentException("fromIndex < 0");
Segment s;
long offset;
// TODO(jwilson): extract this to a shared helper method when can do so without allocating.
findSegmentAndOffset: {
// Pick the first segment to scan. This is the first segment with offset <= fromIndex.
s = head;
if (s == null) {
// No segments to scan!
return -1L;
// 榜首步,先找到对应fromIndex的segment,二分法,哪边更小从哪边操作
} else if (size - fromIndex < fromIndex) {
// We're scanning in the back half of this buffer. Find the segment starting at the back.
offset = size;
// 从终究一个segment开端算,offset从size开端减去这个segment的长度,假如offset更小了,s便是fromIndex对应的segment
while (offset > fromIndex) {
s = s.prev;
offset -= (s.limit - s.pos);
}
} else {
// We're scanning in the front half of this buffer. Find the segment starting at the front.
offset = 0L;
// 和前面相似,nextOffset每次加上一个segment长度,假如加上后比formIndex更大了,那s便是formIndex对应的segment
for (long nextOffset; (nextOffset = offset + (s.limit - s.pos)) < fromIndex; ) {
s = s.next;
offset = nextOffset;
}
}
}
// 第二步,从对应的segment中寻觅
// Scan through the segments, searching for the lead byte. Each time that is found, delegate to
// rangeEquals() to check for a complete match.
byte b0 = bytes.getByte(0);
int bytesSize = bytes.size();
// 被查数据榜首个字符所在方位的最大值,offset是其时segment榜首个方位(指在Buffer中)
long resultLimit = size - bytesSize + 1;
// 榜首层循环意图是让查找能够通过多个segment
while (offset < resultLimit) {
// Scan through the current segment.
byte[] data = s.data;
// 在该segment中的限制,第二个参数指在该segment中序号,那就不能大于segment长度
int segmentLimit = (int) Math.min(s.limit, s.pos + resultLimit - offset);
// 换算到segment内对应的pos(fromIndex - offset是偏移量,s.pos是segment内数据开端方位)
for (int pos = (int) (s.pos + fromIndex - offset); pos < segmentLimit; pos++) {
// 假如榜首个字符匹配上了,就用rangeEquals去比对后边的数据,假如匹配上了就把在Buffer里边的方位回来回去
// rangeEquals内会跨越segment进行查找
if (data[pos] == b0 && rangeEquals(s, pos + 1, bytes, 1, bytesSize)) {
// offset + (pos - s.pos),offset是segment在Buffer的方位,加上数据在其时segment的便宜
return pos - s.pos + offset;
}
}
// Not in this segment. Try the next one.
offset += (s.limit - s.pos);
fromIndex = offset;
s = s.next;
}
return -1L;
}
这儿看得有允许晕了,首要得明确Buffer中的方位(offset、fromIndex)和在segment中的方位(pos、limit),首要根据fromIndex确认从哪个segment开端比较,确认后再运用segment中的方位进行比较榜首位,榜首位比较对上了,就调用rangeEquals进行下一步比较,它里边会支撑跨segment比较,成功了会回来true or false,大致便是这样。
rangeEquals办法
既然上面提到了rangeEquals办法,咱们这儿也看下,比较简略:
private boolean rangeEquals(
Segment segment, int segmentPos, ByteString bytes, int bytesOffset, int bytesLimit) {
// 其时segment终究方位
int segmentLimit = segment.limit;
byte[] data = segment.data;
// 比较到bytes终究一位就算成功了
for (int i = bytesOffset; i < bytesLimit; ) {
// 抵达segment终究,进入下一个segment比较
if (segmentPos == segmentLimit) {
segment = segment.next;
data = segment.data;
segmentPos = segment.pos;
segmentLimit = segment.limit;
}
if (data[segmentPos] != bytes.getByte(i)) {
return false;
}
segmentPos++;
i++;
}
return true;
}
BufferedSink接口完成
看完BufferedSource接口的完成,咱们再来看下BufferedSink接口的完成,和上面相同,咱们选几个办法看下。
write(ByteString)办法
前面BufferedSource接口中的readByteString比较简略,咱们直接越过了,可是BufferedSink接口的ByteString仍是有点东西的,值得一看:
@Override public Buffer write(ByteString byteString) {
if (byteString == null) throw new IllegalArgumentException("byteString == null");
byteString.write(this);
return this;
}
// ByteString内write办法
void write(Buffer buffer) {
buffer.write(data, 0, data.length);
}
// Buffer的办法
@Override public Buffer write(byte[] source, int offset, int byteCount) {
if (source == null) throw new IllegalArgumentException("source == null");
checkOffsetAndCount(source.length, offset, byteCount);
int limit = offset + byteCount;
while (offset < limit) {
Segment tail = writableSegment(1);
int toCopy = Math.min(limit - offset, Segment.SIZE - tail.limit);
System.arraycopy(source, offset, tail.data, tail.limit, toCopy);
offset += toCopy;
tail.limit += toCopy;
}
size += byteCount;
return this;
}
这儿ByteString居然反过来又调用Buffer的办法进行写入,这不是多此一举么?其实仍是有意义的,这儿ByteString的数据没有向外供给,而是通过外部对象把自己数据写到外部,这样更安全吧。
Buffer write(byte[] source, offset, byteCount)办法
上面提到了Buffer的write(byte[] source, offset, byteCount)办法,这儿也说一下,实践便是供给一个数组去写入,很实用。许多当地都是通过调用它完成功用的,比如RealBufferedSink中、outputStream办法中。
在okhttp源码中用了大量的write(byte[] source),其终究完成也是在这儿:
@Override public Buffer write(byte[] source) {
if (source == null) throw new IllegalArgumentException("source == null");
return write(source, 0, source.length);
}
writeUtf8和writeUtf8CodePoint办法
关于UTF-8的写入比较杂乱,这儿就用writeUtf8简略讲讲。我也是现找材料学的,比必定正确,可是能多学点东西总有优点:
@Override public Buffer writeUtf8(String string, int beginIndex, int endIndex) {
// 反常处理。。。忽略
// UTF-16任何字符对应的数字都用两个字节(65536)来保存,UTF-8有或许是用一个字节表明一个字符,也或许是两个,三个,最多四个
// UTF-8: 0xxxxxxx(1byte,128,ascii码),110xxxxx 10xxxxxx(2byte,2048),1110xxxx 10xxxxxx 10xxxxxx(3byte,65536)
// UTF-16没有标志位,容错性高; UTF-8常用于网络传输,UTF-16用于贮存
// Transcode a UTF-16 Java String to UTF-8 bytes. 将UTF-16转为UTF-8
for (int i = beginIndex; i < endIndex;) {
// charAt会按字符取,即c是两字节得UTF-16(65536)
int c = string.charAt(i);
// 转成1byte的UTF-8,即ASCII码
if (c < 0x80) {
Segment tail = writableSegment(1);
byte[] data = tail.data;
// 其时segment中其时字节得方位
int segmentOffset = tail.limit - i;
// 其时segment能包容得字符个数
int runLimit = Math.min(endIndex, Segment.SIZE - segmentOffset);
// Emit a 7-bit character with 1 byte.
data[segmentOffset + i++] = (byte) c; // 0xxxxxxx
// 试着持续当ASCII码写,提高功率
// Fast-path contiguous runs of ASCII characters. This is ugly, but yields a ~4x performance
// improvement over independent calls to writeByte().
while (i < runLimit) {
c = string.charAt(i);
if (c >= 0x80) break;
data[segmentOffset + i++] = (byte) c; // 0xxxxxxx
}
int runSize = i + segmentOffset - tail.limit; // Equivalent to i - (previous i).
tail.limit += runSize;
size += runSize;
// 两字节状况,[128, 2048),将c写入到两个byte中去
} else if (c < 0x800) {
// xxxx xxx(x xx)(xx xxxx)
// Emit a 11-bit character with 2 bytes.
writeByte(c >> 6 | 0xc0); // 110xxxxx
writeByte(c & 0x3f | 0x80); // 10xxxxxx
i++;
// 三字节状况
} else if (c < 0xd800 || c > 0xdfff) {
// (xxxx) (xxxx xx)(xx xxxx)
// Emit a 16-bit character with 3 bytes.
writeByte(c >> 12 | 0xe0); // 1110xxxx
writeByte(c >> 6 & 0x3f | 0x80); // 10xxxxxx
writeByte(c & 0x3f | 0x80); // 10xxxxxx
i++;
// 四个字节状况。
// 假如字符是署理对,则它被编码为四个字节的序列。署理对是一对Unicode字符,用于表明UTF-16编码中BMP之外的字符。
} else {
// c is a surrogate. Make sure it is a high surrogate & that its successor is a low
// surrogate. If not, the UTF-16 is invalid, in which case we emit a replacement character.
int low = i + 1 < endIndex ? string.charAt(i + 1) : 0;
if (c > 0xdbff || low < 0xdc00 || low > 0xdfff) {
writeByte('?');
i++;
continue;
}
// UTF-16 high surrogate: 110110xxxxxxxxxx (10 bits)
// UTF-16 low surrogate: 110111yyyyyyyyyy (10 bits)
// Unicode code point: 00010000000000000000 + xxxxxxxxxxyyyyyyyyyy (21 bits)
int codePoint = 0x010000 + ((c & ~0xd800) << 10 | low & ~0xdc00);
// Emit a 21-bit character with 4 bytes.
writeByte(codePoint >> 18 | 0xf0); // 11110xxx
writeByte(codePoint >> 12 & 0x3f | 0x80); // 10xxxxxx
writeByte(codePoint >> 6 & 0x3f | 0x80); // 10xxyyyy
writeByte(codePoint & 0x3f | 0x80); // 10yyyyyy
i += 2;
}
}
return this;
}
有点懵逼,UTF-8的格局找了找材料,三个字节和四个字节的不是很清楚,可是大致功用好了解,便是把UTF-16的字符转成UTF-8的字符,运用charAt能得到字符的int型,然后根据规模去生成UTF-8,或许是一个字符,或许是两个字符,乃至三个四个字符,按格局生成字节就行了。
writeInt办法
对根本类型的写入比较简略,按字节写入segment的data里边就行了,没什么好说的。
@Override public Buffer writeInt(int i) {
Segment tail = writableSegment(4);
byte[] data = tail.data;
int limit = tail.limit;
data[limit++] = (byte) ((i >>> 24) & 0xff);
data[limit++] = (byte) ((i >>> 16) & 0xff);
data[limit++] = (byte) ((i >>> 8) & 0xff);
data[limit++] = (byte) (i & 0xff);
tail.limit = limit;
size += 4;
return this;
}
writeDecimalLong办法
和前面readDecimalLong办法相似,
@Override public Buffer writeDecimalLong(long v) {
if (v == 0) {
// Both a shortcut and required since the following code can't handle zero.
return writeByte('0');
}
boolean negative = false;
if (v < 0) {
v = -v;
if (v < 0) { // Only true for Long.MIN_VALUE.
return writeUtf8("-9223372036854775808");
}
negative = true;
}
// 也是凶猛哦
// Binary search for character width which favors matching lower numbers.
int width = //
v < 100000000L
? v < 10000L
? v < 100L
? v < 10L ? 1 : 2
: v < 1000L ? 3 : 4
: v < 1000000L
? v < 100000L ? 5 : 6
: v < 10000000L ? 7 : 8
: v < 1000000000000L
? v < 10000000000L
? v < 1000000000L ? 9 : 10
: v < 100000000000L ? 11 : 12
: v < 1000000000000000L
? v < 10000000000000L ? 13
: v < 100000000000000L ? 14 : 15
: v < 100000000000000000L
? v < 10000000000000000L ? 16 : 17
: v < 1000000000000000000L ? 18 : 19;
if (negative) {
// 负数加一位符号位
++width;
}
Segment tail = writableSegment(width);
byte[] data = tail.data;
// 循环取余写入,终究写入符号位
int pos = tail.limit + width; // We write backwards from right to left.
while (v != 0) {
int digit = (int) (v % 10);
data[--pos] = DIGITS[digit];
v /= 10;
}
if (negative) {
data[--pos] = '-';
}
tail.limit += width;
this.size += width;
return this;
}
看起来还说挺好了解的,获取长度那里尽管看起来长了点,如同还没什么好办法。
Buffer本身办法
在上面现已对Buffer本身的办法做了个总结,大致如下:
- 内部类: UnsafeCursor,及四个获取UnsafeCursor的办法
- 加密: md5、sha1等
- snapshot: 取得缓存数据的ByteString
- copy、write、read、getByte(pos)、clear、skip、rangeEquals,对缓存操作
- 获取参数: size、completeSegmentByteCount(segment剩余空间)、segmentSizes
- selectPrefix,判别前缀格局(配合options取得index)
- writableSegment(minimumCapacity),获取能够够容量的segment
UnsafeCursor是这个Buffer的一个操作类,比较中心,仍是得讲一下的。
加密就越过了,关于ByteString的内容能够简略讲讲,学下原理就能够了,内容好多。
对缓存的操作,不太想讲了,都是对segment的处理,在RealBufferedSource和RealBufferedSink以及Buffer内看得够多了。
获取参数的办法看一下就好,selectPrefix以及writableSegment仍是能够看下的。
内部类: UnsafeCursor,及四个获取UnsafeCursor的办法
先来看下UnsafeCursor自己的介绍吧,大致iiu是对缓存区根底数据的不安全处理器,这个安全问题需求用户自己留意,代码注释里边还有很长的比如,这儿就不贴出来了。
A handle to the underlying data in a buffer. This handle is unsafe because it does not enforce its own invariants. Instead, it assumes a careful user who has studied Okio’s implementation details and their consequences. 缓冲区中根底数据的handle。该handle是不安全的,由于它不强制履行自己的不变量。相反,它假定一个细心的用户现已研讨了 Okio 的完成细节及其后果。
变量域
先来看下变量域,先来了解下这些成员变量代表的意义:
// 操控的buffer
public Buffer buffer;
// 可读可写?
public boolean readWrite;
// 其时segment
private Segment segment;
// 其时偏移,最大为buffer.size
public long offset = -1L;
// references the segment's internal byte array
public byte[] data;
// is the segment's start,在buffer里边的方位
public int start = -1;
// is the segment's end,在buffer里边的方位
public int end = -1;
下面再来了解办法,next和close比较简略就不写了。
seek办法
public final int seek(long offset) {
if (offset < -1 || offset > buffer.size) {
throw new ArrayIndexOutOfBoundsException(
String.format("offset=%s > size=%s", offset, buffer.size));
}
// 规模外
if (offset == -1 || offset == buffer.size) {
this.segment = null;
this.offset = offset;
this.data = null;
this.start = -1;
this.end = -1;
return -1;
}
// Navigate to the segment that contains `offset`. Start from our current segment if possible.
long min = 0L;
long max = buffer.size;
Segment head = buffer.head;
Segment tail = buffer.head;
if (this.segment != null) {
// 取得其时segment榜首个方位在buffer中的偏移
long segmentOffset = this.offset - (this.start - this.segment.pos);
// 根据其时segment方位修正min或max,缩小规模,以其时segment为界分两部分
if (segmentOffset > offset) {
// Set the cursor segment to be the 'end'
max = segmentOffset;
tail = this.segment;
} else {
// Set the cursor segment to be the 'beginning'
min = segmentOffset;
head = this.segment;
}
}
// 从小的部分开端寻觅offset所在segment
Segment next;
long nextOffset;
if (max - offset > offset - min) {
// Start at the 'beginning' and search forwards
next = head;
nextOffset = min;
while (offset >= nextOffset + (next.limit - next.pos)) {
nextOffset += (next.limit - next.pos);
next = next.next;
}
} else {
// Start at the 'end' and search backwards
next = tail;
nextOffset = max;
while (nextOffset > offset) {
next = next.prev;
nextOffset -= (next.limit - next.pos);
}
}
// If we're going to write and our segment is shared, swap it for a read-write one.
if (readWrite && next.shared) {
// 假如该segment被同享了,就仿制一个不被同享的segment,并在下面代码代替原本的
Segment unsharedNext = next.unsharedCopy();
if (buffer.head == next) {
buffer.head = unsharedNext;
}
next = next.push(unsharedNext);
next.prev.pop();
}
// 更新Buffer的数据域
// Update this cursor to the requested offset within the found segment.
this.segment = next;
this.offset = offset;
this.data = next.data;
this.start = next.pos + (int) (offset - nextOffset);
this.end = next.limit;
return end - start;
}
seek办法不是很杂乱,意图便是根据offset找到对应的segment,并在这个segment找到offset的方位,详细看上面注释。
resizeBuffer办法
public final long resizeBuffer(long newSize) {
if (buffer == null) {
throw new IllegalStateException("not attached to a buffer");
}
if (!readWrite) {
throw new IllegalStateException("resizeBuffer() only permitted for read/write buffers");
}
long oldSize = buffer.size;
if (newSize <= oldSize) {
if (newSize < 0) {
throw new IllegalArgumentException("newSize < 0: " + newSize);
}
// 从终究一个segment开端pop并收回,直到终究一个segment正好能包容newSize
// Shrink the buffer by either shrinking segments or removing them.
for (long bytesToSubtract = oldSize - newSize; bytesToSubtract > 0; ) {
Segment tail = buffer.head.prev;
int tailSize = tail.limit - tail.pos;
if (tailSize <= bytesToSubtract) {
// tail被pop,回来的是tail的next,即head,或许为null
buffer.head = tail.pop();
SegmentPool.recycle(tail);
bytesToSubtract -= tailSize;
} else {
// 把终究一个segment可写的方位缩到刚好newSize的方位
tail.limit -= bytesToSubtract;
break;
}
}
// Seek to the end. 重置数据域
this.segment = null;
this.offset = newSize;
this.data = null;
this.start = -1;
this.end = -1;
} else if (newSize > oldSize) {
// 逐一增加segment使之抵达newSize要求
// Enlarge the buffer by either enlarging segments or adding them.
boolean needsToSeek = true;
for (long bytesToAdd = newSize - oldSize; bytesToAdd > 0; ) {
// 先验证终究一个segment能写入的巨细,能完成bytesToAdd就增加bytesToAdd,否则就填满segment
// writableSegment会验证是否写满了segment,不行会自动增加和切换到新segment
Segment tail = buffer.writableSegment(1);
int segmentBytesToAdd = (int) Math.min(bytesToAdd, Segment.SIZE - tail.limit);
tail.limit += segmentBytesToAdd;
bytesToAdd -= segmentBytesToAdd;
// 只履行一次,一开端的时分跳到末尾segment
// If this is the first segment we're adding, seek to it.
if (needsToSeek) {
this.segment = tail;
this.offset = oldSize;
this.data = tail.data;
this.start = tail.limit - segmentBytesToAdd;
this.end = tail.limit;
needsToSeek = false;
}
}
}
buffer.size = newSize;
return oldSize;
}
resizeBuffer办法和它的姓名相同,便是来对buffer巨细进行操控的,会根据传入的size对segment进行收回或许创建。
expandBuffer办法
public final long expandBuffer(int minByteCount) {
if (minByteCount <= 0) {
throw new IllegalArgumentException("minByteCount <= 0: " + minByteCount);
}
if (minByteCount > Segment.SIZE) {
throw new IllegalArgumentException("minByteCount > Segment.SIZE: " + minByteCount);
}
if (buffer == null) {
throw new IllegalStateException("not attached to a buffer");
}
if (!readWrite) {
throw new IllegalStateException("expandBuffer() only permitted for read/write buffers");
}
long oldSize = buffer.size;
// 或许是原本终究一个segment也或许是新的segment,不过没关系
Segment tail = buffer.writableSegment(minByteCount);
// 终究一个segment能包容的巨细
int result = Segment.SIZE - tail.limit;
tail.limit = Segment.SIZE;
// 实践扩容,result >= minByteCount
buffer.size = oldSize + result;
// Seek to the old size.
this.segment = tail;
// offset抵达oldSize时,现已在新的segment了
this.offset = oldSize;
this.data = tail.data;
this.start = Segment.SIZE - result;
this.end = Segment.SIZE;
return result;
}
这儿便是给一个minByteCount来扩容,看代码这个minByteCount应该需求比Segment.SIZE小,它会发生两种成果,一个便是原本的tail能够包容minByteCount,就不必切换segment了,假如原本的tail包容不了minByteCount,那就会进入到下一个segment,这时分原本tail后边能够写入的部分就会被越过,所以当seek到oldSize的时分会抵达新的segment起点处。
snapshot办法
/** Returns an immutable copy of this buffer as a byte string. */
public final ByteString snapshot() {
if (size > Integer.MAX_VALUE) {
throw new IllegalArgumentException("size > Integer.MAX_VALUE: " + size);
}
return snapshot((int) size);
}
/**
* Returns an immutable copy of the first {@code byteCount} bytes of this buffer as a byte string.
*/
public final ByteString snapshot(int byteCount) {
if (byteCount == 0) return ByteString.EMPTY;
return new SegmentedByteString(this, byteCount);
}
这儿两个snapshot办法便是回来SegmentedByteString的,SegmentedByteString和ByteString内容还许多,这儿就简略了解下功用,不做深入探讨。
ByteString注释: An immutable sequence of bytes. Byte strings compare lexicographically as a sequence of unsigned bytes. That is, the byte string ff sorts after 00. This is counter to the sort order of the corresponding bytes, where -1 sorts before 0. 不可变的字节序列。 字节字符串按字典次序比较为无符号字节序列。也便是说,字节字符串 ff 排序在 00 之后。这与相应字节的排序次序相反,其间 -1 排序在 0 之前。
上面是ByteString自己的概述,简略来说便是ByteString是用来处理不可变的字节序列的,功率比较高,Buffer是用来处理可变序列的。ByteString内部运用byte数组贮存数据,而不是用segment,供给了许多高功率的办法来对数据处理,比如加密(base64、md5、sha1等)、substring、rangeEquals、startsWith、indexOf、lastIndexOf等,感觉便是相似String吧。
SegmentedByteString注释: An immutable byte string composed of segments of byte arrays. This class exists to implement efficient snapshots of buffers. It is implemented as an array of segments, plus a directory in two halves that describes how the segments compose this byte string. 由字节数组段组成的不可变字节字符串。此类的存在是为了完成高效的缓冲区快照。它被完成为一个段数组,加上一个分为两半的目录,描述了段怎么组成这个字节字符串。
这儿只拿了SegmentedByteString前面部分的注释,大致意思便是它是一个有segment组成的不可变的字符串,它有两个数据域: segments用于保存数据,directory用来操控数据。segments是一个二维数组,directory是个一维数组,directory分为两部分,榜首部分表明segments榜首层数组中贮存字符串的累加长度,第二部分表明segments榜首层数组中贮存字符串的开端方位,不是很好了解,这儿用注释中的比如看一下就懂了:
Suppose we have a byte string, [A, B, C, D, E, F, G, H, I, J, K, L, M] that is stored across three byte arrays: [x, x, x, x, A, B, C, D, E, x, x, x], [x, F, G], and [H, I, J, K, L, M, x, x, x, x, x, x] the complete directory is [5, 7, 13, 4, 1, 0]. 前半部分表明每个array中累加的字符串长度,后半部分表明每个array中开端的方位 需求留意的是array的次序是有序的,array中的数据是接连的
SegmentedByteString其他办法就一个toByteString需求看下,它会把数据转成ByteString,再由ByteString对外供给各种功用。
selectPrefix办法
前面BufferedSource中,咱们讲到了select(Options options)办法,便是供给一个options数组,然后对每一行数据的前缀格局进行校验,然后回来校验到的options内i的index,其时那里的index便是通过Buffer来完成的。
selectPrefix办法比较长,还和Options有关,Options也比较长,这儿就不贴代码了,可是其间的原理倒是很有学习效果,一起来看下吧!
Options类
Options类比较有意思,代码很长,但就做了两件事,一个是通过of创建Options,一个便是通过buildTrieRecursive递归生成trie树。
of办法会对传入的options进行排序去重,数据暂时保存在list里边,然后调用buildTrieRecursive去生成trie树,数据终究保存在trie树里边。这儿最好先了解下trie树,它的根节点是空字符,然后从根节点能够根据前缀找到字符串,还挺实用的,用于查找很便利。
buildTrieRecursive是一个递归的办法,能够将保存在list中的字符串转成trie树,保存的时分有必定的格局,原理差不多就这样,先来看下注释:
Builds a trie encoded as an int array. Nodes in the trie are of two types: SELECT and SCAN. SELECT nodes are encoded as: – selectChoiceCount: the number of bytes to choose between (a positive int) – prefixIndex: the result index at the current position or -1 if the current position is not a result on its own – a sorted list of selectChoiceCount bytes to match against the input string – a heterogeneous list of selectChoiceCount result indexes (>= 0) or offsets (< 0) of the next node to follow. Elements in this list correspond to elements in the preceding list. Offsets are negative and must be multiplied by -1 before being used. SCAN nodes are encoded as: – scanByteCount: the number of bytes to match in sequence. This count is negative and must be multiplied by -1 before being used. – prefixIndex: the result index at the current position or -1 if the current position is not a result on its own – a list of scanByteCount bytes to match – nextStep: the result index (>= 0) or offset (< 0) of the next node to follow. Offsets are negative and must be multiplied by -1 before being used. This structure is used to improve locality and performance when selecting from a list of options. 构建一个编码为 int 数组的 trie。 trie 中的节点有两种类型:SELECT 和 SCAN。 SELECT 节点编码为: – selectChoiceCount:要挑选的字节数(正整数) – prefixIndex:其时方位的成果索引,假如其时方位本身不是成果,则为 -1 – 排序列表要与输入字符串匹配的 selectChoiceCount 字节 – selectChoiceCount 成果索引 (>= 0) 或要跟从的下一个节点的偏移量 (< 0) 的异构列表。此列表中的元素对应于前面列表中的元素。偏移量为负数,运用前有必要乘以 -1。 SCAN 节点编码为: – scanByteCount:按次序匹配的字节数。该计数为负数,运用前有必要乘以 -1。 – prefixIndex:其时方位的成果索引,假如其时方位本身不是成果,则为 -1 – 要匹配的 scanByteCount 字节列表 – nextStep:成果索引 (>= 0) 或偏移量 (< 0)要遵从的下一个节点。偏移量为负数,运用前有必要乘以 -1。此结构用于提高从选项列表中进行挑选时的局部性和功用。
原本不想看这个buildTrieRecursive办法的,可是想想如同学源码便是来学东西的,什么最能学到东西,不便是算法吗?所以仍是研讨下:
private static void buildTrieRecursive(
long nodeOffset, // 在Buffer中的偏移
Buffer node, // 要写入的Buffer
int byteStringOffset, // 在buffer中现已按次序写入字符的偏移
List<ByteString> byteStrings, // 对应的options列表(通过排序和去重)
int fromIndex, // 上面options列表开端坐标
int toIndex, // 上面options列表完毕坐标
List<Integer> indexes) // 上面options列表中每个方位在options中原始坐标index
{
if (fromIndex >= toIndex) throw new AssertionError();
// byteStringOffset是buffer中现已按次序写入字符的偏移,必定不能超过该规模内的字符串长度还长
// 现已写好: abc,规模内字符: abcdefg,abcg,abcf
for (int i = fromIndex; i < toIndex; i++) {
if (byteStrings.get(i).size() < byteStringOffset) throw new AssertionError();
}
ByteString from = byteStrings.get(fromIndex);
ByteString to = byteStrings.get(toIndex - 1);
int prefixIndex = -1;
// 榜首个字符串长度和偏移相同长,记载下index,越过它看剩余的
// If the first element is already matched, that's our prefix.
if (byteStringOffset == from.size()) {
prefixIndex = indexes.get(fromIndex);
fromIndex++;
from = byteStrings.get(fromIndex);
}
// 终究一个和榜首个在byteStringOffset偏移上不持平,即有分叉
if (from.getByte(byteStringOffset) != to.getByte(byteStringOffset)) {
// If we have multiple bytes to choose from, encode a SELECT node.
int selectChoiceCount = 1;
// 由于现已排序好了,只需按次序两两比较,不相同了就多了一种分支
for (int i = fromIndex + 1; i < toIndex; i++) {
if (byteStrings.get(i - 1).getByte(byteStringOffset)
!= byteStrings.get(i).getByte(byteStringOffset)) {
selectChoiceCount++;
}
}
// 计算子部分需求的长度,先往下看懂格局,intCount(node)或许是避免node中其他字符,我觉得是0,由于每次传过来的node都是新建的
// 加2是下面两行,(selectChoiceCount * 2)表明每个节点和它对应的index
// Compute the offset that childNodes will get when we append it to node.
long childNodesOffset = nodeOffset + intCount(node) + 2 + (selectChoiceCount * 2);
// 写入其时层的分支数,正数
node.writeInt(selectChoiceCount);
// 写入其时部分最契合的index
node.writeInt(prefixIndex);
// 写入榜首个option,并按次序写入分叉的option
// 了解一下: 便是写入一层树结构,同一层的数据必定是不相同的才会分叉,前面byteStringOffset相同的进入下一层
for (int i = fromIndex; i < toIndex; i++) {
byte rangeByte = byteStrings.get(i).getByte(byteStringOffset);
if (i == fromIndex || rangeByte != byteStrings.get(i - 1).getByte(byteStringOffset)) {
// 终究8位,即一个int
node.writeInt(rangeByte & 0xff);
}
}
// 下一层
Buffer childNodes = new Buffer();
int rangeStart = fromIndex;
while (rangeStart < toIndex) {
// 遍历一下,找到和rangeStart前byteStringOffset一致的数据,即[rangeStart, rangeEnd)
byte rangeByte = byteStrings.get(rangeStart).getByte(byteStringOffset);
int rangeEnd = toIndex;
for (int i = rangeStart + 1; i < toIndex; i++) {
if (rangeByte != byteStrings.get(i).getByte(byteStringOffset)) {
rangeEnd = i;
break;
}
}
// 左闭右开,所以rangeStart便是终究一个,这儿是递归出口
if (rangeStart + 1 == rangeEnd
&& byteStringOffset + 1 == byteStrings.get(rangeStart).size()) {
// The result is a single index.
node.writeInt(indexes.get(rangeStart));
} else {
// 负一是格局,记载下偏移到什么当地,把上面计算到的某个节点的下一层递归
// The result is another node.
node.writeInt((int) (-1 * (childNodesOffset + intCount(childNodes))));
// 留意(byteStringOffset + 1),下一层的根据是下一个字符是否相同
buildTrieRecursive(
childNodesOffset,
childNodes,
byteStringOffset + 1,
byteStrings,
rangeStart,
rangeEnd,
indexes);
}
// 进入该层下一个节点的下面一层(前byteStringOffset相同的是一层)
rangeStart = rangeEnd;
}
// 递归完毕后,把下一层的数据写到其时层的Buffer里边
node.write(childNodes, childNodes.size());
} else {
// 开端到完毕在前byteStringOffset字符都相同,先找到相同的个数
// If all of the bytes are the same, encode a SCAN node.
int scanByteCount = 0;
for (int i = byteStringOffset, max = Math.min(from.size(), to.size()); i < max; i++) {
if (from.getByte(i) == to.getByte(i)) {
scanByteCount++;
} else {
break;
}
}
// 不是很了解,按下面写入的格局算的(2, scanByteCount, 1(终究if中两种状况都有加一))
// intCount(node)表明什么?已有数据的长度,每次递归不都是新建的,等于0吗?保险起见?
// Compute the offset that childNodes will get when we append it to node.
long childNodesOffset = nodeOffset + intCount(node) + 2 + scanByteCount + 1;
// 用负数写入相同的个数,和上面selectByteCount区别
node.writeInt(-scanByteCount);
node.writeInt(prefixIndex);
// 写入前面相同的字符
for (int i = byteStringOffset; i < byteStringOffset + scanByteCount; i++) {
node.writeInt(from.getByte(i) & 0xff);
}
// 终究一个了
if (fromIndex + 1 == toIndex) {
// The result is a single index.
if (byteStringOffset + scanByteCount != byteStrings.get(fromIndex).size()) {
throw new AssertionError();
}
// 终究一个option对应的index
node.writeInt(indexes.get(fromIndex));
} else {
// 越过scanByteCount个数据后,后边剩余的给下一层处理
// The result is another node.
Buffer childNodes = new Buffer();
node.writeInt((int) (-1 * (childNodesOffset + intCount(childNodes))));
buildTrieRecursive(
childNodesOffset,
childNodes,
byteStringOffset + scanByteCount,
byteStrings,
fromIndex,
toIndex,
indexes);
node.write(childNodes, childNodes.size());
}
}
}
加了许多注释仍是不太好了解,大致来看便是按层遍历一个树,榜首层便是对榜首个字符的比较,以此类推,假定是第i层,就要对这层的数据比较第i位,每层分两种状况:这一层第i位都相同(乃至接连N层相同),这层第i位不同(有j种不同,那该节点下一层就有j个节点)。总而言之,便是该节点后边不相同了就递归,终究把数据写入到总的一个buffer中。
大致便是这样,算法这东西说清楚有点难,仍是代码好了解,上面的写入格局我也不太明白,不过没关系持续看的去,慢慢就懂了。
selectPrefix原理
看完了Options类,再回过来看selectPrefix办法,总算就不是一头雾水了,现在一切options数据都保存在了trie树里边,只需对options树进行遍历,按格局把数据读出来就行了。
int selectPrefix(Options options, boolean selectTruncated) {
Segment head = this.head;
if (head == null) {
if (selectTruncated) return -2; // A result is present but truncated.
return options.indexOf(ByteString.EMPTY);
}
Segment s = head;
byte[] data = head.data;
int pos = head.pos;
int limit = head.limit;
int[] trie = options.trie;
int triePos = 0;
int prefixIndex = -1;
navigateTrie:
while (true) {
// Options中两种状况,n个字符接连相同、或许直接子树个数
int scanOrSelect = trie[triePos++];
// -1是初始化的默许值
int possiblePrefixIndex = trie[triePos++];
if (possiblePrefixIndex != -1) {
prefixIndex = possiblePrefixIndex;
}
int nextStep;
if (s == null) {
break;
} else if (scanOrSelect < 0) {
// Scan形式,要先读取接连的一段字符串
// Scan: take multiple bytes from the buffer and the trie, looking for any mismatch.
int scanByteCount = -1 * scanOrSelect;
int trieLimit = triePos + scanByteCount;
// 对前面接连的scanByteCount个方位进行比较
while (true) {
// 取segment中数据进行比较
int b = data[pos++] & 0xff;
// 在scanByteCount中不相同了,就直接回来prefixIndex
if (b != trie[triePos++]) return prefixIndex; // Fail 'cause we found a mismatch.
boolean scanComplete = (triePos == trieLimit);
// 切换segment
// Advance to the next buffer segment if this one is exhausted.
if (pos == limit) {
s = s.next;
pos = s.pos;
data = s.data;
limit = s.limit;
// segment数据读取完了
if (s == head) {
if (!scanComplete) break navigateTrie; // We were exhausted before the scan completed.
s = null; // We were exhausted at the end of the scan.
}
}
// 在scanByteCount中全比对上了,就退出Scan这个代码块,进入外层下一个循环
if (scanComplete) {
// nextStep是下一个节点数据,而且必定不是scan形式了,可是记载的是负的偏移
// Options中: node.writeInt((int) (-1 * (childNodesOffset + intCount(childNodes))));
nextStep = trie[triePos];
break;
}
}
} else {
// Select形式,selectChoiceCount是分支树,后边的数据便是这一层的数据共selectChoiceCount个
// Select: take one byte from the buffer and find a match in the trie.
int selectChoiceCount = scanOrSelect;
// 遍历这一层
int b = data[pos++] & 0xff;
int selectLimit = triePos + selectChoiceCount;
while (true) {
// 这一层都匹配失利了,那就失利了啊,没有必要持续了
if (triePos == selectLimit) return prefixIndex; // Fail 'cause we didn't find a match.
// 匹配到这一个分支,break进入下一层
if (b == trie[triePos]) {
// nextStep是下一层循环的偏移,是一个负数
// Options中: node.writeInt((int) (-1 * (childNodesOffset + intCount(childNodes))));
nextStep = trie[triePos + selectChoiceCount];
break;
}
triePos++;
}
// Advance to the next buffer segment if this one is exhausted.
if (pos == limit) {
s = s.next;
pos = s.pos;
data = s.data;
limit = s.limit;
if (s == head) {
s = null; // No more segments! The next trie node will be our last.
}
}
}
// 失利状况现已return,nextStep上面现已提到了是负数,进入下一层循环,大于等于0的时分便是完毕的时分,存的index
// Select时: node.writeInt(indexes.get(rangeStart));
// Scan时: node.writeInt(indexes.get(fromIndex));
if (nextStep >= 0) return nextStep; // Found a matching option.
// 上面注释有,-nextStep便是下一个偏移所在方位
triePos = -nextStep; // Found another node to continue the search.
}
// We break out of the loop above when we've exhausted the buffer without exhausting the trie.
if (selectTruncated) return -2; // The buffer is a prefix of at least one option.
return prefixIndex; // Return any matches we encountered while searching for a deeper match.
}
麻了麻了,总算算了解清楚了。两种形式scan和select,格局如下:
- scan榜首位是该层分支个数(正数),第二位是默许的prefixIndex(-1,树的终究一个便是正数index),然后接该层的数据,再接上这一层的数据,终究一位两种状况:下一层的偏移、每终究一个节点的index。
- select榜首位是要越过字符的个数(负数用于区别),第二位和上面相同,然后是要越过的字符,没有数据,终究一位两种状况(下一个方位必定是select形式): 下一层的偏移、每终究一个节点的index。
然后便是这两种形式的混合,构成了整个trie树。
writableSegment办法
writableSegment办法比起上面几个内容就简略多了,直接双向链表拿终究的segment,看看容量够不行,不行从SegmentPool拿个新的放终究并回来。
/**
* Returns a tail segment that we can write at least {@code minimumCapacity}
* bytes to, creating it if necessary.
*/
Segment writableSegment(int minimumCapacity) {
if (minimumCapacity < 1 || minimumCapacity > Segment.SIZE) throw new IllegalArgumentException();
if (head == null) {
head = SegmentPool.take(); // Acquire a first segment.
return head.next = head.prev = head;
}
Segment tail = head.prev;
if (tail.limit + minimumCapacity > Segment.SIZE || !tail.owner) {
tail = tail.push(SegmentPool.take()); // Append a new empty segment to fill up.
}
return tail;
}