String类中的hashCode()分析

  • Post author:
  • Post category:其他


hashCode()这个方法是Object中定义的,因此,不管Java中的每个类都会显示或者隐含的调用hashCode。本文将从以下几个方面对hashCode进行阐述。

  • 什么是hashcode?
  • JDK中String类中的hashCode的详细定义;
  • 简单阐述一下为什么hashCode的系数为31;
  1. 什么是hashcode?

维基百科的定义:In the Java programming language, every class implicitly or explicitly provides a hashCode() method, which digests the data stored in an instance of the class into a single hash value (a 32-bit signed integer). This hash is used by other code when storing or manipulating the instance – the values are intended to be evenly distributed for varied inputs in order to use in clustering. This property is important to the performance of hash tables and other data structures that store objects in groups (“buckets”) based on their computed hash values.

hashcode其实就是对象存在内存中的一个索引,指示对象应该存在哪个位置,根据这个位置可以取出这个对象,但是它必须要减少冲突,并且在存储和操作的时候尽可能高效。

2. JDK中String类中的hashCode的详细定义

 /** The value is used for character storage. */
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    private final int offset;

    /** The count is the number of characters in the String. */
    private final int count;

    /** Cache the hash code for the string */
    private int hash; // Default to 0

/**
     * Returns a hash code for this string. The hash code for a
     * <code>String</code> object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the
     * <i>i</i>th character of the string, <code>n</code> is the length of
     * the string, and <code>^</code> indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
	int h = hash;
        int len = count;
	if (h == 0 && len > 0) {
	    int off = offset;
	    char val[] = value;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

String类覆盖了Object类中的hashCode方法,从代码的注释中可以看得出,一个String的对象的hashCode值的计算方法为:s[0]*31^(n-1) + s[1]*31^(n-2) + … + s[n-1]。其中,s[i]表示是第i个字符,n是字符串的长度。这里是取字符的ASC2码的值进行计算的。




3.为什么hashCode的系数为31?

前面介绍过hashcode是为了存储和操作的时候更高效,hash算法有个通用的毛病就是会冲突,为了让冲突的概率更低,研究表明,使用素数作为系统,他们冲突的概率会比使用合数的冲突的概率低得多。至于为什么选择31而不是别的素数,我就无从得知啦。



版权声明:本文为jim8757原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。