Android So-加载、链接、初始化流程

初探Linker

当我们使用Android Studio进行NDK开发时,先在Java类中编写native方法的声明,之后在cpp文件中编写相应函数的实现,最后根据CMakeLists.txt将cpp代码编译为so,并在Java类中使用System.loadLibrary方法将Apk安装包中lib目录下相应架构的so文件载入内存.那该方法如何将so文件载入内存? 这需要我们深入源码来探究它内部的代码逻辑.

Java层代码跟踪

Java层加载so文件的入口为System.loadLibrary方法,我们以此为入口进行源码跟踪.

loadlibrary入口

1
2
3
4
5
6
7
//http://androidxref.com/4.4.4_r1/xref/libcore/luni/src/main/java/java/lang/System.java
public static void loadLibrary(String libName)

//http://androidxref.com/4.4.4_r1/xref/libcore/luni/src/main/java/java/lang/Runtime.java
void loadLibrary(String libraryName, ClassLoader loader)
->private String doLoad(String name, ClassLoader loader)
->private static native String nativeLoad(String filename, ClassLoader loader, String ldLibraryPath)

Native层代码跟踪

nativeLoad为Native层方法,继续搜索nativeLoad定义,发现在java_lang_Runtime.cc文件中.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
//http://androidxref.com/4.4.4_r1/xref/art/runtime/native/java_lang_Runtime.cc
NATIVE_METHOD(Runtime, nativeLoad, "(Ljava/lang/String;Ljava/lang/ClassLoader;Ljava/lang/String;)Ljava/lang/String;")
->static jstring Runtime_nativeLoad(JNIEnv* env, jclass, jstring javaFilename, jobject javaLoader, jstring javaLdLibraryPath)

//http://androidxref.com/4.4.4_r1/xref/art/runtime/jni_internal.cc
bool JavaVMExt::LoadNativeLibrary(const std::string& path, ClassLoader* class_loader, std::string& detail)

//http://androidxref.com/4.4.4_r1/xref/bionic/linker/dlfcn.cpp
void* dlopen(const char* filename, int flags)

//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
soinfo* do_dlopen(const char* name, int flags)

do_dlopen()函数位于linker.cpp中,从此处开始便进入了Linker的代码细节.

深入Linker

so加载

do_dlopen函数定义如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
soinfo *do_dlopen(const char *name, int flags)
{
    if ((flags & ~(RTLD_NOW | RTLD_LAZY | RTLD_LOCAL | RTLD_GLOBAL)) != 0)
    {
        DL_ERR("invalid flags to dlopen: %x", flags);
        return NULL;
    }
    set_soinfo_pool_protection(PROT_READ | PROT_WRITE);
    soinfo *si = find_library(name);
    if (si != NULL)
    {
        si->CallConstructors();
    }
    set_soinfo_pool_protection(PROT_READ);
    return si;
}

该函数通过find_library函数加载so,如果成功加载,调用si->call_constructors函数进行so的初始化操作,然后返回soinfo类型的指针.

跟踪find_library函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
static soinfo* find_library(const char* name) 
->static soinfo *find_library_internal(const char *name)
{
    if (name == NULL)
    {
        return somain;
    }

    soinfo *si = find_loaded_library(name);
    if (si != NULL)
    {
        if (si->flags & FLAG_LINKED)
        {
            return si;
        }
        DL_ERR("OOPS: recursive link to \"%s\"", si->name);
        return NULL;
    }

    TRACE("[ '%s' has not been loaded yet.  Locating...]", name);
    si = load_library(name);
    if (si == NULL)
    {
        return NULL;
    }

    // At this point we know that whatever is loaded @ base is a valid ELF
    // shared library whose segments are properly mapped in.
    TRACE("[ init_library base=0x%08x sz=0x%08x name='%s' ]",
        si->base, si->size, si->name);

    if (!soinfo_link_image(si))
    {
        munmap(reinterpret_cast<void *>(si->base), si->size);
        soinfo_free(si);
        return NULL;
    }

    return si;
}

find_library函数调用load_library函数进行so的加载,然后通过soinfo_link_image函数完成so的链接过程.

跟踪load_library函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
static soinfo *load_library(const char *name)
{
    // Open the file.
    int fd = open_library(name);
    if (fd == -1)
    {
        DL_ERR("library \"%s\" not found", name);
        return NULL;
    }

    // Read the ELF header and load the segments.
    ElfReader elf_reader(name, fd);
    if (!elf_reader.Load())
    {
        return NULL;
    }

    const char *bname = strrchr(name, '/');
    soinfo *si = soinfo_alloc(bname ? bname + 1 : name);
    if (si == NULL)
    {
        return NULL;
    }
    si->base = elf_reader.load_start();
    si->size = elf_reader.load_size();
    si->load_bias = elf_reader.load_bias();
    si->flags = 0;
    si->entry = 0;
    si->dynamic = NULL;
    si->phnum = elf_reader.phdr_count();
    si->phdr = elf_reader.loaded_phdr();
    return si;
}

该函数先通过open_library打开so文件,拿到一个文件描述符,然后创建ElfReader对象,通过elf_reader.Load函数进行结构填充,最后将elf_reader读取到的结构赋值给si当做结果返回.

跟踪elf_reader.Load函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::Load()
{
    return ReadElfHeader() &&
           VerifyElfHeader() &&
           ReadProgramHeader() &&
           ReserveAddressSpace() &&
           LoadSegments() &&
           FindPhdr();
}

接下来分别跟踪上述函数进行分析so加载流程.

跟踪ReadElfHeader函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
Elf32_Ehdr header_;
bool ElfReader::ReadElfHeader()
{
    ssize_t rc = TEMP_FAILURE_RETRY(read(fd_, &header_, sizeof(header_)));
    if (rc < 0)
    {
        DL_ERR("can't read file \"%s\": %s", name_, strerror(errno));
        return false;
    }
    if (rc != sizeof(header_))
    {
        DL_ERR("\"%s\" is too small to be an ELF executable", name_);
        return false;
    }
    return true;
}

该函数主要从文件中读取header_结构体大小的字节数据到header_中,而header_的类型为Elf32_Ehdr

跟踪VerifyElfHeader函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::VerifyElfHeader()
{
    //判断ELF魔术头
    if (header_.e_ident[EI_MAG0] != ELFMAG0 ||
        header_.e_ident[EI_MAG1] != ELFMAG1 ||
        header_.e_ident[EI_MAG2] != ELFMAG2 ||
        header_.e_ident[EI_MAG3] != ELFMAG3)
    {
        DL_ERR("\"%s\" has bad ELF magic", name_);
        return false;
    }
	
    //判断是否为32位so
    if (header_.e_ident[EI_CLASS] != ELFCLASS32)
    {
        DL_ERR("\"%s\" not 32-bit: %d", name_, header_.e_ident[EI_CLASS]);
        return false;
    }
    //判断是否为小端序
    if (header_.e_ident[EI_DATA] != ELFDATA2LSB)
    {
        DL_ERR("\"%s\" not little-endian: %d", name_, header_.e_ident[EI_DATA]);
        return false;
    }
	//判断文件类型是否为共享目标文件
    if (header_.e_type != ET_DYN)
    {
        DL_ERR("\"%s\" has unexpected e_type: %d", name_, header_.e_type);
        return false;
    }
	//判断ELF版本
    if (header_.e_version != EV_CURRENT)
    {
        DL_ERR("\"%s\" has unexpected e_version: %d", name_, header_.e_version);
        return false;
    }
	//检查目标体系结构类型
    if (header_.e_machine !=
#ifdef ANDROID_ARM_LINKER
        EM_ARM
#elif defined(ANDROID_MIPS_LINKER)
        EM_MIPS
#elif defined(ANDROID_X86_LINKER)
        EM_386
#endif
    )
    {
        DL_ERR("\"%s\" has unexpected e_machine: %d", name_, header_.e_machine);
        return false;
    }

    return true;
}

该函数对程序头部的字段取值进行了限制,只有都符合要求时才返回true

跟踪ReadProgramHeader函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::ReadProgramHeader()
{
    phdr_num_ = header_.e_phnum;

    // Like the kernel, we only accept program header tables that
    // are smaller than 64KiB.
    if (phdr_num_ < 1 || phdr_num_ > 65536 / sizeof(Elf32_Phdr))
    {
        DL_ERR("\"%s\" has invalid e_phnum: %d", name_, phdr_num_);
        return false;
    }

    Elf32_Addr page_min = PAGE_START(header_.e_phoff);
    Elf32_Addr page_max = PAGE_END(header_.e_phoff + (phdr_num_ * sizeof(Elf32_Phdr)));
    Elf32_Addr page_offset = PAGE_OFFSET(header_.e_phoff);

    phdr_size_ = page_max - page_min;

    void *mmap_result = mmap(NULL, phdr_size_, PROT_READ, MAP_PRIVATE, fd_, page_min);
    if (mmap_result == MAP_FAILED)
    {
        DL_ERR("\"%s\" phdr mmap failed: %s", name_, strerror(errno));
        return false;
    }

    phdr_mmap_ = mmap_result;
    phdr_table_ = reinterpret_cast<Elf32_Phdr *>(reinterpret_cast<char *>(mmap_result) + page_offset);
    return true;
}

该函数首先读取程序头表表项数量、检查程序头表大小合法性,然后将ELF文件映射到足够存储程序头大小的内存,最后通过偏移给phdr_table_赋值指向程序头的指针.

跟踪ReserveAddressSpace函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::ReserveAddressSpace()
{
    Elf32_Addr min_vaddr;
    load_size_ = phdr_table_get_load_size(phdr_table_, phdr_num_, &min_vaddr);
    if (load_size_ == 0)
    {
        DL_ERR("\"%s\" has no loadable segments", name_);
        return false;
    }

    uint8_t *addr = reinterpret_cast<uint8_t *>(min_vaddr);
    int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
    void *start = mmap(addr, load_size_, PROT_NONE, mmap_flags, -1, 0);
    if (start == MAP_FAILED)
    {
        DL_ERR("couldn't reserve %d bytes of address space for \"%s\"", load_size_, name_);
        return false;
    }

    load_start_ = start;
    load_bias_ = reinterpret_cast<uint8_t *>(start) - addr;
    return true;
}

该函数主要是计算处so文件中需要加载到内存的PT_LOAD段的总大小,并开辟相应大小的内存空间.

技巧
load_bias_是偏移值,假设so文件中某PT_LOAD段的开始虚拟地址addr = 0x100,而实际上的加载地址start = 0x1000,那么当寻找0x300这个地方的数据时,实际的操作为0x1000 + 0x300 - 0x100,也就是0x300 + 0x1000 - 0x100即0x300 + start - addr.load_bias_为start和addr的差值,当需要得到内存中的真实地址时,只需加上这个偏移即可.

跟踪ReserveAddressSpace函数内的phdr_table_get_load_size如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
size_t phdr_table_get_load_size(const Elf32_Phdr *phdr_table,
                                size_t phdr_count,
                                Elf32_Addr *out_min_vaddr,
                                Elf32_Addr *out_max_vaddr)
{
    Elf32_Addr min_vaddr = 0xFFFFFFFFU;
    Elf32_Addr max_vaddr = 0x00000000U;

    //寻求PT_LOAD段的标志位
    bool found_pt_load = false;
    //遍历程序头表中的每一个段
    for (size_t i = 0; i < phdr_count; ++i)
    {
        //获取每个段的指针phdr
        const Elf32_Phdr *phdr = &phdr_table[i];

        //如果类型不是PT_LOAD则继续遍历,只有PT_LOAD段才需要载入内存
        if (phdr->p_type != PT_LOAD)
        {
            continue;
        }
        found_pt_load = true;

        //找到所有PT_LOAD段中的最小开始地址
        if (phdr->p_vaddr < min_vaddr)
        {
            min_vaddr = phdr->p_vaddr;
        }
        
		//找到所有PT_LOAD段中的最大结束地址
        if (phdr->p_vaddr + phdr->p_memsz > max_vaddr)
        {
            max_vaddr = phdr->p_vaddr + phdr->p_memsz;
        }
    }
    
    //如果没有找到PT_LOAD段,最小地址为0
    if (!found_pt_load)
    {
        min_vaddr = 0x00000000U;
    }

    //进行页对齐
    min_vaddr = PAGE_START(min_vaddr);
    max_vaddr = PAGE_END(max_vaddr);

    if (out_min_vaddr != NULL)
    {
        *out_min_vaddr = min_vaddr;
    }
    if (out_max_vaddr != NULL)
    {
        *out_max_vaddr = max_vaddr;
    }
    return max_vaddr - min_vaddr;
}

该函数主要是获取so文件中包含所有PT_LOAD段的最小地址和最大地址,并在页对齐后求出所有段占用的大小.

跟踪LoadSegments函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::LoadSegments()
{
    for (size_t i = 0; i < phdr_num_; ++i)
    {
        const Elf32_Phdr *phdr = &phdr_table_[i];

        if (phdr->p_type != PT_LOAD)
        {
            continue;
        }

        // Segment addresses in memory.
        Elf32_Addr seg_start = phdr->p_vaddr + load_bias_;
        Elf32_Addr seg_end = seg_start + phdr->p_memsz;

        Elf32_Addr seg_page_start = PAGE_START(seg_start);
        Elf32_Addr seg_page_end = PAGE_END(seg_end);

        Elf32_Addr seg_file_end = seg_start + phdr->p_filesz;

        // File offsets.
        Elf32_Addr file_start = phdr->p_offset;
        Elf32_Addr file_end = file_start + phdr->p_filesz;

        Elf32_Addr file_page_start = PAGE_START(file_start);
        Elf32_Addr file_length = file_end - file_page_start;

        if (file_length != 0)
        {
            void *seg_addr = mmap((void *)seg_page_start,
                                  file_length,
                                  PFLAGS_TO_PROT(phdr->p_flags),
                                  MAP_FIXED | MAP_PRIVATE,
                                  fd_,
                                  file_page_start);
            if (seg_addr == MAP_FAILED)
            {
                DL_ERR("couldn't map \"%s\" segment %d: %s", name_, i, strerror(errno));
                return false;
            }
        }

        // if the segment is writable, and does not end on a page boundary,
        // zero-fill it until the page limit.
        if ((phdr->p_flags & PF_W) != 0 && PAGE_OFFSET(seg_file_end) > 0)
        {
            memset((void *)seg_file_end, 0, PAGE_SIZE - PAGE_OFFSET(seg_file_end));
        }

        seg_file_end = PAGE_END(seg_file_end);

        // seg_file_end is now the first page address after the file
        // content. If seg_end is larger, we need to zero anything
        // between them. This is done by using a private anonymous
        // map for all extra pages.
        if (seg_page_end > seg_file_end)
        {
            void *zeromap = mmap((void *)seg_file_end,
                                 seg_page_end - seg_file_end,
                                 PFLAGS_TO_PROT(phdr->p_flags),
                                 MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE,
                                 -1,
                                 0);
            if (zeromap == MAP_FAILED)
            {
                DL_ERR("couldn't zero fill \"%s\" gap: %s", name_, strerror(errno));
                return false;
            }
        }
    }
    return true;
}

该函数主要是遍历每一个PT_LOAD段,将文件中的段按照ReserveAddressSpace函数计算出来的内存空间范围进行加载,并对其进行额外的0填充等操作.

跟踪FindPhdr函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
bool ElfReader::FindPhdr()
{
    const Elf32_Phdr *phdr_limit = phdr_table_ + phdr_num_;

    // If there is a PT_PHDR, use it directly.
    for (const Elf32_Phdr *phdr = phdr_table_; phdr < phdr_limit; ++phdr)
    {
        if (phdr->p_type == PT_PHDR)
        {
            return CheckPhdr(load_bias_ + phdr->p_vaddr);
        }
    }

    // Otherwise, check the first loadable segment. If its file offset
    // is 0, it starts with the ELF header, and we can trivially find the
    // loaded program header from it.
    for (const Elf32_Phdr *phdr = phdr_table_; phdr < phdr_limit; ++phdr)
    {
        if (phdr->p_type == PT_LOAD)
        {
            if (phdr->p_offset == 0)
            {
                Elf32_Addr elf_addr = load_bias_ + phdr->p_vaddr;
                const Elf32_Ehdr *ehdr = (const Elf32_Ehdr *)(void *)elf_addr;
                Elf32_Addr offset = ehdr->e_phoff;
                return CheckPhdr((Elf32_Addr)ehdr + offset);
            }
            break;
        }
    }

    DL_ERR("can't find loaded phdr for \"%s\"", name_);
    return false;
}

该函数主要是寻找PT_PHDR段并校验,此段指明了段表本身的位置和大小,CheckPhdr函数主要是判断PT_LOAD段是否全部在被加载的内存空间范围内.

so的加载流程主要是根据so的文件信息,先读取so的头部信息进行验证,然后找到程序头的位置,遍历段表的每一个段,根据PT_LOAD段指定的信息将so进行加载.

so链接

跟踪soinfo_link_image函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
static bool soinfo_link_image(soinfo *si)
{
    /* "base" might wrap around UINT32_MAX. */
    Elf32_Addr base = si->load_bias;
    const Elf32_Phdr *phdr = si->phdr;
    int phnum = si->phnum;
    bool relocating_linker = (si->flags & FLAG_LINKER) != 0;

   //----------------------------------

    /* Extract dynamic section */
    size_t dynamic_count;
    Elf32_Word dynamic_flags;
    // ①查找PT_DYNAMIC段
    phdr_table_get_dynamic_section(phdr, phnum, base, &si->dynamic,
                                   &dynamic_count, &dynamic_flags);
    if (si->dynamic == NULL)
    {
        if (!relocating_linker)
        {
            DL_ERR("missing PT_DYNAMIC in \"%s\"", si->name);
        }
        return false;
    }
    else
    {
        if (!relocating_linker)
        {
            DEBUG("dynamic = %p", si->dynamic);
        }
    }

    //----------------------------------
    return true
}

可以看到上述先获取了so加载到内存的地址以及程序头表的指针和数目,然后通过phdr_table_get_dynamic_section函数找到PT_DYNAMIC段的指针.

phdr_table_get_dynamic_section函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker_phdr.cpp
void phdr_table_get_dynamic_section(const Elf32_Phdr *phdr_table,
                                    int phdr_count,
                                    Elf32_Addr load_bias,
                                    Elf32_Dyn **dynamic,
                                    size_t *dynamic_count,
                                    Elf32_Word *dynamic_flags)
{
    const Elf32_Phdr *phdr = phdr_table;
    const Elf32_Phdr *phdr_limit = phdr + phdr_count;

    for (phdr = phdr_table; phdr < phdr_limit; phdr++)
    {
        if (phdr->p_type != PT_DYNAMIC)
        {
            continue;
        }

        *dynamic = reinterpret_cast<Elf32_Dyn *>(load_bias + phdr->p_vaddr);
        if (dynamic_count)
        {
            *dynamic_count = (unsigned)(phdr->p_memsz / 8);
        }
        if (dynamic_flags)
        {
            *dynamic_flags = phdr->p_flags;
        }
        return;
    }
    *dynamic = NULL;
    if (dynamic_count)
    {
        *dynamic_count = 0;
    }
}

返回soinfo_link_image函数,继续分析.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
static bool soinfo_link_image(soinfo *si)
{
    //----------------------------------
    // ②对PT_DYNAMIC段中的字段进行遍历,并根据d_tag做不通操作
    uint32_t needed_count = 0;
    for (Elf32_Dyn *d = si->dynamic; d->d_tag != DT_NULL; ++d)
    {
        DEBUG("d = %p, d[0](tag) = 0x%08x d[1](val) = 0x%08x", d, d->d_tag, d->d_un.d_val);
        switch (d->d_tag)
        {
        case DT_HASH:
            // 哈希表
            si->nbucket = ((unsigned *)(base + d->d_un.d_ptr))[0];
            si->nchain = ((unsigned *)(base + d->d_un.d_ptr))[1];
            si->bucket = (unsigned *)(base + d->d_un.d_ptr + 8);
            si->chain = (unsigned *)(base + d->d_un.d_ptr + 8 + si->nbucket * 4);
            break;
        case DT_STRTAB:
            // 字符串表
            si->strtab = (const char *)(base + d->d_un.d_ptr);
            break;
        case DT_SYMTAB:
            // 符号表
            si->symtab = (Elf32_Sym *)(base + d->d_un.d_ptr);
            break;
        case DT_PLTREL:
            if (d->d_un.d_val != DT_REL)
            {
                DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
                return false;
            }
            break;
        case DT_JMPREL:
            // PTL重定位表
            si->plt_rel = (Elf32_Rel *)(base + d->d_un.d_ptr);
            break;
        case DT_PLTRELSZ:
            // PTL重定位表大小
            si->plt_rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_REL:
            // 重定位表
            si->rel = (Elf32_Rel *)(base + d->d_un.d_ptr);
            break;
        case DT_RELSZ:
            // 重定位表大小
            si->rel_count = d->d_un.d_val / sizeof(Elf32_Rel);
            break;
        case DT_PLTGOT:
            // GOT全局偏移表,与PLT延迟绑定相关
            /* Save this in case we decide to do lazy binding. We don't yet. */
            si->plt_got = (unsigned *)(base + d->d_un.d_ptr);
            break;
        case DT_DEBUG:
            // 调试相关
            //  Set the DT_DEBUG entry to the address of _r_debug for GDB
            //  if the dynamic table is writable
            if ((dynamic_flags & PF_W) != 0)
            {
                d->d_un.d_val = (int)&_r_debug;
            }
            break;
        case DT_RELA:
            DL_ERR("unsupported DT_RELA in \"%s\"", si->name);
            return false;
        case DT_INIT:
            // 初始化函数
            si->init_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT) found at %p", si->name, si->init_func);
            break;
        case DT_FINI:
            // 析构函数
            si->fini_func = reinterpret_cast<linker_function_t>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI) found at %p", si->name, si->fini_func);
            break;
        case DT_INIT_ARRAY:
            // init.array 初始化函数列表
            si->init_array = reinterpret_cast<linker_function_t *>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_INIT_ARRAY) found at %p", si->name, si->init_array);
            break;
        case DT_INIT_ARRAYSZ:
            // init.array 大小
            si->init_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_FINI_ARRAY:
            // fini.array 析构函数列表
            si->fini_array = reinterpret_cast<linker_function_t *>(base + d->d_un.d_ptr);
            DEBUG("%s destructors (DT_FINI_ARRAY) found at %p", si->name, si->fini_array);
            break;
        case DT_FINI_ARRAYSZ:
            // fini.array 大小
            si->fini_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_PREINIT_ARRAY:
            // 初始化函数,大多只出现在可执行文件中,在so中忽略
            si->preinit_array = reinterpret_cast<linker_function_t *>(base + d->d_un.d_ptr);
            DEBUG("%s constructors (DT_PREINIT_ARRAY) found at %p", si->name, si->preinit_array);
            break;
        case DT_PREINIT_ARRAYSZ:
            // 初始化函数列表大小
            si->preinit_array_count = ((unsigned)d->d_un.d_val) / sizeof(Elf32_Addr);
            break;
        case DT_TEXTREL:
            si->has_text_relocations = true;
            break;
        case DT_SYMBOLIC:
            si->has_DT_SYMBOLIC = true;
            break;
        case DT_NEEDED:
            // 当前so的依赖,仅做计数操作
            ++needed_count;
            break;
            //----------------------------------
        }
    }

    //----------------------------------
    soinfo **needed = (soinfo **)alloca((1 + needed_count) * sizeof(soinfo *));
    soinfo **pneeded = needed;

    // ③再次遍历PT_DYNAMIC段,进行so依赖库加载
    for (Elf32_Dyn *d = si->dynamic; d->d_tag != DT_NULL; ++d)
    {
        if (d->d_tag == DT_NEEDED)
        {
            const char *library_name = si->strtab + d->d_un.d_val;
            DEBUG("%s needs %s", si->name, library_name);
            soinfo *lsi = find_library(library_name);
            if (lsi == NULL)
            {
                strlcpy(tmp_err_buf, linker_get_error_buffer(), sizeof(tmp_err_buf));
                DL_ERR("could not load library \"%s\" needed by \"%s\"; caused by %s",
                       library_name, si->name, tmp_err_buf);
                return false;
            }
            *pneeded++ = lsi;
        }
    }
    *pneeded = NULL;

    //----------------------------------
    // ④根据PTL重定位表或者重定位表来处理重定位
    if (si->plt_rel != NULL)
    {
        DEBUG("[ relocating %s plt ]", si->name);
        if (soinfo_relocate(si, si->plt_rel, si->plt_rel_count, needed))
        {
            return false;
        }
    }
    if (si->rel != NULL)
    {
        DEBUG("[ relocating %s ]", si->name);
        if (soinfo_relocate(si, si->rel, si->rel_count, needed))
        {
            return false;
        }
    }

    //----------------------------------
    // ⑤设置so的链接标志
    si->flags |= FLAG_LINKED;
    DEBUG("[ finished linking %s ]", si->name);

    if (si->has_text_relocations)
    {
        /* All relocations are done, we can protect our segments back to
         * read-only. */
        if (phdr_table_protect_segments(si->phdr, si->phnum, si->load_bias) < 0)
        {
            DL_ERR("can't protect segments for \"%s\": %s",
                   si->name, strerror(errno));
            return false;
        }
    }

    /* We can also turn on GNU RELRO protection */
    if (phdr_table_protect_gnu_relro(si->phdr, si->phnum, si->load_bias) < 0)
    {
        DL_ERR("can't enable GNU RELRO protection for \"%s\": %s",
               si->name, strerror(errno));
        return false;
    }

    notify_gdb_of_load(si);
    return true;
}

该函数主要是先查找PT_DYNAMIC段,然后对PT_DYNAMIC段中的字段进行遍历,并根据d_tag做不同处理,然后将so依赖库载入内存,根据重定位表来处理重定位,最后设置so的已链接标志,表示so动态库链接完成.

soinfo_relocate函数用来完成重定位操作,跟踪如下:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
static int soinfo_relocate(soinfo *si, Elf32_Rel *rel, unsigned count,
                           soinfo *needed[])
{
    // 获取符号表和字符串表
    Elf32_Sym *symtab = si->symtab;
    const char *strtab = si->strtab;
    Elf32_Sym *s;
    Elf32_Rel *start = rel;
    soinfo *lsi;

    // ①对重定位表的每个表项进行遍历
    for (size_t idx = 0; idx < count; ++idx, ++rel)
    {
        // 重定位类型
        unsigned type = ELF32_R_TYPE(rel->r_info);
        // 重定位符号
        unsigned sym = ELF32_R_SYM(rel->r_info);
        // 通过加载地址计算得到需要重定位的地址
        Elf32_Addr reloc = static_cast<Elf32_Addr>(rel->r_offset + si->load_bias);
        Elf32_Addr sym_addr = 0;
        char *sym_name = NULL;

        DEBUG("Processing '%s' relocation at index %d", si->name, idx);
        if (type == 0)
        { // R_*_NONE
            continue;
        }
        if (sym != 0)
        {
            // 如果sym不为0,说明,通过st_name字段在符号表中拿到符号名
            sym_name = (char *)(strtab + symtab[sym].st_name);
            // 根据符号名从依赖so中查找需要的符号,找到后将地址赋值给lsi
            s = soinfo_do_lookup(si, sym_name, &lsi, needed);
            if (s == NULL)
            {
                // 如果没有找到就使用so本身的符号
                s = &symtab[sym];
                if (ELF32_ST_BIND(s->st_info) != STB_WEAK)
                {
                    DL_ERR("cannot locate symbol \"%s\" referenced by \"%s\"...", sym_name, si->name);
                    return -1;
                }

                // 如果符号不是外部符号,只能是以下几种类型
                switch (type)
                {
#if defined(ANDROID_ARM_LINKER)
                case R_ARM_JUMP_SLOT:
                case R_ARM_GLOB_DAT:
                case R_ARM_ABS32:
                case R_ARM_RELATIVE: /* Don't care. */
#elif defined(ANDROID_X86_LINKER)
                case R_386_JMP_SLOT:
                case R_386_GLOB_DAT:
                case R_386_32:
                case R_386_RELATIVE: /* Dont' care. */
#endif /* ANDROID_*_LINKER */
                    /* sym_addr was initialized to be zero above or relocation
                       code below does not care about value of sym_addr.
                       No need to do anything.  */
                    break;

#if defined(ANDROID_X86_LINKER)
                case R_386_PC32:
                    sym_addr = reloc;
                    break;
#endif /* ANDROID_X86_LINKER */

#if defined(ANDROID_ARM_LINKER)
                case R_ARM_COPY:
                    /* Fall through.  Can't really copy if weak symbol is
                       not found in run-time.  */
#endif /* ANDROID_ARM_LINKER */
                default:
                    DL_ERR("unknown weak reloc type %d @ %p (%d)",
                           type, rel, (int)(rel - start));
                    return -1;
                }
            }
            else
            {
                // 如果找到了so的外部符号
#if 0
                if ((base == 0) && (si->base != 0)) {
                        /* linking from libraries to main image is bad */
                    DL_ERR("cannot locate \"%s\"...",
                           strtab + symtab[sym].st_name);
                    return -1;
                }
#endif
                // 获取外部so的基址加上符号的st_value来拿到符号的值
                sym_addr = static_cast<Elf32_Addr>(s->st_value + lsi->load_bias);
            }
            count_relocation(kRelocSymbol);
        }
        else
        {
            // 如果sym为0,说明当前重定位用不到符号
            s = NULL;
        }

        // ②根据重定位的类型类处理重定位
        switch (type)
        {
#if defined(ANDROID_ARM_LINKER)
        case R_ARM_JUMP_SLOT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO JMP_SLOT %08x <- %08x %s", reloc, sym_addr, sym_name);
            // 直接将需要重定位的地方写入获取到的符号地址
            *reinterpret_cast<Elf32_Addr *>(reloc) = sym_addr;
            break;
        case R_ARM_GLOB_DAT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO GLOB_DAT %08x <- %08x %s", reloc, sym_addr, sym_name);
            // 直接将需要重定位的地方写入获取到的符号地址
            *reinterpret_cast<Elf32_Addr *>(reloc) = sym_addr;
            break;
        case R_ARM_ABS32:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO ABS %08x <- %08x %s", reloc, sym_addr, sym_name);
            // 将重定位的地方与获取到的符号地址相加,再写入需要重定位的地方
            *reinterpret_cast<Elf32_Addr *>(reloc) += sym_addr;
            break;
        case R_ARM_REL32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO REL32 %08x <- %08x - %08x %s",
                       reloc, sym_addr, rel->r_offset, sym_name);
            *reinterpret_cast<Elf32_Addr *>(reloc) += sym_addr - rel->r_offset;
            break;
#elif defined(ANDROID_X86_LINKER)
        case R_386_JMP_SLOT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO JMP_SLOT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr *>(reloc) = sym_addr;
            break;
        case R_386_GLOB_DAT:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO GLOB_DAT %08x <- %08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr *>(reloc) = sym_addr;
            break;
#elif defined(ANDROID_MIPS_LINKER)
        case R_MIPS_REL32:
            count_relocation(kRelocAbsolute);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO REL32 %08x <- %08x %s",
                       reloc, sym_addr, (sym_name) ? sym_name : "*SECTIONHDR*");
            if (s)
            {
                *reinterpret_cast<Elf32_Addr *>(reloc) += sym_addr;
            }
            else
            {
                *reinterpret_cast<Elf32_Addr *>(reloc) += si->base;
            }
            break;
#endif /* ANDROID_*_LINKER */

#if defined(ANDROID_ARM_LINKER)
        case R_ARM_RELATIVE:
#elif defined(ANDROID_X86_LINKER)
        case R_386_RELATIVE:
#endif /* ANDROID_*_LINKER */
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            if (sym)
            {
                DL_ERR("odd RELATIVE form...");
                return -1;
            }
            TRACE_TYPE(RELO, "RELO RELATIVE %08x <- +%08x", reloc, si->base);
            *reinterpret_cast<Elf32_Addr *>(reloc) += si->base;
            break;

#if defined(ANDROID_X86_LINKER)
        case R_386_32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);

            TRACE_TYPE(RELO, "RELO R_386_32 %08x <- +%08x %s", reloc, sym_addr, sym_name);
            *reinterpret_cast<Elf32_Addr *>(reloc) += sym_addr;
            break;

        case R_386_PC32:
            count_relocation(kRelocRelative);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO R_386_PC32 %08x <- +%08x (%08x - %08x) %s",
                       reloc, (sym_addr - reloc), sym_addr, reloc, sym_name);
            *reinterpret_cast<Elf32_Addr *>(reloc) += (sym_addr - reloc);
            break;
#endif /* ANDROID_X86_LINKER */

#ifdef ANDROID_ARM_LINKER
        case R_ARM_COPY:
            if ((si->flags & FLAG_EXE) == 0)
            {
                /*
                 * http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
                 *
                 * Section 4.7.1.10 "Dynamic relocations"
                 * R_ARM_COPY may only appear in executable objects where e_type is
                 * set to ET_EXEC.
                 *
                 * TODO: FLAG_EXE is set for both ET_DYN and ET_EXEC executables.
                 * We should explicitly disallow ET_DYN executables from having
                 * R_ARM_COPY relocations.
                 */
                DL_ERR("%s R_ARM_COPY relocations only supported for ET_EXEC", si->name);
                return -1;
            }
            count_relocation(kRelocCopy);
            MARK(rel->r_offset);
            TRACE_TYPE(RELO, "RELO %08x <- %d @ %08x %s", reloc, s->st_size, sym_addr, sym_name);
            if (reloc == sym_addr)
            {
                Elf32_Sym *src = soinfo_do_lookup(NULL, sym_name, &lsi, needed);

                if (src == NULL)
                {
                    DL_ERR("%s R_ARM_COPY relocation source cannot be resolved", si->name);
                    return -1;
                }
                if (lsi->has_DT_SYMBOLIC)
                {
                    DL_ERR("%s invalid R_ARM_COPY relocation against DT_SYMBOLIC shared "
                           "library %s (built with -Bsymbolic?)",
                           si->name, lsi->name);
                    return -1;
                }
                if (s->st_size < src->st_size)
                {
                    DL_ERR("%s R_ARM_COPY relocation size mismatch (%d < %d)",
                           si->name, s->st_size, src->st_size);
                    return -1;
                }
                memcpy((void *)reloc, (void *)(src->st_value + lsi->load_bias), src->st_size);
            }
            else
            {
                DL_ERR("%s R_ARM_COPY relocation target cannot be resolved", si->name);
                return -1;
            }
            break;
#endif /* ANDROID_ARM_LINKER */

        default:
            DL_ERR("unknown reloc type %d @ %p (%d)",
                   type, rel, (int)(rel - start));
            return -1;
        }
    }
    return 0;
}

至此so的链接过程分析完毕.

so初始化

跟踪si->call_constructors函数如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
//http://androidxref.com/4.4.4_r1/xref/bionic/linker/linker.cpp
void soinfo::CallConstructors()
{
    // 如果已初始化,则不再进行初始化
    if (constructors_called)
    {
        return;
    }

    constructors_called = true;

    if ((flags & FLAG_EXE) == 0 && preinit_array != NULL)
    {
        // The GNU dynamic linker silently ignores these, but we warn the developer.
        PRINT("\"%s\": ignoring %d-entry DT_PREINIT_ARRAY in shared library!",
              name, preinit_array_count);
    }

    // 如果dynamic段不为空,先处理依赖库的初始化
    if (dynamic != NULL)
    {
        for (Elf32_Dyn *d = dynamic; d->d_tag != DT_NULL; ++d)
        {
            if (d->d_tag == DT_NEEDED)
            {
                const char *library_name = strtab + d->d_un.d_val;
                TRACE("\"%s\": calling constructors in DT_NEEDED \"%s\"", name, library_name);
                find_loaded_library(library_name)->CallConstructors();
            }
        }
    }

    TRACE("\"%s\": calling constructors", name);

    // 调用init_func
    CallFunction("DT_INIT", init_func);
    // 调用init_array
    CallArray("DT_INIT_ARRAY", init_array, init_array_count, false);
}

so的初始化主要就是调用CallFunction以及CallArray函数分别对_init函数以及init_array函数数组进行调用.

参考链接

<Unidbg逆向工程 原理与实践>


相关内容

0%