C++ Memory Corruption (std::string) - part 4

 

Summary

This is the next part of the C++ memory corruption series*. In this post, we'll look at corrupting the std:string object in Linux and see what exploitation primitives we can gain.

* https://blog.infosectcbr.com.au/2020/08/c-memory-corruption-part-1.html

* https://blog.infosectcbr.com.au/2022/01/c-memory-corruption-stdvector-part-2.html 

https://blog.infosectcbr.com.au/2022/03/c-memory-corruption-stdlist-part-3.html

Author: Dr Silvio Cesare

Introduction

C++ is a common language for memory corruption. However, there is much more literature on exploiting C programs and little on C++ programs. C++ presents new classes, objects, and data structures which can all be effectively used for building exploitation primitives. In this post, we'll look at corrupting the std::string class and see what specific primitives we can obtain.

std::string

We note that the object stored in memory for a basic string consists firstly of the backing pointer to the string contents. Secondly, the next member is the size of the string content.

These 2 members are very useful to target in a memory corruption. If we can modify the backing pointer, we might be able to construct an arbitrary read/write primitive such that accessing the string contents points to our modified backing pointer.

If we can modify the size member, then we might be able to access the relative memory from the backing pointer base, effectively giving us a relative read/write primitive.

Corrupting the Backing Pointer

In this attack, we'll corrupt the backing pointer of the string. The backing pointer is the first member of the object. We'll make it point to an arbitrary address by corrupting p[0] and then leak the contents of that new memory address. We effectively build an arbitrary read primitive.

#include <cstdio>
#include <cstdlib>
#include <string>

static char victim[] = "secret";

int
main()
{
	std::string str;
	unsigned long *p = (unsigned long *)&str;
	unsigned long x;

	str = "12345678";
	p[0] = (unsigned long)&victim;
	printf("%s\n", str.c_str());
		
	exit(1);
}

Corrupting the String Size

The string content's size member is the 2nd member of the object. We can make this size larger than it should be, and this allows us to access out of bounds beyond the original string contents.

In the following code, p[1] is the memory corruption of the size member. The subsequent access to the string through s.str[i] goes out of bound but this is allowed since it's within the new corrupted size bounds. We effectively build a relative read primitive.

#include <cstdio>
#include <cstdlib>
#include <string>


#define N (0x40)

struct struct_s {
	std::string str;
	unsigned long victim;
};

static struct struct_s s;

int
main()
{
	unsigned long *p = (unsigned long *)&s.str;
	unsigned long x;

	s.str = std::string("12345678");
	s.victim = 0x1122334455667788;

	p[1] = N;
	for (int i = 16; i < (16 + 8); i++) {
		x <<= 8;
		x |= (unsigned char)s.str[i];
	}
	printf("%lx\n", x);
	exit(1);
}

Conclusion

This short blog post presents another technique of corrupting a C++ object and gaining a useful primitive to an attacker. In this edition, we looked at corrupting std::string objects and built arbitrary and relative read primitives.

Comments

  1. Hey, it is worth to mention that std::string layout is not standardized, i.e. it may differ between implementations and it indeed does. Also, both libstdc++ (GCC) and libcxx (LLVM) implementations do have a "short string optimizations" where if the string is small enough (and this size differs between GCC's and LLVM's implementations), its fields are interpreted differently and the underlying data is stored within the string structure (TL;DR: Yeah, std::string is an union).

    Your examples work as expected in recent GCC versions, but if you recompile these listings with clang and its standard library implementation (adding a -stdlib=libc++ flag) the resulting programs will give different output. It would be great if you could update your post to include OS/compiler versions and expected outputs so that people can reproduce it (and be sure what was expected).

    For what is worth, the LLVM's libcxx implementation also has an "alternate string layout" version where it uses a different std::string layout that supposedly may be faster in some codebases (one can use that through the -D_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT -D_LIBCPP_ABI_UNSTABLE -stdlib=libc++ flags iirc).

    Generally while strings may look obvious to implement, there is a lot of engineering behind them and the "small string optimization" to make things faster. If one wants to learn more about those I would recommend going through those references:
    - https://github.com/elliotgoodrich/SSO-23
    - https://www.youtube.com/watch?v=kPR8h4-qZdk

    ReplyDelete

Post a Comment

Popular posts from this blog

C++ Memory Corruption (std::vector) - part 2

Pointer Compression in V8

Linux Kernel Stack Smashing