Need of R-Value References & Move Semantics in C++11

Need of R-Value References & Move Semantics in C++11

Before C++11 was introduced, C++ already had references, or you can say L-Value references. Then why were R-Value References and Move Semantics was introduced in C++? What was the problem with existing references in C++? In this tutorial, we will discuss the problem with temporary objects and references in old C++, so that we get a clear picture of what kind of problems the R-Value References & Move Semantics in C++11 are trying to solve.

What is L-Value and L-Value Reference in C++? #

L-values (left values) in C++ refer to locations in memory that can be identified and accessed. They persist beyond a single expression, meaning you can assign new values to them. In simple terms, an L-value is something that has a name and address, like variables. Let’s use an example.

// Here x is a L-Value
int x = 10;

// A reference to variable someData
// Or a L-Value Reference
int & ref = x;

In the expression int x = 10;, the variable x is an L-Value because it refers to a location in memory that can be identified and accessed. Also, the variable x exists beyond the expression in which it was declared. We can even take the address of the variable x, therefore it is an L-value. A reference to an L-value is called an L-value reference, like the variable ref, which is an L-value reference.

However, the value 10 in the expression int x = 10; is not a L-Value because it does not exist beyond the expression, and we cannot refer to its location in memory or fetch the address of 10. Therefore, the 10 in this expression is not an L-value. In old C++, there was nothing to identify these kinds of things in an expression, but in C++11, we can refer to them as R-Values. We will discuss that in detail later.

The old L-value references were good and helped a lot in improving the performance of code. However, they could not handle temporary objects efficiently. This was a limitation that caused performance issues while handling large objects as temporary objects. By large objects, we mean the objects which internally had resources that required lots of processing to copy, and because of the absence of any mechanism to move temporary objects directly, temporary objects were often copied unnecessarily, leading to significant overhead.

To solve this problem, R-value references and move semantics were introduced in C++11. Now, first, we will understand this problem with L-value references in old C++ with some examples, and then we will learn how R-value references and move semantics in C++11 solve this problem.

The Problem of Unnecessary Copies of Temporary Objects in C++ #

Let’s consider a class Student that contains an integer age and a character pointer name. The character pointer will store the name of the student. As the class contains a pointer, the default copy behavior (shallow copy) would simply copy the pointer address. This is undesirable because when the original object is deleted, the pointer in the copied object will become a dangling pointer. To prevent this, we implement a deep copy using a copy constructor and an assignment operator.

We will begin by defining our Student class and implement a copy constructor and assignment operator to deeply copy the name pointer.

#include <iostream>
#include <cstring>

class Student {
private:
    int age;
    char* name;

public:
    // Constructor
    Student(int age, const char* name) {
        this->age = age;
        this->name = new char[strlen(name) + 1];
        strcpy(this->name, name);
    }

    // Copy constructor for deep copying
    Student(const Student& other) {
        age = other.age;

        // Allocate new memory
        name = new char[strlen(other.name) + 1]; 
        
        // Deep copy
        strcpy(name, other.name);
        std::cout << "Copy constructor called\n";
    }

    // Assignment operator for deep copying
    Student& operator=(const Student& other) {
        // Avoid self-assignment
        if (this != &other) {  
            // Release old memory
            delete[] name;  
            age = other.age;
            // Allocate new memory
            name = new char[strlen(other.name) + 1];  
            // Deep copy
            strcpy(name, other.name);  
        }
        std::cout << "Assignment operator called\n";
        return *this;
    }

    // Destructor
    ~Student() {
        delete[] name;
    }
};

In the above example:

  • Copy Constructor: Allocates new memory for name and copies the contents from the other object.
  • Assignment Operator: First deletes the old memory to prevent memory leaks, then allocates new memory and copies the contents from the source object.

However, both of these are costly operations because we are allocating memory on the heap and copying data. This can become a performance bottleneck when dealing with temporary objects or frequent copying.

Now we will show some problems where unnecessary copies of objects of class Student will be created due to the lack of new features like R-value references and move semantics in C++. Let’s discuss these problems with separate examples.

Problem 1: Poor Efficiency of Containers prior to C++11: #

Containers like std::vector, std::map, and others benefit greatly from move semantics. When resizing, inserting, or re-allocating, containers can now move objects instead of copying them.

For example, consider we have a std::vector of type Student. This vector can hold objects of type Student. Initially, we reserve the size of the vector to be 2. Now, suppose we add or push back 4 Student objects into the vector. Internally, the vector will need to resize to accommodate the extra elements.

std::vector<Student> vec;
vec.reserve(2);

// Insert 1000s of elements into vector
vec.push_back(Student(31, "John"));
vec.push_back(Student(34, "Mathew"));
vec.push_back(Student(44, "Sanjay"));
vec.push_back(Student(44, "Raj"));

Its output will be,

Copy constructor called
Copy constructor called
Copy constructor called
Copy constructor called
Copy constructor called
Copy constructor called

During resizing, the vector will copy the existing Student objects into a new, larger memory block. This causes the copy constructor of the Student class to be called for each object being copied. If the Student class contains a pointer to dynamically allocated memory (on the heap), the copy constructor becomes a relatively expensive operation. This is because the constructor must allocate new memory, copy the data from the original object, and manage the destruction of the old memory.

The copy operation can become costly when the object has dynamically allocated memory, as it will:

  1. Allocate new memory on the heap.
  2. Copy the data from the original object into the new memory.
  3. Delete the old memory when necessary.

This can slow down the program, especially when many objects are copied, as is often the case when a vector resizes.

Before the introduction of rvalue references and move semantics in C++11, this inefficiency was a common issue in C++ because every time the vector resized, it would trigger a copy of the objects, calling the copy constructor, allocating new memory, and deleting the old memory. For example, by implementing a move constructor and move assignment operator in the Student class, we can prevent these unnecessary copies of Student objects when vector resizing happens. Instead, we can move the internal data between objects, which is more efficient than copying.

Important Note: We will learn about R-Value References and Move Constructor in upcoming articles. The idea here is to explain the problems with the existing version of C++ and the need for Move Semantics.

Problem 2: Passing Large Object by Value to Functions #

Suppose we have a std::vector of type Student, and it contains thousands of Student objects. Now, we want to pass this vector to a function by value, intending for the function to take full ownership of the vector. In older versions of C++, this would result in unnecessary copying of the entire vector, which is inefficient. Consider the following code example:

void HandleData(std::vector<Student> vecObj)
{
    // Do some work with the vector
}

int main()
{
    std::vector<Student> vec;
    vec.reserve(1000);

    // Insert thousands of elements into the vector
    vec.emplace_back(31, "John");
    vec.emplace_back(34, "Mathew");
    vec.emplace_back(44, "Sanjay");
    // ....... Add 100s of Student Objects ......

    // Pass vector by copy to function
    HandleData(vec);

    return 0;
}

In this example, when we call the HandleData() function and pass the vector by value, it creates a copy of the vector inside the HandleData() function. Essentially, all the Student objects from the original vector are copied into a new vector, which is inefficient for large datasets.

We can confirm this by observing how many times the copy constructor is called. The output will be:

Copy constructor called
Copy constructor called
Copy constructor called

In this case, the copy constructor is called multiple times when we pass the vector by value, leading to the creation of new vector using deep copy. This is not desirable since our intent was to transfer ownership of the vector to the HandleData() function, but instead, we created a copy, which is a costly operation.

With modern C++ features like rvalue references and move semantics, we can prevent this unnecessary copying and efficiently transfer ownership of the vector to the HandleData() function.

To achieve this, we can use std::move() to convert the vector into an rvalue, which signals that we are transferring ownership of the resource. Here’s how it looks:

HandleData(std::move(vec));

In this case, no copy constructor will be called. Instead, the vector will be moved to the HandleData() function. After the function call, the original vector (vec) will have its size reduced to 0 because ownership of the data has been transferred.

Without rvalue references and move semantics, passing a large vector by value would result in a deep copy, which is inefficient. By using std::move() and move constructors, we can transfer ownership of the vector efficiently, improving performance, especially for large datasets.

We will learn about R-Value References and Move Constructor in upcoming articles. The idea here is to explain the problems with the existing version of C++ and the need for Move Semantics.

Problem 3: Copying Temporary Objects #

Consider a scenario where we have a function that returns a Student object. When this function returns a Student, a temporary object is created, which then gets copied to the variable in the calling function. This creates an unnecessary copy.

Student getStudent()
{
    Student temp(20, "Mark Sen");
    // Temporary object created
    return temp;  
}

int main()
{
    Student s1 = getStudent();  // Copying happens here
}

In the above code:

  • A Student object temp is created inside the getStudent() function.
  • When returning, this temporary object is copied into s1 in main().
  • The copy constructor is called, leading to unnecessary memory allocation and copying.

The output of above code will be,

Constructor called
Copy constructor called
Destructor called
Destructor called

This shows that we are creating a temporary object and copying it, which is unnecessary, especially if this function is called millions of times. Every time, memory is allocated and then deallocated, leading to performance overhead.

Although in latest compilers this problem was resolved using Return Value Optimization.

Return Value Optimization vs Move Semantics

Return Value Optimization (RVO) and Named Return Value Optimization (NRVO) were present in earlier versions of C++ (before C++11), and they do optimize many cases of returning objects from functions by eliminating unnecessary copies. However, RVO/NRVO is not guaranteed in all scenarios, and there are several cases where it cannot be applied or doesn’t solve the problem completely.

RVO (Return Value Optimization) is not guaranteed by all compilers by default, especially in versions of C++ prior to C++17.

For example,

Student create() {
    Student obj;
    return obj; // RVO *may* happen, but it's not guaranteed
}

In C++98/11/14, the compiler might use RVO to eliminate the copy, but this behavior was not guaranteed and depended on the specific compiler implementation. However, from C++17 onwards, in the above example, copy elision is guaranteed by the standard, and obj is constructed directly in the memory where the function’s return value will be stored, without invoking the copy or move constructor. Wherewas, move semantics works in all the cases.

There were some other factors why move semantics was need tin Modern C++.

Solution: R-Value References & Move Semantics #

To avoid this unnecessary copying of temporary objects, R-Value References and Move Semantics were introduced in C++11. Using move constructors and move assignment operators, we can transfer ownership of resources from one object to another without copying the underlying data. This is particularly useful when working with temporary objects, where we can “move” the resources instead of copying them.

Move semantics drastically reduce the number of memory allocations and deallocations, which is especially important in performance-critical applications. Imagine code that generates millions of temporary objects in a day. Without move semantics, millions of unnecessary copies would occur, each with its own heap allocation and deallocation, slowing down the system.

Summary #

The purpose of this article was to show the problem with temporary objects and old-style references in C++ (i.e., L-Value References). In upcoming articles, we will discuss R-Value References and Move Semantics from C++11. We will also discuss how things work under the hood. Stay tuned!

Author: Varun

A Software Developer with 20 Years of Experience in C/C++