Why does C++ disallow the creation of valid pointers from a valid address and type?

1906 views c++
5

If you know two pieces of information:

  1. A memory address.
  2. The type of the object stored in that address.

Then you logically have all you need to reference that object:

#include <iostream>
using namespace std;

int main()
{
    int x = 1, y = 2;
    int* p = (&x) + 1;
    if ((long)&y == (long)p)
        cout << "p now contains &y\n";
    if (*p == y)
        cout << "it also dereference to y\n";
}

However, this isn't legal per the C++ standard. It works in several compilers I tried, but it's Undefined Behavior.

The question is: why?

answered question

If you know two pieces of information, that doesn't necessarily means the compiler knows it, nor does it necessarily trust you :-)

Try turning on optimisation options and watch the code die horribly.

@HenriMenke Doesn't mean it's valid ;)

I wonder: does this question (or answers to such) change at all if &y is not in the program? It seems that y might not need to "exist at an address", much less "one after" x. That is, if the behavior wasn't UB it would need to be defined, and this would force compilers to codify/honor several assumptions..

@Rakete1111 That is true but I though the question was about compiler errors.

Because what you think you know may not be true.

However, this isn't legal per the C++ standard. -- I'm not seeing where any of this code violates the C++ standard.

@PaulMcKenzie assuming two variables are adjacent in memory is a bad assumption to make. The standard makes no guarantees about how variables are stored.

Requiring this to work essentially bans putting local variables in registers, which is effectively a requirement for a serious compiler.

@PaulMcKenzie: *p==y is definitely UB.

@geza Yes, UB, but where does this code violate the C++ standard? Returning a pointer to a local variable is also UB, but it does not violate the C++ standard.

@PaulMcKenzie: okay :) People maybe mean different things when they say "legal per the C++ standard", "violates the C++ standard".

Actually, what do you mean by "why": "in what way does the standard undefine this" or "why does the standard make that choice"?

4 Answers

4

If you want to treat pointers as a numeric type, firstly you need to use std::uintptr_t, not long. That's the first undefined behavior, but not the one you're talking about.

It works in several compilers I tried, but it's Undefined Behavior.

The question is: why?

You are definitely invoking undefined behavior by trying to compare two distinctly unrelated pointers:

  • &x + 1
  • &y

The pointer &x+1 is a one-past-the-end pointer. The standard allows you to have such a pointer, but the behavior is only defined when you use it to compare against pointers based on x. The behavior is not defined if you compare it with anything else.

The compiler is free to put y anywhere it chooses, including in a register. As such, there is no guarantee that &y and &x+1 are related.

posted this
6

If you know address and type of an object and your implementation has relaxed pointer safety [basic.stc.dynamic.safety §4], then it should be legal to just access the object at that address through an appropriate lvalue I think.

The problem is that the standard does not guarantee that local variables of the same type are allocated contiguously with addresses increasing in order of declaration. So you cannot derive the address of y based on that computation you do with the address of x. Thus your assumptions do not hold, you may know the type, but you do not know the address of the object you're talking about. Furthermore, pointer arithmetic would lead to undefined behavior if you go more than one element past an object ([expr.add]). So while (&x) + 1 is not undefined behavior yet, just the act of even computing (&x) + 2 would be…

posted this
13

The code is legal per the C++ standard (i.e. should compile), but as you already noted the behaviour is undefined. This is because the order of variable declaration does not imply that they will be arranged in memory in the same way.

posted this
10

It wreaks havoc with optimizations.

void f(int* x);

int g() {
    int x = 1, y = 2;
    f(&x);
    return y;
}

If you can validly "guess" the address of y from x's address, then the call to f may modify y and so the return statement must reload the value of y from memory.

Now consider a typical function with more local variables and more calls to other functions, where you'd have to save the value of every variable to memory before each call (because the called function may inspect them) and reload them after each call (because the called function may have modified them).

posted this

Have an answer?

JD

Please login first before posting an answer.