Archive for the ‘Inside .NET’ Category

Inside .NET - The mystery of conv.ovf.<to type>.un…

Friday, March 27th, 2009

I'm sure you've all read the ECMA-335 document; but for those of you who don't know, this is the document that specifies all the gory details of how .NET works.

When you got to Partition III, page 62 in your bedtime reading I expect you read it, thought nothing very much of it, but then went Hmmm? That's a little odd. Why is it needed exactly?

Because it's a little strange, and I just can't figure out why it's really needed.

Of course, this could be because I'm being a bit slow. If so, please just leave a comment explaining what it's about to put me out of my confusion - I'm not losing sleep over it, but it's close...

The problem is all about conversion instructions. When you write something like this:

int i = ...
byte b = (byte)i;

Then the CIL instruction generated for line 2 will be conv.u1, which tells the CIL runtime to convert whatever is at the top of the stack into a u1, which is the same as the C# byte type.

Simple.

Now, if you had this C#:

checked {
  int i = ...
  byte b = (byte)i;
}

Then the CIL instruction generated for line 3 will be conv.ovf.u1, which tells the CIL runtime to do the same conversion as last time, but do an overflow check as well, so if i doesn't fit in to a byte then an OverflowException will be thrown.

Still simple.

And, futhermore, if you have this in C#:

checked {
  uint i = ...
  byte b = (byte)i;
}

Then the CIL for line 3 will be conv.ovf.u1.un, which tells the CIL runtime to do the another checked conversion, but this time it is told that the source type is unsigned (the .un part of the CIL instruction).

Which still looks well, good, obvious and still simple.

But the thing is - it just isn't needed.

As part of the JIT, the runtime has to do a full stack analysis, and as part of that stack analysis the runtime will already know exactly what type is in the source stack entry. Therefore it can figure out whether it's a signed or unsigned int without any help from the CIL instruction.

And because there is a very small range of types that are allowed to be used as the operand to these conversion instructions, and the range of supported instructions is fixed (i.e. no extensibility is allowed), I cannot think of any situation where theses conv.ovf.<to type>.un instructions would actually be needed. The runtime could always just use the conv.ovf.<to type> instruction and figure out if it's signed or unsigned from the stack analysis.

Inside .NET - 2 byte compare op-codes

Wednesday, March 25th, 2009

A small anomaly of the CIL instruction set is why the ceq, cgt, cgt.un, clt and clt.un instructions are 2-byte op-codes.

2-byte op-codes would normally be used for instructions that are not commonly used, as they (obviously) take up more space than 1-byte op-codes.

But the group of op-codes listed above are commonly used, and there appears to be plenty of space in the 1-byte op-code space for them to only use 1 byte - e.g. 0xBB - 0xBF.

So why do they use 2 bytes?

Inside .NET - Partially Constructed Generic Types

Friday, March 20th, 2009

Given these class:

class B<T, U> {
}
 
class D : B<int, string> {
}
 
class E<T> : B<T, string> {
}

The result of:

typeof(B<,>)

Is a generic type definition, which can be used to make a fully constructed type using Type.MakeGenericType(typeT,typeU).

The result of:

typeof(D).BaseType

Is a fully constructed generic type: B<int,string>

But what's the result of:

typeof(E<>).BaseType

It's a half-constructed type: B<T, string>

Obviously, you can't instantiate an object of this type, as T is undefined; but you might expect that you could call MakeGenericType(typeT) with a single parameter, which will make a fully constructed type. But you can't.

Type has two properties that together tell you what kind of generic type you've got:

  • Type.IsGenericTypeDefinition
  • Type.ContainsGenericParameters

You can only make a fully constructed type from a type where IsGenericTypeDefinition is true, and you can only instantiate a class where ContainsGenericParameters is false.

Which leaves the middle ground, which is inhabited by these half-constructed types, where IsGenericTypeDefinition is false, but ContainsGenericParameters is true, so they can't be instantiated, and they can't be used to make fully-constructed generic types.

Although you can call GetGenericTypeDefinition(), which returns what you would expect, and you can also call GetGenericArguments(), which when called on type B<T, string> returns the second type argument as string, as expected, but the first type argument is returned as T, with its Type.IsGenericParameter property set to true. The base class of this type T is the base-class constraint of the type parameter (note that this does not include any interface constraints).

One question this leaves me with is why I can't call MakeGenericType() on a half-constructed generic type, if I provide the missing type argument(s)...

Inside .NET - What is MaxStack for?

Friday, March 13th, 2009

Every method defined in .NET has a header, and in this header is a maxstack value:

.assembly Hello {}
.method public static void Main() cil managed
{
  .entrypoint
  .maxstack 1
  ldstr "Hello, world!"
  call void [mscorlib]System.Console::WriteLine(string)
  ret
}

It is specified explicitly in a fat (Microsoft's naming, not mine) header, and is implicitly 8 in a tiny header.

But what is it for?

Well, the documentation states that it specifies the maximum number of entries the stack will contain during execution of this function.

Good, excellent, very useful, you might think - now the JIT/execution engine doesn't have to figure out how much stack space to allocate, just use this value.

But no, it's not that simple. Maxstack specifies the maximum number of stack entries, not the maximum number of stack bytes. In .NET the size of a stack entry is essentially unbounded, as value-types are stored directy on the stack; unlike reference types, which have a reference (of known size) stored in the stack.

A value-type can be as large as you like: System.Drawing.Rectangle is 16 bytes, and you can define your own to be multiple kilobytes, or larger (I'm not saying this is recommended, but it's certainly possible).

This means that Dot Net Anywhere has to do a full stack analysis of each method to calculate the number of bytes required for the stack, and essentially ignores the maxstack value (it actually does use it, but not for anything important, and could easily manage without - see the source for details - JIT.c).

Which makes me wonder - why is it even there? It appears to serve no useful purpose.

If you happen to know the answer, please let me know...