Sunday, 30 June 2013

C# NEW Features in Microsoft .Net Framework 4.0

Covariance and Contravariance

Covariance and contravariance are best introduced with an example, and the best is in the framework. In System.Collections.Generic, IEnumerable<T> and IEnumerator <T> represent, respectively, an object that’s a sequence of T’s and the enumerator (or iterator) that does the work of iterating the sequence. These interfaces have done a lot of heavy lifting for a long time, because they support the implementation of the foreach loop construct. In C# 3.0, they became even more prominent because of their central role in LINQ and LINQ to Objects—they’re the .NET interfaces to represent sequences.

So if you have a class hierarchy with, say, an Employee type and a Manager type that derives from it (managers are employees, after all), then what would you expect the following code to do?

IEnumerable<Manager> ms = GetManagers();

IEnumerable<Employee> es = ms;

It seems as though one ought to be able to treat a sequence of Managers as though it were a sequence of Employees. But in C# 3.0, the assignment will fail; the compiler will tell you there’s no conversion. After all, it has no idea what the semantics of IEnumerable<T> are. This could be any interface, so for any arbitrary interface IFoo<T>, why would an IFoo<Manager> be more or less substitutable for an IFoo<Employee>?

In C# 4.0, though, the assignment works because IEnumerable<T>, along with a few other interfaces, has changed, an alteration enabled by new support in C# for covariance of type parameters.

IEnumerable<T> is eligible to be more special than the arbitrary IFoo<T> because, though it’s not obvious at first glance, members that use the type parameter T (GetEnumerator in IEnumerable<T> and the Current property in IEnumerator<T>) actually use T only in the position of a return value. So you only get a Manager out of the sequence, and you never put one in.

In contrast, think of List<T>. Making a List<Manager> substitutable for a List<Employee> would be a disaster, because of the following:

List<Manager> ms = GetManagers();

List<Employee> es = ms; // Suppose this were possible

es.Add(new EmployeeWhoIsNotAManager()); // Uh oh

As this shows, once you think you’re looking at a List<Employee>, you can insert any employee. But the list in question is actually a List<Manager>, so inserting a non-Manager must fail. You’ve lost type safety if you allow this. List<T> cannot be covariant in T.

The new language feature in C# 4.0, then, is the ability to define types, such as the new IEnumerable<T>, that admit conversions among themselves when the type parameters in question bear some relationship to one another. This is what the .NET Framework developers who wrote IEnumerable<T> used, and this is what their code looks like (simplified, of course):

public interface IEnumerable<out T> { /* ... */ }

Notice the out keyword modifying the definition of the type parameter, T. When the compiler sees this, it will mark T as covariant and check that, in the definition of the interface, all uses of T are up to snuff (in other words, that they’re used in out positions only—that’s why this keyword was picked).

Why is this called covariance? Well, it’s easiest to see when you start to draw arrows. To be concrete, let’s use the Manager and Employee types. Because there’s an inheritance relationship between these classes, there’s an implicit reference conversion from Manager to Employee:

Manager → Employee

And now, because of the annotation of T in IEnumerable<out T>, there’s also an implicit reference conversion from IEnumerable<Manager> to IEnumerable<Employee>. That’s what the annotation provides for:

IEnumerable<Manager> → IEnumerable<Employee>

This is called covariance, because the arrows in each of the two examples point in the same direction. We started with two types, Manager and Employee. We made new types out of them, IEnumerable<Manager> and IEnumerable<Employee>. The new types convert the same way as the old ones.

Contravariance is when this happens backward. You might anticipate that this could happen when the type parameter, T, is used only as input, and you’d be right. For example, the System namespace contains an interface called IComparable<T>, which has a single method called CompareTo:

public interface IComparable<in T> {

  bool CompareTo(T other);

}

If you have an IComparable<Employee>, you should be able to treat it as though it were an IComparable<Manager>, because the only thing you can do is put Employees in to the interface. Because a manager is an employee, putting a manager in should work, and it does. The in keyword modifies T in this case, and this scenario functions correctly:

IComparable<Employee> ec = GetEmployeeComparer();

IComparable<Manager> mc = ec;

This is called contravariance because the arrow got reversed this time:

Manager → Employee
IComparable<Manager> ← IComparable<Employee>

So the language feature here is pretty simple to summarize: You can add the keyword in or out whenever you define a type parameter, and doing so gives you free extra conversions. There are some limitations, though.

First, this works with generic interfaces and delegates only. You can’t declare a generic type parameter on a class or struct in this manner. An easy way to rationalize this is that delegates are very much like interfaces that have just one method, and in any case, classes would often be ineligible for this treatment because of fields. You can think of any field on the generic class as being both an input and an output, depending on whether you write to it or read from it. If those fields involve type parameters, the parameters can be neither covariant nor contravariant.

Second, whenever you have an interface or delegate with a covariant or contravariant type parameter, you’re granted new conversions on that type only when the type arguments, in the usage of the interface (not its definition), are reference types. For instance, because int is a value type, the IEnumerator<int> doesn’t convert to IEnumerator <object>, even though it looks like it should:

IEnumerator <int> image: right arrow with slash  IEnumerator <object>

The reason for this behavior is that the conversion must preserve the type representation. If the int-to-object conversion were allowed, calling the Current property on the result would be impossible, because the value type int has a different representation on the stack than an object reference does. All reference types have the same representation on the stack, however, so only type arguments that are reference types yield these extra conversions.

Very likely, most C# developers will happily use this new language feature—they’ll get more conversions of framework types and fewer compiler errors when using some types from the .NET Framework (IEnumerable<T>, IComparable<T>, Func<T>, Action<T>, among others). And, in fact, anyone designing a library with generic interfaces and delegates is free to use the new in and out type parameters when appropriate to make life easier for their users.

By the way, this feature does require support from the runtime—but the support has always been there. It lay dormant for several releases, however, because no language made use of it. Also, previous versions of C# allowed some limited conversions that were contravariant. Specifically, they let you make delegates out of methods that had compatible return types. In addition, array types have always been covariant. These existing features are distinct from the new ones in C# 4.0, which actually let you define your own types that are covariant and contravariant in some of their type parameters.
Dynamic Dispatch

On to the interop features in C# 4.0, starting with what is perhaps the biggest change.

C# now supports dynamic late-binding. The language has always been strongly typed, and it continues to be so in version 4.0. Microsoft believes this makes C# easy to use, fast and suitable for all the work .NET programmers are putting it to. But there are times when you need to communicate with systems not based on .NET.

Traditionally, there were at least two approaches to this. The first was simply to import the foreign model directly into .NET as a proxy. COM Interop provides one example. Since the original release of the .NET Framework, it has used this strategy with a tool called TLBIMP,  which creates new .NET proxy types you can use directly from C#.

LINQ-to-SQL, shipped with C# 3.0, contains a tool called SQLMETAL, which imports an existing database into C# proxy classes for use with queries. You’ll also find a tool that imports Windows Management Instrumentation (WMI) classes to C#. Many technologies allow you to write C# (often with attributes) and then perform interop using your handwritten code as basis for external actions, such as LINQ-to-SQL, Windows Communication Foundation (WCF) and serialization.

The second approach abandons the C# type system entirely—you embed strings and data in your code. This is what you do whenever you write code that, say, invokes a method on a JScript object or when you embed a SQL query in your ADO.NET application. You’re even doing this when you defer binding to run time using reflection, even though the interop in that case is with .NET itself.

The dynamic keyword in C# is a response to dealing with the hassles of these other approaches. Let’s start with a simple example—reflection. Normally, using it requires a lot of boilerplate infrastructure code, such as:

object o = GetObject();

Type t = o.GetType();

object result = t.InvokeMember("MyMethod",

  BindingFlags.InvokeMethod, null,

  o, new object[] { });

int i = Convert.ToInt32(result);

With the dynamic keyword, instead of calling a method MyMethod on some object using reflection in this manner, you can now tell the compiler to please treat o as dynamic and delay all analysis until run time. Code that does that looks like this:

dynamic o = GetObject();

int i = o.MyMethod();

It works, and it accomplishes the same thing with code that’s much less convoluted.

The value of this shortened, simplified C# syntax is perhaps more clear if you look at the ScriptObject class that supports operations on a JScript object. The class has an InvokeMember method that has more and different parameters, except in Silverlight, which actually has an Invoke method (notice the difference in the name) with fewer parameters. Neither of these are the same as what you’d need to invoke a method on an IronPython or IronRuby object or on any number of non-C# objects you might come into contact with.

In addition to objects that come from dynamic languages, you’ll find a variety of data models that are inherently dynamic and have different APIs supporting them, such as HTML DOMs, the System.Xml DOM and the XLinq model for XML. COM objects are often dynamic and can benefit from the delay to run time of some compiler analysis.

Essentially, C# 4.0 offers a simplified, consistent view of dynamic operations. To take advantage of it, all you need to do is specify that a given value is dynamic, ensuring that analysis of all operations on the value will be delayed until run time.


By Lingraj Gowda

No comments:

Post a Comment