jordan.terrell
Just trying to make sense of things...

LINQ: Distinct() does not work as expected

Thursday, 2 October 2008 10:20 by jordan.terrell

A few days ago I was using LINQ to filter a distinct (only unique items) list of custom objects.  I needed the distinction to be based on a property of type System.String.  Prior to this, I have never used or looked at the Distinct query operator - thus I expected it to have overloads that let me pass in a selector (Func<T, TRet>) to select the property to make the distinction on.

Unfortunately, it has no such overload.

However, what it did have was an overload that took an IEqualityComparer<T>.  As much as that felt like a cop-out (not a very LINQ-ish approach), I figured it would have to do the job.  There was one other overload that basically defaulted to EqualityComparer<T>.Default - this goes through a series of type checks to determine the best comparer for the type T being compared.  If the type T implements the IEquatable<T> interface, it will use that for the comparison.  Great!  I'll just implement that interface on my custom type, and be on my way!

Not so fast! It didn't work - for some reason it never invoked the IEquatable<T>.Equals method implemented on my custom type.  My next step was to implement as custom IEqualityComparer<T> to make the comparison based on the the property value.

That didn't work either! At this point I was quite confused, and then out of sheer desperation I put a breakpoint on the other method defined on IEqualityComparer<T>: GetHashCode.

"You've got to be kidding me!" - I believe those were the words going through my mind.  The Distinct query operator uses hash codes provided by an IEqualityComparer<T> to determine the distinct set of items.  Upon further inspection, the Distinct operator doesn't really use GetHashCode, it defers to another Set class to filter the items - that class uses the GetHashCode method.

Not what I expected.  Needless to say I was a bit disappointed.  Not only was there not an standard overload for me to pass in a selection lambda, but now I couldn't even use the Equals method on IEquatable<T> OR IEqualityComparer<T>!

I wasn't about to implement a custom GetHashCode method - I believe the annotations in the book Framework Design Guidelines basically says that the current implementation of GetHashCode is flawed.  Plus, for it to work in my case I would have to return the hash code for the property I wanted compared, not the hash code for the custom type.  All of this left a very sour taste in my mouth.

I did find an alternative though - using a combination of the GroupBy, First, and Select query operators, I was able to get what I wanted:

   1: var distinctItems = items
   2:     .GroupBy(x => x.PropertyToCompare)
   3:     .Select(x => x.First());

 

It looses quite a bit in meaning, but it accomplishes the end goal.  At some point I think I'm going to create an Extension Method that adds another Distinct() overload that takes a selection lambda - something that should have been there to being with!

Tags:   ,
Categories:   .NET | Programming
Actions:   E-mail | del.icio.us | Permalink | Comments (7) | Comment RSSRSS comment feed