LINQ's Distinct() on a particular property – Dev

The best answers to the question “LINQ's Distinct() on a particular property” in the category Dev.

QUESTION:

I am playing with LINQ to learn about it, but I can’t figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a list of an Object on one or more properties of the object?

Example: If an object is Person, with Property Id. How can I get all Person and use Distinct on them with the property Id of the object?

Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"

How can I get just Person1 and Person3? Is that possible?

If it’s not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties in .NET 3.5?

ANSWER:

EDIT: This is now part of MoreLINQ.

What you need is a “distinct-by” effectively. I don’t believe it’s part of LINQ as it stands, although it’s fairly easy to write:

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

So to find the distinct values using just the Id property, you could use:

var query = people.DistinctBy(p => p.Id);

And to use multiple properties, you can use anonymous types, which implement equality appropriately:

var query = people.DistinctBy(p => new { p.Id, p.Name });

Untested, but it should work (and it now at least compiles).

It assumes the default comparer for the keys though – if you want to pass in an equality comparer, just pass it on to the HashSet constructor.

ANSWER:

What if I want to obtain a distinct list based on one or more properties?

Simple! You want to group them and pick a winner out of the group.

List<Person> distinctPeople = allPeople
  .GroupBy(p => p.PersonId)
  .Select(g => g.First())
  .ToList();

If you want to define groups on multiple properties, here’s how:

List<Person> distinctPeople = allPeople
  .GroupBy(p => new {p.PersonId, p.FavoriteColor} )
  .Select(g => g.First())
  .ToList();

Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.

Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155

ANSWER:

You could also use query syntax if you want it to look all LINQ-like:

var uniquePeople = from p in people
                   group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
                   into mygroup
                   select mygroup.FirstOrDefault();

ANSWER:

Use:

List<Person> pList = new List<Person>();
/* Fill list */

var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id).Select(grp => grp.FirstOrDefault());

The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.