The General Problem
One of the things that I've always struggled with in SQL is the group by clause. When you use a group by statement in SQL, you have access to only those values you use in the grouping, or the results of aggregate functions. In order to get any of the other values from the original records, you have to learn tricks like using the XML capabilities, or joining the sub-query back to the original table. These are cumbersome to create, and difficult to read and figure out, even if you know the "trick".
The grouping is different in LINQ. When you use a group by statement in LINQ, you end up with a list of collections. The collections have the original objects with the original data in them, with the Key property on the collection being the value that was grouped by (hooray for smart collections!). This gives us all sorts of possibilities with the things we can because we have all the original data, grouped the way we want.
A Specific Instance
I regularly would like to generate a CSV list of the values in a "group by" clause. Let's imagine that you have a table of the FirstName and LastName of each of the Presidents of the United States. And maybe you'd like to see how many first names are repeated more than once.
Using SQL, you can very easily get the first names that appear more than once:
select FirstName
from President
group by FirstName
having count(*) > 1
With one small addition, you can even list how many times each first name is used:
select FirstName, count(*) as Total
from President
group by FirstName
having count(*) > 1
It gets really difficult, however, when you want to get a list (preferably comma-separated) of the last names to go along with those first names. It requires that you use the (fairly recent) XML capabilities in SQL Server, or else join back to the original table in some way
Since LINQ actually gives you a collection of the original objects when you group things, you can do much more complex, nifty things without having to "cheat".
So lets see what the equivalents are using LINQ. Here is the query to produce the equivalent list of first names that appear more than once:
from president in Presidents
group president by president.FirstName into bag
where bag.Count() > 1
select bag.Key
To get the count of how many times each first name is used, you have to introduce an anonymous type, but again it's standard LINQ-type stuff:
from president in Presidents
group president by president.FirstName into bag
where bag.Count() > 1
select new { FirstName = bag.Key, Total = bag.Count() }
Here's where it gets fun: since it's LINQ, we have full access to the bag variable that we've created, which is a collection of President records. Since I want to create a CSV string of some of the data inside that collection, I can use a LINQ statement inside the anonymous type to gather that data (LINQ-ception?):
from president in Presidents
group president by president.FirstName into bag
where bag.Count() > 1
select new
{
FirstName = bag.Key,
LastNames = string.Join(
",",
(from person in bag select person.LastName))
}
Now I finally get what I want: a list of first names, followed by a CSV list of last names they go with. The only "trick" is remember that you have full access to all the original records/objects in the grouped collection, unlike SQL.