Yield is one of the coolest, yet underused, new .NET 2.0 features that you may not be familiar with. Yes, I'm talking about a 2.0 feature - even though 3.5 just came out, it is my experience that a large number of developers have still not embraced all that 2.0 gave us. Hopefully this article will help make this powerful, yet initially confusing, feature more approachable.
When you "foreach" over an object (array, collection, etc), it must implement IEnumerable or IEnumerable<T> - the generic version which returns strongly typed objects (for the rest of this article, I'll just refer to IEnumerable).
All arrays, and most collections (ArrayList, List<T>) implement IEnumerable, which is why you can "foreach" over them.
If ALL you need to do to a collection is cycle through its members - you only need to reference it as an IEnumerable. That explains why the following code is perfectly valid:
IEnumerable<int> numbers = new int[] { 1, 2, 3 }; foreach (int i in numbers) { Console.WriteLine(i); }
It is always better to depend on abstract types (interfaces, abstract base classes) rather than concrete types. This allows you to change the implementation of a method without impacting any of its callers.
Consider this example where CallingMethod gets a set of numbers from MethodBeingCalled. It really doesn't care if it gets an array or a List or whatever. It just does a foreach over it, which means it only needs the functionality exposed by the IEnumerable interface.
public void CallingMethod() { List<int> numbers = MethodBeingCalled(); foreach (int i in numbers) { Console.WriteLine(i); } } private List<int> MethodBeingCalled() { List<int> numbers = new List<int>(); numbers.AddRange(new int[] { 1, 2, 3 }); return numbers; }
The problem with the current design is that the CallingMethod declares the list of numbers List<int>. That is a specific implementation detail. If CallingMethod needed to add more numbers to the list after it was returned from MethodBeingCalled, or if individual numbers were being accessed by their index, then it would make sense to declare it as a List<T> (or better, IList<T>). But it really only needs to cycle through the numbers in order. If MethodBeingCalled wanted to change its implementation to return an int[] instead, it would have to change its signature, and CallingMethod would also have to change. However, if CallingMethod had just declared its local variable as IEnumerable<int> (because that is the only functionality it truly depends on), then it would not have to change if MethodBeingCalled changes - as long as it still returns something that implements IEnumerable<int>.
So what does foreach do? It asks the IEnumerable if it has any items. If so, it gets the next item and uses it within the loop. It then asks the IEnumerable again if it has more items, gets the next one, and so on, until there are no more items. The important thing to note here is that foreach never works with the entire collection all it once - it only deals with one item at a time.
How can we use all of this information? If you have a method that returns a collection of data, and you can be sure that callers are only going to be interested in looping over each item in the collection, one at a time (which is very common), you can declare your method's return value as IEnumerable<T>.
When you declare your method's return value as IEnumerable<T>, something magic happens: the C# compiler will now let you use the yield keyword within the body! So why does that matter? Because it allows you to take advantage of the way that foreach really works. Remember, foreach will just ask for one value at a time. When it does, it will call into your method - your method will run until it encounters a yield return statement. When it reaches a yield return statement, that value is immediately returned to the caller. In the next iteration of the caller's loop, it will ask again if there is another value, your method will start executing again, but this time it will start on the line after the last yield statement that was run. Yes, you read that right: execution alternates between the calling method and the method being called, all the way through the loop. This is completely different from the normal way that methods interact, where one method gives up control to another method, which doesn't return until it is completely finished executing.
This has at least 2 very important implications for the method that returns the IEnumerable<T>:1) It does not need to build up a big list of all of the items it plans to return, holding them all in memory at the same time. If the list of items is very large, holding them all in memory at once can be a significant burden.
2) It does not necessarily need to do the work to produce all of the items in the list. If the caller breaks out of the foreach loop before the entire collection has been traversed - all of the work needed to produce the remaining items in the colleciton is avoided. Without using the yield keyword, you would have to build up the entire collection to return, even if the caller was only going to look at the first few items.
To illustrate the advantage of this approach, I included sample code at the bottom. You can create a new Console Application and paste this over the default Program.cs.
A method, GetCombinations returns a collection of 2 numbers combined. The caller then loops through the collection, looking for a specific combination: "13".
First, run it using GetCombinations. This is implemented the way most people would write a method that returns a collection, in .NET 1.1. The output willl look something like this (the indented output is from within GetCombinations):
BEFORE call to GetCombinations Beginning execution of GetCombinations Producing 33 Producing 23 Producing 13 Producing 32 Producing 22 Producing 12 Producing 31 Producing 21 Producing 11 Ending execution of GetCombinationsAFTER call to GetCombinationsChecking 33Checking 23Checking 13Found!Elapsed: 00:00:08.9931583
Now, comment out the line that calls GetCombinations, and uncomment the line that calls BetterGetCombinations. It is important to note that BetterGetCombinations uses the exact same algorithm as GetCombinations - given the same input, they will both return the exact same list of strings, in the same order (I've emphasized the lines that changed, in the sample code below). But since the caller is only looking for a specific combination, not all combinations need to be produced. Consider the output:
BEFORE call to GetCombinationsAFTER call to GetCombinations Beginning execution of BetterGetCombinations <-- notice that BetterGetCombinations doesn't really start until the foreach loop - NOT when the method was first called!
Producing 33Checking 33 Producing 23Checking 23 Producing 13Checking 13Found!Elapsed: 00:00:02.9976438
With the improved implementation that took advantage of the yield keyword, the program was able to finish its job in less than half the time! It also used much less memory, as it never had to store all 9 strings in a collection. Now imagine the potential impact if GetCombinations returned a collection with thousands of entries!
Paste the code below over the default Program.cs in a new Console Application:
using System; using System.Collections.Generic; using System.Diagnostics; using System.Threading; namespace YieldDemo { class Program { static void Main() { Stopwatch timer = Stopwatch.StartNew(); Console.WriteLine("BEFORE call to GetCombinations"); //1) First try running using the GetCombinations that returns a List<string> List<string> combinations = GetCombinations(3, 3); //2) After observing the output, comment out the above line, uncomment the next line, and re-run // IEnumerable<string> combinations = BetterGetCombinations(3, 3); Console.WriteLine("AFTER call to GetCombinations"); foreach (string combination in combinations) { Console.WriteLine("Checking " + combination); if (combination == "13") { Console.WriteLine("Found!"); break; } } timer.Stop(); Console.WriteLine("Elapsed: " + timer.Elapsed); } static List<string> GetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of GetCombinations"); List<string> output = new List<string>(); int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination output.Add(newCombinationToProduce); --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of GetCombinations"); return output; } static IEnumerable<string> BetterGetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of BetterGetCombinations"); // NOTE The local List is not necessary in this implementation int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination yield return newCombinationToProduce; // insteading of adding to a list, just return the value --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of BetterGetCombinations"); } } }
Powered by: newtelligence dasBlog 2.1.8209.14743
Special thanks to LosTechies.com
Site theme based on the essence design by Jelle Druyts
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.