Yield keyword has always been a highly misunderstood keyword in C#
, so I will try to explain it with a simple example so hopefully you can understand it as it will help you to get a better performance for your app.
➽Create the Console App
☞Let's first assume that we have a simple .Net Core 3.1
Console application that handles a list of students.
Student Class
will have the following structure:
public class Student { public int Id { get; set; } public string Name { get; set; } }
Seems simple right ? don't worry, it will keep that way 😉
☞Next step, create a list of Students in the main
method of the console app:
static void Main(string[] args) { CreateStudents(); }
☞So we have a simple function called CreateStudents
which takes 1 parameter (Number of students to be created), this function will create students and if the student ID
is less than 1000, then it will type it to the screen :
public static void CreateStudents(){ var students = GetStudents(1000000); foreach (var student in students) { if (student.Id < 1000) Console.WriteLine($"Id: {student.Id}, Name: {student.Name}"); else break; }}static IEnumerable<Student> GetStudents(int count){ var students = new List<Student>(); for (int i = 0; i < count; i++) { students.Add(new Student() { Id = i, Name = $"Student Number {i}" }); } return students;}
☞GetStudents
function just creates a new student list and then fill it with new students and return it.
☞let's run the application and see how it behaves and remember again that we want to type only students with ID less than 1000 ...
☞By setting a breakpoint 🔴 after calling GetStudents
function, notice that a Million students are created then the foreach
loop is entered to print the first 1000 only, so Do we really need to create a million student to print only 1000 😖 ?
Of course not ..
➥What we want is to prevent creating all million records and adding them to a list and returning that list fully hydrated with a million records, instead would it be great if we could just only return what we want.
➽Apply Yield Return
So let's apply some modifications to CreateStudents
function ...
static IEnumerable<Student> GetStudents(int count){ for (int i = 0; i < count; i++) { yield return new Student() { Id = i, Name = $"Student Number {i}" }; }}
☞let's run the application now with the same breakpoint set and see what will happen ..
☞We now hit the foreach
loop with Zero students created !!
☞let's set another breakpoint in the yield return
line inside GetStudents
function to see what is really going on and hit Step Over (F10) in Visual Studio to continue the debugging process ...
☞The debugger will go over (students) in the foreach
loop then over the (in) keyword then it jumps to the return yield
line in GetStudents
function without stepping into the foreach
in CreateStudents
function ...
☞After that we are inside the foreach
loop then we enter the if statement and data of the first Student is printed in console ...
☞Continue stepping over using F10 ... you will hit the keyword **in**
in the foreach
then enter the return yield
again ...
➽So what are we really doing here ?
☞Actually we are lazily iterating over the collection of students, we are only requesting one student at a time.
☞In both ways we are getting the same result, which is printing 1000 student on the screen like that
☞The difference between them is that yield return
made a huge improve of performance and memory usage that we will see later..
☞Actually, We are basically creating an iterator which iterates through our collection.
Our foreach
loop is just a syntactical sugar for a statement like this
var studentesEnumerator = students.GetEnumerator();while (studentesEnumerator.MoveNext()){ var student = studentesEnumerator.Current;}
this is kind of what it compiles down to roughly.
➥So, by not having to iterate over the entire collection, we didn't have to actually create a million student in the list, we actually looped through and only created the first 1000
because using yield
when we came into this this loop makes students creation one at a time.
➽Test the difference using BenchmarkDotNet :
● Now, let's test and diagnose the application performance to see if that is really making a difference..
● In this test we will use BenchmarkDotNet Nuget Package to do that, you can download it from here BenchmarkDotNet Nuget
● We will just make simple edits to our code to be able to apply benchmark statistics ...
● We will create a class and let's call it YieldBenchmarks
, this class will contain the old version of our functions and the new version using yield return , the class will be decorated with [MemoryDiagnoser] Attribute and the old and new version of CreateStudent
functions will be decorated with [Benchmark] Attribute, the code will be as follows
[MemoryDiagnoser] public class YieldBenchmarks { [Benchmark] public void CreateStudents() { var students = GetStudents(1000000); foreach (var student in students) { if (student.Id < 1000) Console.WriteLine($"Id: {student.Id}, Name: {student.Name}"); else break; } } static IEnumerable<Student> GetStudents(int count) { var students = new List<Student>(); for (int i = 0; i < count; i++) { students.Add(new Student() { Id = i, Name = $"Student Number {i}" }); } return students; } [Benchmark] public void CreateStudentsYield() { var students = GetStudentsYield(1000000); foreach (var student in students) { if (student.Id < 1000) Console.WriteLine($"Id: {student.Id}, Name: {student.Name}"); else break; } } static IEnumerable<Student> GetStudentsYield(int count) { for (int i = 0; i < count; i++) { yield return new Student() { Id = i, Name = $"Student Number {i}" }; } } }
And the main
function will be like :
static void Main(string[] args) { var result = BenchmarkRunner.Run<YieldBenchmarks>(); }
● Now let's run the application in Release mode ...
● Notice that the BenchmarkRunner
will run each function multiple time to be able to perform its diagnostics and the result will be like :
● Now Notice the large difference in time elapsed, also notice the huge difference in memory usage, it's 133.5 MB vs 220 KB !!
This was just a simple clarification for the yield
keyword, next , we will go with Async Yield .. Wait for it :)