Essential PowerShell: Understanding foreach

Of all of the statements and commands available in PowerShell, there is one in particular that I found causes more confusion than others for newcomers to the language — foreach.  In PowerShell, foreach is both a statement and an alias to the ForEach-Object cmdlet.  This means that you can use it as a statement like this:

foreach ($command in Get-Command -CommandType All) { $command } 

 or as an alias like this:

Get-Command -CommandType All | foreach { $_ }

While you might think that both of these examples do exactly the same thing, they do not.  Both examples will iterate through a collection of objects and execute the internal script block once for each object.  In this case, both examples are simply outputting the objects in the internal script.  Their output will be the same, but how they go about getting that output is different.  It is important to understand these differences and the implications that they have on performance and memory when writing scripts using PowerShell.  Let’s talk about the memory implications first.

The foreach statement does not use pipelining.  Instead, the right-hand side of the in operator is evaluated to completion before anything else is done.  For our example above, the Get-Command cmdlet is called and the results are completely loaded into memory before the interior script block is executed.  This means you have to have enough memory to store all of the objects when you run the script.  This usually isn’t a problem but as my friend Dmitry Sotnikov points out on his blog, in some cases it can definitely be an issue.

In contrast, the foreach alias, or ForEach-Object cmdlet, does use pipelining.  When the second example is used, Get-Command is called and it starts to return the commands one at a time.  As each object is returned out of the Get-Command cmdlet, it is sent into the pipeline and execution continues in the next section of the pipeline.  In this case, the foreach alias gets executed and the object is run through the process script block of ForEach-Object.  Once the process script block completes, the object is discarded and the next object is returned from Get-Command.  Since only one object is passing through the pipeline at a time, memory usage is minimal.

This would seem to indicate that script authors should always prefer the foreach alias, or ForEach-Object cmdlet, over the foreach statement, but according to Bruce Payette, author of PowerShell in Action and development lead for PowerShell, foreach can perform faster than ForEach-Object in some cases.  He states, “in the bulk-read case, however, there are some optimizations that the foreach statement does that allow it to perform significantly faster than the ForEach-Object cmdlet”.  If that’s the case, how does a script author decide which is the right command for the job?  How will those optimizations influence a decision to choose the foreach statement over the foreach alias?  How much faster is significantly faster?  Let’s take a closer look at the performance and what considerations need to be made.

I ran a test on my local machine to compare the performance of the two examples I listed above.  For this test I used the Get-Date cmdlet to retrieve the date before the example script started and after the example script completed and then I took the difference of these dates to determine how much time had elapsed during the script.  I also ran this test through 10 iterations for each example and I discarded the highest and lowest elapsed times.  I then took the averages of the remaining 8 iterations and compared them.  The results confirmed what Bruce Payette said.  The average runtime for the foreach statement example was 13.9 seconds and the average runtime for the foreach alias example was 15.9 seconds.  This shows how the internal optimizations in the foreach statement improve performance when compared to the foreach alias.

So, it seems pretty simple.  Use the foreach statement when you either already have the array of objects that you want to process or when your collection of objects will be small enough that it can be loaded into memory all at once, right?  Well that depends on what aspect of performance is most important to you.

One of the many beautiful things about PowerShell is the support for the pipeline and how objects are passed through (and out of) the pipeline one at a time.  If you’re working with an application that is displaying the data objects that are output through the pipeline of a script, such as PowerGUI, you may be more concerned with the performance rate at which those data objects are output through the pipeline so that you can display them more quickly then the overall amount of time required to output all objects.  Whatever portion of the 13.9 seconds were used to load the objects into the collection may seem like an eternity to wait until the first object is displayed on the screen when you can see thousands of objects displayed iteratively over 15.9 seconds.  Perspective is everything when you’re talking about performance.

Hopefully this will lift some of the confusion that you might otherwise face when using foreach in your scripts!

Kirk out.

P.S. This is the first of two articles discussing foreach in PowerShell.  After reading this article I recommend you read the second part as well, entitled “Essential PowerShell: Understanding foreach (addendum)“.

Technorati Tags: , , ,


9 thoughts on “Essential PowerShell: Understanding foreach

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s