Log-MD Output Data: Just a Matter of Time

Log-MD is an excellent tool both for auditing your Windows security logging settings as well as to export those logs in an easily-consumed .csv format. For the purpose of this blog post let’s assume your system already passes all config settings and you’d like to take your MS Excel analysis of the report.csv output to the next level. Let’s say you want to summarize the event IDs in your dataset up to the minute and hour levels, to get more of a feel of the timeline of events. To do this you’ll need the ‘Time’ field recognized by Excel as a date/time format. In version 0.2.0 of Log-MD the default output is not interpreted by Excel as a date/time format but rather general text. This means we lose out on Excel’s built-in time-based slicing and filtering features.

Let’s walk through a specific example. The third column of report.csv is labeled ‘Time’ and its data contains the date/time stamp. Humans can easily recognize the format of this field just by looking at it: Dates come first in YYYY-MM-DD format, then the letter ‘T’, then the time in HH:MM:SS:mmm (mmm being milliseconds) format. When you select all the values in this field and try to format them in one of the date styles (Short Date, Long Date, Time, etc.), it doesn’t actually take effect in the display of the cells. From Excel’s perspective this is just general alphanumeric text. It’s not quite smart enough to see the date format before the ‘T’ and the time format after the ‘T’. It acts the same as if you changed any other text field (e.g. Computer Name) to a Date – it just leaves the contents as they are.

It turns out all we need to do is to massage the Time column data with just two tweaks, and we can get it into a format that Excel does natively recognize as a date/time format. We need to convert the ‘T’ to a space, and we need to use a different separator for milliseconds in the time side. But we need to be careful when doing a find-replace, because there may be data in other fields that might get caught in our regex expression if it’s not carefully crafted. Here is a regex expression (PCRE/Perl syntax) that seems to work well.

Find:  (?<date>[\d]{4}-[\d]{2}-[\d]{2})T(?<time>[\d]{2}:[\d]{2}:[\d]{2}):(?<millisec>[\d]{1,3})

Replace with:   $+{date} $+{time}\.$+{millisec}

There are a number of ways to perform this regex operation on your report.csv file. If you have Notepad++ simply use its Find and Replace feature, putting the first regex expression into the Find field and the second regex above into Replace. This can also be used with the sed command if for some reason you’re correlating these report.csv files through a Linux system. Full explanation of using regex is beyond the scope of this blog post. For more on regular expressions (“regex”) see this excellent site.

After you’ve done the find-replace, then next time you open the report.csv in Excel you’ll find the Time column has already been recognized as a date field. The text that is displayed in each cell of column C appears to have omitted the date and just displays the time. But when you select a single cell and look in the formula bar you can see the entire date and time values are still there (see cell C2). So Excel has the full value stored in its memory, it’s just displaying a portion of it. This is what we want; now Excel understands that this field is a date/time type.

One point that may be confusing here is that Excel is displaying the minute, second and millisecond values in column C (MM:SS:sss where sss is the milliseconds). Most of the time when you see numbers separated by a colon in this style, it is referring to HH:MM:SS format. Don’t let this confuse you; for example, line 2 of our sample data set did NOT occur at 3:37am in the morning, it was actually 9:03am at 37.4 seconds into that particular minute. Again, this can be confirmed by selecting that cell and viewing the full date/time value in the Formula Bar.

Now that Excel understands this to be a date/time field, let’s create a pivot table to demonstrate (Insert menu -> Pivot Table). For purposes of this blog post we’ll set as our goal to report on which event IDs occurred per minute per hour in this sample data set.

Let’s drag the Time and the Event ID fields down into the Rows field of our pivot table, and let’s select a random text field such as Hash_of_image or PID that will consistently produce a value (any value, even “N/A”!) to go into the Sum Values pivot table field. The reason we take a random field for Sum Values is so that Excel can count the quantity of non-empty values. Basically what we’re doing is counting the number of times each event ID hit during each time slice (per second in this screenshot). As long as we choose a field that has some text in every record, this will give us a count of records. Be careful not to select a numeric value field here though, such as Event ID. Excel will default to adding up the numerical values instead of counting them, and (for example) the sum of event IDs is not particularly useful.

Which events occurred per second is not particularly useful given our stated goal. We could try adding up all the sum totals of each event ID manually, but the problem is we have no way of distinguishing between 9:18:01 and 10:18:01 when they are grouped like this. This default grouping is only giving us how many of each event matched that particular combination of minutes, seconds and milliseconds without respect to any other time slice such as hours.

We’ll start by grouping just by the hour, and let the minutes, seconds and milliseconds get lost inside the sum total. Excel can do the heavy lifting for us now that it knows the Time field is a date/time type. All we need to do is to group the Time field by hour. Notice in the above screenshot we have cell A4 highlighted. This is one of the Time field values that we’re looking to group. (We could just as well have used cell A6, A9, A12, etc.) In the PivotTable Tools menu we click Group Selection, and change the grouping from Months to Hours.

Now we have a succinct summary of event IDs that occurred in each hourly window of our sample data set. All we have left to do is add an additional level of grouping on the minutes. Just open up the same grouping properties window, and select Minutes in addition to Hours (it is a multi-select field).

Voila, we get event IDs per minute in pivot table’s intuitive collapsible tree interface.