Wednesday, December 2, 2015

Crafting Queries and Extracting Data from Event Logs using Microsoft Log Parser

   During a recent engagement, while hunting for threats in a client's environment, I got tasked with having to analyze over a terabyte worth of security (Security.evtx) event logs.  A terabyte worth of logs amounts to, a lot of logs.  We are talking close to a thousand logs, each containing approximately 400,000 events from dozens of Windows servers, including multiple domain controllers.  Did I say, a lot of logs? 

   Unfortunately, this wasn't the only task of the engagement, so I needed to go through these logs and I needed to do it quickly.  I needed to do it quickly because like in most engagements, time is against you.

   When you only have a few logs to look at, one of my tools of choice on the Windows side is Event Log Explorer.  Event Log Explorer is great.  It is a robust, popular, GUI tool with excellent filtering capabilities.  On the Linux side, I have used Log2timeline to convert dozens of evtx files to CSV and then filter the CSV file for the data that I was looking for.  But this was another animal, a different beast.  This beast needed a tool that could parse a very large amounts of logs and have the ability to filter for specific events within the data.  The answer to the problem came in the form of a tiny tool simply called Log Parser.

   Log Parser is a free tool designed by Microsoft.  You can download the tool here.  According to the documentation from the site the tool is described in this manner. “Log Parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files.”  That one-liner perfectly sums up why the tool is so powerful, yet not as popular as other tools.  Log parser provides query access to data.  What does that mean?  This means that if you want to parse data with this tool you have to be somewhat comfortable with the Structured Query Language (SQL).  The tool will only cough up data if it is fed SQL like queries.  The use of SQL like queries for filtering data is what gives the tool its power and control, while at the same time becoming a stopping point and a deal breaker for anyone not comfortable with SQL queries.   

   The purpose of this article is to attempt to explain the basic queries required to get you started with the tool and in the process show the power of the tool and how it helped me make small rocks out of big rocks.  

Installing the Tools:

  The tool is downloaded from here in the form of an msi.  It installs using a graphical installation, very much like many other tools.  Once installed the tool runs from the command line only.  For the purposes of the article, I will be using a security log extracted from a Windows Server 2008R2 Domain Controller that I own, and use for testing such as this.  If you want to follow along, you can extract the Security.evtx log from a similar server or even your Windows 7 machine.  The log is located under \Windows\System32\winevt\Logs.

The Test:

   Log Parser is a command line only utility.  To get started open up a command prompt and navigate to the Log Parser installation directory located under C:\Program Files (x86)\Log Parser 2.2. 



   The security log that I will be using for the write-up is called LosDC.evtx.  The log contains exactly 5,731 entries.  It is not a large log, but it contains the data that we need to illustrate the usage of the tool.  I extracted the log and placed it on my Windows 7 examination machine in a directory on the Desktop called “Test.”



   Now, the most basic SQL query that one can run looks something like this.  It is called a select statement.  “select * from LosDC.evtx”  The ‘select’, as you suspected, selects data that matches your criteria from the columns in your log.  In this instance we are not doing any matching yet, we are simply telling the tool to select everything by using an asterisk “*” from the LosDC.evtx log.  The tool needs to know what kind of file it is looking at.  You tell the tool that is it reading data from an event log with the -i:evt parameter, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt



   This query will send the first 10 lines of the file to standard output.  A lot of data is going to be sent to the screen.  It is very difficult to make any use of this data at this point.  The only positive that can come from this command is that you can begin to see the names of the columns in the event log like “TimeGenerated”, “EventID”, and so on.

    An easier way to see the columns in the event log is by using the datagrid output feature, which sends the data to a GUI, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt -o:datagrid



   Thanks to the GUI it is now easier to see the TimeGenerated and EventID columns.  Also, I want to point out the “Strings” column, which contains data that is very valuable to us.  The majority of the important data that we are after is going to be contained in this column.  So let us take a closer look at it. 

   If we build upon our last query and we now replace the asterisk "*" with the name of a specific column, the tool will now send the data matching our criteria to standard output, like so:

LogParser.exe "select strings from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt



   Notice that the tool is now displaying only the information that is found in the strings column.  The data is displayed in a delimited format.  The data is being delimited by pipes.  Field number 5 contains the username of the account, field number 8 contains the Log-On type, and field number 18 contains the source IP of the system that was used to authenticate against the domain controller. 

You have probably seen this data displayed in a prettier manner by Event Log Explorer. 



   Yet, is in fact the same data, and Log Parser has the ability to extract this data from hundreds of log files quickly and efficiently.  But to accomplish this we have to continue adding to our query.  In my recent case I was looking for the username, the log-on type, and source IP of all successful logins.  As mentioned earlier, this data was being stored in field 5, field 8, and field 18 of the Strings column.  To extract that data we need to craft a query that could extract these specific fields from the Strings column.  To accomplish that, we have to introduce a Log Parser function called extract_token.  The extract_token function gives Log Parser the ability to extract data from delimited columns like the Strings column.  

  To extract the data from the fifth delimited field in the strings column we need to add this to our query:

extract_token(strings,5,'|') AS User

   Let me break this down, extract_token is the function.  We open parenthesis and inside of the parenthesis we tell the function to go into the strings column and pull out the fifth field that is delimited by a pipe “|” and then we close parenthesis.  “AS User” is used so that once the data is pulled out of the Strings column, it is displayed in a new column with the new name of “User”.  It is like telling the function “Hey, display this as 'User'.”  

   To pull the data from the eighth field in the Strings column, we use this function:

extract_token(strings,8,'|') AS LogonType

   And finally to pull the data from the eighteenth field in the Strings column, we use this function:

extract_token(strings,18,'|') AS SourceIP

   We put it all together with the following query:

LogParser.exe "select TimeGenerated, EventID, extract_token(strings,5,'|') AS User, extract_token(strings,8,'|') AS LogonType, extract_token(strings,18,'|') AS SourceIP into C:\Users\carlos\Desktop\Test
\LosDC_4624_logons.csv from C:\Users\carlos\Desktop\Test\LosDC.evtx where eventid in (4624)" -i:evt -o:csv



   The select statement is now selecting the TimeGenerated and EventID columns, followed by the three extract_token functions to pull the data from the Strings column.  Into is an optional clause that specifies that the data be redirected to a file named  LosDC_4624_logons.csv in the Test directory.  From specifies the file to be queried, which is the LosDC.evtx log.  Where is also an optional clause which specifies data values to be displayed based on the criteria described.  The criteria described in this query is 4624 events contained in the eventid column.  The -o:csv is another output format like the datagrid, except this one sends the data to a csv file rather than a GUI.

   This is an example of what you can gather from the resulting CSV file.  This is what you would see if you were to sort the data in the CSV file by user.





   Notice the times, and source IP that was used by user “larry” when he used the RDP protocol (Logon Type 10) to remotely log-in to his system. 

Cool, Right?

   I want to point out that this log only contained 5731 entries and that the data redirected to the CSV file consisted of 1,418 lines.  That data was parsed and redirected in less than 0.2 seconds



   That is another example of the power of the tool.  Keep in mind that when you are parsing gigabytes worth of logs, the resulting CSV files are going to be enormous.  Below is an explorer screenshot displaying the amount of security event logs from one the servers in my case (Server name has been removed to protect the innocent).



   The sample data from that server was 40GB.  It was made up of 138 files each with approximately 416,000 records in each log.

   The tool parsed all of that that data in only 23 minutes.



   It searched 60 million records and created a CSV file with over 700,000 lines.  Although you can certainly open a CSV file with 700,000 lines in Excel or LibreOffice Calc, it is probably not a good idea.  Don't forget that you can search the CSV file directly from the command prompt with find.  Here is an example of searching the CSV file for user "larry" to quickly see which machines user "larry" used to authenticate on the Domain.  



   And there you have it!

Conclusion:


   This is a free and powerful tool that allows you to query very large amounts of data for specific criteria contained within the tables of your many event log files.  If this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on twitter: @carlos_cajigas  





6 comments:

  1. Another helpful write up. I have been meaning to mess around with Log Parser for some time, now I have no excuses not to thanks to this detailed post.

    B

    ReplyDelete
    Replies
    1. Thanks for reading. Glad you enjoyed it. Keep up the good work!!!

      Delete
  2. Just found this while looking over the site. While I do not do bulk Event log views like you describe, your descriptions of how to use the tool and create the queries were great. Good stuff... THanks..

    ReplyDelete
  3. Rob, Thanks so much for the read. Glad you like it. It sure is a neat tool, and free! Carlos

    ReplyDelete
  4. do you know any SQL scripts to pull .evtx files from several computers and run results for a few event IDs

    ReplyDelete