Saturday, December 5, 2015

Mounting the VMFS File System of an ESXi Server Using Linux

  It won't happen very often when you will find yourself holding in your hand a hard drive that belonged to an ESXi server.  These servers usually house production machines that just don't get shutdown very often.  Why the decision has been made to turn it off is one that I am sure was not made lightly.  Whatever the scenario is, it is what it is.  It wasn't your call, but the client decided to shut down their ESXi server and subsequently shipped it to you for analysis.  Now you have the drive in your hand and you have been tasked with extracting the Virtual Machines out of the drive for analysis.

   The underlying file system of an ESXi server is the VMFS file system.  It stands for Virtual Machine File System.  VMFS is Vmware, Inc.'s clustered file system used by the company's flagship server visualization suite, vSphere. It was developed to store virtual machine disk images, including snapshots. Multiple servers can read/write the same file system simultaneously while individual virtual machine files are locked (Source Wikipedia).

   As of the date of this writing, not all of the big forensic suites have the ability to read this file system.  And I can understand why, as is extremely difficult for the commercial suites to offer support for all available file systems.  Fortunately for us, it is very possible to read this file system using Linux. 

   The purpose of this article is to go over the steps required to mount the VMFS file system of the drive from an ESXi server.  Once access to the file system has been accomplished, we will acquire a Virtual Machine stored on the drive. 

Installing the Tools:

   For you to be able to accomplish the task, you will have to make sure that you have vmfs-tools installed on your Linux examination machine.  You can get the package from the repositories by running $ sudo apt-get install vmfs-tools.  Vmfs-tools is included by default in LosBuntu.  LosBuntu is our own Ubuntu 14.04 distribution that can be downloaded here.  If you download and boot your machine with LosBuntu, you will be able to follow along and have the exact same environment described in this write-up.   

The Test:

   To illustrate the steps of mounting the partition containing the VMFS file system on the drive, I will use a 2TB hard drive with ESXi 6.0 installed on it.  This drive is from an ESXi server that I own.  The ESXi server drive is currently housing some virtual machines that we will be able to see, once the file system is mounted.  I booted an examination machine with a live version on LosBuntu and connected the drive to the machine.  LosBuntu’s default behavior is to never auto-mount drives.   

   Now, fire up the terminal and let's begin the first step of identifying the drive.  Usually the first step involves running fdisk, so that we can identify which physical assignment was given to the drive.  Running $ sudo fdisk –l lists the physical drives attached to the system, the flag -l tells fdisk to list the partition table.  Sudo gives fdisk superuser privileges for the operations.  Press enter and type the root password (if needed, pw is "mtk").

$ sudo fdisk -l

   Not show on the screen is /dev/sda, which is my first internal drive, therefore /dev/sdb should the drive of the ESXi server.  The output of fdisk give us a warning that /dev/sdb may have been partitioned with GPT and fdisk was unable to read the partition table.  Fdisk is telling us to use parted, so let’s do that.  The following parted command will hopefully get us closer to what we need.

$ sudo parted /dev/sdb print

  From the output, we can see that yes, it is indeed a GPT partitioned drive, containing multiple partitions.  The last displayed partition, which is actually partition number three, looks to be the largest partition of them all.  Although parted was able to read the partition table, it was unable to identify the file system contained in partition three.  We currently have a strong suspicion that /dev/sdb is our target drive containing our target partition, but it would be nice to have confirmation.  Let's run one more command.

$ sudo blkid -s TYPE /dev/sdb*

   Blkid is a command that has the ability to print or display block device attributes.  The flag -s TYPE will print the file system type of the partitions contained in /dev/sdb.  We used an asterisk “*” after sdb so that blkid can show us the file system types of all partitions located in physical device sdb like sdb1, sdb2, sdb3 and so on. 

   Finally, we can now see that /dev/sdb3 is the partition that contains the VMFS volume. 

   To mount the file system we are going to have to call upon vmfs-fuse, which is one of the commands contained within the vmfs-tools package built into LosBuntu.  But before we call upon vmfs-fuse, we need to create a directory to mount the VMFS volume.  Type $ sudo mkdir /mnt/vmfs to create our mount point.

   Mount the VMFS file system contained in /dev/sdb3 to /mnt/vmfs with the below command

$ sudo vmfs-fuse /dev/sdb3 /mnt/vmfs/

   As you can see, the execution of the command simply gave us our prompt back.  As my friend Gene says.  “You will not get a pat on the back telling you that you ran your command correctly or that it ran successfully, so we need to go check.”  True and amusing at the same time…

   Check the contents of /mnt/vmfs by first elevating our privileges to root, with $ sudo su and then by listing its contents with # ls -l /mnt/vmfs.

   Great! We can read the volume and we see that we have many directories belonging to Virtual Machines.  From here you can remain in the terminal and navigate to any of these directories, or you can fire up nautilus and have a GUI to navigate.  The following command will open nautilus at the location of your mount point as root.  It is important to open nautilus as root so that your GUI can have the necessary permissions to navigate the vmfs mount point that was created by root. 

# nautilus /mnt/vmfs

   Insert another drive to your examination machine and copy out any of the Virtual Machines that are in scope.    

   Another option would be to make a forensic image of the Virtual Machine.  For example, we can navigate to the Server2008R2DC01 directory, which houses the Domain Controller used on the previous write-up about examining Security logs.  Find that article here.

   In this specific instance, this Virtual Machine does not contain snapshots.  This means that the Server2008R2DC01-flat.vmdk is the only virtual disk in this directory responsible for storing the data on disk about this server.  If the opposite were true, you would have to collect all of the delta-snapshot.vmdk files to put back together at a later time. 

   The Server2008R2DC01-flat.vmdk file is a raw representation of the disk.  It is not compressed and can be read and mounted directly.  The partition table can be read with the sleuthkit tool mmls.  Mmls is a tool that can display the partition layout of volumes.  Type the following into the terminal and press enter.  The flag -a is to show allocated volumes, and the flag -B is to include a column with the partition sizes in bytes.

# mmls -aB Server2008R2DC01-flat.vmdk

   You can see that the 50GB NTFS file system starts at sector offset 206848.

   If you want to acquire this virtual disk in E01 format, add the flat-vmdk file to Guymager as a special device and acquire it to another drive.

And there you have it!


   Using free and open source tools you have been able to mount and acquire images of Virtual Machines contained in the file system of a drive belonging to an ESXi server.  If this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on twitter: @carlos_cajigas

Wednesday, December 2, 2015

Crafting Queries and Extracting Data from Event Logs using Microsoft Log Parser

   During a recent engagement, while hunting for threats in a client's environment, I got tasked with having to analyze over a terabyte worth of security (Security.evtx) event logs.  A terabyte worth of logs amounts to, a lot of logs.  We are talking close to a thousand logs, each containing approximately 400,000 events from dozens of Windows servers, including multiple domain controllers.  Did I say, a lot of logs? 

   Unfortunately, this wasn't the only task of the engagement, so I needed to go through these logs and I needed to do it quickly.  I needed to do it quickly because like in most engagements, time is against you.

   When you only have a few logs to look at, one of my tools of choice on the Windows side is Event Log Explorer.  Event Log Explorer is great.  It is a robust, popular, GUI tool with excellent filtering capabilities.  On the Linux side, I have used Log2timeline to convert dozens of evtx files to CSV and then filter the CSV file for the data that I was looking for.  But this was another animal, a different beast.  This beast needed a tool that could parse a very large amounts of logs and have the ability to filter for specific events within the data.  The answer to the problem came in the form of a tiny tool simply called Log Parser.

   Log Parser is a free tool designed by Microsoft.  You can download the tool here.  According to the documentation from the site the tool is described in this manner. “Log Parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files.”  That one-liner perfectly sums up why the tool is so powerful, yet not as popular as other tools.  Log parser provides query access to data.  What does that mean?  This means that if you want to parse data with this tool you have to be somewhat comfortable with the Structured Query Language (SQL).  The tool will only cough up data if it is fed SQL like queries.  The use of SQL like queries for filtering data is what gives the tool its power and control, while at the same time becoming a stopping point and a deal breaker for anyone not comfortable with SQL queries.   

   The purpose of this article is to attempt to explain the basic queries required to get you started with the tool and in the process show the power of the tool and how it helped me make small rocks out of big rocks.  

Installing the Tools:

  The tool is downloaded from here in the form of an msi.  It installs using a graphical installation, very much like many other tools.  Once installed the tool runs from the command line only.  For the purposes of the article, I will be using a security log extracted from a Windows Server 2008R2 Domain Controller that I own, and use for testing such as this.  If you want to follow along, you can extract the Security.evtx log from a similar server or even your Windows 7 machine.  The log is located under \Windows\System32\winevt\Logs.

The Test:

   Log Parser is a command line only utility.  To get started open up a command prompt and navigate to the Log Parser installation directory located under C:\Program Files (x86)\Log Parser 2.2. 

   The security log that I will be using for the write-up is called LosDC.evtx.  The log contains exactly 5,731 entries.  It is not a large log, but it contains the data that we need to illustrate the usage of the tool.  I extracted the log and placed it on my Windows 7 examination machine in a directory on the Desktop called “Test.”

   Now, the most basic SQL query that one can run looks something like this.  It is called a select statement.  “select * from LosDC.evtx”  The ‘select’, as you suspected, selects data that matches your criteria from the columns in your log.  In this instance we are not doing any matching yet, we are simply telling the tool to select everything by using an asterisk “*” from the LosDC.evtx log.  The tool needs to know what kind of file it is looking at.  You tell the tool that is it reading data from an event log with the -i:evt parameter, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt

   This query will send the first 10 lines of the file to standard output.  A lot of data is going to be sent to the screen.  It is very difficult to make any use of this data at this point.  The only positive that can come from this command is that you can begin to see the names of the columns in the event log like “TimeGenerated”, “EventID”, and so on.

    An easier way to see the columns in the event log is by using the datagrid output feature, which sends the data to a GUI, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt -o:datagrid

   Thanks to the GUI it is now easier to see the TimeGenerated and EventID columns.  Also, I want to point out the “Strings” column, which contains data that is very valuable to us.  The majority of the important data that we are after is going to be contained in this column.  So let us take a closer look at it. 

   If we build upon our last query and we now replace the asterisk "*" with the name of a specific column, the tool will now send the data matching our criteria to standard output, like so:

LogParser.exe "select strings from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt

   Notice that the tool is now displaying only the information that is found in the strings column.  The data is displayed in a delimited format.  The data is being delimited by pipes.  Field number 5 contains the username of the account, field number 8 contains the Log-On type, and field number 18 contains the source IP of the system that was used to authenticate against the domain controller. 

You have probably seen this data displayed in a prettier manner by Event Log Explorer. 

   Yet, is in fact the same data, and Log Parser has the ability to extract this data from hundreds of log files quickly and efficiently.  But to accomplish this we have to continue adding to our query.  In my recent case I was looking for the username, the log-on type, and source IP of all successful logins.  As mentioned earlier, this data was being stored in field 5, field 8, and field 18 of the Strings column.  To extract that data we need to craft a query that could extract these specific fields from the Strings column.  To accomplish that, we have to introduce a Log Parser function called extract_token.  The extract_token function gives Log Parser the ability to extract data from delimited columns like the Strings column.  

  To extract the data from the fifth delimited field in the strings column we need to add this to our query:

extract_token(strings,5,'|') AS User

   Let me break this down, extract_token is the function.  We open parenthesis and inside of the parenthesis we tell the function to go into the strings column and pull out the fifth field that is delimited by a pipe “|” and then we close parenthesis.  “AS User” is used so that once the data is pulled out of the Strings column, it is displayed in a new column with the new name of “User”.  It is like telling the function “Hey, display this as 'User'.”  

   To pull the data from the eighth field in the Strings column, we use this function:

extract_token(strings,8,'|') AS LogonType

   And finally to pull the data from the eighteenth field in the Strings column, we use this function:

extract_token(strings,18,'|') AS SourceIP

   We put it all together with the following query:

LogParser.exe "select TimeGenerated, EventID, extract_token(strings,5,'|') AS User, extract_token(strings,8,'|') AS LogonType, extract_token(strings,18,'|') AS SourceIP into C:\Users\carlos\Desktop\Test
\LosDC_4624_logons.csv from C:\Users\carlos\Desktop\Test\LosDC.evtx where eventid in (4624)" -i:evt -o:csv

   The select statement is now selecting the TimeGenerated and EventID columns, followed by the three extract_token functions to pull the data from the Strings column.  Into is an optional clause that specifies that the data be redirected to a file named  LosDC_4624_logons.csv in the Test directory.  From specifies the file to be queried, which is the LosDC.evtx log.  Where is also an optional clause which specifies data values to be displayed based on the criteria described.  The criteria described in this query is 4624 events contained in the eventid column.  The -o:csv is another output format like the datagrid, except this one sends the data to a csv file rather than a GUI.

   This is an example of what you can gather from the resulting CSV file.  This is what you would see if you were to sort the data in the CSV file by user.

   Notice the times, and source IP that was used by user “larry” when he used the RDP protocol (Logon Type 10) to remotely log-in to his system. 

Cool, Right?

   I want to point out that this log only contained 5731 entries and that the data redirected to the CSV file consisted of 1,418 lines.  That data was parsed and redirected in less than 0.2 seconds

   That is another example of the power of the tool.  Keep in mind that when you are parsing gigabytes worth of logs, the resulting CSV files are going to be enormous.  Below is an explorer screenshot displaying the amount of security event logs from one the servers in my case (Server name has been removed to protect the innocent).

   The sample data from that server was 40GB.  It was made up of 138 files each with approximately 416,000 records in each log.

   The tool parsed all of that that data in only 23 minutes.

   It searched 60 million records and created a CSV file with over 700,000 lines.  Although you can certainly open a CSV file with 700,000 lines in Excel or LibreOffice Calc, it is probably not a good idea.  Don't forget that you can search the CSV file directly from the command prompt with find.  Here is an example of searching the CSV file for user "larry" to quickly see which machines user "larry" used to authenticate on the Domain.  

   And there you have it!


   This is a free and powerful tool that allows you to query very large amounts of data for specific criteria contained within the tables of your many event log files.  If this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on twitter: @carlos_cajigas