Mash That Key: 2015

Saturday, December 5, 2015

Mounting the VMFS File System of an ESXi Server Using Linux

It won't happen very often when you will find yourself holding in your hand a hard drive that belonged to an ESXi server. These servers usually house production machines that just don't get shutdown very often. Why the decision has been made to turn it off is one that I am sure was not made lightly. Whatever the scenario is, it is what it is. It wasn't your call, but the client decided to shut down their ESXi server and subsequently shipped it to you for analysis. Now you have the drive in your hand and you have been tasked with extracting the Virtual Machines out of the drive for analysis.

The underlying file system of an ESXi server is the VMFS file system. It stands for Virtual Machine File System. VMFS is Vmware, Inc.'s clustered file system used by the company's flagship server visualization suite, vSphere. It was developed to store virtual machine disk images, including snapshots. Multiple servers can read/write the same file system simultaneously while individual virtual machine files are locked (Source Wikipedia).

As of the date of this writing, not all of the big forensic suites have the ability to read this file system. And I can understand why, as is extremely difficult for the commercial suites to offer support for all available file systems. Fortunately for us, it is very possible to read this file system using Linux.

The purpose of this article is to go over the steps required to mount the VMFS file system of the drive from an ESXi server. Once access to the file system has been accomplished, we will acquire a Virtual Machine stored on the drive.

Installing the Tools:

For you to be able to accomplish the task, you will have to make sure that you have vmfs-tools installed on your Linux examination machine. You can get the package from the repositories by running $ sudo apt-get install vmfs-tools. Vmfs-tools is included by default in LosBuntu. LosBuntu is our own Ubuntu 14.04 distribution that can be downloaded here. If you download and boot your machine with LosBuntu, you will be able to follow along and have the exact same environment described in this write-up.

The Test:

To illustrate the steps of mounting the partition containing the VMFS file system on the drive, I will use a 2TB hard drive with ESXi 6.0 installed on it. This drive is from an ESXi server that I own. The ESXi server drive is currently housing some virtual machines that we will be able to see, once the file system is mounted. I booted an examination machine with a live version on LosBuntu and connected the drive to the machine. LosBuntu’s default behavior is to never auto-mount drives.

Now, fire up the terminal and let's begin the first step of identifying the drive. Usually the first step involves running fdisk, so that we can identify which physical assignment was given to the drive. Running $ sudo fdisk –l lists the physical drives attached to the system, the flag -l tells fdisk to list the partition table. Sudo gives fdisk superuser privileges for the operations. Press enter and type the root password (if needed, pw is "mtk").

$ sudo fdisk -l

Not show on the screen is /dev/sda, which is my first internal drive, therefore /dev/sdb should the drive of the ESXi server. The output of fdisk give us a warning that /dev/sdb may have been partitioned with GPT and fdisk was unable to read the partition table. Fdisk is telling us to use parted, so let’s do that. The following parted command will hopefully get us closer to what we need.

$ sudo parted /dev/sdb print

From the output, we can see that yes, it is indeed a GPT partitioned drive, containing multiple partitions. The last displayed partition, which is actually partition number three, looks to be the largest partition of them all. Although parted was able to read the partition table, it was unable to identify the file system contained in partition three. We currently have a strong suspicion that /dev/sdb is our target drive containing our target partition, but it would be nice to have confirmation. Let's run one more command.

$ sudo blkid -s TYPE /dev/sdb*

Blkid is a command that has the ability to print or display block device attributes. The flag -s TYPE will print the file system type of the partitions contained in /dev/sdb. We used an asterisk “*” after sdb so that blkid can show us the file system types of all partitions located in physical device sdb like sdb1, sdb2, sdb3 and so on.

Finally, we can now see that /dev/sdb3 is the partition that contains the VMFS volume.

To mount the file system we are going to have to call upon vmfs-fuse, which is one of the commands contained within the vmfs-tools package built into LosBuntu. But before we call upon vmfs-fuse, we need to create a directory to mount the VMFS volume. Type $ sudo mkdir /mnt/vmfs to create our mount point.

Mount the VMFS file system contained in /dev/sdb3 to /mnt/vmfs with the below command

$ sudo vmfs-fuse /dev/sdb3 /mnt/vmfs/

As you can see, the execution of the command simply gave us our prompt back. As my friend Gene says. “You will not get a pat on the back telling you that you ran your command correctly or that it ran successfully, so we need to go check.” True and amusing at the same time…

Check the contents of /mnt/vmfs by first elevating our privileges to root, with $ sudo su and then by listing its contents with # ls -l /mnt/vmfs.

Great! We can read the volume and we see that we have many directories belonging to Virtual Machines. From here you can remain in the terminal and navigate to any of these directories, or you can fire up nautilus and have a GUI to navigate. The following command will open nautilus at the location of your mount point as root. It is important to open nautilus as root so that your GUI can have the necessary permissions to navigate the vmfs mount point that was created by root.

# nautilus /mnt/vmfs

Insert another drive to your examination machine and copy out any of the Virtual Machines that are in scope.

Another option would be to make a forensic image of the Virtual Machine. For example, we can navigate to the Server2008R2DC01 directory, which houses the Domain Controller used on the previous write-up about examining Security logs. Find that article here.

In this specific instance, this Virtual Machine does not contain snapshots. This means that the Server2008R2DC01-flat.vmdk is the only virtual disk in this directory responsible for storing the data on disk about this server. If the opposite were true, you would have to collect all of the delta-snapshot.vmdk files to put back together at a later time.

The Server2008R2DC01-flat.vmdk file is a raw representation of the disk. It is not compressed and can be read and mounted directly. The partition table can be read with the sleuthkit tool mmls. Mmls is a tool that can display the partition layout of volumes. Type the following into the terminal and press enter. The flag -a is to show allocated volumes, and the flag -B is to include a column with the partition sizes in bytes.

# mmls -aB Server2008R2DC01-flat.vmdk

You can see that the 50GB NTFS file system starts at sector offset 206848.

If you want to acquire this virtual disk in E01 format, add the flat-vmdk file to Guymager as a special device and acquire it to another drive.

And there you have it!

Conclusion:

Using free and open source tools you have been able to mount and acquire images of Virtual Machines contained in the file system of a drive belonging to an ESXi server. If this procedure helped your investigation, we would like to hear from you. You can leave a comment or reach me on twitter: @carlos_cajigas

Wednesday, December 2, 2015

Crafting Queries and Extracting Data from Event Logs using Microsoft Log Parser

During a recent engagement, while hunting for threats in a client's environment, I got tasked with having to analyze over a terabyte worth of security (Security.evtx) event logs. A terabyte worth of logs amounts to, a lot of logs. We are talking close to a thousand logs, each containing approximately 400,000 events from dozens of Windows servers, including multiple domain controllers. Did I say, a lot of logs?

Unfortunately, this wasn't the only task of the engagement, so I needed to go through these logs and I needed to do it quickly. I needed to do it quickly because like in most engagements, time is against you.

When you only have a few logs to look at, one of my tools of choice on the Windows side is Event Log Explorer. Event Log Explorer is great. It is a robust, popular, GUI tool with excellent filtering capabilities. On the Linux side, I have used Log2timeline to convert dozens of evtx files to CSV and then filter the CSV file for the data that I was looking for. But this was another animal, a different beast. This beast needed a tool that could parse a very large amounts of logs and have the ability to filter for specific events within the data. The answer to the problem came in the form of a tiny tool simply called Log Parser.

Log Parser is a free tool designed by Microsoft. You can download the tool here. According to the documentation from the site the tool is described in this manner. “Log Parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files.” That one-liner perfectly sums up why the tool is so powerful, yet not as popular as other tools. Log parser provides query access to data. What does that mean? This means that if you want to parse data with this tool you have to be somewhat comfortable with the Structured Query Language (SQL). The tool will only cough up data if it is fed SQL like queries. The use of SQL like queries for filtering data is what gives the tool its power and control, while at the same time becoming a stopping point and a deal breaker for anyone not comfortable with SQL queries.

The purpose of this article is to attempt to explain the basic queries required to get you started with the tool and in the process show the power of the tool and how it helped me make small rocks out of big rocks.

Installing the Tools:

The tool is downloaded from here in the form of an msi. It installs using a graphical installation, very much like many other tools. Once installed the tool runs from the command line only. For the purposes of the article, I will be using a security log extracted from a Windows Server 2008R2 Domain Controller that I own, and use for testing such as this. If you want to follow along, you can extract the Security.evtx log from a similar server or even your Windows 7 machine. The log is located under \Windows\System32\winevt\Logs.

The Test:

Log Parser is a command line only utility. To get started open up a command prompt and navigate to the Log Parser installation directory located under C:\Program Files (x86)\Log Parser 2.2.

The security log that I will be using for the write-up is called LosDC.evtx. The log contains exactly 5,731 entries. It is not a large log, but it contains the data that we need to illustrate the usage of the tool. I extracted the log and placed it on my Windows 7 examination machine in a directory on the Desktop called “Test.”

Now, the most basic SQL query that one can run looks something like this. It is called a select statement. “select * from LosDC.evtx” The ‘select’, as you suspected, selects data that matches your criteria from the columns in your log. In this instance we are not doing any matching yet, we are simply telling the tool to select everything by using an asterisk “*” from the LosDC.evtx log. The tool needs to know what kind of file it is looking at. You tell the tool that is it reading data from an event log with the -i:evt parameter, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt

This query will send the first 10 lines of the file to standard output. A lot of data is going to be sent to the screen. It is very difficult to make any use of this data at this point. The only positive that can come from this command is that you can begin to see the names of the columns in the event log like “TimeGenerated”, “EventID”, and so on.

An easier way to see the columns in the event log is by using the datagrid output feature, which sends the data to a GUI, like so:

LogParser.exe "select * from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt -o:datagrid

Thanks to the GUI it is now easier to see the TimeGenerated and EventID columns. Also, I want to point out the “Strings” column, which contains data that is very valuable to us. The majority of the important data that we are after is going to be contained in this column. So let us take a closer look at it.

If we build upon our last query and we now replace the asterisk "*" with the name of a specific column, the tool will now send the data matching our criteria to standard output, like so:

LogParser.exe "select strings from C:\Users\carlos\Desktop\Test\LosDC.evtx" -i:evt

Notice that the tool is now displaying only the information that is found in the strings column. The data is displayed in a delimited format. The data is being delimited by pipes. Field number 5 contains the username of the account, field number 8 contains the Log-On type, and field number 18 contains the source IP of the system that was used to authenticate against the domain controller.

You have probably seen this data displayed in a prettier manner by Event Log Explorer.

Yet, is in fact the same data, and Log Parser has the ability to extract this data from hundreds of log files quickly and efficiently. But to accomplish this we have to continue adding to our query. In my recent case I was looking for the username, the log-on type, and source IP of all successful logins. As mentioned earlier, this data was being stored in field 5, field 8, and field 18 of the Strings column. To extract that data we need to craft a query that could extract these specific fields from the Strings column. To accomplish that, we have to introduce a Log Parser function called extract_token. The extract_token function gives Log Parser the ability to extract data from delimited columns like the Strings column.

To extract the data from the fifth delimited field in the strings column we need to add this to our query:

extract_token(strings,5,'|') AS User

Let me break this down, extract_token is the function. We open parenthesis and inside of the parenthesis we tell the function to go into the strings column and pull out the fifth field that is delimited by a pipe “|” and then we close parenthesis. “AS User” is used so that once the data is pulled out of the Strings column, it is displayed in a new column with the new name of “User”. It is like telling the function “Hey, display this as 'User'.”

To pull the data from the eighth field in the Strings column, we use this function:

extract_token(strings,8,'|') AS LogonType

And finally to pull the data from the eighteenth field in the Strings column, we use this function:

extract_token(strings,18,'|') AS SourceIP

We put it all together with the following query:

LogParser.exe "select TimeGenerated, EventID, extract_token(strings,5,'|') AS User, extract_token(strings,8,'|') AS LogonType, extract_token(strings,18,'|') AS SourceIP into C:\Users\carlos\Desktop\Test

\LosDC_4624_logons.csv from C:\Users\carlos\Desktop\Test\LosDC.evtx where eventid in (4624)" -i:evt -o:csv

The select statement is now selecting the TimeGenerated and EventID columns, followed by the three extract_token functions to pull the data from the Strings column. Into is an optional clause that specifies that the data be redirected to a file named LosDC_4624_logons.csv in the Test directory. From specifies the file to be queried, which is the LosDC.evtx log. Where is also an optional clause which specifies data values to be displayed based on the criteria described. The criteria described in this query is 4624 events contained in the eventid column. The -o:csv is another output format like the datagrid, except this one sends the data to a csv file rather than a GUI.

This is an example of what you can gather from the resulting CSV file. This is what you would see if you were to sort the data in the CSV file by user.

Notice the times, and source IP that was used by user “larry” when he used the RDP protocol (Logon Type 10) to remotely log-in to his system.

Cool, Right?

I want to point out that this log only contained 5731 entries and that the data redirected to the CSV file consisted of 1,418 lines. That data was parsed and redirected in less than 0.2 seconds

That is another example of the power of the tool. Keep in mind that when you are parsing gigabytes worth of logs, the resulting CSV files are going to be enormous. Below is an explorer screenshot displaying the amount of security event logs from one the servers in my case (Server name has been removed to protect the innocent).

The sample data from that server was 40GB. It was made up of 138 files each with approximately 416,000 records in each log.

The tool parsed all of that that data in only 23 minutes.

It searched 60 million records and created a CSV file with over 700,000 lines. Although you can certainly open a CSV file with 700,000 lines in Excel or LibreOffice Calc, it is probably not a good idea. Don't forget that you can search the CSV file directly from the command prompt with find. Here is an example of searching the CSV file for user "larry" to quickly see which machines user "larry" used to authenticate on the Domain.

And there you have it!

Conclusion:

This is a free and powerful tool that allows you to query very large amounts of data for specific criteria contained within the tables of your many event log files. If this procedure helped your investigation, we would like to hear from you. You can leave a comment or reach me on twitter: @carlos_cajigas

Tuesday, November 10, 2015

Creating a Virtual Machine of a Windows 10 Disk Image Using a Linux Live Distro

The process of converting a full physical acquisition of a hard disk into a fully functioning virtual machine (VM) has been covered many times. Probably, because interacting with a machine the same way that your suspect did just prior to the machine being seized, is a technique that in my opinion although underused is still very valuable. There are things that can be learned about the habits of your suspect that may only be discovered by taking the time to look at your seized data in a live manner.

To accomplish this, one tool that I still hear people talking about on the Windows side is LiveView. At the time that I tried using it, the tool required that a raw image of the disk be used. This meant taking the time to convert your E01 to a raw image, which took time and wasted space.

Alternatives to LiveView, are discussed in great detail by Jimmy Weg, on his blog justaskweg.com. Jimmy even wrote an article on going from a write blocked drive to a VM, which I found very useful.

Lucky for us, going from a write blocked drive to a VM can also be accomplished in Linux, and is something that I have discussed and covered previously.

In this article, I want to talk about booting a disk image of a Windows 10 machine. For the purposes of this article I used a live Linux distribution of LosBuntu. LosBuntu is our own Ubuntu 14.04 distribution that can be downloaded here.

The Plan:

The plan is to use a live version of LosBuntu and boot your machine from it. Whether you boot LosBuntu from a DVD or a flash-drive, the process should be the same. Select a machine that is powerful and has plenty of ram. Aside from the fact that LosBuntu already has xmount installed on it, another benefit to using a live distribution is to accomplish complete segregation. Any malware that you catch or any action that you wished reversed can be dealt with by simply shutting down the machine.

Installing the Tools:

The tools that we will be using during the process are xmount and VMware Workstation Player 12 (VMware). Xmount comes preinstalled in the Live version of LosBuntu, but if you choose to install it yourself, find it here https://pinguin.lu/pkgserver. VMware can be downloaded free here.

To install VMware, issue the below command. When prompted, enter the root password, which is “mtk” without the quotes.

$ sudo bash VMware-Player-12.0.1-3160714.x86_64.bundle

Use the VMware installer graphical user interface to complete the installation.

The Test:

To illustrate the steps of converting a disk image of a Windows 10 machine to a VM, I will be using a previously acquired disk image of a Windows 10 operating system from a 512GB SSD that I use for testing.

The acquisition of the disk was done using the E01 format with best compression and 4000mb chunks. The image compressed down to about 33GB spanned into 8 different segments. Due to the compression, the disk image is only occupying 33GB worth of space, rather than 512GB had we used the RAW format during acquisition. That is a lot of saved space, thanks to the compression! Great.

Let us now turn our attention to the point of the write up, converting this E01 to a virtual machine. To accomplish this feat, we are going to summon the powers of xmount. Xmount is a very powerful tool written by Dan Gillen. The tool that has the ability to convert on-the-fly between multiple input and output hard disk image types. In other words, xmount can take our E01 image and convert it to a raw image (DD), on-the-fly, all while maintaining the integrity of the data.

Xmount can also turn a DD or an E01 into a VMDK (VMware virtual disk), and redirect writes to a cache file. This makes it for example, possible to use VMware to boot an Operating System contained in a read-only DD or E01 image.

For us to pull off the trick of turning an E01 into a VM, we are going to pass xmount the following instructions. Enter this command into the terminal:

$ sudo xmount --in ewf Win10.E?? --out vmdk --cache /mnt/cache/win10.cache /mnt/vmdk/

Xmount is the command to crossmount, --in ewf lets xmount know that we are passing it an image using the E01 format, Win10.E?? is the E01 image. In this example we have more than one segment so we must use “E??” as the file extension, to specify the segment files. --out vmdk tells xmount to convert the E01 to a VMDK, --cache /mnt/cache/win10.cache is the name of the cache file that will store all of the writes being written by the operating system, and /mnt/vmdk/ is a previously created mount point for the vmdk file. Sudo gives xmount superuser privileges for the operations.

If you received your prompt back without any errors, then it may be safe to assume that you issued the correct command. At this point, you now have the E01 converted to a vmdk, that is ready to be opened in VMware.

Now, fire up VMware and go through the process of creating a Windows 10 VM. This write up assumes that you know the process, so we will not bore you with steps of how to set up a VM. If needed, a web search on the topic will reveal multiple articles on accomplishing that specific task.

As you go through the process of creating your Windows 10 VM, I would recommend that you give the VM 4GB of ram and 2 cores. I would also recommend that you un-check the box labeled “connect at power on” for your network adapter. This is your call, but I choose not to allow suspect machines to connect to the internet.

Finish, setting up your machine and get back to the home screen

We are almost ready to fire up the machine. But before we do that we have to do some final tweaks. An important one is adding the vmdk file to the virtual machine. Click on “edit the virtual machine settings” and remove the disk assigned to the VM.

Add the vmdk file that we previously mounted to /mnt/vmdk/

Lastly, we need to edit the vmx configuration file so that VMware knows that it needs to get ready to handle an image containing GPT/UEFI settings. This is a very important step. If you omit this step, you will likely get a “no operating system found” error. Open the vmx file with your favorite text editor and add a line at the bottom of the file that reads firmware = “efi”

Once this is done, go back to Vmware and start your VM.

If everything went according to plan, you should now have a fully functioning VM, revealing all of the settings and unique configurations issued by your suspect to his/her machine. Feel free to navigate to your hearts content. Any edits that you make will be written to the cache file and will survive reboots. If you need to edit the registry, go ahead, the cache file will save the edits. Feel free to take screen-shots or do anything that you need without having to worry about changing the integrity of the image. No changes will be made to the image as E01's are read-only files. When you are done with the machine, shut it down. If you used LosBuntu as a live distribution, then the OS on your internal drive will also be untouched.

And there you have it.

Conclusion:

This is a completely free and quick way to see your suspect's system in a live manner, all while preserving the integrity of your data. If this procedure helped your investigation, we would like to hear from you. You can leave a comment or reach me on twitter: @carlos_cajigas

Tuesday, April 28, 2015

Acquiring an Image of an Amazon EC2 Linux Instance

As cloud services continue gaining popularity and become more affordable, more people are learning about what is available and are increasingly opting-in to the idea of having computers in the cloud. This became evident during a recent conversation with my old friend JJ, @jxor2378. I called JJ to get his opinion on what an ideal password cracking rig would be? Without hesitation, JJ answered, “Why would you invest in buying the hardware, when you can just rent it!” And he was right, for what I needed, using cloud computing services was in fact a good match for me.

It is not like I didn't know about the concept of cloud computing, it was just simply that because I hadn't had a need for it, I hadn't taken the time to do my computing in the cloud just yet. By that afternoon that changed. JJ recommended that I play with Amazon's Elastic Compute Cloud, also known as their EC2 service.

Paraphrased from their website, http://aws.amazon.com/ec2/. Amazon's EC2 is a service that provides resizable compute capacity, designed to make cloud computing easier. The web service interface allows you to obtain and configure capacity with minimal friction, and complete control of your computing resources. You can quickly scale capacity, both up and down, as your computing requirements change, and you only pay for capacity that you actually use.

That last line is the neat thing about the service. You only pay for what you use, and they even offer you a chance to try their service for free. It only took a few minutes to get an instance up and running. Once I had access to the instance, I couldn't help but wonder how I would go about analyzing it for forensic artifacts.

In this article we are going to go over the steps of how to acquire an image of a Linux Ubuntu Server Amazon EC2 Instance. For the purposes of this article I used a Live Linux Distribution of LosBuntu. LosBuntu is our own Ubuntu 14.04 distribution that can be downloaded here.

Installing the Tools:

The tools that we will be using during the process are ssh, dd and netcat. All of these tools come preinstalled in the Live version of LosBuntu, so there is no need to install anything else.

The Plan:

The plan is to go fire up an EC2 instance, remotely log-in to the instance and then go through the steps of acquiring an image of the instance back to a remote location of your choice. Let's get started

The Test:

To set up your instance, you can use your current amazon username and password to log in to aws.amazon.com. Once authenticated, navigate to the EC2 dashboard and click on create instance. The instance that we will use for the test is the Ubuntu Server 14.04 LTS t2.micro instance that you can try for free.

For added security purposes we created a public/private key pair that can be used to securely SSH into the instance. Using a key to SSH into a system offers a bit more security than a username and password combination alone. I named the key ec2key and downloaded the key.

Once you have downloaded the key, simply launch the instance with all of the defaults. The service shines due to the amount of customization that you can do to your instance, but that is beyond the scope of this article. For now, all defaults will work.

Once you have your instance running, locate the “Connect” icon and click it.

A set of instructions on how to connect will appear on your screen, including the username and public IP address of the instance.

That's it. That is all the information that is needed to SSH into the instance. We now have the information that we need, so let’s take care of some final things. For the command on the screenshot to work, you will need to make sure that, if your key is not in your current working directory, you will need to provide the path to it. Also, the permissions of the key must be set so no other users or groups can read it, the command $ sudo chmod 400 yourkey.pem will take care of that requirement.

Now type the below command to SSH into the server. SSH is the remote login tool that will establish a secure encrypted connection to the server, the “-i” is the option that points SSH to your identity file (key), ubuntu is the username and @51.11.255.55 is the public IP of the instance. Remember to change the key to the name that you provided and use the IP of your instance.

$ ssh -i ec2key.pem ubuntu@52.11.255.55

We are now logged-on to the EC2 instance. From here you can remotely control the system and do whatever it is that you intent to do with it. The possibilities are endless, but there is one caveat... Since we set up the instance for remote access using a key, anytime that we want to access the instance we are going to have to pass it this key so that access can be granted. This means that if you, for example, want to use Remote Desktop Protocol (RDP) to get a GUI on your screen, you may have to do it by authenticating to the instance first using an SSH tunnel.

Which is exactly what we are going to do to acquire the image of this instance. We are going to establish a second SSH connection to a second remote server a pass the entire contents of the instance's hard drive through an SSH tunnel. This process can be accomplished by standing up a second EC2 instance with enough space to store your image, or you can use an already publicly accessible existing server that you control.

If you read the article titled “Analyzing Plain Text Log Files Using Linux Ubuntu” then you may know that I like to run a publicly accessible server to transfer data and serve files. So I took advantage of the SSH access to this server that I control, and authenticated back into it from the EC2 instance using an SSH tunnel.

The SSH tunnel consists of an encrypted tunnel created through the SSH protocol connection that can be used to transfer unencrypted traffic over the network through an encrypted channel. The purpose here is to use the SSH tunnel to securely transfer the entire contents of the hard drive using DD and Netcat through the tunnel, even though Netcat itself does not use encryption. All of the contents of the drive from the EC2 instance will travel encrypted through the tunnel back to my forensic machine in my lab.

You do not need to go through the trouble of setting up a public server for this. A second EC2 instance will also work, but you will then have to transfer the now acquired image back to you. Accessing a server that you control kills two birds with one stone.

Using its own terminal window on the EC2 instance, type the below command set up the SSH tunnel back to your server. SSH is the command -p 5678 is to tell SSH to use a non-default port to connect to the server you control, -N is used so SSH does not execute any remote commands, which is useful when just forwarding ports, -L is to specify the port on the local host that is to be forwarded to the given host and port on the remote side. Secretuser is the user on my server and 432.123.456.1 is the public IP of the server is that will be receiving the data.

$ ssh -p 5678 -N -L 4444:127.0.0.1:4444 secretuser@432.123.456.1

secretuser@432.123.456.1's password:

If everything went well and you entered the password correctly, this shell window is simply going to hang and will not show any output. From this point forward, any data that is sent to localhost on port 4444 is going to be redirected to the server back in my lab.

On that server you will now need to set up a netcat listening session with the below command. Nc is the netcat command, -l is to listen, -p is the port to listen on, and we are piping that data to pv and redirecting it to a file titled ec2image.dd. Pv is a neat utility that will measure the data that is passed through the pipe. A visual of what data is coming in, helps in determining if things are going according to plan.

$ nc -l -p 4444 | pv > ec2image.dd

Finally, on the EC2 instance run dd to image the instance’s hard drive and pipe it to netcat using the below command. Remember that you are piping to netcat on localhost to port 4444.

$ sudo dd if=/dev/xvda bs=4k | nc -w 3 127.0.0.1 4444

This is an illustration of how the data should flow.

# Image courtesy of Freddy Chid @fchidsey0144

After about 30 minutes it had sent over 4GB of data to my server located many states away from the EC2 instance.

$ nc -l -p 4444 | pv > ec2image.dd

4.39GB 0:32:28 [2.51MB/s] [ <=> ]

It finished in less than an hour and transferred the entire 8GB (default) bit-by-bit image of the hard drive from the instance.

$ nc -l -p 4444 | pv > ec2image.dd

8GB 0:57:30 [2.37MB/s] [ <=> ]

Speeds will vary depending on bandwidth. In reference to image verification, since this was a live acquisition, doing a hash comparison at this point will not be of much value. At the very least, check and compare that the size of your dd image matches the amount of bytes contained in /dev/xvda from the instance. This can be accomplished by comparing the output of fdisk -l /dev/svda against the size of the acquired dd.

Check the size of the hard drive on the instance:

$ sudo fdisk -l /dev/xvda

Disk /dev/xvda: 8589 MB, 8589934592 bytes

Check the size of the dd on your sever.

$ ls -l

total 8388612

-rw-r--r-- 1 secretuser secretgroup 8589934592 Apr 17 18:20 ec2image.dd

We have a match. And there you have it. You have acquired an image of an Ubuntu 14.04 Server running on Amazon EC2.

Conclusion:

Amazon offers a full set of developer API tools for EC2 that might offer an easier way of accomplishing this task. If in a pinch, and if you have an evening to spare, know that at least you have this option available to get the job done. If this procedure helped your investigation, we would like to hear from you. You can leave a comment or reach me on twitter: @carlos_cajigas