Wednesday, September 10, 2014

Using Curl to Retrieve VirusTotal Malware Reports in BASH

   If you are in the DFIR world, there is a good chance that you often find yourself either submitting suspicious files to VirusTotal (VT) for scanning, or searching their database for suspicious hashes.  For these tasks and other neat features, VT offers a useful web interface were you can accomplish this.  If submitting one file or searching one hash at a time is enough for you, then their web interface should suffice for your needs.  Find the web interface at www.virustotal.com.

   If you are looking for a little bit more functionality or the ability to scan a set of suspicious hashes, you may want to look into using their public API.  VirusTotal's public API, among other things, allows you to access malware scan reports without the need to use their web interface.  Access to their API gives one the ability to build scripts that can have direct access to the information generated and stored by them. 

   To have access to the API you need to join their community and get your own API key.  The key is free and getting one is as simple as creating an account with them.  After joining their community you can locate your personal API key in your community profile. 

   In this article we will go through the process of communicating with their API form the Bourne-Again Shell (BASH) using the program curl.  The chosen format to communicate with the API is HTTP POST requests.  We will discuss a few curl commands, and once we become familiar with the commands, we will then incorporate the commands into a script to automate the process.  The command’s and the script were courtesy of a tip that I got from my co-worker John Brown.  He gave me permission to talk about his tip and permission to publish his script.   

   BASH is the default terminal shell in Ubuntu.  For the purposes of this article I used a VmWare Player Virtual Machine with Ubuntu 14.04 installed on it. 

Installing the tools:

   All of the tools that we will use are already included in Ubuntu by default.  You will not need to download and install any other tools.  If you want to follow along, make sure that you have your VT API key available.  Also, we are going to need suspicious hashes.  Feel free to use your own hashes, or copy these two md5 hashes that I will use for the article, e4736f7f320f27cec9209ea02d6ac695 and  7f16d6f96912db0289f76ab1cde3420b.  One of the hashes belongs to a fake antivirus piece of malware that I use for testing, and the other one is a hash of a text file that contains no malicious code.  One of the hashes will return hits the other one will not.  Let's get started. 

The test:

   Open a Terminal window, In Ubuntu you can accomplish this by pressing Ctrl-Alt-T at the same time or by going to the Dash Home and typing in “terminal”. 

   In order to communicate with the VT database to retrieve a file scan report we are going to need two things.  First we need to know the URL to send the POST request to.  That URL will be “https://www.virustotal.com/vtapi/v2/file/report.”  And second we will need to feed the curl command some parameters, your API key and a resource.  The API key will be the key that was given to you upon joining the VT community and the resource will be the md5 hash of the file in question.

   The following curl command should satisfy all of those requirements.

$ curl -s -X POST 'https://www.virustotal.com/vtapi/v2/file/report' --form apikey="c6e8f956YOURAPIKEY06a82eab47a0cb8cbYOURAPIKEY53aa118ba0db1faeb67" --form resource="e4736f7f320f27cec9209ea02d6ac695"


   Curl is the command that we will use to send the POST request to the specific VT URL.  The -s tells curl to be silent, to not print the progress bar.  The -X tells which request we want it to send, which in this instance is a POST request.  --form apikey= will be your API key, and --form resource is the MD5 hash of the aforementioned fake antivirus file.  These are my results.


   It looks like our fake antivirus file’s hash was located in the database and the file had been previously scanned by VT.  Lots of data with positive hits was returned.  The scanned file report currently contains no line breaks, so it was sent to our terminal window in a format that is difficult to read.  Let's see if we can fix that.  Notice that results from each individual antivirus solution start after each combination of a curly brace and a comma “},”  Armed with this information let’s add a new line character at the end of each one of those lines to separate the output so that we can see it better.   Run the same command as above, but this time let’s pipe it to sed 's|\},|\}\n|g'

$ carlos@vm:~$ curl -s -X POST 'https://www.virustotal.com/vtapi/v2/file/report' --form apikey="c6e8f9563e6YOURAPIKEY82eab47a0cb8cb9454824YOURAPIKEYba0db1faeb67" --form resource="e4736f7f320f27cec9209ea02d6ac695" | sed 's|\},|\}\n|g'

   The sed command is changing our standard output by switching every }, for a }n which is the curly brace followed by a newline.  The \ in the sed command is to escape the braces and the newlines so that the sed command can interpret these characters as literal characters and not as strings.  These are my results.


   We can now start to see information that we can work with.  From here you can redirect this data to a file or continue using grep, sed, awk and/or any other command line magic that you can throw at this output to continue editing it to your needs.  Personally, I am interested in the bottom area of the screen, the part that says "positives": 23,.  This tells me that this hash was recognized by 23 different antivirus engines on the VT database.  This is the data that I may need to pay attention to during an investigation.  That sed command was just an example of how to manipulate the output. 

   The next command will incorporate a combination of awk and sed pipes to filter the output to a final set of data that we felt comfortable working with.  We chose to filer the data with this combination of awk and sed commands.

$ carlos@vm:~$ curl -s -X POST 'https://www.virustotal.com/vtapi/v2/file/report' --form apikey="c6e8f9563e63YOURAPIKEY2eab47a0cb8cb9454824YOURAPIKEYba0db1faeb67" --form resource="e4736f7f320f27cec9209ea02d6ac695" | awk -F 'positives\":' '{print "VT Hits" $2}' | awk -F ' ' '{print $1$2$3$6$7}' | sed 's|["}]||g'



   The first awk command use the "positives" string as a field delimiter and tells it to print the string “VT Hits” followed by the second field, which is the 23 instances of positive hits.  The second awk command uses a space as a delimiter and tells it to print the first, second, third, sixth and seventh column to extract the string md5 and the md5 hash of the file from the output.  The last sed command is simply to remove any quotes and curly braces from the resulting output.  These are my results.



VTHits23,md5:e4736f7f320f27cec9209ea02d6ac695

   The end result is that we get a string of data that tells us the amount of antivirus solutions that recognize the file as being malicious plus the md5 hash of the file, so that we know which file is the suspicious file.

   If by now you are thinking that was way too long of a command to remember, or even wish to type again, then you are more like me.  For this reason, John has made a script available that automates this exact process, and is extremely easy to use.  Find the script here.

   After making the script executable, run the script and give it a hash value as an argument.  It will use the same command as above and will search the VT database for the hash that you fed it as an argument.  Run the script like this.

$ carlos@vm:~$ ./grabVThash.sh e4736f7f320f27cec9209ea02d6ac695

   These are my results.


   Same results as above.  The script automates the process of sending hash values to VT and sends the results to the screen.  It even has the ability to take a file containing multiple hashes as its input.  It will send 4 hashes per minute to VT as this is a limitation set by VT for its public access of the API.  You will need to add your API key to the script. 

Conclusion:

   VT gives us access to its database by allowing us to build scripts that can have direct access to the information generated and stored by them.  The script that we published is just one of many ways that we can add ease of access to the data stored by VT.  
   If this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on twitter: @carlos_cajigas