Wednesday, December 31, 2014

Analyzing Plain Text Log Files Using Linux Ubuntu

   Analyzing large amounts of plaint text log files for indications of wrong doing is not an easy task, especially if it is something that you are not accustomed to doing all the time.  Fortunately getting decent at it can be accomplished with a little bit of practice.  There are many ways to go about analyzing plaint text log files, but in my opinion a combination of a few built-in tools under Linux and a BASH terminal can crunch results out of the data, very quickly.

   I recently decided to stand up an SFTP server at home, so that I could store and share files across my network.  In order to publicly access the server from the public internet, I created strong passwords for my users, forwarded the ports on my router and went out for the weekend.  I accessed the server from the outside and shared some data.  From the time that I fired it up to the time I returned, only 30 hours passed.  I came back home to discover that the monitor attached to my server was going crazy trying to keep up with showing me large amounts of unauthorized log-in attempts.  It became evident that I was under attack, Oh my! 

   After some time and a little bit of playing around, I was able to get the situation under control.  Once the matter was resolved, I couldn't wait to get my hands on the server logs. 

   In this write-up, we go will over a few techniques that can be used to analyze plain text log files for evidence and indications of wrong doing.  I chose to use the logs from my SFTP server so that we can see what a real attack looks like.  In this instance, the logs are auth.log files from a BSD install, recovered from the /var/log directory.  Whether they are auth.log files, IIS, FTP, Apache, Firewall, or even a list of credit cards and phone numbers, as long as the logs are plain text files, the methodology followed in this write-up will apply to all and should be repeatable.  For the purposes of the article I used a VMware Player Virtual Machine with Ubuntu 14.04 installed on it.  Let's get started.

Installing the Tools:

  The tools that we will be using for the analysis are cat, head, grep, cut, sort, and uniq.  All of these tools come preinstalled in a default installation of Ubuntu, so there is no need to install anything else. 

The test:

  The plan is to go through the process of preparing data for analysis and go through the process of analyzing it.  Let's set up a working folder that we can use to store the working copy of the logs.  Go to your desktop, right click on your desktop and select “create new folder”, name it “Test”.

  This will be the directory that we will use for our test.  Once created, locate the log files that you wish to analyze and place them in this directory.  Tip: Do not mix logs from different systems or different formats into one directory.  Also, if your logs are compressed (ex: zip, gzip), uncompress them prior to analysis.

   Open a Terminal Window.  In Ubuntu you can accomplish this by pressing Ctrl-Alt-T at the same time.  Once the terminal window is open, we need to navigate to the previously created Test folder on the desktop.  Type the following into the terminal.

$ cd /home/carlos/Desktop/Test/

   Replace “carlos” with the name of the user account you are currently logged on as.  After doing so, press enter.  Next, type ls -l followed by enter to list the data (logs) inside of the Test directory.  The flag -l uses a long listing format.

   Notice that inside of the Test directory, we have 8 log files.  Use the size of the log (listed in bytes) as a starting point to get an idea of how much data each one of the logs may contain.  As a rule, log files store data in columns, often separated by a delimiter.  Some examples of delimiters can be commas, like in csv files, spaces or even tabs.  Taking a peak at the first few lines of a log is one of the first things that we can do to get an idea of the amount of columns in the log and the delimiter used.  Some logs, like the IIS logs, contain a header.  This header indicates exactly what each one of the columns is responsible for storing.  This makes it easy to quickly identify which column is storing the external IP, port, or whatever else you wish to find inside of the logs.  Let's take a look at the first few lines stored inside of the log tilted auth.log.0.  Type the following into the terminal and press enter.

$ cat auth.log.0 | head

   Cat is the command that prints file data to standard output, auth.log.0 is the filename of the log that we are reading with cat.  The “|” is known as a pipe.  A pipe is a technique in Linux for passing information from one program process to another.  Head is the command to list the first few lines of a file.  Explanation: What we are doing with this command is using the tool cat so send the data contained in the log to the terminal screen, but rather than sending all of the data in the log, we are “piping” the data to the tool head, which is used to only display the first few lines of the file, by default it only displays ten lines. 

   As you can see, this log contains multiple lines, and each one of the lines has multiple columns.  The columns account for date, time, and even descriptions of authentication attempts and failures.  We can also see that each one of these columns is separated by a space, which can be used as a delimiter.  Notice that some of the lines include the strings “Invalid user” and “Failed password”.  Right away, we have identified two strings that we can use to search across all logs for instances of either one of these strings.  By searching for these strings across the logs we should be able to identify instances of when a specific user and/or IP attempted to authenticate against our server.   

   Let's use the “Invalid user” string as an example and build upon our previous command.  Type the following into the terminal and press enter.

$ cat * | grep 'Invalid user' | head

   Just like in our previous command, cat is the command that prints the file data to standard output.  The asterisk “*” after cat is used to tell cat to send every file in the current directory to standard output.  This means that cat was told to send all of the data contained in all eight logs to the terminal screen, but rather than print the data to the screen, all of the data was passed (piped) over to grep so that grep can search the data for the string 'Invalid user'.  Grep is a very powerful string searching tool that has many useful options and features worth learning.  Lastly the data is once again piped to head so that we can see the first ten lines of the output.  This was done for display purposes only, otherwise over 12,000 lines containing the string 'Invalid user' would have been printed to the terminal, yikes!

   Ok, back to the task at hand.  Look at the 10th column of the output, the last column.  See the IP address of where the attempts are coming from?  Lets say that you were interested in seeing just that information from all of the logs and filter only for the tenth column, which contains the IP addresses.  This is accomplished with the command cut.  Let's continue to build on the command.  Type the following and press enter.

$ cat * | grep 'Invalid user' | cut -d " "  -f 10 | head

   In this command, after the data is searched for 'Invalid user' it is piped over to cut so that it may print only the tenth column.  The flag -d tells cut to use a space as a delimiter.  The space is put in between quotes so that cut can understand it.  The flag -f tells cut to print the tenth column only.  Head was again used for display purposes only.  Next, let’s see all of the IP's in the logs by adding sort and uniq to our command.

$ cat * | grep 'Invalid user' | cut -d " "  -f 10 | sort | uniq

   In this command, head is dropped and sort and uniq are added.  As you imagined, sort will sort the text, and uniq is responsible for omitting repeated text.  This is nice, but it leaves us wanting more.  If you wanted to see how many times each one of these IP's attempted to authenticate against the server, the flag -c of uniq will count each instance of the repeated text, like so. 

$ cat * | grep 'Invalid user' | cut -d " "  -f 10 | sort | uniq -c | sort -nr

   In this command, the instances of each IP found in the logs were counted by uniq and then again sorted by sort.  The flag -n is to do a numeric sort and the flag -r is so that the text is shown in reverse order. 

   And there you have it.  Now we can see who was most persistent at trying to get pictures of my dog from my SFTP server. 

   Keep practicing.  Hopefully this helped you in getting started with the basics of cat, grep, cut, sort, and uniq. 


   This is a quick and powerful way to search for specific patterns of text in a single plain text file or in many files.   If this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on twitter: @carlos_cajigas 

Suggested Reading:

- Ready to dive to the next level of command line fun?  Check out @jackcr ‘s article where he implements the use of a for loop to look for strings inside of a memory dump.  Awesome!  Find it here.

- Check out Ken's blog and his experience with running a Linux SSH honeypot.  Find it here.