Analyzing
large amounts of plaint text log files for indications of wrong doing is not an
easy task, especially if it is something that you are not accustomed to doing
all the time.  Fortunately getting decent
at it can be accomplished with a little bit of practice.  There are many ways to go about analyzing
plaint text log files, but in my opinion a combination of a few built-in tools
under Linux and a BASH terminal can crunch results out of the data, very quickly.
   I
recently decided to stand up an SFTP server at home, so that I could store and
share files across my network.  In order
to publicly access the server from the public internet, I created strong
passwords for my users, forwarded the ports on my router and went out for the
weekend.  I accessed the server from the
outside and shared some data.  From the
time that I fired it up to the time I returned, only 30 hours passed.  I came back home to discover that the monitor
attached to my server was going crazy trying to keep up with showing me large
amounts of unauthorized log-in attempts. 
It became evident that I was under attack, Oh my!  
   After
some time and a little bit of playing around, I was able to get the situation
under control.  Once the matter was
resolved, I couldn't wait to get my hands on the server logs.  
   In
this write-up, we go will over a few techniques that can be used to analyze plain
text log files for evidence and indications of wrong doing.  I chose to use the logs from my SFTP server
so that we can see what a real attack looks like.  In this instance, the logs are auth.log files
from a BSD install, recovered from the /var/log directory.  Whether they are auth.log files, IIS, FTP,
Apache, Firewall, or even a list of credit cards and phone numbers, as long as
the logs are plain text files, the methodology followed in this write-up will
apply to all and should be repeatable. 
For the purposes of the article I used a VMware Player Virtual Machine
with Ubuntu 14.04 installed on it.  Let's
get started.
Installing the Tools:
  The
tools that we will be using for the analysis are cat, head, grep, cut, sort,
and uniq.  All of these tools come
preinstalled in a default installation of Ubuntu, so there is no need to
install anything else.  
The test:
  The
plan is to go through the process of preparing data for analysis and go through
the process of analyzing it.  Let's set
up a working folder that we can use to store the working copy of the logs.  Go to your desktop, right click on your
desktop and select “create new folder”, name it “Test”.
  This
will be the directory that we will use for our test.  Once created, locate the log files that you
wish to analyze and place them in this directory.  Tip: Do not mix logs from different systems
or different formats into one directory. 
Also, if your logs are compressed (ex: zip, gzip), uncompress them prior
to analysis.
   Open
a Terminal Window.  In Ubuntu you can
accomplish this by pressing Ctrl-Alt-T at the same time.  Once the terminal window is open, we need to
navigate to the previously created Test folder on the desktop.  Type the following into the terminal.
$ cd /home/carlos/Desktop/Test/
   Replace “carlos” with the name of
the user account you are currently logged on as.  After doing so, press enter.  Next, type ls -l followed by enter to list
the data (logs) inside of the Test directory. 
The flag -l uses a long listing format.
   Notice
that inside of the Test directory, we have 8 log files.  Use the size of the log (listed in bytes) as
a starting point to get an idea of how much data each one of the logs may
contain.  As a rule, log files store data
in columns, often separated by a delimiter. 
Some examples of delimiters can be commas, like in csv files, spaces or
even tabs.  Taking a peak at the first
few lines of a log is one of the first things that we can do to get an idea of
the amount of columns in the log and the delimiter used.  Some logs, like the IIS logs, contain a
header.  This header indicates exactly
what each one of the columns is responsible for storing.  This makes it easy to quickly identify which
column is storing the external IP, port, or whatever else you wish to find
inside of the logs.  Let's take a look at
the first few lines stored inside of the log tilted auth.log.0.  Type the following into the terminal and
press enter.
$
cat auth.log.0 | head
   Cat
is the command that prints file data to standard output, auth.log.0 is the
filename of the log that we are reading with cat.  The “|” is known as a pipe.  A pipe is a technique in Linux for passing
information from one program process to another.  Head is the command to list the first few
lines of a file.  Explanation: What we
are doing with this command is using the tool cat so send the data contained in
the log to the terminal screen, but rather than sending all of the data in the
log, we are “piping” the data to the tool head, which is used to only display
the first few lines of the file, by default it only displays ten lines.  
   As
you can see, this log contains multiple lines, and each one of the lines has
multiple columns.  The columns account
for date, time, and even descriptions of authentication attempts and failures.  We can also see that each one of these
columns is separated by a space, which can be used as a delimiter.  Notice that some of the lines include the
strings “Invalid user” and “Failed password”. 
Right away, we have identified two strings that we can use to search
across all logs for instances of either one of these strings.  By searching for these strings across the
logs we should be able to identify instances of when a specific user and/or IP
attempted to authenticate against our server.   
   Let's
use the “Invalid user” string as an example and build upon our previous
command.  Type the following into the
terminal and press enter.
$
cat * | grep 'Invalid user' | head
   Just
like in our previous command, cat is the command that prints the file data to
standard output.  The asterisk “*” after
cat is used to tell cat to send every file in the current directory to standard
output.  This means that cat was told to
send all of the data contained in all eight logs to the terminal screen, but
rather than print the data to the screen, all of the data was passed (piped)
over to grep so that grep can search the data for the string 'Invalid
user'.  Grep is a very powerful string
searching tool that has many useful options and features worth learning.  Lastly the data is once again piped to head so
that we can see the first ten lines of the output.  This was done for display purposes only,
otherwise over 12,000 lines containing the string 'Invalid user' would have
been printed to the terminal, yikes!
   Ok,
back to the task at hand.  Look at the 10th
column of the output, the last column. 
See the IP address of where the attempts are coming from?  Lets say that you were interested in seeing
just that information from all of the logs and filter only for the tenth
column, which contains the IP addresses. 
This is accomplished with the command cut.  Let's continue to build on the command.  Type the following and press enter.
$
cat * | grep 'Invalid user' | cut -d " "  -f 10 | head
   In
this command, after the data is searched for 'Invalid user' it is piped over to
cut so that it may print only the tenth column. 
The flag -d tells cut to use a space as a delimiter.  The space is put in between quotes so that cut
can understand it.  The flag -f tells cut
to print the tenth column only.  Head was
again used for display purposes only. 
Next, let’s see all of the IP's in the logs by adding sort and uniq to
our command.
$
cat * | grep 'Invalid user' | cut -d " "  -f 10 | sort | uniq
   In
this command, head is dropped and sort and uniq are added.  As you imagined, sort will sort the text, and
uniq is responsible for omitting repeated text. 
This is nice, but it leaves us wanting more.  If you wanted to see how many times each one
of these IP's attempted to authenticate against the server, the flag -c of uniq
will count each instance of the repeated text, like so.  
$
cat * | grep 'Invalid user' | cut -d " "  -f 10 | sort | uniq -c | sort -nr
   In
this command, the instances of each IP found in the logs were counted by uniq
and then again sorted by sort.  The flag
-n is to do a numeric sort and the flag -r is so that the text is shown in
reverse order.  
   And
there you have it.  Now we can see who
was most persistent at trying to get pictures of my dog from my SFTP server.  
   Keep
practicing.  Hopefully this helped you in
getting started with the basics of cat, grep, cut, sort, and uniq.  
Conclusion:
   This
is a quick and powerful way to search for specific patterns of text in a single
plain text file or in many files.   If
this procedure helped your investigation, we would like to hear from you.  You can leave a comment or reach me on
twitter: @carlos_cajigas  
Suggested
Reading:
-
Ready to dive to the next level of command line fun?  Check out @jackcr ‘s article where he
implements the use of a for loop to look for strings inside of a memory
dump.  Awesome!  Find it here.








 
































