Archive for December, 2009

2009: A review

Well, the year is nearly over and it seems everyone is in a reflective mode so I thought I’d join in. And I’m glad I did, didn’t really just how turbulent year I’ve had. I’d better (on pain of death) start with the none technical, as it is around 12 months since I got engaged to my long-time girlfriend.

Back to the technical: The InfoSanity blog went live in February with the first post. Originally I was far from confident that I would be able to keep up blogging as I had a ‘fear’ of social media and web2.0, but nearly a year on I’m still here and despite some peaks and troughs posting articles regularly. I’ve found it a great platform for getting ideas out of my head and into practice, hopefully I’ve managed to be of benefit to others in the process.

Lab environment: February was also when I purchased the server for my virtual lab environment. This has got be the best buy of the year, providing a solid framework for testing and experimenting with everything else I have done this year. Lab environments also seem to be one of the areas that gathers a lot of interest from others, the two posts discussing configuration of virtual networks and guest systems were InfoSanity’s most popular posts this year by a good margin. In the process of improving my lab environment I also read Thomas Wilhelm’s Professional Penetration Testing book and reviewed it for the Ethical Hacker Network, for which I’m indebted to Don for organising.

Wireless: Included in my long list of purchases this year was an Alfa AWUS036H wireless card and a BU-353 GPS Reciever. This resulted in a basic attempt to write a utility to create maps from the results of wardriving with Kismet, whilst the short development time of the project was enjoyable it was promptly shelved once people introduced me to Jabra’s excellent giskismet. It also resulted in the creation of the still to be field-tested, James Bond-esque warwalking case.

Honeypots: Whilst I had had Nepenthes honeypot system running before the turn of the year, I hadn’t really worked with it in earnest until the first post on the subject in February, and subsequent statistic utilities. These posts also became the topic for my first experience with public speaking, for local (and rapidly expanding) technical group, SuperMondays. As the technology has improved the honeypot system has recently been migrated over to Nepenthes’ spiritual successor Dionaea. Over the year I have also had the pleasure and privilege of talking with Markus Koetter (lead dev of Nepenthes and Dionaea) and Lukas Rist (lead dev of Glastopf), these guys *really* know their stuff.

Public Speaking: As mentioned above I gave my first public talk for SuperMondays, discussing Nepenthes honeypots and the information that can be gathered from them. Unfortunately (or thankfully) there is only limited footage available for the session as the camera’s battery ran out of juice. My second session was for a group of Northumbria University’s Forensics and Ethical Hacking students as an ‘expert speaker’, and I still think they mistook me for someone else. This time a recording was available thanks to a couple of the students, full review and audio available here. My public speaking is still far from perfect, coming out at a rapid fire pace, but I’m over my initial dread and actually quite enjoy it. Hopefully they’ll be additional opportunities in the future.

Friends and Contacts: Throughout the year I have ended up in contact with some excellent and interesting people; from real-world network events like SuperMondays and Cloudcamp, old school discussions in forums (EH-Net) and IRC channels, to the ‘2.0’ of Twitter (@infosanity btw). Along with good debates and discussions I’d also like to think I’ve made some good friendships, too many people to name (and most wouldn’t want to be associated 😉 ) but you know who you are.

So that’s the year in brief, couple of smaller activities along the way, from investigating newly released attack vectors to trying my hand at lock picking. In hindsight it has been one hell of a year, and with some of the side projects in the pipeline I’m expecting 2010 to be even better. Onwards and upwards.

— Andrew Waite

Categories: Honeypot, InfoSec, Lab, Presentation

Fuzzy hashing, memory carving and malware identification

2009/12/15 Comments off

I’ve recently been involved in a couple of discussions for different ways for identifying malware. One of the possibilities that has been brought up a couple of times is fuzzy hashing, intended to locate files based on similarities to known files. I must admit that I don’t fully understand the maths and logic behind creating fuzzy hash signatures or comparing them. If you’re curious Dustin Hurlbut has released a paper on the subject, Hurlbut’s abstract does a better job of explaining the general idea behind fuzzy hashing.

Fuzzy hashing allows the discovery of potentially incriminating documents that may not be located using traditional hashing methods. The use of the fuzzy hash is much like the fuzzy logic search; it is looking for documents that are similar but not exactly the same, called homologous files. Homologous files have identical strings of binary data; however they are not exact duplicates. An example would be two identical word processor documents, with a new paragraph added in the middle of one. To locate homologous files, they must be hashed traditionally in segments to identify the strings of identical data.

I have previously experimented with a tool called ssdeep, which implements the theory behind fuzzy hashing. To use ssdeep to find files similar to known malicious files you can run ssdeep against the known samples to generate a signature hash, then run ssdeep against the files you are searching, comparing with the previously generated sample.

One scenarios I’ve used ssdeep for in the past is to try and group malware samples collected by malware honeypot systems based on functionality. In my attempts I haven’t found this to be a promising line of research, as different malware can typically have the same and similar functionality most of the samples showed a high level of comparison whether actually related or not.

Another scenario that I had developed was running ssdeep against a clean WinXP install with a malicious binary. In the tests I had run I haven’t found this to be a useful process, given the disk capacity available to modern systems running ssdeep against a large HDD can be a time consuming process. It can also generate a good number of false positives when run against the OS.

After recently reading Leon van der Eijk’s post on malware carving I have been mulling a method for combining techniques to improve fuzzy hashing’s ability to identify malicious files, while reducing the number of false positives and workload required for an investigator. The theory was that, while any unexpected files on a system are not desirable, if they aren’t running in memory then they are less threatening than those that are active.

To test the theory I infected an XP SP2 victim with a sample of Blaster that had been harvested by my Dionaea honeypot and dumped the RAM following Leon’s methodology. Once the image was dissected by foremost I ran ssdeep against extracted resources. Ssdeep successfully identified the malicious files with a 100% comparison to the maliciuos sample. So far so good.

With my previous experience with ssdeep I ran a control test, repeating the procedure against the dumped memory of a completely clean install. Unsurprisingly the comparison did not find a similar 100% match, however it did falsely flag several files and artifacts with a 90%+ comparison so there is still a significant risk of false positives.

From the process I have learnt a fair deal (reading and understanding Leon’s methodolgy was no comparison to putting it into practice) but don’t intend to utilise the methods and techniques attempted in real-world scenarios any time soon. Similar, and likely faster, results can be achieved by following Leon’s process completely and running the files carved by Foremost against an anti-virus scan.

Being able to test scenarios similar to this was the main reason for me to build up the my test and development lab which I have described previously. In particular, if I had run the investigation on physical hardware I would likely not have rebuilt the environment for the control test with a clean system, losing the additional data for comparison, virtualisation snap shots made re-running the scenario trivial.

–Andrew Waite

P.S. Big thanks to Leon for writing up the memory capture and carving process used as a foundation for testing this scenario.

Analysis: Honeypot Datasets

Earlier this week Markus released two anonymised data sets from live Dionaea installations. The full write-up and data sets can be found on the newly migrated news feed here. Perhaps unsurprisingly I couldn’t help but run the data through my statistics scripts to get a quick idea of  what was seen by the sensors.

This caused some immediate problems, before the data was released Markus had contacted me to point out/complain that the performance from my script is ideal. Performance wasn’t an issue I had encountered, but the database from the sensor I run is ~1MB, the smaller of the released data sets is ~300MB, with the larger being 4.1GB. I immediately tried to rectify the problem and am proud to report,…

I failed miserably. I had tried to move some of the counting and loops from the python code and migrate to more complex SQL queries, working on the theory that working with large datasets should be more efficient within databases as they are designed for working with sets of data. Theory was proved false, actually increasing run-time by about 20%, so I won’t be releasing the changes. Good job I’ve never claimed to be a developer. All this being said, the script still crunches through the raw data in 30seconds and 3minutes respectively.

Without further ado, the Berlin data-set:

Statistics engine written by Andrew Waite –

Number of submissions: 2726
Number of unique samples: 133
Number of unique source IPs: 639

First sample seen: 2009-11-05 12:02:48.104760
Last sample seen: 2009-12-07 11:13:55.930130
SystemrRunning: 31 days, 23:11:07.825370
Average daily submissions: 87.935483871

Most recent submissions:
2009-12-07 11:13:55.930130,,, ae8705a7b4bf8c13e5d8214d374e6c34
2009-12-07 11:12:59.389940,, ftp://1:1@, 14a09a48ad23fe0ea5a180bee8cb750a
2009-12-07 11:10:27.296370,, tftp://, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:55:24.607140,, tftp://, df51e3310ef609e908a6b487a28ac068
2009-12-07 10:43:48.872170,, ftp://1:1@, 14a09a48ad23fe0ea5a180bee8cb750a

And Paris:

Statistics engine written by Andrew Waite –

Number of submissions: 749518
Number of unique samples: 2064
Number of unique source IPs: 30808

First sample seen: 2009-11-30 03:10:24.591650
Last sample seen: 2009-12-07 08:46:23.657530
SystemrRunning: 7 days, 5:35:59.065880
Average daily submissions: 107074.0

Most recent submissions:
2009-12-07 08:46:23.657530,,, d45895e3980c96b077cb4ed8dc163db8
2009-12-07 08:46:20.985190,,, 94e689d7d6bc7c769d09a59066727497
2009-12-07 08:46:21.000540,,, 908f7f11efb709acac525c03839dc9e5
2009-12-07 08:46:18.398500,,, ed12bcac6439a640056b4795d22608da
2009-12-07 08:46:15.753080,,, 94e689d7d6bc7c769d09a59066727497

Still need to dig further into the data, they’ll be another post in the making if I uncover anything interesting…

— Andrew Waite

Categories: Dionaea, Honeypot, Malware

Starting out with Glastopf

I’ve been lax in writing up my initial experience with Glastopf. For those new to Glastopf, initially created by Lukas Rist as part of the Google summer of code program in collaboration with the Honeynet Project and Thorsten Holz.

I must admit that I found the installation of Glastopf to be a complete nightmare. Although this is mostly due to my systems lack of some of the Python pre-requisites that I needed to compile from source, which in turn had other unmet pre-requisites, which in turn… you get the idea. But I did manage to get my install complete eventually, and have learnt a few things in the process, so it can’t be all bad.

At this point I also need to thank the guys from the #glastopf irc channel on freenode. The advice and suggestions provided made the job much easier than it could have been, and simplified my initial testing of the system once working.

My Glastopf system has been running a couple of weeks and I’m starting to take a closer look logs being recorded. I’m not entirely sure what I was expecting as a result of the install, I must confess to being a little disappointed so far, but as I’m no expert in the realm of web applications the findings may mean more to those with more insight.

Overall I have logged several scans for various resources, I’m assuming looking for vulnerabilities in installed services. Nothing too unexpected for example scans for Roundcube mail or phpMyAdmin installations.

I have also found some links to inocious, legitimate online resources. Again I am no expert with web attacks (one of my motivations for installing a web honeypot in the first place was to learn more about them), but I am assuming that this was to test the effect of a particular attack vector before providing host systems with malicious URLs in the logs for an unsuccessful attack. If anyone knows I’m wrong, or can provide a better explanation I’d appreciate a heads up.

With this installation the InfoSanity honeytrap environment is slowly expanding to show a wider and more indepth understanding of live attack vectors targetting production systems.

— Andrew Waite

Categories: Honeypot, Web App Security

New dionaea statistics script

Following on from my work with gathering statistics from the Honeypot systems that I run I have released a limited alpha of a new script/tool that I am working on. The tool provides access to common result sets from the sqlite database, without the requirement for remembering the database architecture  and entering lengthy SQL statements by hand.

Disclaimer first: the tool doesn’t do anything outrageously new, and most of the SQL queries have been borrowed from Markus’ post on SQL logging with Dionaea when the feature was first introduced. However I have found the script makes my analysis of the honeypot logs simpler and quicker, and I’ve a positive reaction from a limited few that have had a copy of the script before this post. Hopefully it will be of use others.

Usage is relatively simple, shown below:

Dionaea database query collection
Author: Andrew Waite –

Inspiration from article:

/path/to/python –query #
Where # is:
1:      Port Attack Frequency
2:      Attacks over a day
3:      Popular Malware Downloads
4:      Busy Attackers
5:      Popular Download Locations
6:      Connections in last 24 hours

The script can be found here. There is still a good level of work to be undertaken to tidy up the output, potentially allowing for output in different formats, and I also want to add additional and more complex queries as time progresses. If you have any success,  failure, comments or suggests please let me know.

— Andrew Waite

Categories: Dionaea, Honeypot, InfoSec, Python