Using Gmvault to Retrieve and Store Gmail Messages for Forensic Investigation

The cloud is becoming an ever-increasing repository for email storage. One of the more popular email programs is Gmail, with its 15 GB of free storage and easy access anywhere for users with an Internet connection. Due to the great number of email accounts, the potential for large amounts of data, and no direct income, Google has throttled back on backups to lessen the burden on their servers worldwide.

This blog post is the start of a series of articles that will review Gmail collection options for computer forensic purposes. Kivu initiated a project to find the most efficient and defensible process to collect Gmail account information. The methods tested were Microsoft Outlook, Gmvault, X1 Social Discovery and Google scripts.

All four programs were run through two Gmail collection processes, with a focus on:

  • Discovering how the program stores emails.
  • Identifying whether the program encounters throttling? If so, how does it deal with it?
  • Determining if current forensic tools can process the emails collected.
  • Measuring how long the program takes to process the email, and the level of examiner involvement necessary.

Kivu employees created two Google email accounts for this analysis. Each email account had over 30,000 individual emails, which is a sufficient amount for Google throttling to occur and differences in speed to become apparent. The data included attachments as well as multi-recipient emails to incorporate a wide range of options and test how the programs collect and sort variations in emails. Our first blog post focuses on Gmvault.

What is Gmvault and How Does It Work?

Gmvault is a third party Gmail backup application that can be downloaded at Gmvault.org. Gmvault uses the IMAP protocol to retrieve and store Gmail messages for backup and onsite storage. Gmvault has built-in protocols that help bypass most of the common issues with retrieving email from Google. The process is scriptable to run on a set schedule to ensure a constant backup in case disaster should happen. The file system database created by Gmvault can be uploaded to any other Gmail account for either consolidation or migration.

During forensic investigation, Gmvault can be used to collect Gmail account data with minimal examiner contact with the collected messages. The program requires user interaction with the account twice – once to allow application access to the account and again at the end to remove the access previously granted. Individual emails can be viewed without worrying about changing metadata, such as Read Status, and/or Folders/Labels because this information is stored in a separate file with a .meta file extension.

How to Use Gmvault for Forensic Investigation

Gmvault needs very little user input and can be initiated with this command:

$> gmvault sync [email address]

We suggest using the following options:

$> gmvault sync –d [Destination Directory] –no-compression [email address]

“d” enables the user to change where the download will go, allowing for the data extraction to go directly to an evidence drive, (default: User\cloud\gmvault-db)

“no-compression” downloads .eml files rather than the .gzip default. Compression comes with a rare chance of data corruption during both the compression and decompression processes so, unless size is an issue, it is better to use the “no compression” option. Download speed is unaffected by the compression, although compressed files are roughly 50% of the uncompressed size.

Next, sign in to the Gmail account to authorize Gmvault access. The program will create 3 folders in the destination drive you set, and emails will be stored by month. The process is largely automated, and Gmvault manages Google throttling. It accomplishes this by disconnecting from Google, waiting a predetermined number of seconds and retrying. If this fails 4 times, the email is skipped, and Gmvault moves on to the next set of emails. When finished with the email backup, Gmvault checks for chats and downloads them as well.

When Gmvault is finished, a summary of the sync is displayed in the cmd shell. Gmvault performs a check to see if any of the emails were deleted from the account and removes them from the database. This should not be a problem for initial email collections, but it will need to be noted on further syncs for the same account. The summary shows the total time for the sync, number of emails quarantined, number of reconnects, number of emails that could not be fetched, and emails returned by Gmail as blank.

To obtain the emails that could not be fetched by Gmvault, simply run the same cmd line again:

$> gmvault sync –d [Destination Directory] –no-compression [email address]

Gmvault will check to see if the emails are already in the database, if so skip them, and then download the skipped items from the previous sync. It may take up to 10 times to recover all skipped emails, but the process can probably be completed within 5 minutes.

Be sure to remove authorization once the collection is complete.

Now you should have all of the emails from the account in .eml format, stored by date in multiple folders. Gmvault can then be used to export these files into a more useable storage system. The database can be exported as offlineimap, dovecot, maildir or mbox (default). Here’s how:

gmvault-shell>gmvault export -d[Destination Directory] [Export Directory]

Following are the Pros and Cons of Using Gmvault:

Pros:

  • Easy to setup and run
  • Counts total emails/collected emails to quickly know if emails are missing
  • 50% compression
  • Can be scripted to collect multiple accounts

Cons:

  • No friendly UI
  • Needs further processing to get to a user friendly deliverable
  • Will sometimes not retrieve the last few emails

No comments yet.

Leave a Reply