Frequently Asked Questions

Data Throughput

How much data per second can I put through StorReduce?

The short answer is: up to 2 GB per second per server.

The amount of throughput achieved by a StorReduce server is dependent on how well the data deduplicates, as well as the speed of the server machine(s).

Below are the sustained throughputs you can achieve for various cloud servers. Please note that on Amazon EC2 smaller instances tend to be network I/O bound.

Amazon EC2

Machine Type

99% dedupe

97% dedupe

95% dedupe

90% dedupe

c3.8xlarge 905 MB/s 824 MB/s 756 MB/s 616 MB/s
c3.4xlarge 237 MB/s 237 MB/s 237 MB/s 236 MB/s
c3.2xlarge 118 MB/s 118 MB/s 118 MB/s 118 MB/s
c3.xlarge 83 MB/s 83 MB/s 83 MB/s 79 MB/s
c3.large 59 MB/s 55 MB/s 48 MB/s 38 MB/s

For comparison, here are specifications for these EC2 instance types. Detailed specifications can be found here.

Machine Type

Cores

RAM (GB)

SSD (GB)

Network

c3.8xlarge 32 60 2 x 320 10 GB/s
c3.4xlarge 16 30 2 x 160 High
c3.2xlarge 8 15 2 x 80 High
c3.xlarge 4 7.5 2 x 40 Moderate
c3.xlarge 2 3.75 2 x 16 Moderate

Microsoft Azure

Machine Type

99% dedupe

97% dedupe

95% dedupe

90% dedupe

Standard G5 748 MB/s 467 MB/s 411 MB/s 226 MB/s
Standard D14 411 MB/s 363 MB/s 295 MB/s 192 MB/s
Standard D4 231 MB/s 168 MB/s 170 MB/s 144 MB/s
Standard D3 106 MB/s 91 MB/s 82 MB/s 67 MB/s
Standard D2 59 MB/s 47 MB/s 41 MB/s 31 MB/s

For comparison, here are specifications for these Azure instance types. Detailed specifications can be found here.

Machine Type

Cores

RAM (GB)

SSD (GB)

Standard G5 32 448 6,144
Standard D14 16 112 800
Standard D4 8 28 400
Standard D3 4 14 200
Standard D2 2 7 100


Notes:

  • The StorReduce server is capable of sustaining these speeds indefinitely, 24 hours a day, not just for certain scheduled hours.

  • These numbers are for clients using multipart uploads - higher numbers may be possible using single-part uploads.

Why is the throughput less for lower deduplication ratios?

For lower deduplication ratios there is more data for the StorReduce server to compress and store on back-end storage, and therefore more work to do and lower throughput.

How do I obtain this performance on my server?

The biggest things you can do to ensure you get reasonable throughput are:

  1. Have enough clients running in parallel: an Amazon EC2 c3.8xlarge (32 core) instance can easily manage 60+ simultaneous S3 API connections. Depending on the particular client you may require many client instances to max out the server.

  2. If using multipart uploads, use a relatively large part size, e.g. 100MB rather than 5MB. This reduces the overhead of making TCP connections.

If you are not getting the throughput you expect from your StorReduce server please contact us or send us a chat message - we’d be happy to help.

Is it realistic to achieve a 99% deduplication ratio?

Certain types of data can routinely achieve a 99% deduplication ratio. For backup data from full daily backups (e.g. backup tapes), after the initial upload we can expect an average daily delta (change rate) of 1% or or less.

Backup services will commonly see only 0.3% delta each day. Over time the deduplication ratio approaches this delta rate, so 97% to 99% deduplication ratio for uploading full daily backups is realistic.

StorReduce Capabilities

Can I use StorReduce for private or hybrid cloud?

Yes. StorReduce will work in any private cloud with an object store.

Can StorReduce deduplicate files across multiple S3 buckets?

Yes. StorReduce provides global deduplication across multiple S3 buckets.

Does StorReduce’s deduplication slow down access to my data?

  • The StorReduce server is capable of sustaining very high data transfer rates, up to 900 megabytes per second continuously (24 / 7).

  • For cloud-based access to data, typically StorReduce will add only around 10 milliseconds of latency. This makes little or no difference to the speed of most cloud operations.

  • For backups originating on-premise, StorReduce can potentially speed up the process of sending your data to Amazon S3 because it will only send deduplicated data.

Is my data secure when using StorReduce?

StorReduce supports multiple levels of encryption to keep your data secure:

  1. Data at Rest can be encrypted using AWS S3 Server-side encryption (SSE).
  2. Client-side encryption can be used to encrypt data as it passes through a StorReduce server using either:
    • AWS Key Management Service (KMS) or
    • A server supporting the KMIP protocol. When StorReduce is deployed on-premise this will encrypt all data before it leaves the customer site.
  3. Data in Flight is always secured via HTTPS by default.

All data access via StorReduce is controlled through an AWS-compatible policy engine, allowing sophisticated access control rules.

Can I get my data back out if I want to stop using StorReduce?

Yes. Because StorReduce has an S3 interface, you can simply copy your data out to any cloud or on-premise location, at any time.

Can I use StorReduce on-premise if I want to minimize my bandwidth for migrating data to the cloud?

Yes. You can install StorReduce software on your existing on-premise hardware for this purpose. Your data can be immediately accessible on the cloud through a second cloud-based StorReduce instance. At the end of migration you can remove your on-premise StorReduce software.

Where is the index for my data stored?

The index for your data is wholly on-cloud, and your data is accessible via an S3 interface for cloud services such as Search and Hadoop.

Can my data that is already deduplicated on-premise be deduplicated by StorReduce?

No. You need to rehydrate it before you put it through StorReduce to obtain our deduplication benefit.

Technical Questions

How do I log in for the first time?

Once the StorReduce appliance is running you can log in to the StorReduce dashboard. Connect via a Web browser to the public DNS name or public IP address for your appliance. Most browsers will be automatically redirected to the StorReduce dashboard URL. If you are not redirected, add /storreduce-dashboard to the end of the URL you are browsing to.

Log in with the following credentials:

Amazon EC2

User Name: root
Password: [your EC2 Instance ID, e.g., i-12345678]

Microsoft Azure

User Name: root
Password: storreduce

Google Compute Engine

User Name: root
Password: [the password displayed by the Deployment Manager (see STORREDUCE_USER_PASSWORD metadata value)]

The server should typically start in under 2 minutes but on some cloud providers first-time startup can take up to 10 minutes.

How do I SSH in to my StorReduce appliance?

You should not normally need to SSH in to the StorReduce appliance, however access is available.

Amazon EC2

Use the key pair set up during instance creation, and SSH in as user ‘ec2-user’.

Microsoft Azure

Use the username and SSH key pair set up during instance creation.

Google Compute Engine

Connect via the Google Compute Engine web console (See the “SSH” button at the top of the instance screen).

What does StorReduce use SSD storage for?

StorReduce creates a global index of your data. SSD-based storage is used to ensure very fast access to this index.

The index is ephemeral and can be rebuilt entirely from data stored in the object store.

We recommend choosing a machine type that supports local SSD storage when deploying on-cloud.

How long does it take for StorReduce to rebuild its index data?

On-Cloud

When you shut down a cloud instance or the instance dies, your instance-based SSD storage is deleted. On startup StorReduce is able to rebuild your index data from the activity log stored on the object store. Of course this can take a long time.

To speed up this process, StorReduce is able to store snapshots of index data on the object store. When starting a new instance StorReduce will go back to the latest snapshot and replay any further transactions from the log.

On-Premises

Unlike on-cloud deployments, on-premises deployments have persistent SSD storage. This means that the StorReduce appliance can be stopped without losing and having to rebuild index data.

What does StorReduce use persistent mechanical storage for (e.g. an EBS volume)?

StorReduce uses a slower mechanical storage volume for storing parts of the index that are less frequently accessed and have a smaller impact on performance when compared to the SSD-backed index.

Why use the ‘c3’ instance types on Amazon EC2?

The c3 series of instances provide SSD local instance storage as well as fairly good CPU performance, both of which are important for StorReduce to achieve good throughput.

AWS Command Line Interface (CLI) and StorReduce

Past versions of the AWS CLI (prior to April 2015) had issues communicating with non-AWS services such as StorReduce.

The latest AWS command-line interface has resolved these issues and now works correctly with the latest StorReduce server software.

Boto / Boto3 and StorReduce

Boto and Boto3 are Python SDKs for AWS. They are used as the foundation for the AWS CLI.

Past versions (prior to April 2015) had issues communicating with non-AWS services such as StorReduce. The latest version of these clients has resolved these issues and should work correctly with the latest StorReduce server software.