StorReduce Insight Deployment Guide

Introduction

StorReduce Insight unlocks and indexes the data stored within backups, providing insight into data that was previously locked away. It does this by analysing backup data stored in cloud storage or private object stores, including data that has been deduplicated by StorReduce. Once the data is indexed this enables a user to:

  • instantly search across all the indexed backups, including full-text searching (not just searching on file names or metadata)

  • extract and download any version of any file directly from a backup archive, instantly via a Web browser, without the need to restore the backup

  • run cloud-based services on the backup data for analytics, machine learning, pattern identification, or any other purpose for which a cloud-based service exists

All indexed data is made available via Elasticsearch, for complex and scalable querying and data analysis.

Functionality

Backup Products

StorReduce Insight examines and indexes backup archives created by Veritas NetBackup. Insight is designed to work optimally with the StorReduce deduplication engine which is integrated with NetBackup.

Support for other archive formats will be coming soon - get in touch with us to enquire about the backup product or format you use.

Object Storage

StorReduce Insight, in combination with the StorReduce deduplication server, allows indexing of backup data stored in a variety of cloud and private object store systems.

Elasticsearch

Information about backup archives and their contents is stored in Elasticsearch, and the Elasticsearch API is made available for external software (including cloud services) to search for files within backups. This can include full-text searching for those files selected to be full-text indexed. Records found in Elasticsearch include a link to download the full original file via the Insight server.

The Insight Virtual Appliance comes with Elasticsearch and Kibana pre-installed and configured, ready to use. However, for larger-scale deployments we recommend using an external instance of Elasticsearch and Kibana, for example an instance of Amazon’s Elasticsearch service.

Scale-out

Multiple Insight Virtual Appliances can be deployed in a scale-out cluster to allow more rapid indexing of backup archive data. New servers can be added to or removed from an Insight cluster at any time as required.

Security

The Insight Virtual Appliance includes an authentication service which requires users to log in using a StorReduce user account and password, obtained from a StorReduce server.

This is known as the Insight Reverse Proxy service, and sits in front of the Insight, Elasticsearch and Kibana ensuring that only authorized users can access the indexed data.

Deployment

The Insight Virtual Appliance is currently available as an AMI for deployment on Amazon EC2, or as a VMware OVA file for deployment on-premises. StorReduce and StorReduce Insight can be deployed entirely on-premsises, entirely in the cloud, or a mixture of the two.

Please contact us to request a free trial for your preferred platform.

Prerequisites

StorReduce deduplication server: StorReduce Insight requires a StorReduce instance (either a scale-out cluster or single server edition). You can deploy a StorReduce scale-out cluster using the StorReduce Cluster Edition.

S3-Compatible Storage: You should have an S3-Compatible storage bucket containing NetBackup archives to use StorReduce Insight. This guide focuses on deploying StorReduce Insight for use with backup archives stored through a StorReduce deduplication server.

NetBackup: You should have a NetBackup instance for performing backups to your S3-compatible storage (or as a minimum, some existing NetBackup archives previously stored to object storage). Support for other archive formats will be coming soon.

External Elasticsearch (Optional): For indexing large amounts of backup data we recommend setting up a separate Elasticsearch cluster (on-premises) or using a scalable Elasticsearch cloud service such as Amazon Elasticsearch.

Step 1: Deploy an Insight Virtual Appliance

Cloud Deployment to Amazon EC2

Please follow the StorReduce Insight Appliance Amazon EC2 Deployment Guide. After completing the guide please return here to continue the configuration steps below.

On-premises Deployment using a VMware OVA

Please follow the StorReduce Insight Appliance VMware Deployment Guide. After completing the guide please return here to continue the configuration steps below.

Important Note: The default Unix root password for the VMware instance is ‘storreduce’. You will need this to SSH in to the server or to log in via the console.

Local Storage

StorReduce Insight stores all persistent data in Elasticsearch. This includes settings for the Insight cluster, and information about the backup archives which have been indexed.

If an external Elasticsearch instance is being used then only minimal local storage is required for the Insight Virtual Appliance, for storing log files.

If the build-in Elasticsearch instance is used then the StorReduce Insight appliance is pre-configured to store Elasticsearch index data on a single large disk mounted at /mnt/storage. This data should be backed up periodically.

Firewall Setup

If your deployment environment has a firewall, please configure it to allow TCP connections to/from Insight server(s) on the following ports:

Incoming Connections to Insight Virtual Appliance

Incoming Port Service Who can connect
22 SSH Administrators
80 StorReduce Insight (HTTP) Administrators & users
443 StorReduce Insight (HTTPS) Administrators & users
5601 Kibana (HTTP) Administrators & users
9200 Elasticsearch (HTTP) Cloud based services

Outgoing Connections from Insight Virtual Appliance

Outgong Port Service Source Insight component Destination
80 StorReduce (HTTP) Insight indexing service StorReduce Server
443 StorReduce (HTTPS) Insight indexing service StorReduce Server
8080 StorReduce Admin Interface Insight authentication service StorReduce Server

Outgoing Connections for Updates to Insight Virtual Appliance

In order to perform software updates to the Insight Virtual Appliance we recommend allowing outgoing connections on port 443. Updates are performed by fetching data from Amazon S3 and from Docker Hub.

Step 2: Configure the Authentication Service

The Insight server includes an authentication service that fetches user account information from a StorReduce deuplication server. To configure this service to talk to StorReduce, SSH to the Insight server and run each of the following commands:

sudo storreduce-insight-reverseproxy configure <STORREDUCE SERVER>
sudo storreduce-insight-reverseproxy restart

Where <STORREDUCE SERVER> is the DNS name or IP address of your StorReduce deduplication server (either a standalone server or one of the servers within a cluster). Use just the DNS name; do not include ‘http://’ on the front.

This command configures the Authentication Service (a reverse proxy) on the virtual machine to connect to StorReduce and perform authentication based on the user accounts stored within StorReduce. Any user that can log in to the StorReduce Web Dashboard can access StorReduce Insight and related services.

By default the StorReduce deduplication server comes with a root user whose password is storreduce (if deployed on VMware) or the EC2 instance ID (if deployed on Amazon EC2). This username and password can be used to log in to Insight, Kibana and Elasticsearch.

Step 3: (Optional) Configure an External Elasticsearch Instance

A StorReduce Insight AMI or VM image comes complete with Elasticsearch and Kibana, and is pre-configured to use this internal Elasticsearch instance. Please skip this step unless you wish to use an external Elasticsearch instance.

To configure Insight to talk to an external Elasticsearch cluster or service, SSH to the Insight server and edit the file /etc/insight/flags using vi (or nano):

sudo vi /etc/insight/flags

Please edit the following values in the flags file:

  • elastic_url: the URL endpoint for the extenal Elasticsearch instance, including ‘http://’ or ‘https://’

  • elastic_username: the user name to connect to the external Elastic instance.

  • elastic_password: the password corresponding to the username defined for elastic_username (from either StorReduce or the external Elastic instance).

When using an external Elasticsearch instance, some useful Kibana Index patterns and searches that come pre-configured with the Insight Appliance will need to be configured manually. A script is available to assist with this process; please [contact StorReduce(/contact) for instructions.

Step 4: Configure Insight

After deploying a StorReduce Insight appliance and configuring the Authentication service, the remaining configuration is performed via a Web browser.

  1. Browse to your Insight VM (via http or https) to view the Insight Web Dashboard.

  2. You will be asked to authenticate. Enter a valid StorReduce user and password, e.g user ‘root’ with default password ‘storreduce’ (if deployed on VMware) or the EC2 instance ID of the StorReduce server (if deployed on Amazon EC2).

    Authenticate screenshot

    StorReduce Insight will tell you that it must be configured before it can be used:

    Settings only mode

  3. Click on the Settings tab at the top right and fill out the settings.

    Insight Settings - Required Fields

    Please enter values for the following fields:

    • Insight Endpoint URL: The URL of the Insight appliance, e.g. http://insight.server.name

    • Data Source Endpoint URL: The URL of the StorReduce server storing backups, e.g. http://storreduce.server.name

    • Access Key Id: Obtained from the StorReduce Web dashboard, ‘Users’ tab

    • Secret Access Key: Corresponding to the access key ID

    • Buckets to Index: One or more bucket names from the StorReduce server, comma-serparated, e.g. ‘bucket1,bucket2’

    Advanced users can configure the values under Bucket Lister, Archive Extractor and Full-text Indexing sections to optimize performance.

  4. Click on the “Save Cluster Settings” button at the bottom of the page to save the configuration to Elasticsearch.

  5. Restart the Insight service for the settings to take effect and to begin indexing. To do this, SSH into the Insight server machine and run the following command:

    sudo storreduce-insight restart
    
  6. Refresh the Insight Web Dashboard on your browser and browse to the ‘Status’ page. You should see activty as Insight begins indexing the archives it finds in the specified StorReduce buckets.

    Insight Status

    To search, click on the ‘StorReduce Insight’ logo on the Insight Web dashboard to bring up the Insight Search page, which can be used to search for specific archived/backed-up files within the data Indexed by Insight.

    Insight Search

    To view the log file from the Insight server (in real time), SSH to the Insight server and enter the following command:

    sudo docker logs -f insight
    

Step 5: Configure Kibana

Kibana is the preferred tool for performing complex queries and reporting on the backup data indexed by Insight. Kibana comes pre-configured with index patterns and basic searches for use with Insight. To finish configuring Kibana for use please perform the following steps:

  1. Browse to your Insight appliance on port 5601 to bring up Kibana. When you first browse to Kibana it will ask you to configure an index pattern:

    Kibana screenshot

  2. Set a default index pattern: Click on the ‘archived-file’ index pattern, then click on the star icon on the right to make ‘archived-file’ the default index pattern.

    Kibana screenshot

  3. Update the ‘url’ field to provide the correct DNS name for your Insight appliance:

    • Click on ‘Management’, then ‘Index Patterns’, then ‘archived-file’

    • Enter ‘url’ into the filter search box near the top

    • On result row for Url, click on the edit button on the right to edit the field definition

    Kibana screenshot

    • On the ‘Url Template’ field, replace the placeholder of ‘insight.name.here’ with the correct DNS name for the Insight server. Leave the ‘http://’ on the front and the ‘{{rawValue}}’ on the back. Do not enter a slash (‘/’) after the DNS name since a slash is already included in the value. Here is an example of a correctly configured Url Template for an Insight server running on Amazon EC2:

      http://ec2-34-216-43-25.us-west-2.compute.amazonaws.com{{rawValue}}

    • Click the ‘Update Field’ button at the bottom of the page to save your changes.

    Kibana screenshot

  4. Reduce the default number of search results returned: This is optional but recommended as it will speed up the fetching of search results in Kibana:

    • Click on ‘Management’, then ‘Advanced Settings’

    • Change the ‘discover:sampleSize’ setting from the default of 500 down to 50.

    Kibana screenshot

After configuration you can click on ‘Discover’ and click the ‘Open’ button to see the available pre-configured searches.

Kibana screenshot

If you select ‘Archived Files’ you can then type a search query and find files within the Backup archives that have been indexed by Insight:

Kibana screenshot

Configuring Kibana when using an external Elasticsearch instance: When using an external Elasticsearch instance, Kibana Index patterns and searches will need to be configured manually. A script is available to assist with this process; please contact StorReduce for instructions.

Step 6: Check for Updates

After configuring Insight, please finish by checking for updates to ensure you have the latest features and fixes.

To update StorReduce Insight, SSH into each machine in the Insight cluster and run:

sudo storreduce-insight update

If you see a message saying that the Insight software was updated, restart the Insight server:

sudo storreduce-insight restart

Note the above steps must be performed on every Insight server in the cluster.

Adanced Deployment Options

Adding Additional Insight Servers to a Cluster

Adding additional Insight servers to a cluster enables faster indexing of archive data. An Insight cluster spreads the work over all servers in the cluster by using a single Elasticsearch instance for co-ordination.

To add an additional Insight server to a cluster:

  1. Deploy an additional Insight server instance using the Amazon AMI or the VMware OVA file.

  2. SSH in to the new Insight server and configure the Authentication Service just as for the first Insight server, by running the following commands:

    sudo storreduce-insight-reverseproxy configure <STORREDUCE SERVER>
    sudo storreduce-insight-reverseproxy restart
    

    where <STORREDUCE SERVER> is the DNS name or IP address of the same StorReduce server used when configuring the first Insight server.

  3. Configure the new Insight server’s connection to Elastic, either to the Elastic instance on the first Insight server, or to the external Elastic instance if one was used. To do this, edit the file /etc/insight/flags using vi (or nano):

    sudo vi /etc/insight/flags
    

    The first Insight server will have Elasticsearch exposed on port 9200, protected via Basic Auth with credentials based on the User Management system of StorReduce. Please edit the following values in the flags file:

    • elastic_url: the URL endpoint for Elastic, either on the first Insight server (http://<first-server-address>:9200) or the endpoint of the external Elastic service.

    • elastic_username - a valid username defined in StorReduce, or the user name to connect to the external Elastic instance.

    • elastic_password - the password corresponding to the username defined for elastic_username (from either StorReduce or the external Elastic instance).

    • poller_disabled - set to ‘true’ for all except the first Insight server in the cluster

    Note that only one server in the cluster should have the poller_disabled setting set to false. Having more than one Insight server in the cluster with poller_disabled set to false will cause polling of S3 to occur more frequently than configured since each of these servers will request polling independently.

  4. After saving the edited /etc/insight/flags file, restart the new Insight server by typing:

    sudo storreduce-insight restart
    
  5. On startup the new Insight server will fetch its settings from Elasticsearch and be a part of the Insight cluster. Browse to the new Insight server’s Web dashboard (on port 80 or 443) and click on the Status tab to see the utilization of workers within this server as part of the overall cluster.

Configuring a Custom SSL/TLS Certificate

StorReduce Insight is accessible via HTTPS on port 443 via the Insight Authentication Service This is a reverse proxy sitting in front of the Insight, Elasticsearch and Kibana servers, and handling TLS connections.

Initially a self-signed certificate will be used for HTTPS/TLS. If you have a server certificate issued by a valid CA then this can be used by copying the certificate and key files onto the Insight appliance, and editing the configuration file /etc/storreduce/storreduce-insight-reverseproxy/config.yml to update the following fields:

  • tls_cert_paths: “,,,/path/to/tls_certificate.pem”

  • tls_key_paths: “,,,/path/to/tls_key.pem”

Once complete, please run storreduce-insight-reverseproxy restart for the changes should take effect.