Data Management

Cloud Kotta is designed make data management simple yet secure and cost effective. The web interface provides methods to upload data, browse public datasets and view data sets generated by analyses computed by the system.

Note

To move protected datasets into Cloud Kotta contact the Administrators

Upload

It is common to upload tar balls of source-code, virtual environments for python and snippets of data to be used for analysis steps. To upload data to Cloud Kotta select the Upload button from the Data Dropdown.

The Upload data page has a button that opens an Open File Dialog box that allows you to select a file from your local machine for upload. Once a file has been selected click the Upload button to upload the data to your personal folder on the Cloud Kotta system. Once the data is uploaded, the data is stored on reliable S3 storage with server side encryption. These files can then be referred to in Job descriptions as Inputs

Note

Any data that has been uploaded via this system is treated as public to all authenticated users withing Cloud Kotta. This allows users to share code and data sets.

Note

Since the uploads are done through an HTTP POST operation, the data does NOT go through an encrypted pipe. If you require a secure transfer please contact the Administrators

Browse

Once you have uploaded data select the Browse button from the Data Dropdown to browse all files that you have uploaded. To select a particular file to be used as part of Inputs to an application, right click the name of the file and select Copy link address. You can paste the link address which has an s3:// prefix into the inputs field.

Note

If you upload a file with the same name, the previous file will be over-written. All subsequent invocations will fetch the latest file that is uploaded. Versioning is not currently supported.

Note

There are no methods currently supported to view the contents of an uploaded file.

Results

Outputs generated by applications are stored with the same privileges and encryption as the Uploaded data files. However the links provided in the the jobs page are short lived URLs that can be used to view, download or share (for 1hour) the files.

For example:

Here’s a job that returns a file shuf100.txt that contains integers 1-100 in shuffled order : https://turingcompute.net/jobs/b4d3e33f-8aaa-47c6-9621-eaebaf7ce0de

Under outputs for this job, there’s a row that includes shuf100.txt. This URL for this is a signed URL which looks like this :

https://klab-jobs.s3.amazonaws.com/outputs/1837edda-2996-4070-9174-e8c991d7c693/shuf100.txt?Signature=Bt1SssrsocVvBs9AqqGGh21aY9Q%3D&Expires=1466117573&AWSAccessKeyId=ASIAI67RMICSNZNPAATA&x-amz-meta-owner=Yadu&x-amz-security-token=FQoDYXdzEKz//////////wEaDMQ3ARW03kbE3oUKmSKZA2UEq%2BDjP1WznNS0jEn/nv2F8jJH/B2bBzR2I8jnmqmdWZfEcSkXjomWshXsgH78Y8oOeIg7jtr%2Bg9wrUUDcM%2BJl0prIscXVZbzXPO8UnQndByEvFwKZwYqGGuzOIoPEdAmychT/DB9Q4NBnBsizowCc8sFNioFvcpyyUqkiIyS4dSilutp1%2BFG5hnoge2%2BKqaYXd2howSpC3Iewo3YI0ETySaUvfW6WAM9uWv/i6PQlYINrzudoM06lvcQHkRWmtZkWG%2B9c/TeLnwkQzAl97yvUDw5U8VF1U1vg3K8nc9rI4vIB5O5O/6o6xcMrj7U3%2Bve8W2FCJEcrr/K845KL2AAJKXKnqNAbntEDo9/XLC%2BD6SnGySTW9RNY7yT4MECvIjdkXCR63euDCVHhiXlII4OOpqmpgqZihtyq6NsA7Uj2cKBvB219ojDjy4QRC7BHx3yQdkC15VA95MAkj6sSKqwzBZdzNd3DivJYfDBDjICOC0ozfs/knMUs7aEOA5RCSlUsoSrS2oPjzubRh2p/fgKwxkUO4J5IjMMo9uqLuwU%3D

This link has long expired, which is the intended behavior. However, an authenticated user can always load the url of the job, and get a fresh URL to shuf100.txt.