Advanced: Self-hosted Remotes¶
Register Self-hosted Remotes¶
You can push/pull from your own S3 and (s)ftp resources. The repo API stores all the neccessary details so the data consumer does not have to make any changes if you make any changes to the remote storage.
settings = \
{
'name': 'pipe-name',
'protocol': 's3',
'location': 'bucket-name',
'options': {
'remotepath': 's3://bucket-name/'}
'credentials' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
d6tpipe.upsert_pipe(api, settings)
Most resources in d6tpipe are managed in an REST API type interface so you will be using the API client to define resources like remotes, pipes, permission etc. That way you can easily switch between local and server deployment without having to change your code.
Parameters¶
name
(str): unique idprotocol
(str): [s3, ftp, sftp]location
(str): s3 bucket, ftp server name/ipcredentials
(json): credentials for pulling. s3: aws_access_key_id, aws_secret_access_key. ftp: username, passwordoptions
(json): any options to be shared across pipesremotepath
(str): path where data is located. If you connect to a Databolt Pipe server that is managed on your behalfdir
(str): read/write from/to this subdir (auto created)
schema
(json): any parameters you want to pass to the reader
Templates
# s3
settings = \
{
'name': 'pipe-name',
'protocol': 's3',
'location': 'bucket-name',
'options': {
'remotepath': 's3://bucket-name/'}
'credentials' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
d6tpipe.upsert_pipe(api, settings)
# ftp
settings = \
{
'name':'yourftp',
'protocol':'ftp',
'location':'ftp.domain.com',
'options': {
'remotepath': '/'}
'credentials':{'username':'name', 'password':'secure'}
}
d6tpipe.upsert_pipe(api, settings)
Access Control¶
You can have separate read and write credentials
settings = \
{
'name': 'remote-name',
'protocol': 's3',
'location': 'bucket-name',
'credentials': {
'read' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
},
'write' : {
'aws_access_key_id': 'AAA',
'aws_secret_access_key': 'BBB'
}
}
}
Keeping Credentials Safe¶
Don’t Commit Credentials To Source¶
In practice you wouldn’t want to have the credentials in the source code like in the example above. It’s better to load the settings from a json, yaml or ini file to a python dictionary that you can pass to the REST API. Alternatively for server-based setups you can work with REST tools like Postman.
Here is a recipe for loading settings from json and yaml files.
# create file
(api.repopath/'.creds.json').touch()
# edit file in `api.repo` folder. NB: you don't have to use double quotes in the json but you have to use spaces for tabs
print(api.repo)
# load settings and create
settings = d6tpipe.utils.loadjson(api.repopath/'.creds.json')['pipe-name']
d6tpipe.upsert_pipe(api, settings)
# or if you prefer yaml
(api.repopath/'.creds.yaml').touch()
settings_remote = d6tpipe.utils.loadyaml(api.repopath/'.creds.json')['pipe-name']
d6tpipe.upsert_pipe(api, settings)
See example templates in https://github.com/d6t/d6tpipe/tree/master/docs