Register and Administer Pipes¶
Why register pipes?¶
To have a pipe point to a remote data storage that you control, you have to register a pipe. Normally creating remote data storage is a complex task. But with d6tpipe it is exceptionally quick by using managed pipes which automatically take care of storage, authentication, permissioning etc, see security for details. To register self-hosted pipes, see Advanced Pipes.
Register Managed Pipes¶
To register a new pipe you need pipe settings that specify how remote data is stored and returned. The minimum requirement is the name which creates a free remote data repo. [todo: make d6tpipe default protocol]
import d6tpipe
api = d6tpipe.api.APIClient()
response, data = d6tpipe.upsert_pipe(api, {'name': 'your-pipe'})
Customizing Pipe Settings¶
Pipes not only store data but also control how data is stored, returned and provide additional meta data.
Settings Parameters
name
(str): unique nameprotocol
(str): storage protocol (d6tfree
,``s3``,``ftp``,``sftp``)options
(json):dir
(str): read/write from/to this subdirectory (auto created)include
(str): only include files with this pattern, eg*.csv
or multiple with*.csv|*.xls
exclude
(str): exclude files with this pattern
schema
(json): see Schema
Pipe Setting Inheritance¶
You can and should have multipe pipes that connect to the same underlying remote file storage but return different files. Why? Say you are a data vendor, you can create different pipes for different subscriptions without having to individually manage them. Say the vendors data files are:
dataA\\monthly*.csv
dataA\\daily*.csv
dataB\\reports*.xlsx
Those are 3 different datasets, so you should define 3 separate pipes: vendor-monthly
, vendor-daily
, vendor-reports
. To avoid having to specify the same settings multiple times, you can create a parent pipe vendor-parent
from which the child pipes can inherit settings.
settings_parent = {
'name': 'vendor-parent',
'protocol': 'd6tfree'
}
settings_monthly = {
'name': 'vendor-monthly',
'parent': 'vendor-parent',
'options': {'dir':'dataA', 'include':'monthly*.csv'}
}
settings_daily = {
'name': 'vendor-daily',
'parent': 'vendor-parent',
'options': {'dir':'dataA', 'include':'daily*.csv'}
}
settings_reports = {
'name': 'vendor-reports',
'parent': 'vendor-parent',
'options': {'dir':'dataB', 'include':'reports*.xlsx'}
}
This way the vendor can push files to vendor-parent
and then grant clients individual access to vendor-monthly
, vendor-daily
, vendor-reports
based on which product they have subscribed to.
Updating Pipe Settings¶
Here is how you update an existing pipe with more advanced settings. This will either add the setting if it didn’t exist before or overwrite the setting if it existed.
settings = \
{
'name': 'pipe-name',
'remote': 'remote-name',
'options': {
'dir': 'some/folder',
'include': '*.csv|*.xls',
'exclude': 'backup*.csv|backup*.xls'
},
'schema': {
'pandas': {
'sep': ',',
'encoding': 'utf8'
}
}
}
# update an existing pipe with new settings
response, data = d6tpipe.upsert_pipe(api, settings)
Administer Pipes with repo API¶
You can run any CRUD operations you can normally run on any REST API.
# listing pipes
api.list_pipes() # names_only=False shows all details
# CRUD
response, data = api.cnxn.pipes.post(request_body=settings)
response, data = api.cnxn.pipes._('pipe-name').get()
response, data = api.cnxn.pipes._('pipe-name').put(request_body=new_settings)
response, data = api.cnxn.pipes._('pipe-name').patch(request_body=new_settings)
response, data = api.cnxn.pipes._('pipe-name').delete()
# using pipe object
response, data = pipe.cnxnpipe.get()
response, data = pipe.cnxnpipe.put(request_body=all_settings)
response, data = pipe.cnxnpipe.patch(request_body=mod_settings)
response, data = pipe.cnxnpipe.delete()