Pull Files from Remote to Local

Connecting to Pipes

To pull and push files, you first connect to a data pipe. A data pipe lets you manage remote and local data files.

import d6tpipe
api = d6tpipe.api.APIClient()
api.list_pipes() # show available pipes

pipe = d6tpipe.Pipe(api, 'pipe-name') # connect to a pipe

Show remote files

To show files in the remote storage, run pipe.scan_remote().

pipe = d6tpipe.Pipe(api, 'pipe-name')
pipe.scan_remote() # show remote files

Pulling Files to Local

Pulling files will download files from the remote data repo to the local data repo. Typically you have to write a lot of code to download files and sync remote data sources. With d6tstack you can sync pull with just a few lines of python.

pipe = d6tpipe.pipe.Pipe(api, 'pipe-name')
pipe.pull_preview() # preview
pipe.pull() # execute

Your files are now stored locally in a central location and conveniently accessible. See Accessing Pipe Files to learn how to use files after you have pulled them.

Which files are pulled?

Only files that you don’t have or that were modified are downloaded. You can manually control which files are downloaded or force download individual files, see advanced topics.

Advanced Topics

Pull Modes

You can control which files are pulled/pushed

  • default: modified and new files
  • new: new files only
  • mod: modified files only
  • all: all files, good for resetting a pipe
pipe = d6tpipe.pipe.Pipe(api, 'test', mode='all') # set mode
pipe.pull() # pull all files
pipe.setmode('all') # dynamically changing mode

Useful Pipe Operations

Below is a list of useful functions. See the reference Module Index for details.

# advanced pull options
pipe.pull(['a.csv']) # force pull on selected files
pipe.pull(include='*.csv',exclude='private*.xlsx') # apply file filters

# other useful operations
api.list_local_pipes() # list pipes pulled
pipe.files() # show synced files
pipe.scan_remote() # show files in remote
pipe.scan_remote(sortby='modified_at') # sorted by modified date
pipe.is_synced() # any changes?
pipe.remove_orphans() # delete orphan files
pipe.delete_files() # reset local repo

Using Multipe Pipes

If you work with multiple data sources, you can connect to multiple pipes.

pipe2 = d6tpipe.Pipe(api, 'another-pipe-name') # connect to multiple

# todo: how to sync pipe1 files to pipe2?