civis.io.civis_to_multifile_csv

civis.io.civis_to_multifile_csv(sql, database, job_name=None, api_key=None, client=None, credential_id=None, include_header=True, compression='none', delimiter='|', max_file_size=None, unquoted=False, prefix=None, polling_interval=None, hidden=True)[source]

Unload the result of SQL query and return presigned urls.

This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.

Parameters
sqlstr

The SQL select string to be executed.

databasestr or int

Execute the query against this database. Can be the database name or ID.

job_namestr, optional

A name to give the job. If omitted, a random job name will be used.

api_keyDEPRECATED str, optional

Your Civis API key. If not given, the CIVIS_API_KEY environment variable will be used.

clientcivis.APIClient, optional

If not provided, an civis.APIClient object will be created from the CIVIS_API_KEY.

credential_idstr or int, optional

The database credential ID. If None, the default credential will be used.

include_header: bool, optional

If True include a key in the returned dictionary containing a list of column names. Default: True.

compression: str, optional

Type of compression to use, if any. One of 'none', 'zip', or 'gzip'. Default 'none'.

delimiter: str, optional

Which delimiter to use, if any. One of ',', '     ', or '|'. Default: '|'.

max_file_size: int, optional

Maximum number of Megabytes each created file will be.

unquoted: bool, optional

Whether or not to quote fields. Default: False.

prefix: str, optional

A user specified filename prefix for the output file to have. Default: None.

polling_intervalint or float, optional

Number of seconds to wait between checks for query completion.

hiddenbool, optional

If True (the default), this job will not appear in the Civis UI.

Returns
unload_manifest: dict

A dictionary resembling an AWS manifest file. Has the following keys:

‘query’: str

The query.

‘header’: list of str

The columns from the query.

‘entries’: list of dict

Each dict has the following keys:

‘id’: int

File ID

‘name’: str

Filename

‘size’: int

File size in bytes

‘url’: str

Unsigned S3 URL (‘s3://…’)

‘url_signed’: str

Signed S3 URL (’https://…’)

‘unquoted’: bool

Whether the cells are quoted.

‘compression’: str

Type of compression used.

‘delimiter’: str

Delimiter that separates the cells.

See also

civis.APIClient.scripts.post_sql

Examples

>>> sql = "SELECT * FROM schema.my_big_table"
>>> database = "my_database"
>>> delimiter = "|"
>>> manifest = civis_to_multifile_csv(sql, database, delimiter=delimiter)
>>> ids = [entry['id'] for entry in manifest['entries']]
>>> buf = BytesIO()
>>> civis_to_file(ids[0], buf)
>>> buf.seek(0)
>>> df = pd.read_csv(buf, delimiter=delimiter)