civis.io.civis_to_multifile_csv

civis.io.civis_to_multifile_csv(sql, database, job_name=None, api_key=None, client=None, credential_id=None, include_header=True, compression='none', delimiter='|', unquoted=False, prefix=None, polling_interval=None, hidden=True)[source]

Unload the result of SQL query and return presigned urls.

This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.

Parameters:
sql : str, optional

The SQL select string to be executed.

database : str or int

Execute the query against this database. Can be the database name or ID.

job_name : str, optional

A name to give the job. If omitted, a random job name will be used.

api_key : DEPRECATED str, optional

Your Civis API key. If not given, the CIVIS_API_KEY environment variable will be used.

client : civis.APIClient, optional

If not provided, an civis.APIClient object will be created from the CIVIS_API_KEY.

credential_id : str or int, optional

The database credential ID. If None, the default credential will be used.

include_header: bool, optional

If True include a key in the returned dictionary containing a list of column names. Default: True.

compression: str, optional

Type of compression to use, if any. One of 'none', 'zip', or 'gzip'. Default 'none'.

delimiter: str, optional

Which delimiter to use, if any. One of ',', '     ', or '|'. Default: '|'.

unquoted: bool, optional

Whether or not to quote fields. Default: False.

prefix: str, optional

A user specified filename prefix for the output file to have. Default: None.

polling_interval : int or float, optional

Number of seconds to wait between checks for query completion.

hidden : bool, optional

If True (the default), this job will not appear in the Civis UI.

Returns:
unload_manifest: dict

A dictionary resembling an AWS manifest file. Has the following keys:

‘query’: str

The query.

‘header’: list of str

The columns from the query.

‘entries’: list of dict

Each dict has the following keys:

‘id’: int

File ID

‘name’: str

Filename

‘size’: int

File size in bytes

‘url’: str

Unsigned S3 URL (‘s3://…’)

‘url_signed’: str

Signed S3 URL (‘https://…’)

‘unquoted’: bool

Whether the cells are quoted.

‘compression’: str

Type of compression used.

‘delimiter’: str

Delimiter that separates the cells.

See also

civis.APIClient.scripts.post_sql

Examples

>>> sql = "SELECT * FROM schema.my_big_table"
>>> database = "my_database"
>>> delimiter = "|"
>>> manifest = civis_to_multifile_csv(sql, database, delimiter=delimiter)
>>> ids = [entry['id'] for entry in manifest['entries']]
>>> buf = BytesIO()
>>> civis_to_file(ids[0], buf)
>>> buf.seek(0)
>>> df = pd.read_csv(buf, delimiter=delimiter)