civis.io.civis_to_multifile_csv¶
-
civis.io.
civis_to_multifile_csv
(sql, database, job_name=None, api_key=None, client=None, credential_id=None, include_header=True, compression='none', delimiter='|', unquoted=False, prefix=None, polling_interval=None, hidden=True)[source]¶ Unload the result of SQL query and return presigned urls.
This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.
Parameters: sql : str, optional
The SQL select string to be executed.
database : str or int
Execute the query against this database. Can be the database name or ID.
job_name : str, optional
A name to give the job. If omitted, a random job name will be used.
api_key : DEPRECATED str, optional
Your Civis API key. If not given, the
CIVIS_API_KEY
environment variable will be used.client :
civis.APIClient
, optionalIf not provided, an
civis.APIClient
object will be created from theCIVIS_API_KEY
.credential_id : str or int, optional
The database credential ID. If
None
, the default credential will be used.include_header: bool, optional
If
True
include a key in the returned dictionary containing a list of column names. Default:True
.compression: str, optional
Type of compression to use, if any. One of
'none'
,'zip'
, or'gzip'
. Default'none'
.delimiter, str: optional
Which delimiter to use, if any. One of
','
,' '
, or'|'
. Default:'|'
.unquoted: bool, optional
Whether or not to quote fields. Default:
False
.prefix: str, optional
A user specified filename prefix for the output file to have. Default:
None
.polling_interval : int or float, optional
Number of seconds to wait between checks for query completion.
hidden : bool, optional
If
True
(the default), this job will not appear in the Civis UI.Returns: unload_manifest: dict
A dictionary resembling an AWS manifest file. Has the following keys:
'header'
,'query'
,'entries'
, respresenting the columns from the query, the query itself, and a list of dictionaries for each unloaded CSV part, each containing its file'id'
,'name'
,'size'
, and unsigned and signed S3 urls,'url'
and'url_signed'
, respectively.See also
civis.APIClient.scripts.post_sql
Examples
>>> sql = "SELECT * FROM schema.my_big_table" >>> database = "my_database" >>> delimiter = "|" >>> manifest = civis_multipart_unload(sql, database, delimiter=delimiter) >>> ids = [file['id'] for file in manifest['files']] >>> buf = BytesIO() >>> civis_to_file(ids[0], buf) >>> buf.seek(0) >>> df = pd.read_csv(buf, delimiter=delimiter)