civis.io.civis_to_multifile_csv¶
-
civis.io.civis_to_multifile_csv(sql, database, job_name=None, api_key=None, client=None, credential_id=None, include_header=True, compression='none', delimiter='|', unquoted=False, prefix=None, polling_interval=None, hidden=True)[source]¶ Unload the result of SQL query and return presigned urls.
This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.
Parameters: sql : str, optional
The SQL select string to be executed.
database : str or int
Execute the query against this database. Can be the database name or ID.
job_name : str, optional
A name to give the job. If omitted, a random job name will be used.
api_key : DEPRECATED str, optional
Your Civis API key. If not given, the
CIVIS_API_KEYenvironment variable will be used.client :
civis.APIClient, optionalIf not provided, an
civis.APIClientobject will be created from theCIVIS_API_KEY.credential_id : str or int, optional
The database credential ID. If
None, the default credential will be used.include_header: bool, optional
If
Trueinclude a key in the returned dictionary containing a list of column names. Default:True.compression: str, optional
Type of compression to use, if any. One of
'none','zip', or'gzip'. Default'none'.delimiter, str: optional
Which delimiter to use, if any. One of
',',' ', or'|'. Default:'|'.unquoted: bool, optional
Whether or not to quote fields. Default:
False.prefix: str, optional
A user specified filename prefix for the output file to have. Default:
None.polling_interval : int or float, optional
Number of seconds to wait between checks for query completion.
hidden : bool, optional
If
True(the default), this job will not appear in the Civis UI.Returns: unload_manifest: dict
A dictionary resembling an AWS manifest file. Has the following keys:
'header','query','entries', respresenting the columns from the query, the query itself, and a list of dictionaries for each unloaded CSV part, each containing its file'id','name','size', and unsigned and signed S3 urls,'url'and'url_signed', respectively.See also
civis.APIClient.scripts.post_sqlExamples
>>> sql = "SELECT * FROM schema.my_big_table" >>> database = "my_database" >>> delimiter = "|" >>> manifest = civis_multipart_unload(sql, database, delimiter=delimiter) >>> ids = [file['id'] for file in manifest['files']] >>> buf = BytesIO() >>> civis_to_file(ids[0], buf) >>> buf.seek(0) >>> df = pd.read_csv(buf, delimiter=delimiter)