civis.io.civis_to_multifile_csv¶
-
civis.io.
civis_to_multifile_csv
(sql, database, job_name=None, api_key=None, client=None, credential_id=None, include_header=True, compression='none', delimiter='|', unquoted=False, prefix=None, polling_interval=None, hidden=True)[source]¶ Unload the result of SQL query and return presigned urls.
This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.
Parameters: - sql : str, optional
The SQL select string to be executed.
- database : str or int
Execute the query against this database. Can be the database name or ID.
- job_name : str, optional
A name to give the job. If omitted, a random job name will be used.
- api_key : DEPRECATED str, optional
Your Civis API key. If not given, the
CIVIS_API_KEY
environment variable will be used.- client :
civis.APIClient
, optional If not provided, an
civis.APIClient
object will be created from theCIVIS_API_KEY
.- credential_id : str or int, optional
The database credential ID. If
None
, the default credential will be used.- include_header: bool, optional
If
True
include a key in the returned dictionary containing a list of column names. Default:True
.- compression: str, optional
Type of compression to use, if any. One of
'none'
,'zip'
, or'gzip'
. Default'none'
.- delimiter: str, optional
Which delimiter to use, if any. One of
','
,' '
, or'|'
. Default:'|'
.- unquoted: bool, optional
Whether or not to quote fields. Default:
False
.- prefix: str, optional
A user specified filename prefix for the output file to have. Default:
None
.- polling_interval : int or float, optional
Number of seconds to wait between checks for query completion.
- hidden : bool, optional
If
True
(the default), this job will not appear in the Civis UI.
Returns: - unload_manifest: dict
A dictionary resembling an AWS manifest file. Has the following keys:
- ‘query’: str
The query.
- ‘header’: list of str
The columns from the query.
- ‘entries’: list of dict
Each dict has the following keys:
- ‘id’: int
File ID
- ‘name’: str
Filename
- ‘size’: int
File size in bytes
- ‘url’: str
Unsigned S3 URL (‘s3://…’)
- ‘url_signed’: str
Signed S3 URL (‘https://…’)
- ‘unquoted’: bool
Whether the cells are quoted.
- ‘compression’: str
Type of compression used.
- ‘delimiter’: str
Delimiter that separates the cells.
See also
civis.APIClient.scripts.post_sql
Examples
>>> sql = "SELECT * FROM schema.my_big_table" >>> database = "my_database" >>> delimiter = "|" >>> manifest = civis_to_multifile_csv(sql, database, delimiter=delimiter) >>> ids = [entry['id'] for entry in manifest['entries']] >>> buf = BytesIO() >>> civis_to_file(ids[0], buf) >>> buf.seek(0) >>> df = pd.read_csv(buf, delimiter=delimiter)