civis.io.civis_to_multifile_csv
- civis.io.civis_to_multifile_csv(sql: str, database: str | int, sql_params_arguments: dict | None = None, job_name: str | None = None, client: APIClient | None = None, credential_id: int | None = None, include_header: bool = True, compression: str = 'none', delimiter: str = '|', max_file_size: int | None = None, unquoted: bool = False, prefix: str | None = None, polling_interval: int | float | None = None, hidden: bool = True) dict[source]
Unload the result of SQL query and return presigned urls.
This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.
- Parameters:
- sqlstr
The SQL select string to be executed.
- databasestr or int
Execute the query against this database. Can be the database name or ID.
- sql_params_argumentsdict, optional
A dictionary of SQL query parameters to pass directly to
civis.APIClient.scripts.post_sql. The only allowed keys are"params"(whose value is a list[dict]) and"arguments"(whose value is a dict). Please refer to the linked API documentation for how to format these two keys’ values.- job_namestr, optional
A name to give the job. If omitted, a random job name will be used.
- client
civis.APIClient, optional If not provided, an
civis.APIClientobject will be created from theCIVIS_API_KEY.- credential_idstr or int, optional
The database credential ID. If
None, the default credential will be used.- include_header: bool, optional
If
Trueinclude a key in the returned dictionary containing a list of column names. Default:True.- compression: str, optional
Type of compression to use, if any. One of
'none','zip', or'gzip'. Default'none'.- delimiter: str, optional
Which delimiter to use, if any. One of
',',' ', or'|'. Default:'|'.- max_file_size: int, optional
Maximum number of Megabytes each created file will be.
- unquoted: bool, optional
Whether or not to quote fields. Default:
False.- prefix: str, optional
A user specified filename prefix for the output file to have. Default:
None.- polling_intervalint or float, optional
Number of seconds to wait between checks for query completion.
- hiddenbool, optional
If
True(the default), this job will not appear in the Civis UI.
- Returns:
- unload_manifest: dict
A dictionary resembling an AWS manifest file. Has the following keys:
- ‘query’: str
The query.
- ‘header’: list of str
The columns from the query.
- ‘entries’: list of dict
Each dict has the following keys:
- ‘id’: int
File ID
- ‘name’: str
Filename
- ‘size’: int
File size in bytes
- ‘url’: str
Unsigned S3 URL (‘s3://…’)
- ‘url_signed’: str
Signed S3 URL (’https://…’)
- ‘unquoted’: bool
Whether the cells are quoted.
- ‘compression’: str
Type of compression used.
- ‘delimiter’: str
Delimiter that separates the cells.
See also
civis.APIClient.scripts.post_sql
Examples
>>> import io >>> import civis >>> import pandas as pd >>> sql = "SELECT * FROM schema.my_big_table" >>> database = "my_database" >>> delimiter = "|" >>> manifest = civis.io.civis_to_multifile_csv(sql, database, delimiter=delimiter) >>> ids = [entry['id'] for entry in manifest['entries']] >>> for file_id in ids: >>> buf = io.BytesIO() >>> civis.io.civis_to_file(file_id, buf) >>> buf.seek(0) >>> # Process the data in `buf` for your own application, e.g.: >>> df = pd.read_csv(buf, delimiter=delimiter)