civis.io.civis_to_multifile_csv

civis.io.civis_to_multifile_csv(sql: str, database: str | int, sql_params_arguments: dict | None = None, job_name: str | None = None, client: APIClient | None = None, credential_id: int | None = None, include_header: bool = True, compression: str = 'none', delimiter: str = '|', max_file_size: int | None = None, unquoted: bool = False, prefix: str | None = None, polling_interval: int | float | None = None, hidden: bool = True) dict[source]

Unload the result of SQL query and return presigned urls.

This function is intended for unloading large queries/tables from redshift as it uses a ‘PARALLEL ON’ S3 unload. It returns a similar manifest file to conventional S3 UNLOAD statements except the CSV parts are accessible via both files endpoint IDs and presigned S3 urls.

Parameters:
sqlstr

The SQL select string to be executed.

databasestr or int

Execute the query against this database. Can be the database name or ID.

sql_params_argumentsdict, optional

A dictionary of SQL query parameters to pass directly to civis.APIClient.scripts.post_sql. The only allowed keys are "params" (whose value is a list[dict]) and "arguments" (whose value is a dict). Please refer to the linked API documentation for how to format these two keys’ values.

job_namestr, optional

A name to give the job. If omitted, a random job name will be used.

clientcivis.APIClient, optional

If not provided, an civis.APIClient object will be created from the CIVIS_API_KEY.

credential_idstr or int, optional

The database credential ID. If None, the default credential will be used.

include_header: bool, optional

If True include a key in the returned dictionary containing a list of column names. Default: True.

compression: str, optional

Type of compression to use, if any. One of 'none', 'zip', or 'gzip'. Default 'none'.

delimiter: str, optional

Which delimiter to use, if any. One of ',', '     ', or '|'. Default: '|'.

max_file_size: int, optional

Maximum number of Megabytes each created file will be.

unquoted: bool, optional

Whether or not to quote fields. Default: False.

prefix: str, optional

A user specified filename prefix for the output file to have. Default: None.

polling_intervalint or float, optional

Number of seconds to wait between checks for query completion.

hiddenbool, optional

If True (the default), this job will not appear in the Civis UI.

Returns:
unload_manifest: dict

A dictionary resembling an AWS manifest file. Has the following keys:

‘query’: str

The query.

‘header’: list of str

The columns from the query.

‘entries’: list of dict

Each dict has the following keys:

‘id’: int

File ID

‘name’: str

Filename

‘size’: int

File size in bytes

‘url’: str

Unsigned S3 URL (‘s3://…’)

‘url_signed’: str

Signed S3 URL (’https://…’)

‘unquoted’: bool

Whether the cells are quoted.

‘compression’: str

Type of compression used.

‘delimiter’: str

Delimiter that separates the cells.

See also

civis.APIClient.scripts.post_sql

Examples

>>> import io
>>> import civis
>>> import pandas as pd
>>> sql = "SELECT * FROM schema.my_big_table"
>>> database = "my_database"
>>> delimiter = "|"
>>> manifest = civis.io.civis_to_multifile_csv(sql, database, delimiter=delimiter)
>>> ids = [entry['id'] for entry in manifest['entries']]
>>> for file_id in ids:
>>>     buf = io.BytesIO()
>>>     civis.io.civis_to_file(file_id, buf)
>>>     buf.seek(0)
>>>     # Process the data in `buf` for your own application, e.g.:
>>>     df = pd.read_csv(buf, delimiter=delimiter)