|
1 | 1 | Data Ingestion
|
2 | 2 | ==============
|
3 | 3 |
|
4 |
| -Importing CSV from AWS S3 into Exasol |
5 |
| -------------------------------------- |
6 |
| -This example demonstrates how to import a CSV file from AWS S3 into Exasol using the `IMPORT FROM CSV` command. |
| 4 | +CSV files |
| 5 | +--------- |
7 | 6 |
|
8 |
| -Step 1: Create a Virtual Schema Connection |
9 |
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 7 | +The example below shows how to import a CSV file into an Exasol database using `pyexasol` and the `import_from_file` function. |
10 | 8 |
|
11 |
| -Establish a connection to your AWS S3 bucket using your AWS credentials: |
| 9 | +.. code-block:: python |
12 | 10 |
|
13 |
| -.. code-block:: sql |
| 11 | + import pyexasol |
14 | 12 |
|
15 |
| - CREATE CONNECTION S3_MY_BUCKET |
16 |
| - TO 'http://<my_bucketname>.s3.<my_region>.amazonaws.com' |
17 |
| - USER '<my_access_key>' |
18 |
| - IDENTIFIED BY '<my_secret_key>'; |
| 13 | + # Connection details |
| 14 | + dsn = 'your_exasol_dsn' |
| 15 | + user = 'your_username' |
| 16 | + password = 'your_password' |
19 | 17 |
|
20 |
| -Step 2: Create a Table |
21 |
| -^^^^^^^^^^^^^^^^^^^^^^ |
| 18 | + # Connect to Exasol |
| 19 | + C = pyexasol.connect(dsn=dsn, user=user, password=password) |
22 | 20 |
|
23 |
| -Define the structure of the target table where the data from the CSV file will be stored: |
| 21 | + # Path to the local CSV file |
| 22 | + file_path = '/path/to/your/file.csv' |
24 | 23 |
|
25 |
| -.. code-block:: sql |
| 24 | + # Import CSV file into Exasol |
| 25 | + C.import_from_file(file_path, 'your_schema.your_table') |
26 | 26 |
|
27 |
| - CREATE TABLE sales_data ( |
28 |
| - order_id INT, |
29 |
| - product_name VARCHAR(100), |
30 |
| - quantity INT, |
31 |
| - price DOUBLE |
32 |
| - ); |
| 27 | +For more detailed information and additional options, refer to the `pyexasol documentation <https://exasol.github.io/pyexasol/master/user_guide/http_transport.html#import-from-file>`_ |
33 | 28 |
|
34 |
| -Step 3: Import Data |
35 |
| -^^^^^^^^^^^^^^^^^^^ |
36 | 29 |
|
37 |
| -Execute the `IMPORT FROM CSV` command using the defined connection and specifying the details of the CSV file, such as its location, column separators, and encoding: |
| 30 | +Other options to import CSV |
| 31 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 32 | +Exasol also provides a bulk loader to import CSV from various external sources, details can be found in the `Exasol documentation <https://docs.exasol.com/db/latest/sql/import.htm>`_ |
38 | 33 |
|
39 |
| -.. code-block:: sql |
40 | 34 |
|
41 |
| - IMPORT INTO sales_data |
42 |
| - FROM CSV |
43 |
| - AT S3_MY_BUCKET |
44 |
| - FILE 'sales_2025/sales.csv' |
45 |
| - COLUMN SEPARATOR = ';' |
46 |
| - ROW SEPARATOR = 'CRLF' |
47 |
| - COLUMN DELIMITER = '"' |
48 |
| - ENCODING = 'UTF-8' |
49 |
| - SKIP = 1; |
| 35 | +Parquet Files |
| 36 | +------------- |
50 | 37 |
|
51 |
| -.. note:: |
52 |
| - Make sure to replace `my-access-key`, `my-secret-access`, `my-bucket-name` with your actual AWS S3 credentials. |
| 38 | +The example below shows how to read a local Parquet file into a pandas DataFrame and then insert that data into an Exasol database using `pyexasol`. |
53 | 39 |
|
54 |
| -For more detailed information and additional options, refer to the Exasol documentation at: `Exasol Documentation <https://docs.exasol.com/db/latest/sql/import.htm>`_ |
| 40 | +Reading the Parquet File |
| 41 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 42 | +Use the `pandas.read_parquet` function to read the Parquet file into a DataFrame. |
55 | 43 |
|
| 44 | +.. code-block:: python |
56 | 45 |
|
| 46 | + import pandas as pd |
57 | 47 |
|
58 |
| -Parquet Files |
59 |
| -------------- |
| 48 | + # Path to the local Parquet file |
| 49 | + file_path = 'path/to/your/file.parquet' |
| 50 | +
|
| 51 | + # Read the Parquet file into a DataFrame |
| 52 | + df = pd.read_parquet(file_path) |
| 53 | +
|
| 54 | +Inserting Data into Exasol |
| 55 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 56 | +Use the `pyexasol` library to connect to the Exasol database and insert the DataFrame. |
| 57 | + |
| 58 | +.. code-block:: python |
| 59 | +
|
| 60 | + import pyexasol |
| 61 | +
|
| 62 | + # Connection details |
| 63 | + dsn = 'your_exasol_dsn' |
| 64 | + user = 'your_username' |
| 65 | + password = 'your_password' |
| 66 | +
|
| 67 | + # Connect to Exasol |
| 68 | + conn = pyexasol.connect(dsn=dsn, user=user, password=password) |
| 69 | +
|
| 70 | + # Insert DataFrame into Exasol |
| 71 | + conn.import_from_pandas(df, 'your_schema.your_table') |
| 72 | +
|
| 73 | +
|
| 74 | +Other options to import parquet |
| 75 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 76 | +To import a Parquet file from e.g. Amazon S3 into Exasol, you can also use the Exasol Cloud Storage Extensions. |
| 77 | +Detailed instructions and examples can be found in the the following `Cloud Storage Extensions User guide <https://github.com/exasol/cloud-storage-extension/blob/main/doc/user_guide/user_guide.md>`__. |
| 78 | + |
| 79 | + |
| 80 | +Loading Data from External Sources |
| 81 | +---------------------------------- |
| 82 | +Exasol supports loading data from various external sources using the `IMPORT` statement. |
| 83 | +You can connect to external databases via JDBC, or load data from files stored in e.g. Cloud Object Storage / Kafka and more. |
| 84 | + |
| 85 | +Example: Loading Data from a JDBC Source |
| 86 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 87 | +Here is an example of how to load data from a PostgreSQL database using JDBC: |
| 88 | + |
| 89 | +.. code-block:: python |
| 90 | +
|
| 91 | + import pyexasol |
| 92 | +
|
| 93 | + # Connection details |
| 94 | + dsn = 'your_exasol_dsn' |
| 95 | + user = 'your_username' |
| 96 | + password = 'your_password' |
| 97 | +
|
| 98 | + # Connect to Exasol |
| 99 | + conn = pyexasol.connect(dsn=dsn, user=user, password=password) |
| 100 | +
|
| 101 | + # Define the connection to the PostgreSQL database |
| 102 | + conn.execute(""" |
| 103 | + CREATE OR REPLACE CONNECTION my_pg_conn |
| 104 | + TO 'jdbc:postgresql://your_postgresql_host:5432/your_database' |
| 105 | + USER 'your_pg_username' |
| 106 | + IDENTIFIED BY 'your_pg_password' |
| 107 | + """) |
| 108 | +
|
| 109 | + # Import data from PostgreSQL into Exasol |
| 110 | + conn.execute(""" |
| 111 | + IMPORT INTO your_schema.your_table |
| 112 | + FROM JDBC AT my_pg_conn |
| 113 | + STATEMENT 'SELECT * FROM your_pg_table' |
| 114 | + """) |
| 115 | +
|
| 116 | +Example: Loading Data from an HTTP Source |
| 117 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 118 | +Here is an example of how to load data from a CSV file stored on an HTTPS server: |
| 119 | + |
| 120 | +.. code-block:: python |
| 121 | +
|
| 122 | + import pyexasol |
| 123 | +
|
| 124 | + # Connection details |
| 125 | + dsn = 'your_exasol_dsn' |
| 126 | + user = 'your_username' |
| 127 | + password = 'your_password' |
| 128 | +
|
| 129 | + # Connect to Exasol |
| 130 | + conn = pyexasol.connect(dsn=dsn, user=user, password=password) |
| 131 | +
|
| 132 | + # Import data from a CSV file on an HTTP server |
| 133 | + conn.execute(""" |
| 134 | + IMPORT INTO your_schema.your_table |
| 135 | + FROM CSV AT 'https://your_https_server/path/to/your/file.csv' |
| 136 | + FILE OPTIONS 'DELIMITER=; ENCODING=UTF-8 SKIP_ROWS=1 NULL=NULL' |
| 137 | + """) |
| 138 | +
|
| 139 | +For more detailed information on loading data from external sources, please refer to the Exasol documentation: |
| 140 | +* `Loading Data from External Sources <https://docs.exasol.com/db/latest/loading_data/load_data_from_externalsources.htm>`_. |
60 | 141 |
|
61 |
| -Import from external sources |
62 |
| ----------------------------- |
| 142 | +Using Virtual Schemas |
| 143 | +^^^^^^^^^^^^^^^^^^^^^ |
| 144 | +Virtual schemas in Exasol provide an abstraction layer that makes external data sources accessible through regular SQL commands. |
| 145 | +This allows you to query external data as if it were stored in Exasol, without the need to physically load the data into the database. |
63 | 146 |
|
64 |
| -HTTP Transport |
65 |
| --------------- |
| 147 | +For more information on virtual schemas and the supported dialects, please refer to the following resources: |
| 148 | +* `Virtual Schemas User Guide <https://github.com/exasol/virtual-schemas/blob/main/doc/user_guide/dialects.md>`_. |
| 149 | +* `Virtual Schemas Documentation <https://docs.exasol.com/db/latest/database_concepts/virtual_schemas.htm>`_. |
0 commit comments