Skip to content

Commit be44ab3

Browse files
first version of data ingestion
1 parent 416d1c6 commit be44ab3

File tree

1 file changed

+127
-43
lines changed

1 file changed

+127
-43
lines changed

doc/data_ingestion.rst

Lines changed: 127 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,149 @@
11
Data Ingestion
22
==============
33

4-
Importing CSV from AWS S3 into Exasol
5-
-------------------------------------
6-
This example demonstrates how to import a CSV file from AWS S3 into Exasol using the `IMPORT FROM CSV` command.
4+
CSV files
5+
---------
76

8-
Step 1: Create a Virtual Schema Connection
9-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7+
The example below shows how to import a CSV file into an Exasol database using `pyexasol` and the `import_from_file` function.
108

11-
Establish a connection to your AWS S3 bucket using your AWS credentials:
9+
.. code-block:: python
1210
13-
.. code-block:: sql
11+
import pyexasol
1412
15-
CREATE CONNECTION S3_MY_BUCKET
16-
TO 'http://<my_bucketname>.s3.<my_region>.amazonaws.com'
17-
USER '<my_access_key>'
18-
IDENTIFIED BY '<my_secret_key>';
13+
# Connection details
14+
dsn = 'your_exasol_dsn'
15+
user = 'your_username'
16+
password = 'your_password'
1917
20-
Step 2: Create a Table
21-
^^^^^^^^^^^^^^^^^^^^^^
18+
# Connect to Exasol
19+
C = pyexasol.connect(dsn=dsn, user=user, password=password)
2220
23-
Define the structure of the target table where the data from the CSV file will be stored:
21+
# Path to the local CSV file
22+
file_path = '/path/to/your/file.csv'
2423
25-
.. code-block:: sql
24+
# Import CSV file into Exasol
25+
C.import_from_file(file_path, 'your_schema.your_table')
2626
27-
CREATE TABLE sales_data (
28-
order_id INT,
29-
product_name VARCHAR(100),
30-
quantity INT,
31-
price DOUBLE
32-
);
27+
For more detailed information and additional options, refer to the `pyexasol documentation <https://exasol.github.io/pyexasol/master/user_guide/http_transport.html#import-from-file>`_
3328

34-
Step 3: Import Data
35-
^^^^^^^^^^^^^^^^^^^
3629

37-
Execute the `IMPORT FROM CSV` command using the defined connection and specifying the details of the CSV file, such as its location, column separators, and encoding:
30+
Other options to import CSV
31+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
32+
Exasol also provides a bulk loader to import CSV from various external sources, details can be found in the `Exasol documentation <https://docs.exasol.com/db/latest/sql/import.htm>`_
3833

39-
.. code-block:: sql
4034

41-
IMPORT INTO sales_data
42-
FROM CSV
43-
AT S3_MY_BUCKET
44-
FILE 'sales_2025/sales.csv'
45-
COLUMN SEPARATOR = ';'
46-
ROW SEPARATOR = 'CRLF'
47-
COLUMN DELIMITER = '"'
48-
ENCODING = 'UTF-8'
49-
SKIP = 1;
35+
Parquet Files
36+
-------------
5037

51-
.. note::
52-
Make sure to replace `my-access-key`, `my-secret-access`, `my-bucket-name` with your actual AWS S3 credentials.
38+
The example below shows how to read a local Parquet file into a pandas DataFrame and then insert that data into an Exasol database using `pyexasol`.
5339

54-
For more detailed information and additional options, refer to the Exasol documentation at: `Exasol Documentation <https://docs.exasol.com/db/latest/sql/import.htm>`_
40+
Reading the Parquet File
41+
^^^^^^^^^^^^^^^^^^^^^^^^
42+
Use the `pandas.read_parquet` function to read the Parquet file into a DataFrame.
5543

44+
.. code-block:: python
5645
46+
import pandas as pd
5747
58-
Parquet Files
59-
-------------
48+
# Path to the local Parquet file
49+
file_path = 'path/to/your/file.parquet'
50+
51+
# Read the Parquet file into a DataFrame
52+
df = pd.read_parquet(file_path)
53+
54+
Inserting Data into Exasol
55+
^^^^^^^^^^^^^^^^^^^^^^^^^^
56+
Use the `pyexasol` library to connect to the Exasol database and insert the DataFrame.
57+
58+
.. code-block:: python
59+
60+
import pyexasol
61+
62+
# Connection details
63+
dsn = 'your_exasol_dsn'
64+
user = 'your_username'
65+
password = 'your_password'
66+
67+
# Connect to Exasol
68+
conn = pyexasol.connect(dsn=dsn, user=user, password=password)
69+
70+
# Insert DataFrame into Exasol
71+
conn.import_from_pandas(df, 'your_schema.your_table')
72+
73+
74+
Other options to import parquet
75+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
76+
To import a Parquet file from e.g. Amazon S3 into Exasol, you can also use the Exasol Cloud Storage Extensions.
77+
Detailed instructions and examples can be found in the the following `Cloud Storage Extensions User guide <https://github.com/exasol/cloud-storage-extension/blob/main/doc/user_guide/user_guide.md>`__.
78+
79+
80+
Loading Data from External Sources
81+
----------------------------------
82+
Exasol supports loading data from various external sources using the `IMPORT` statement.
83+
You can connect to external databases via JDBC, or load data from files stored in e.g. Cloud Object Storage / Kafka and more.
84+
85+
Example: Loading Data from a JDBC Source
86+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
87+
Here is an example of how to load data from a PostgreSQL database using JDBC:
88+
89+
.. code-block:: python
90+
91+
import pyexasol
92+
93+
# Connection details
94+
dsn = 'your_exasol_dsn'
95+
user = 'your_username'
96+
password = 'your_password'
97+
98+
# Connect to Exasol
99+
conn = pyexasol.connect(dsn=dsn, user=user, password=password)
100+
101+
# Define the connection to the PostgreSQL database
102+
conn.execute("""
103+
CREATE OR REPLACE CONNECTION my_pg_conn
104+
TO 'jdbc:postgresql://your_postgresql_host:5432/your_database'
105+
USER 'your_pg_username'
106+
IDENTIFIED BY 'your_pg_password'
107+
""")
108+
109+
# Import data from PostgreSQL into Exasol
110+
conn.execute("""
111+
IMPORT INTO your_schema.your_table
112+
FROM JDBC AT my_pg_conn
113+
STATEMENT 'SELECT * FROM your_pg_table'
114+
""")
115+
116+
Example: Loading Data from an HTTP Source
117+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
118+
Here is an example of how to load data from a CSV file stored on an HTTPS server:
119+
120+
.. code-block:: python
121+
122+
import pyexasol
123+
124+
# Connection details
125+
dsn = 'your_exasol_dsn'
126+
user = 'your_username'
127+
password = 'your_password'
128+
129+
# Connect to Exasol
130+
conn = pyexasol.connect(dsn=dsn, user=user, password=password)
131+
132+
# Import data from a CSV file on an HTTP server
133+
conn.execute("""
134+
IMPORT INTO your_schema.your_table
135+
FROM CSV AT 'https://your_https_server/path/to/your/file.csv'
136+
FILE OPTIONS 'DELIMITER=; ENCODING=UTF-8 SKIP_ROWS=1 NULL=NULL'
137+
""")
138+
139+
For more detailed information on loading data from external sources, please refer to the Exasol documentation:
140+
* `Loading Data from External Sources <https://docs.exasol.com/db/latest/loading_data/load_data_from_externalsources.htm>`_.
60141

61-
Import from external sources
62-
----------------------------
142+
Using Virtual Schemas
143+
^^^^^^^^^^^^^^^^^^^^^
144+
Virtual schemas in Exasol provide an abstraction layer that makes external data sources accessible through regular SQL commands.
145+
This allows you to query external data as if it were stored in Exasol, without the need to physically load the data into the database.
63146

64-
HTTP Transport
65-
--------------
147+
For more information on virtual schemas and the supported dialects, please refer to the following resources:
148+
* `Virtual Schemas User Guide <https://github.com/exasol/virtual-schemas/blob/main/doc/user_guide/dialects.md>`_.
149+
* `Virtual Schemas Documentation <https://docs.exasol.com/db/latest/database_concepts/virtual_schemas.htm>`_.

0 commit comments

Comments
 (0)