version 0.3
To load data from your files to your DB first run test spec from test config file dput-test-<source>.py
against your DB. Then dput
will create tables ida
and ida_lines
in the DB, and these tables will be used to load data.
The dput
utility loads data from CSV, XLSX, JSON files and other text files according to a spec in config file. Data may be loaded from a single file or from a series of files. If input files are zipped then they are automatically unzipped before loading data.
- Test Files to Load
- Basic Usage
- Loading Procedure
- Loading Data into the Default Table
- Loading Selected Rows and Fields
- Loading Data from JSON file
- Loading Data into User-Defined Table
- Loading Data with Unpacking Nested List
- Loading Data into Multiple User-Defined Tables
- Loading Nested Tables into the Default Table
- Loading Special Text Files
- Three Types of
insert_data
Functions - Command Line Arguments
- Config File Parameters
- Spec Parameters
Files for test loads are found in the in
directory. Let's look at them to better understand the following explanations.
Files test
with extensions .xlsx
, .csv
, .json
and .dat
contain the list of countries with 4 fields:
- country name,
- 2-symbol country code,
- 3-symbol country code,
- numeric country code.
The file series with names test_000001
, test_000002
, test_000003
and extensions .csv
, .xlsx
или .json
contain the same list of countries divided into 3 parts.
Here are the first five and the last five lines from file test.csv
:
Afghanistan;AF;AFG;004
Aland Islands;AX;ALA;248
Albania;AL;ALB;008
Algeria;DZ;DZA;012
American Samoa;AS;ASM;016
...
Wallis and Futuna Islands;WF;WLF;876
Western Sahara;EH;ESH;732
Yemen;YE;YEM;887
Zambia;ZM;ZMB;894
Zimbabwe;ZW;ZWE;716
The first five and the last five lines from file test.json
:
[
{"name": "Afghanistan", "alpha2": "AF", "alpha3": "AFG", "code": "004"},
{"alpha2": "AX", "alpha3": "ALA", "code": "248", "name": "Aland Islands"},
{"alpha3": "ALB", "code": "008", "name": "Albania", "alpha2": "AL"},
{"code": "012", "alpha2": "DZ", "name": "Algeria", "alpha3": "DZA"},
{"name": "American Samoa", "code": "016", "alpha2": "AS", "alpha3": "ASM"},
...
{"name": "Wallis and Futuna Islands", "alpha2": "WF", "alpha3": "WLF", "code": "876"},
{"alpha2": "EH", "name": "Western Sahara", "alpha3": "ESH", "code": "732"},
{"alpha2": "YE", "alpha3": "YEM", "name": "Yemen", "code": "887"},
{"alpha2": "ZM", "alpha3": "ZMB", "code": "894", "name": "Zambia"},
{"code": "716", "name": "Zimbabwe", "alpha2": "ZW", "alpha3": "ZWE"}
]
Notice that fields of JSON objects have no particular order.
Here are example specs for SQLite to load data from test data files into table ida_lines
.
The "file"
parameter sets the name of a file (found in the IN_DIR
directory) and its format:
# hello-dput.py
IN_DIR = os.path.join(os.path.dirname(__file__), '..', 'in')
SOURCE = sources['sqlite-source']
specs = {
"hello-csv": {
"file": "test.csv"
},
"hello-xlsx": {
"file": "test.xlsx"
},
"hello-json": {
"file": "test.json",
"insert_data": lambda row: (row["code"], row["name"], row["alpha2"], row["alpha3"])
},
"hello-zip": {
"file": "test_zip.zip"
},
"hello-series": {
"file": "test_000???.csv",
"insert_data": lambda row: (row[3], row[0], row[1])
},
...
}
The "file"
parameter in spec "hello-zip"
tells dput
to first unzip the file and then load it. A zip
archive should contain a single file with the same name as zip
-file and one of extensions .csv
, .xlsx
and .json
.
The "file"
parameter in spec "hello-series"
sets glob-pattern for a series of files to load. The matched files will be loaded in alphabetical order.
The "insert_data"
parameter in spec "hello-json"
defines function to transform JSON object into data row – a list or tuple of values – to be inserted into a database table. The function's input argument is a Python dict
representing a JSON object.
The "insert_data"
parameter in spec "hello-series"
demonstrates the possibility of selecting and rearranging fields of CSV-file. The function's input argument is a list or tuple of values of CSV-file fields.
Data from xlsx
, csv
and other text files are loaded line by line in the order of lines in a file; from a json
file – object by object in the order of JSON objects in JSON array.
The procedure of loading data with dput
optionally includes:
- checking if a line meets a certain condition to be loaded;
- building a list of fields to insert into a database table – when data comes from
json
file or when we only need some of the fields; - checking if all the rows inserted into a database table meet certain conditions, including consistency with existent data,
- copying (and transforming) rows from an interface table into target database tables where the data belong.
If rows loaded into interface table do not meet integrity or consistency requirements, they will not be loaded into target database tables and a user will get error message(s).
That said the simplest spec for dput
- only contains file name to load,
- does not evoke steps 1–4 described above,
- allows loading all fields of all rows from input files into the default interface table
ida_lines
.
Tables ida
and ida_lines
are automatically created in a database when executing specs from test config files and are afterward used by default for all user defined specifications. Here is their structure (in PostgreSQL DB):
create table if not exists ida (
iload serial not null,
idate timestamptz not null default now(),
istat smallint not null default 0,
imess varchar(4000),
entity varchar(50) not null,
ifile varchar(256) not null,
iuser varchar(30),
arg1 varchar(4000),
arg2 varchar(4000),
...
arg9 varchar(4000),
primary key (iload)
);
create table if not exists ida_lines (
iload int not null,
iline int not null,
ntable smallint not null default -1,
nline int not null default -1,
istat smallint not null default 0,
ierrm varchar(4000),
c1 varchar(4000),
c2 varchar(4000),
...
...
...
c100 varchar(4000),
primary key (iload, iline, ntable, nline),
foreign key (iload) references ida (iload) on delete cascade
);
Table ida
keeps the facts of loads, in particular:
- load identifier
iload
, - load time
idate
, - load status
istat
, - name of a loaded file
ifile
, - name of a spec
entity
, - optional message
imess
written on load completion.
Table ida_lines
keeps all the lines from the file, and also
- load identifier
iload
, - line number
iline
, - "nested table" number
ntable
, - "nested table" line number
nline
, - line status
istat
and - error message
ierrm
– in case when error occurred.
In order to load data in other table(s) than ida_lines
you have to explicitly provide SQL insert
statement (or stored procedure call) in a spec. See further details in Loading Data into User-Defined Table.
Data loaded in DB via the interface tables ida
and ida_lines
is not deleted from these tables right away. The config file parameter
PRESERVE_N_LOADS = 10
tells dput
to keep 10 last loads for each spec in the interface tables. When user runs the 11-th load the oldest of the previous 10 loads will be deleted. If you set PRESERVE_N_LOADS
to 0, then dput
will delete the data from interface tables right after executing a spec.
Keeping loaded data in interface tables for a while may turn helpful when you need to manually check what data was loaded by users or what errors were detected.
Test config files dput-test-<source>.py
contain comments on all the spec parameters. Read it carefully and familiarize yourself with all the parameters.
Let's see how dput
works – using test file test.csv
and specs from the config file dput-test-postgres.py
:
specs = {
...
"csv_ida_test": {
"source": "postgres-source",
"file": "test.csv",
"validate_actions": [
"""
update ida_lines set
istat = 2,
ierrm = trim(ierrm || ' Empty field.')
where iload = %s
and (c1 is null or c2 is null or c3 is null or c4 is null)
""",
"""
update ida_lines set
istat = 2,
ierrm = trim(ierrm || ' Not ALPHA2 code.')
where iload = %s
and length(c2) != 2
""",
"""
update ida_lines set
istat = 2,
ierrm = trim(ierrm || ' Not ALPHA3 code.')
where iload = %s
and length(c3) != 3
""",
],
"process_actions": [
# just teardown
"delete from ida where iload = %s"
]
},
...
}
According to the above spec, dput
utility
- loads data from file
test.csv
into tableida_lines
in the database specified in parameter"source"
, and inserts into tableida
a row with load metadata; - executes SQL statements from
"validate_actions"
list to verify that rows loaded intoida_lines
comply to the requirements; - if the verification is success then
- executes SQL statements from
"process_actions"
list (here might be calls to stored procedures) to copy data fromida_lines
into target tables; else - writes error messages to the log file.
- executes SQL statements from
When loading data in table ida_lines
the four fields' values will be put in columns c1
, с2
, с3
and с4
, respectfully. As table ida_lines
has 100 columns c1
, .. c100
, the number of fields in a file to load may not be greater than 100.
The status of lines istat
right after loading (step 1) equals 0 - Waiting to be processed.
List "validate_actions"
contains SQL statement to verify data compliance to the requirements. These statements get executed by dput
(step 2) after the lines are loaded into table ida_lines
.
The three update
statements in the spec above check, respectfully:
- that all 4 fields are not empty,
- that the length of 2-symbol country code is 2,
- that the length of 3-symbol country code is 3.
If the data does not meet requirements the line in ida_lines
gets marked as erroneous (istat = 2
), and error message is put into the field ierrm
. In this case the processing is stopped after executing "validate_actions"
, and dput
writes bad line numbers and error messages to the log file.
If "validate_actions"
do not detect errors then dput
executes SQL statements from the list "process_actions"
(step 3) and inserts the verified data from ida_lines
into target tables. In the test spec the list "process_actions"
contains a single statement that deletes loaded data from tables ida
and ida_lines
. Thus, instead of inserting data into target tables, test data is just deleted from the DB.
SQL statements in lists "validate_actions"
and "process_actions"
contain a bind variable for a load identifier iload
. While bind variables for PostgreSQL are designated with %s
, bind variables for other DBs may be designated differently, depending on the Python DB-API module for a database. See test config files for other databases dput-test-sqlite.py
, dput-test-oracle.py
, dput-test-mysql.py
.
If at least one row in ida_lines
were marked as erroneous, then dput
sets the load status ida.istat
to 2 - Error. If all the loaded rows were processed successfully then dput
sets the load status ida.istat
to 1 - Processing succeeded.
Optional spec parameter "skip_lines"
specifies number of lines to skip at the beginning of an input file. For example, value 1
makes dput
skip the first line (presumably a header) when loading csv
or xlsx
files.
Optional spec parameter "insert_data"
defines a function that allows
- load only explicitly chosen fields,
- load only rows that meet a certain condition.
The "insert_data"
parameter either sets a lambda-function or a regular Python function defined in the same config file.
Specification "selected_ida_test"
below demonstrates just such a selective loading:
"selected_ida_test": {
"file": "test.csv",
#
# tuple of values to insert into ida_lines table
#
"insert_data": lambda row: (row[3], row[0]) if row[0][0] in 'AEIOU' else None
}
Function set with parameter "insert_data"
,
- is called for each line of input file,
- gets as an argument a list or a tuple of all fields' values of a line,
- returns a list or a tuple of fields' values that should be loaded,
- or return
None
, in which case the line is skipped and not loaded.
The lambda-function in the above spec returns a tuple (<numeric counrty code>, <contry name>)
for lines where the country name starts with a vowel, and None
for other lines. Thus, only some fields of some lines are going to be loaded.
The "insert_data"
parameter is mandatory in a spec that loads data from a JSON file, even in case when rows are loaded into the default table ida_lines
. The thing is that the fields of JSON objects – or JSON file "rows" – could be put in any order (see Test Files to Load).
An example specification to load data from test.json
into table ida_lines
:
"json_ida_test": {
"file": "test.json",
#
# tuple of values to insert into ida_lines table
#
"insert_data": lambda row: (row["code"], row["name"], row["alpha2"], row["alpha3"])
}
Instead of loading data rows into table ida_lines
it is possible to load them into other table. To make dput
load data into such a table, you should set spec parameters "insert_actions"
and "insert_data"
.
Suppose you have interface table test
designed to accept data on countries:
create table if not exists test (
code varchar(3) not null,
name varchar(50) not null,
alpha2 char(2),
alpha3 char(3)
)
The following spec allows loading data from file test.csv
into table test
:
"csv_test_test": {
"file": "test.csv",
"insert_actions": """
insert into test (code, name, alpha2, alpha3)
values (%s, %s, %s, %s)
""",
"insert_data": lambda row: (row[3], row[0], row[1], row[2])
}
Spec parameter "insert_actions"
defines SQL statement insert
to insert a row into table test
. As the order of columns in insert
statement differs from the order of fields in the CSV-file (see Test Files to Load), the "insert_data"
parameter defines a Python lambda-function which rearranges the fields in the order required by the SQL statement.
If you specify columns in the insert
statement in the order of fields in CSV-file, then "insert_data"
parameter may be omitted:
"csv_test_test": {
"file": "test.csv",
insert_statement": """
insert into test (name, alpha2, alpha3, code)
values (%s, %s, %s, %s)
"""
}
In this case, it is important that each of the four fields of a CSV file line has a corresponding column in the table test
. Whereas, the "insert_data"
lambda-function allows choosing (and rearrange) for insertion any subset of the fields.
Even when loading data in a user-defined table specified in the "insert_actions"
, the dput
utility registers the load in the ida
table with status 0 - waiting for being processed.
If SQL statements in the lists "validate_actions"
and "process_actions"
set the load status in the column ida.istat
to 2 - error, and put an error message in the column ida.imess
, then "dput"
writes that error message in the log file. Otherwise, the load status is set to 1 - processing succeeded, and a success message is written in the log file.
Each line of test data file test_nested_00.csv
has two fields:
- Floor number,
- List of stores on the floor.
Here is the file content:
1;ABC,Bonus,Cosmos,Domus,Eidos
2;Focus,Iris
3;Lotus
4;
The second field is actually a nested data structure. This list should be unnested, or unpacked, in order to load into a database table rows without any nested structures. To achieve this, the "insert_data"
parameter of spec "nested_00_ida"
defines the function that gets a tuple as input and returns a list of tuples each representing a row to load:
"nested_00_ida": {
"file": "test_nested_00.csv",
"insert_data": \
lambda row: [
(row[0], n) for n in row[1].split(',')
] if row[1] else []
}
Here is the result of loading data into table ida_lines
with spec nested_00_ida
:
|iload|iline|ntable|nline|c1 |c2 |
|-----|-----|------|-----|---|------|
|42 |1 |0 |1 |1 |ABC |
|42 |1 |0 |2 |1 |Bonus |
|42 |1 |0 |3 |1 |Cosmos|
|42 |1 |0 |4 |1 |Domus |
|42 |1 |0 |5 |1 |Eidos |
|42 |2 |0 |1 |2 |Focus |
|42 |2 |0 |2 |2 |Iris |
|42 |3 |0 |1 |3 |Lotus |
Notice that
- column
iline
contains line number from the input file, - column
ntable
contains number of unpacked nested structure (or "nested table"), - column
nline
contains element number of unpacked structure (or row number of "nested table").
Files of JSON format allow representing nested data structures that map onto more than one table in a relational database.
For example, test data file test_nested_01.json
contains seven JSON objects with two fields:
"region"
– name of a world region or a continent,"countries"
– array of JSON objects representing countries of a region.
Here are the first two JSON objects in the file:
{
"region": "Nowhere",
"countries": []
},
{
"region": "Antarctica",
"countries": [
{"name": "Antarctica", "alpha2": "AQ", "alpha3": "ATA", "code": "010"},
{"name": "Bouvet Island", "alpha2": "BV", "alpha3": "BVT", "code": "074"}
]
},
{
"region": "Africa",
"countries": [
{"name": "Comoros", "alpha2": "KM", "alpha3": "COM", "code": "174"},
{"name": "Djibouti", "alpha2": "DJ", "alpha3": "DJI", "code": "262"},
...
The data in the files naturally maps onto two tables:
- (parent) table of regions, and
- (child) table of countries referring to the table of regions.
In order to load data of a single line (JSON object) into multiple tables, you need to provide for each table an insert
statement and values to insert:
"nested_01_test": {
"file": "test_nested_01.json",
"insert_actions": [
"insert into test_region (region, contains) values (?, ?)",
"insert into test_countries (region, code, name) values (?, ?, ?)"
],
"insert_data": [
lambda row: (row['region'], len(row['countries'])),
lambda row: [
(row['region'], n['code'], n['name']) \
for n in row['countries']
] if row['countries'] else []
]
}
Here, spec parameters "insert_actions"
and "insert_data"
are a list of strings and a list of functions, respectfully. Each insert
statement in the "insert_actions"
list has a corresponding function in the "insert_data"
list.
Notice that the first function returns a tuple representing a row for the (parent) table of regions. While the second function returns a list of tuples representing a set of rows for the (child) table of countries.
Loading data of a single input line into multiple tables may be useful not only for JSON files. Files csv
and xlsx
as well may contain in a single line data that maps onto more than one table. Data in such files might be called denormalized – from the relational point of view.
In order to load nested structures into default interface table ida_lines
you should assign spec parameter "insert_data"
a list of functions, one for each table: a parent one and child ones.
Let's load the above described file test_nested_01.json
into table ida_lines
:
"nested_01_ida": {
"file": "test_nested_01.json",
"insert_data": [
lambda row: (row['region'], len(row['countries'])),
lambda row: [
(n['code'], n['name'], n['alpha2'], n['alpha3']) \
for n in row['countries']
] if row['countries'] else []
]
}
The first function in the list produces a row for parent table, the second – a set of rows for "nested table" №1, the third (if it were defined) – a set of rows for "nested table" №2, etc.
Here are the first 10 rows inserted into table ida_lines
with spec nested_01_ida
:
|iload|iline|ntable|nline|c1 |c2 |c3 |c4 |
|-----|-----|------|-----|----------|-----------------|---|---|
|41 |1 |-1 |-1 |Nowhere |0 | | |
|41 |2 |-1 |-1 |Antarctica|2 | | |
|41 |2 |1 |1 |010 |Antarctica |AQ |ATA|
|41 |2 |1 |2 |074 |Bouvet Island |BV |BVT|
|41 |3 |-1 |-1 |Africa |59 | | |
|41 |3 |1 |1 |174 |Comoros |KM |COM|
|41 |3 |1 |2 |262 |Djibouti |DJ |DJI|
|41 |3 |1 |3 |226 |Equatorial Guinea|GQ |GNQ|
|41 |3 |1 |4 |232 |Eritrea |ER |ERI|
|41 |3 |1 |5 |324 |Guinea |GN |GIN|
Notice that
- column
iline
contains line number from the input file, - column
ntable
contains number of unpacked nested structure ("nested table"), or-1
for the parent table, - column
nline
contains element number of unpacked structure (row number of "nested table"), or-1
for the parent table.
Sometimes you need to load in a database data that is represented in text files of a special format, for example,
- a file, where a few lines at the beginning describe data that follows, or
- a file, where lines have fields of fixed length without any delimiters.
Spec parameter "pass_lines": True
tells dput
to pass lines from input text file "as is" to function specified in spec parameter "insert_data"
. Lines are passed as Python strings without any attempted splitting into fields.
Then, the "insert_data"
function parses the string and returns a list or a tuple of fields' values to load into a database table.
Test data file test_special.dat
contains data on world countries in lines of the following format:
- 2-letter country code – positions from 1 to 2,
- 3-letter country code – positions from 3 to 5,
- 3-digit country code – positions from 6 to 8,
- country name – positions from 9 to the end of line.
Here are the first five and the last five lines from the file:
AFAFG004Afghanistan
AXALA248Aland Islands
ALALB008Albania
DZDZA012Algeria
ASASM016American Samoa
...
WFWLF876Wallis and Futuna Islands
EHESH732Western Sahara
YEYEM887Yemen
ZMZMB894Zambia
ZWZWE716Zimbabwe
Spec "special_03_ida"
allows loading into table ida_lines
four separate fields extracted from input lines with function special_03_data
(it as well might be a lambda-function):
...
def special_03_ida(line):
"""
line - line content, e.g. AFAFG004Afghanistan
"""
return (line[0:2], line[2:5], line[5:8], line[8:].strip())
specs = {
...
"special_03_ida": {
"file": "test_special.dat",
"pass_lines": True,
"insert_data": special_03_ida
},
...
}
Spec parameter "insert_data"
may set functions of three kinds:
Function kind | Description |
---|---|
<func>(row) |
row - list or tuple of fields' values, or line from input file as is if "pass_lines": True |
<func>(line_no, row) |
line_no - number of line from a file, row - list or tuple of fields' values, or line from input file as is if "pass_lines": True |
<func>(iload, iline, row) |
iload - load ID, iline - number of loaded line, row - list or tuple of fields' values, or line from input file as is if "pass_lines": True |
When loading data from a single file, the number of file line line_no
and the number of loaded line iline
are the same.
However, when loading data from a series of files, the number of file line line_no
resets with each new file, while the number of loaded line iline
keeps incrementing through all lines of all files of a series.
Functions <func>(line_no, row)
allow processing text files where first few lines contain metadata describing data that follows. See spec "special_01_ida"
in test config files.
Functions <func>(iload, iline, row)
allow loading data into multiple logically linked tables (i.e. a parent and child ones) by generating unique values for a parent table primary key and foreign key columns of child tables. Unique values are generated, combining iload
and iline
values. See spec "nested_01_keygen"
in test config files.
$ ./dput.py -h
usage: dput.py [-h] [-a ARG] [-d] [-f] [-t] [-u USER] [-v] cfg_file spec [in_file]
Load data from file into DB, as specified in cfg-file spec.
positional arguments:
cfg_file cfg-file name
spec spec name
in_file input file name
options:
-h, --help show this help message and exit
-a ARG, --arg ARG pass one or more arguments to SQL query
-d, --delete delete loaded data file(s)
-f, --force load data file(s) unconditionally
-t, --trace enable tracing
-u USER, --user USER set data loader username
-v, --version show program's version number and exit
Thanks for using dput.py!
The mandatory arguments are config file name cfg_file
and spec name spec
.
To execute all specs from the config file use keyword all
instead of spec name.
If in_file
is provided then the data will be loaded from the specified file instead of the one defined with spec parameter "file"
. The input data format is determined by file name extension – the same as with spec parameter "file"
.
Option -a
or --arg
allows loading 1 to 9 arguments in the row of table ida
which keeps track of all loads done with dput
. The loaded arguments are available to the SQL statements defined in spec parameters "validate_actions"
and "process_actions"
. To make it possible to load arguments in table ida
you have to define a list of default values for the arguments with spec parameter "args"
.
Option -d
or --delete
instructs the dput
utility to delete input file(s) after executing the spec. An input file will not be deleted if errors occur while executing the spec.
Option -f
or --force
instructs the dput
utility to load data from files even in case when data from these files had been already loaded. The dput
utility registers paths and modification times of loaded files in a special file ~/.<cfg-file>.json
and by default only loads file(s) that had not been loaded earlier.
Option -u
or --user
allows setting username that will be loaded in the row of table ida
.
Option -t
or --trace
instructs the dput
to create a trace file in directory ~/.dbang
. This is an empty file named dput#<spec>#<user>#<timestamp>#<status>
, where <status>
is either 0 - running, or 1 - succeeded, or 2 - failed.
Config file parameters are variables with names in uppercase that define context for executing specs from that config file. See also Config Files Structure.
The dput
config file parameters are described below.
Parameter | Default Value | Description |
---|---|---|
DEBUGGING |
False |
Debugging mode? |
LOGGING |
= DEBUGGING | Write to log file? |
LOG_DIR |
./ |
Path to the directory with log files. |
IN_DIR |
./ |
Path to the directory with input files to load. |
ENCODING * |
locale.getpreferredencoding() |
Input file(s) encoding. |
CSV_DIALECT * |
excel |
CSV dialect as defined in Python module csv . |
CSV_DELIMITER * |
csv.get_dialect(CSV_DIALECT).delimiter |
CSV fields delimiter. |
PRESERVE_N_LOADS |
10 |
Number of loads per spec to preserve in table ida . |
PRESERVE_N_TRACES |
10 |
Number of trace files per spec to preserve. |
SOURCE * |
Name of a data source defined in sources.py . |
|
* config file parameter marked with asterisk may be overridden at spec level with a corresponding spec parameter. |
Specs are found in a config file in the specs
dictionary and contain spec parameters. See also Config Files Structure.
Spec parameters for dput
utility are described below. If not explicitly described as mandatory, a spec parameter is optional and may be omitted.
Spec Parameter | Description |
---|---|
"tags" |
List of tags attached to the spec. |
"source" |
Name of a data source defined in sources.py . This parameter overrides config file parameter SOURCE . |
"doc" |
Short description/comment on the spec. |
"setup" |
List of SQL statements to be executed at a spec startup. |
"upset" |
List of SQL statements to be executed at a spec completion. |
"force" |
Load data from files unconditionally. |
"file" |
MANDATORY name of the input file(s) to load. The file name extension determines the input data format. Use glob-pattern to set names for a series of files. |
"args" |
List of default values for the arguments to be loaded into table ida . |
"encoding" |
Input file encoding. At spec level this parameter overrides config file parameter ENCODING . |
"csv_dialect" |
CSV dialect as defined in Python module csv . At spec level this parameter overrides config file parameter CSV_DIALECT . |
"csv_delimiter" |
CSV fields delimiter. At spec level this parameter overrides config file parameter CSV_DELIMITER . |
"skip_lines" |
Number of lines (int ) to skip at the beginning of input files. |
"insert_data" |
List of Python functions to transform a line from input file (passed as a singlestr or as a list of fields) to a list of values for columns of a database table. |
"pass_lines" |
Pass lines from input files to Python functions specified in "insert_data" as plain strings (str )? |
"insert_actions" |
List of SQL statements or stored procedures calls that insert data into one or more database table(s). |
"validate_actions" |
List of SQL statements or stored procedures calls that check data loaded into intermediate database table(s) for correctness. |
"process_actions" |
List of SQL statements or stored procedures calls that moves or copies loaded data from intermediate table(s) to target table(s). |