Setting Up Sources¶
Each source on the dataflow is represented as a source object. You can have any number of sources in the dataflow, and they can feed into zero or more destinations.
The following source types are supported by the dataflow engine:
Flat File Sources:
Tree File Sources:
Database Sources:
- Data Model
- Database Table
- SQL Query
All sources can be added to the dataflow by picking a source type on the Toolbox and dropping it on the dataflow. File sources can also be added by dragging-and-dropping a file from an Explorer window. Database sources can be drag-and-dropped from the Data Source Browser. For more details on adding sources to the dataflow, see Introducing Dataflows.
Flat File Sources¶
Delimited File¶
Adding a Delimited File Source object allows you to transfer data from a delimited file. An example of what a delimited file source object looks like is shown below.
To configure the properties of a Delimited File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Fixed-Length File¶
Adding a Fixed-Length File Source object allows you to transfer data from a fixed-length file. An example of what a Fixed-Length File Source object looks like is shown below.
To configure the properties of a Fixed-Length File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Excel File¶
Adding an Excel Workbook Source object allows you to transfer data from an Excel file. An example of what an Excel Workbook Source object looks like is shown below.
To configure the properties of an Excel Workbook Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
Tree File Sources¶
COBOL File¶
Adding a COBOL File Source object allows you to transfer data from a COBOL file. An example of what a COBOL File Source object looks like is shown below.
To configure the properties of a COBOL File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu.
XML/JSON File¶
Adding an XML/JSON File Source object allows you to transfer data from an XML file. An example of what an XML/JSON File Source object looks like is shown below.
To configure the properties of an XML/JSON File Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
General Properties window:
File Path – Specifies the location of the source XML file. Using UNC paths is recommended if running the dataflow on a server.
Note: To open the source file for editing in a new tab, click icon next to the File Path input, and select Edit File.
Schema File Path – Specifies the location of the XSD file controlling the layout of the XML source file.
Note: Centerprise can generate a schema based on the content of the source XML file. The data types will be assigned based on the source file’s content.
To generate the schema, click icon next to the Schema File Path input, and select Generate.
To edit an existing schema, click icon next to the Schema File Path input, and select Edit File. The schema will open for editing in a new tab.
Optional Record Filter Expression – Allows you to enter an expression to selectively filter incoming records according to your criteria. You can use the Expression Builder to help you create your filter expression. For more information on using Expression Builder, see Expression Builder.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Database Sources¶
Database Table¶
Adding a Database Table Source object allows you to transfer data from a database table. An example of what a Database Table Source object looks like is shown below.
To configure the properties of a Database Table Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your source, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected source.
Pick Source Table window:
Select a source table using the Pick Table dropdown.
- Select Full Load if you want to read the entire table.
- Select Incremental Load Based on Audit Fields to perform an incremental read starting at a record where the previous read left off.
Incremental load based on Audit Fields is based around the concept of Change Data Capture (CDC), which is a set of reading and writing patterns designed to optimize large-scale data transfers by minimizing database writing in order to improve performance. CDC is implemented in Centerprise using Audit Fields pattern. The Audit Fields pattern uses create time or last update time to determine the records that have been inserted or updated since the last transfer and transfers only those records.
Advantages
- Most efficient of CDC patterns. Only records that were modified since the last transfer are retrieved by the query thereby putting little stress on the source database and network bandwidth
Disadvantages
- Requires update date time and/or create date time fields to be present and correctly populated
- Does not capture deletes
- Requires index on the audit field(s) for efficient performance
To use the Audit Fields strategy, select the Audit Field and an optional Alternate Audit Field from the appropriate dropdown menus. Also, specify the path to the file that will store incremental transfer information.
Where Clause window:
You can enter an optional SQL expression serving as a filter for the incoming records. The expression should start with the WHERE word followed by the filter you wish to apply.
For example, WHERE CreatedDtTm >= ‘2001/01/05’
General Options window:
The Comments input allows you to enter comments associated with this object.
SQL Query¶
Adding a SQL Query Source object allows you to transfer data returned by the SQL query. An example of what an SQL Query Source object looks like is shown below.
To configure the properties of a SQL Query Source object after it is added to the dataflow, right-click on its header and select Properties from the context menu. The following properties are available:
Source Connection window – Allows you to enter the connection information for your SQL Query, such as Server Name, Database, and Schema, as well as credentials for connecting to the selected database.
SQL Query Source window:
Enter the SQL expression controlling which records should be returned by this source. The expression should follow SQL syntax conventions for the chosen database provider.
For example, select OrderId, OrderName, CreatedDtTm from Orders.
Source/Destination File Options¶
Source or Destination is a Delimited File
If your source or destination is a Delimited File, you can set the following properties
- First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
- Field Delimiter - Allows you to select the delimiter for the fields. The available choices are
, and . You can also type the delimiter of your choice instead of choosing the available options. - Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination
, carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary. - Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
- Quote Char - Allows you to select the type of quote character to be used in the delimited file. This quote character tells the system to overlook any special characters inside the specified quotation marks. The options available are ” and ’.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Microsoft Excel Worksheet
If the Source and/or the Destination chosen is a Microsoft Excel Worksheet, you can set the following properties:
- First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
- Worksheet - Allows you to select a specific worksheet from the selected Microsoft Excel file.
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Source or Destination is a Fixed Length File
If the Source and/or the Destination chosen is a Fixed Length File, you can set the following properties:
- First Row Contains Header - Check this option if you want the first row of your file to display the column headers. In the case of Source file, this indicates if the source contains headers.
- Record Delimiter - Allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination
, carriage-return and line-feed . You can also type the record delimiter of your choice instead of choosing the available options. For more information on Record Delimiters, please refer to the Glossary. - Encoding - Allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
You can also use the Build fields from an existing file feature to help you build a destination fields based on an existing file instead of manually typing the layout.
Using the Length Markers window, you can create the layout of your fixed-length file, The Length Markers window has a ruled marker placed at the top of the window. To insert a field length marker, you can click in the window at a particular point. For example, if you want to set the length of a field to contain five characters and the field starts at five, then you need to click at the marker position nine.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the RecordLength box is used to specify the character length for a single record.
You can delete a field length marker by clicking the marker.
Source or Destination is an XML file
If the source is an XML file, you can set the following options:
- Source File Path specifies the file path of the source XML file.
- Schema File Path specifies the file path of the XML schema (XSD file) that applies to the selected source XML file.
Note: Centerprise makes it possible to generate an XSD file from the layout of the selected source XML file. This feature is useful when you don’t have the XSD file available. Note that all fields are assigned the data type of String in the generated schema. To use this feature, expand thecontrol and select Generate.
- Record Filter Expression allows you to optionally specify an expression used as a filter for incoming source records from the selected source XML file. The filter can refer to a field or fields inside any node inside the XML hierarchy.
The following options are available for destination XML files.
- Destination File Path specifies the file path of the destination XML file.
- Encoding - Allows you to choose the encoding scheme for the XML file from a list of choices. The default value is Unicode (UTF-8).
- Format XML Output instructs Centerprise to add line breaks to the destination XML file for improved readability.
- Read From Schema File specifies the file path of the XML schema (XSD file) that will be used to generate the destination XML file.
- Root Element specifies the root element from the list of the available elements in the selected schema file.
- Generate Destination XML Schema Based on Source Layout creates the destination XML layout to mirror the layout of the source.
- Root Element specifies the name of the root element for the destination XML file.
- Generate Fields as XML Attributes specifies that fields will be written as XML attributes (as opposed to XML elements) in the destination XML file.
- Record Node specifies the name of the node that will contain each record transferred.
Note: To ensure that your dataflow is runnable on a remote server, please avoid using local paths for the source. Using UNC paths is recommended.
Advanced Flat-File Reading Options¶
When importing from a fixed-width, delimited, or Excel file, you can specify the following advanced reading options:
Header Spans x Rows - If your source file has a header that spans more than 1 row, select the number of rows for the header using this control.
Skip Initial Records - Sets the number of records which you want skipped at the beginning of the file. This option can be set whether or not your source file has a header. If your source file has a header, the first record after the specified number of rows to skip will be used as the header row.
Raw Text Filter - Only records starting with the filter string will be imported. The rest of the records will be filtered.
You can optionally use regular expressions to specify your filter. For example, the regular expression ^[12][4] will only include records starting with 1 or 2, and whose second character is 4.
Note: Centerprise supports Regular Expressions implemented with the Microsoft .NET Framework and uses the Microsoft version of named captures for regular expressions.
Raw Text Filter setting is not available for Excel source files.
Managing Differences between Source Layout and Source File¶
If your source is a fixed-length file, delimited file, or Excel spreadsheet, it may contain an optional header row. A header row is the first record in the file that specifies field names and, in the case of a fixed-length file, the positioning of fields in the record.
If your source file has a header row, you can specify how you want the system to handle the differences between your actual source file, and the source layout specified in the setting. Differences may arise due to the fact that the source file has a different field order from what is specified in the source layout, or it may have extra fields compared to the source layout. Conversely, the source file may have fewer fields than what is defined in the source layout, and the field names may also differ, or may have changed since the time the layout was created.
By selecting from the available options, you can have Centerprise handle those differences exactly as required by your situation. These options are described in more detail below:
Enforce exact header match – Lets Centerprise proceed with the transfer only if the source file’s layout matches the source layout defined in the setting exactly. This includes checking for the same number and order of fields and field names.
Columns order in file may be different from the layout – Lets Centerprise ignore the sequence of fields in the source file, and match them to the source layout using the field names.
Column headers in file may be different from the layout – This mode is used by default whenever the source file does not have a header row. You can also enable it manually if you want to match the first field in the layout with the first field in the source file, the second field in the layout with the second field in the source file, and so on. This option will match the fields using their order as described above even if the field names are not matched successfully. We recommend that you use this mode only if you are sure that the source file has the same field sequence as what is defined in the source layout.
Creating Field Layout¶
The Field Layout window is available in the properties of most objects on the dataflow to help you specify the fields making up the object. The table below explains the attributes you can set in the Field Layout window.
Attribute | Description |
---|---|
Name | The system pre-fills this item for you based on the field header. Field names do not allow spaces. Field names are used to refer to the fields in the Expression Builder or tools where a field is used in a calculation formula. |
Header | Represents the field name specified in the header row of the file. Field headers may contain spaces. |
Data Type | Specifies the data type of a field, such as Integer, Real, String, Date, or Boolean. |
Format | Specifies the format of the values stored in that field, depending on the field’s data type. For example, for dates you can choose between DD-MM-YY, YYYY-MM-DD, or other available formats. |
Start Position | Specifies the position of the field’s first character relative to the beginning of the record. Note: This option is only available for fixed length layout type. |
Length | Specifies the maximum number of characters allotted for a value in the field. The actual value may be shorter than what is allowed by the Length attribute. Note: This option is only available for fixed length and database layout types. |
Column Name | Specifies the column name of the database table. Note: This option is only available in database layout. |
DB Type | Specifies the database specific data type that the system assigns to the field based on the field's data. Each database (Oracle, SQL, Sybase, etc) has its own DB types. For example, Long is only available in Oracle for data type string. Note: This option is only available in database layout. |
Decimal Places | Specifies the number of decimal places for a data type specified as real. Note: This option is only available in database layout. |
Allows Null | Controls whether the field allows blank or NULL values in it. |
Default Value | Specifies the value that is assigned to the field in any one of the following cases:- The source field does not have a value - The field is not found in the source layout- The destination field is not mapped to a source field. Note: This option is only available in destination layout. |
Sequence | Represents the column order in the source file. You can change the column order of the data being imported by simply changing the number in the sequence field. The other fields in the layout will then be reordered accordingly. |
Description | Contains information about the field to help you remember its purpose. |
Alignment | Specifies the positioning of the field’s value relative to the start position of the field. Available alignment modes are LEFT, CENTER, and RIGHT. Note: This option is only available for fixed length layout type. |
Primary Key | Denotes the primary key field (or part of a composite primary key) for the table. Note: This option is only available in database layout. |
System Generated | Indicates that the field will be automatically assigned an increasing Integer number during the transfer. Note: This option is only available in database layout. |
The table below provides a list of all the attributes available for a particular layout type.
Layout Type | Attributes Available |
---|---|
Source Delimited file and Excel worksheet | Name, Header, Data type, Format |
Source Fixed Length file | Name, Header, Data type, Format, Start position, Length |
Source Database Table and SQL query | Column name, Name, Data type, DB type, Length, Decimal places, Allows null |
Destination Delimited file and Excel worksheet | Name, Header, Data type, Format, Allows null, Default value |
Destination Fixed Length file | Sequence, Name, Header, Description, Data type, Format, Start position, Length, Allows null, Default value, Alignment |
Destination Database Table | Column name, Name, Data type, DB type, Length, Decimal places, Allows null, Primary key, System generated |
Using Data Formats¶
Centerprise supports a variety of formats for each data type. For example, for Dates, you can specify the date as “April 12” or “12-Apr-08”. Data Formats can be configured independently for source and for destination, giving you the flexibility to correctly read source data and change its format as it is transferred to destination.
If you are transferring from a flat file (for example, Delimited or Fixed-Width), you can specify the format of a field so that the system can correctly read the data from that field.
If you do not specify a data format, the system will try to guess the correct format for the field. For example, Centerprise is able to correctly interpret any of the following as a Date:
April 12
12-Apr-08
04-12-2008
Saturday, 12 April 2008
and so on
Centerprise comes with a variety of pre-configured formats for each supported data type. These formats are listed in the Sample Formats section below. You can also create and save your own data formats.
To open the Data Formats window, click icon located in the Toolbar at the top of the designer.
To select a data format for a source field, go to Source Fields and expand the Format dropdown menu next to the appropriate field.
Sample Formats
Dates:
Format | Sample Value |
---|---|
dd-MMM-yyyy | 12-Apr-2008 |
yyyy-MM-dd | 2008-04-12 |
dd-MM-yy | 12-04-08 |
MM-dd-yyyy | 04-12-2008 |
MM/dd/yyyy | 04/12/2008 |
MM/dd/yy | 04/12/08 |
dd-MMM-yy | 12-Apr-08 |
M | April 12 |
D | 12 April 2008 |
mm-dd-yyyy hh:mm:ss tt | 04-12-2008 11:04:53 PM |
M/d/yyyy hh:mm:ss tt | 4/12/2008 11:04:53 PM |
Booleans:
Format | Sample Value |
---|---|
Y/N | Y/N |
1/0 | 1/0 |
T/F | T/F |
True/False | True/False |
Integers:
Format | Sample Value |
---|---|
###### | 123456 |
#### | 1234 |
####;0;(####) | -1234 |
.##%;0;(.##%) | 123456789000% |
.##%;(.##%) | 1234567800% |
$###,###,###,### | $1,234,567,890,000 |
$###,###,###,##0 | $1,234,567,890,000 |
###,### | 123450 |
#,# | 1,000 |
##.00 | 35 |
Real Numbers:
Format | Sample Value |
---|---|
###,###.## | 12,345.67 |
##.## | 12.34 |
$###,###,###,### | $1,234,567,890,000 |
$###,###,###,##0 | $1,234,567,890,000 |
.##%;(.##%); | .1234567800% |
.##%;0;(.##%) | .12345678900% |
Numeric Format Specifiers:
Format specifier | Name | Description |
---|---|---|
0 | Zero placeholder | If the value being formatted has a digit in the position where the '0' appears in the format string, then that digit is copied to the result string; otherwise, a '0' appears in the result string. The position of the leftmost '0' before the decimal point and the rightmost '0' after the decimal point determines the range of digits that are always present in the result string. The "00" specifier causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "00" would result in the value 35. |
# | Digit placeholder | If the value being formatted has a digit in the position where the '#' appears in the format string, then that digit is copied to the result string. Otherwise, nothing is stored in that position in the result string. Note that this specifier never displays the '0' character if it is not a significant digit, even if '0' is the only digit in the string. It will display the '0' character if it is a significant digit in the number being displayed. The "##" format string causes the value to be rounded to the nearest digit preceding the decimal, where rounding away from zero is always used. For example, formatting 34.5 with "##" would result in the value 35. |
. | Decimal Point | The first '.' character in the format string determines the location of the decimal separator in the formatted value; any additional '.' characters are ignored. |
, | Thousand separator and number scaling | The ',' character serves as both a thousand separator specifier and a number scaling specifier. Thousand separator specifier: If one or more ',' characters is specified between two digit placeholders (0 or #) that format the integral digits of a number, a group separator character is inserted between each number group in the integral part of the output. Number scaling specifier: If one or more ',' characters is specified immediately to the left of the explicit or implicit decimal point, the number to be formatted is divided by 1000 each time a number scaling specifier occurs. For example, if the string "0,," is used to format the number 100 million, the output is "100". |
% | Percentage placeholder | The presence of a '%' character in a format string causes a number to be multiplied by 100 before it is formatted. The appropriate symbol is inserted in the number itself at the location where the '%' appears in the format string. |
E0E+0E-0e0e+0e-0 | Scientific notation | If any of the strings "E", "E+", "E-", "e", "e+", or "e-" are present in the format string and are followed immediately by at least one '0' character, then the number is formatted using scientific notation with an 'E' or 'e' inserted between the number and the exponent. The number of '0' characters following the scientific notation indicator determines the minimum number of digits to output for the exponent. The "E+" and "e+" formats indicate that a sign character (plus or minus) should always precede the exponent. The "E", "E-", "e", or "e-" formats indicate that a sign character should only precede negative exponents. |
'ABC'"ABC" | Literal string | Characters enclosed in single or double quotes are copied to the result string, and do not affect formatting. |
; | Section separator | The ';' character is used to separate sections for positive, negative, and zero numbers in the format string. If there are two sections in the custom format string, the leftmost section defines the formatting of positive and zero numbers, while the rightmost section defines the formatting of negative numbers. If there are three sections, the leftmost section defines the formatting of positive numbers, the middle section defines the formatting of zero numbers, and the rightmost section defines the formatting of negative numbers. |
Other | All other characters | Any other character is copied to the result string, and does not affect formatting. |