Delimited File Source¶
Delimited files are one of the most commonly used data sources and are used in a variety of situations. The Delimited File Source object in Astera Centerprise provides the functionality to read data from a delimited file.
In this article, we will cover how to use a Delimited File Source object.
Getting the Delimited File Source Object¶
1. To get a Delimited File Source object from the Toolbox, go to Toolbox > Sources > Delimited File Source. If you are unable to see the Toolbox, go to View > Toolbox or press Ctrl + Alt + X.
2. Drag-and-drop the Delimited File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we have not configured the object yet.
Configuring the Delimited File Source Object¶
1. To configure the Delimited File Source object, right-click on its header and select Properties from the context menu.
As soon as you have selected the Properties option from the context menu, a dialog box will open.
This is where you can configure the properties for the Delimited File Source object.
2. The first step is to provide the File Path for the delimited source file. By providing the file path, you are building the connectivity to the source dataset.
Note: In this case, we are going to be using a delimited file with sample Orders data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as CR/LF:
3. The dialog box has some other configuration options:
- If the source file contains headers, and you want Centerprise to read headers from the source file, check the File Contains Header option.
- If you want your file to be read in portions, upon selecting the Partition File for Reading option, Centerprise will read your file according to the specified Partition Count. For instance, if a file with 1000 rows has a Partition Count of 2 specified, the file will be read in two partitions of 500 each. This is a back-end process that makes data reading more efficient and helps in processing data faster. This will not have any effect on your output.
- The Record Delimiter field allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination <CR/LF>, carriage-return - CR and line-feed - LF. You can also type the record delimiter of your choice instead of choosing from the available options.
- In case the records do not have a delimiter and you rely on knowing the size of a record, the number in the Record Length field can be used to specify the character length for a single record.
- The Encoding field allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8).
- A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data. For example, if you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
- To define a hierarchical file layout and process the data file as a hierarchical file, check the This is a Hierarchical File option. Centerprise IDE provides extensive user interface capabilities for processing hierarchical structures.
- Use the Null Text option to specify a certain value that you do not want in your data, and instead want it to be replaced by a null value.
- Check the Allow Record Delimiter Inside a Field Text option when you have the record delimiter as text inside your data and want that to be read as it is.
Advanced File Options
- In the Header spans over field, specify the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
- Check the Enforce exact header match option if you want the header to be read as it is.
- Check the Column order in file may be different from the layout option, if the field order in your source layout is different from the field order in Centerprise’s layout.
- Check the Column headers in file may be different from the layout option if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
- Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Centerprise’s layout. You can create a Synonym Dictionary file to store values for alternate headers. You can also use the Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
- If you do not want to apply any filter and process all records, check No filter. Process all records.
- If there is a specific value which you want to filter out, you can check the Process if begins with option and give the value that you want Centerprise to read from the data, in the provided field.
- If there is a specific expression which you want to filter out, you can check the Process if matches this regular expression option and give the expression that you want Centerprise to read from the data, in the provided field.
String Processing options come in use when you are reading data from a file system and writing it to a database destination.
- Check the Treat empty string as null value option when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Centerprise will omit those accordingly in the output.
- Check the Trim strings option when you want to omit any extra spaces in the field value.
4. Once you have specified the data reading options on this window, click Next.
The next window is the Layout Builder. On this window, you can modify the layout of the delimited source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear where you will have the option to Delete.
Selecting this option will delete the entire row.
The field is now deleted from the layout and will not appear in the output.
Note: Modifying the layout (adding or deleting fields) from the Layout Builder in Centerprise will not make any changes to the actual source file. The layout is specific to Centerprise only.
5. After you are done customizing the layout, click Next. You will be directed to a new window, Config Parameters. Here, you can define parameters for the Delimited File Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
6. Once you have configured the source object, click OK.
The Delimited File Source object is now configured according to the changes made.
The Delimited File Source object has now been modified from its previous configuration. The new object has all the modifications that were made in the builder.
In this case, the modifications that were made are:
- Added the CustomerName column.
- Deleted the ShipCountry column.
You have successfully configured your Delimited File Source object. The fields from the source object can now be mapped to other objects in the dataflow.