Delimited File Source¶
Delimited files are one of the most commonly used data sources and are used in a variety of situations. Delimited File Source object in Astera Centerprise provides the functionality to read data from a delimited file. In this article, we will cover how to use a delimited file source object.
Getting Delimited File Source Object¶
In this section, we’ll cover how to get Delimited File Source object on the dataflow designer from the toolbox.
1. To get a Delimited File Source from the Toolbox, go to Toolbox > Sources > Delimited File Source. If you’re unable to see the toolbox, go to View > Toolbox or press Ctrl + Alt + X.
2. Drag-and-drop the Delimited File Source object onto the designer.
You can see that the dragged source object is empty right now. This is because we haven’t configured the object yet.
Configuring the Delimited File Source Object¶
1. To configure the Delimited File Source object, right-click on the header and select Properties from the context menu.
As soon as you’ve selected the Properties option from the context menu, a dialog box will open.
This is where you configure the properties for the Delimited File Source object.
2. The first step is to provide the File Path for the Delimited Source. By providing the file path you are building the connectivity to the source dataset.
Note: In this case we are going to be using a delimited file with sample Orders data. This file works with the following options:
File Contains Headers
Record Delimiter is specified as CR/LF:
3. The dialog box has some other configuration options:
If the source File Contains Header, and you want Centerprise to read headers from the source file, check the box.
If you want your file to be read in portions, for instance your file has data over 1000 rows, on selecting Partition File for Reading Centerprise will read your file according to the Partition Count that you specify. For instance, for a file with 1000 rows, you give the partition count to be 2, it will read your file in two partitions. This is a back-end process that makes data reading more efficient and helps in processing the data faster. This will not have any effect on your output.
Record Delimiter box allows you to select the delimiter for the records in the fields. The choices available are carriage-return line-feed combination <CR/LF>, carriage-return - CR and line-feed - LF. You can also type the record delimiter of your choice instead of choosing from the available options.
In case the records don’t have a delimiter and you rely on knowing the size of a record, the number in the Record Length box is used to specify the character length for a single record.
Encoding box allows you to choose the encoding scheme for the delimited file from a list of choices. The default value is Unicode (UTF-8)
A Text Qualifier is a symbol that identifies where text begins and ends. It is used specifically when importing data.
Say you need to import a text file that is comma delimited (commas separate the different fields that will be placed in adjacent cells).
To define hierarchical file layout and process the data file as a hierarchical file check the This is a Hierarchical File box. Centerprise IDE provides extensive user interface capabilities for processing hierarchical structures.
Use Null Text to specify a certain value that you don’t want in your data and want it to be replaced by a null value.
Check on Allow Record Delimiter Inside a Field Text when you have the record delimiter as text inside your data and want that to be read as it is.
Advanced File Options
- In the Header spans over option, give the number of rows that your header takes. Refer to this option when your header spans over multiple rows.
- Check on Enforce exact header match if you want the header to be read as it is.
- Check on Column order in file may be different from the layout, if the field order in your source layout is different from the field order in Centerprise layout.
- Check on Column headers in file may be different from the layout if you want to use alternate header values for your fields. The Layout Builder lets you specify alternate header values for the fields in the layout.
- Check the Use SmartMatch with Synonym Dictionary option when the header values vary in the source layout and Centerprise layout. You can create a Synonym Dictionary file to store the values for alternate headers. You can also use Synonym Dictionary file to facilitate automapping between objects on the flow diagram that use alternate names in field layouts.
To skip any unwanted rows at the beginning of your file, you can specify the number of records that you want to omit through the Skip initial records option.
Raw text filter
- If you don’t want to apply any filter and process all records, check No filter. Process all records.
- If there is a specific value which you want to filter out, you can check on Process if begins with and give the value that you want Centerprise to read from the data, in the box.
- If there is a specific expression which you want to filter out, you can check on Process if matches this regular expression and give the expression that you want Centerprise to read from the data, in the box.
String processing options come in use when you are reading data from a file system and writing it to a database destination.
- Check on Treat empty string as null value when you have empty cells in the source file and want those to be treated as null objects in the database destination that you are writing to, otherwise Centerprise will omit those accordingly in the output.
- Check on Trim strings when you want to omit any extra spaces in the field value.
4. Once you’ve specified the data reading options on this screen, click Next.
The next screen will show you a Layout Builder. On this screen, you can modify the layout of the delimited source file.
If you want to add a new field to your layout, go to the last row of your layout (Name column), which will be blank and double-click on it, and a blinking text cursor will appear. Type in the name of the field you want to add and select subsequent properties for it. A new field will be added to the source layout.
If you want to delete a field from your dataset, click on the serial column of the row that you want to delete. The selected row will be highlighted in blue.
Right-click on the highlighted line, a context menu will appear where you will have the option to Delete.
Selecting delete will delete the entire row.
The field is now deleted from the layout and won’t appear in the output.
Note: Modifying the layout (adding or deleting fields) from the layout builder screen in Centerprise will not make any changes to the actual source file. The layout is specific to Centerprise only.
5. After you’re done customizing the layout, click Next. You will be directed to a new screen - Config Parameters. Here, you can define parameters for the Delimited File Source object.
Parameters can provide easier deployment of flows by eliminating hardcoded values and provide an easier way of changing multiple configurations with a simple value change.
Note: Parameters left blank will use their default values assigned on the properties page.
6. Once you’ve configured the source object, click OK.
The DelimitedSource object is now configured according to the changes you made.
The Delimited Source object has now been modified from its previous configuration. The new object has all the modifications that were made in the builder.
In this case, the modifications that were made are:
- Added the CustomerName column.
- Deleted the ShipCountry column.
You have successfully configured your Delimited File Source object. The fields from the source object can now be mapped to other objects in the dataflow.