Add a File Definition

Firstly, locate the file definitions page from dropdown menu at the top of the application.

file definitions link

The following steps require you to have Administrator, Project Owner or Project Contributor permissions.

Click on Add File Definition at the top-right of this page to add a file definition. You can then provide the fields required for your file definitions on this form.

If you don’t completely understand the purpose of file definitions, the File Definitions Concepts page provides an overview of what file definitions do and how they are configured.

File Definition Fields

Name

Provide a name for the file definition.

Each file definition requires a name, this acts as a unique label that can easily identify what the file definition is used for.

Type

Choose a file definition type from the dropdown.

File definitions require a type to state what the definition is being used for.

Loome Integrate currently supports 3 different types of File Definitions.

  • File System
  • Azure Blob Folder
  • HDFS/DBFS

To learn more about these types, visit the File Definitions Concepts page.

Path

This is the path to the folder which will be used for the file definition. The format of this depends on the type of file definition.

Formatting File System Paths

File Systems require the full path to the folder that will contain the files you’re working with. Note that Loome Integrate supports both Linux and Windows style paths (eg: both C:\CSVData and /CSVData are the same).

Loome Integrate accepts the same paths you would use in a program like File Explorer so copying the path from the address bar is an easy way to get the path. image 2

Formatting Azure Blob Paths

The path for an Azure Blob file definition is just the name of the folder you wish to work in. For example if you had a container that had a folder called Data than your path for the definition would just be Data.

Formatting HDFS/DBFS Paths

As is the case wth File System paths, this value is just the full path to the folder you wish to use within the HDFS/DBFS.

Selecting Path using the File Browser

There is an optional way of selecting paths using the File Browser.

This will open the file browser after you select the connection and the agent associated with the file system. You can navigate through the browser at the folder level and select the folder containing the file/s.

Selecting path using File Browser is recommended as it avoids any possible mistakes while entering path manually. However, this option is currently available only to browse git repositories.

file browser button

Browse to your Git Repository using the File Browser

To browse through the git repository, select the file type Script Directory and then click on the File Browser button highlighted in the above image. This will open up a modal prompting you to select the agent and the git repository connection.

file browser connection selection After selecting the agent and the connection, clicking on next button will load a File Browser. It might take a while to load all the contents from the git repository. You can then navigate within the browser to select any folder. file browser The file Browser is displaying all the folders in the repository. You can navigate to any folder by clicking on the folder name. On clicking the folder name will take you to that folder and displays all the files/folders within that folder. file browser navigation The selected path will be in the path section at the top of the File Browser. Review the path that you have selected and click on Submit. Your selected path from the File Browser is displayed in the Path section. path selection

File Format

The file format field determines how Loome Integrate reads and writes to files.

In most cases, you would use the “Delimited” format as standard flat file types.

Format Descriptions

File Format Description
Delimited The files being processed are to be delimited using a human readable character or set of characters.
Hex Delimited The files being processed use a hexadecimal based delimiter.

Delimiter

This is the character that is used to split cells. In most cases this will be a single character like a comma (.csv) however if you need to use whitespace based characters like a tab (.tsv) you can use the delimiter dropdown.

delimiter dropdown

Encoding

This is the file encoding that is used for reading and writing to the files associated with this file definition.

If you are unsure about what to use, select “UTF-8” as it supports the widest range of characters and languages.

Extension

This is the file extension to save and retrieve files with. Common examples for this include csv, dat and txt but it ultimately depends on your requirements.

If you are using this File Definition as a migration target and are unsure as to what extension to use, it is recommended to use csv as you can easily view the contents of the file with Microsoft Excel.

Parquet Support

Loome Integrate Online supports Apache Parquet as a migration target out of the box. This means that if a target connection utilizes a file definition with the file format “Parquet” Loome Integrate will automatically output the data to a parquet file rather than a flat file.

Datetime Offset in Parquet

When datetime columns are exported to Parquet, they are converted into the timestamp format. The timestamp format is seconds since January 1, 1970 (midnight UTC/GMT), and as a result, it needs to be combined with a timezone in order to display a readable datetime value.

If you have datetimeoffset columns, these contain the timezone information required to correctly create the UTC timestamp in Parquet/Databricks without inferring any information.

If you have a datetime column that has no offset value, there is no way to determine which timezone it belongs in. Loome will use the timezone the agent is currently running in.

You can choose the way datetime columns are handled in Parquet file definitions.

You can leave these as the current default, or you can adjust this by adding a configuration setting to your connection string. You can use this configuration setting to choose the timezone of a datetime column in Parquet, or you can export datetime columns as string literals.

Default Datetime

When exporting into Parquet;

  • The ‘DateTimeOffset’ column type values are adjusted to a UTC date using the offset of the original datetimeoffset value.
  • The ‘DateTime’ column type values are adjusted to a UTC date by providing an offset from the timezone of the Loome Agent running the export.

As default, Loome will use the timezone that the agent is running in to create a UTC adjusted timestamp for the datetime offset value.

Currently, the defaults for different connection types when it is not specified in the connection string are:

Connection Default
Azure Data Lake Storage LOCAL
Google Bigquery LOCAL
Snowflake STRING
Adjust Datetime Data Type

You can change these defaults to suit your own preferences.

You can adjust this on the connections page by adding a configuration setting to your connection string.

For example, if you add UTC as your chosen offset value in your connection string, any UTC values will remain as UTC and will not be adjusted.

You can also choose to leave your column as is and treat it as a string value.

Add ParquetDateTimeDataType=STRING; to export it as a string literal of the datetime value.

To override the datetime handling when using Parquet in conjunction with a target connection (Azure Data Lake Storage, Google Bigquery, and Snowflake), add the following configuration setting.

You can specify:

Configuration Setting Description
ParquetDateTimeDataType=LOCAL; Exports inferring a datetimeoffset using the agents locale
ParquetDateTimeDataType=UTC; Exports inferring a datetimeoffset of UTC
ParquetDateTimeDataType=STRING; Exports a string literal of the datetime value

For example, AccountName=YOUR_ACCOUNT_NAME;FileSystem=YOUR_FILE_SYSTEM;AccountKey=YOUR_STORAGE_ACCOUNT_KEY;ParquetDateTimeDataType=UTC;.

If your files will need to have a header row (used for displaying the column names) then you should check header as Loome Integrate will factor this in with migrations to and from the file definition.

Projects

You can also choose whether this file definition will be available to all projects or only to selected projects.

If you choose selected projects, you can then choose from a list of all projects in this tenant. This file definition will only be available in these projects and will not be displayed when creating tasks in other projects.

Selected projects or all projects

You can then Submit this file definition.