Importing data from a text file

Table of contents


Buzzy supports importing data from a Text file using the CSV Import option inside the Data Source feature of the micro-app. Access the properties of the microapp on which the data is to be stored, then simply navigate to the advanced tab section and click on the Data Source tab selecting the CSV import toggle, then choose a .csv file to be imported by clicking on the “Upload CSV” button and selecting the file, if the upload is successful the file name will appear in the text box.

The import process uses JSONata, a lightweight query and transformation language for JSON data. More information for JSONata.

Configuration

Default Configuration

The CSV Import feature-set can import files with either a default configuration or a custom configuration. For the default configuration simply input an empty JSON object (as seen below).

Custom Configuration

The bare minimum required to get the JSON object working are,

  1. source Object: the source Object contains an array called “transforms” in which the jsonATA transform is contained. As a minimum, the jsonATA must contain the transform “$[].data.$[]” to supply an array of objects which is the format needed as input.

  2. fields array containing objects: a minimum of one object must be specified, a key and type identifier must be provided, the key correlates to the field name and the type identifies what type the field is. If the field is unique to each row, then a unique identifier is added, buzzy will also attempt to guess the unique field if no fields property is provided once the preview button is clicked.

Bare minimum custom input

{
   "source": {
    "description": "Optional Description of Config",
    "type": "csv",
    "transforms": [
      "$[].data.$[]"
    ]
  },
  "fields": [
    {
      "key": "EmployeeName",// Name Of A field unique to each row
      "type": "string",
      "unique": true
    }
  ]
}

By default, a comma is used as a delimiter. However in some cases, such as address data, a TAB delimited file is used. To specify the delimiter, modify the source by adding the object “delimiter”. For example, for a TAB file use "delimiter" : "\t".

Transforms - Example custom input with multiple fields

The transforms property provides a quick and easy way for users to manipulate their data, the transforms property accepts a value of type array with each index containing JSONata code, then Buzzy executes the transforms on each row of data, Note that the final output of the transformed data must be of type array of objects.

Fields

The fields array contains objects identifying what type each field is, for numbers (e.g. 13764852) and Booleans (e.g. true), buzzy will automatically detect the field type. If the value is not of these types, then the value is assumed to be a string.

{
  "fields": [
    {
      "key": "ID",
      "type": "number",
      "unique": true
    },
    {
      "key": "Joined",
      "type": "date"
    },
    {
      "key": "Role", //does not need to be added as its a string
      "type": "string"
    },
    {
      "key": "age", //does not need to be added as its a Number
      "type": "number"
    }
  ]
}

Field Conversion

Data types can be converted either implicitly or explicitly.

Implicit conversions are not visible to the user. The system automatically converts the data from one data type to another, this occurs for strings and numbers.

Explicit conversions are based on field types. Two important conversions are Location and Date. Other types of fields such as selection or checklist can be imported into existing fields.

Location

The location field takes an object as input. The object has a “type” attribute and “coordinates” attribute.

“Type”: “point” example

“Type”: “MultiPolygon” example

Sample Location Field Import

The below format allows for points to be displayed on a map whilst also having their information in text form displayed as part of the form data, although with the method below, the longitude and latitude must match the address.

{
  "source": {
    "transforms": [
      "$[].data.{'Department_Name': Department_Name , 'geometry': {'type': 'Point','coordinates': [Longitude,Latitude] ,'address': {'formattedAddress': Address, 'country': 'USA','geometry': {'type': 'Point','coordinates': [Longitude,Latitude]} }} }"
    ]
  },
  "fields": [
    {
      "key": "'Department_Name'",
      "type": "string",
      "unique": true
    },
    {
      "key": "geometry",
      "type": "GeoJSON"
    }
  ]
}

Selection and Checklist

Create the selection and toggle fields first manually - you just have to match the name of the field with your import transform, and it will just store the data in them without overwriting the field definition

Make sure the selection field has all the values in your input data, or use a transform formula to match the input up to your field definition options - e.g. Job: code = ‘1’ ? ‘IT’ : code =’2’ ? ‘HR’ : ‘FIN’ - would turn a number code into a text code for example

For boolean fields you need to convert the input data into a boolean - typically this would just mean testing for your true value whatever that is - e.g. Toggle:input=’true’, or Toggle:input=’on’, etc. - though for some cases, the $boolean() function could handle it - e.g. if it was 0 and 1, or empty and non-empty

Error Handling Strategies

An approach to error handling is to treat the building of an importer as a development task.

  1. Analyse the data - look for variations, inconsistencies, errors, etc.

  2. Create a cutdown test suite of data - pull out all the variations you’ve found and create a test Text File

  3. Use preview to iterate through building a filter/transform - so using the cut down test Text File, use that test and develop an appropriate transform that filters out bad rows, or fixes them in the transform, deals with valid alternatives, defaults, etc.

  4. Do some test imports - once the JSON is building as expected in the preview, test the import, might require some more tweaks here, iterate again

  5. Upload the full file and import all the records

And you’re now set to keep doing this with new exports, or if it’s a one off you’re done.

Example with missing Fields

Example CSV file with Null field values in geometry

id,address,lat,lon
1,1005 N. Glebe Road  Ste 610 ARLINGTON VA 22201-5758,38.883107,-77.115584,
2,6049 Falls Rd BALTIMORE MD 21209-2215,,,
3,1217 Buffalo Road CLAIRFIELD TN 37715-4331,36.586106,-83.953246,

Below is the initial transform which results in an error:

{
  "source": {
  "description": "Test Error Handling",
    "type": "csv",
    "delimiter": ",",
    "transforms": [
      "$[].data.{'id': id , 'geometry': {'type': 'Point','coordinates': [lon,lat] ,'address': {'formattedAddress': address, 'country': 'USA','geometry': {'type': 'Point','coordinates': [lon,lat]} }} }"
    ]
  },
  "fields": [
    {
      "key":"id",
      "type": "string",
      "unique": true
    },
    {
      "key": "formattedAddress",
      "type": "string"
    },
    {
      "key": "geometry",
      "type": "GeoJSON"
    }
  ]
}

The second row in the CSV has empty coordinates, but the transform is not doing any filtering or data conversion, so mongo DB is throwing errors trying to encode the empty coords.

To resolve this issue you either want to filter out that row:

"transforms": [
  "$[].data[$boolean(lat) and $boolean(lon)].{'id': id , 'geometry': {'type': 'Point','coordinates': [lon,lat] ,'address': {'formattedAddress': address, 'country': 'USA','geometry': {'type': 'Point','coordinates': [lon,lat]} }} }"
]

or set the empty coords to some default value:

"transforms": [
  "$[].data.{'id': id , 'geometry': {'type': 'Point','coordinates': [$boolean(lon) ? lon : 0,$boolean(lat) ? lat : 0] ,'address': {'formattedAddress': address, 'country': 'USA','geometry': {'type': 'Point','coordinates': [$boolean(lon) ? lon : 0,$boolean(lat) ? lat : 0]} }} }"
]

$boolean() is used to determine if the field has a value.


Last updated