Prerequisites
-
Full Admin user role. For more information, see Manage Accounts.
-
JVM version 17+
-
The template CSV you upload must meet the following requirements:
- The file can not be larger than 10 KB.
- The second and ensuing rows may contain sample data; be aware, however, that this data will not be indexed unless it also appears in the source data CSV file.
-
The DLP indexer supports data files with up to 55 million records. The exact records limit is determined by the total number of columns and the number of those are of Alphanumeric type. The indexer displays the exact limit when attempting to load a file that exceeds it. If your dataset is larger than the limit, you need to split the records into multiple files. For errors received when indexing a large file, see Memory Tuning for DLP Exact Data Matching Indexer.
-
Both the template and source data CSV files must meet the following requirements:
- A multi-term (multi-word) field can contain a maximum of 6 space-separated words.
- The data file must contain only 1 byte or 2 byte UTF-8 encoded characters.
- The first row of data must have between 1 and 50 fields and each row must have the same number of fields.
- The first row of data must specify the name of each field, and each value must be unique.
- Data in the second and ensuing rows must comply with the EDM field types and supported formats (see Exact Data Match Field Types).
- The field names in the sample data template must match the field names in the actual data source file. The field names must appear in the same order in both files.
Caution: Do not create, edit, or view the template or source data CSV file using Microsoft Excel, as this may corrupt the file. Use a text editor.
|
If any of the values provided in the source file to the DLP indexer fail to be validated as per the supported format, then the DLP indexer will skip that record and proceed with indexing the remaining records. The indexer also behaves in this manner for any records that may exceed the template-defined fields, and for empty rows or records with empty primary values. The position of the skipped records in the file will be provided as part of the output of the DLP indexer.
|
The following procedure uses the following example CSV file for both the template and indexed source data:
Email,SSN,Passport number - US,Credit card number
r_t@gmail.com, 113011111,123456789USA1234567U1234567, 5205105105105109
r_ppp@gmail.com, 248257990,123456789USA1234567U1234567, 4532237384050172
r_opop@gmail.com, 363265019,123456789USA1234567U1234567, 4532549977031249
ppo_t@gmail.com, 417279936,123456789USA1234567U1234567, 4539157470627290