9.1 School Administration Example, Migrator Input Documents

Existing student, teacher, and course document sets comprise the JSON-to-duality migrator input for the school-administration example.

Note:

The document sets in the examples here are very small. In order to demonstrate the handling of outlier (high-entropy) fields, we use a minFrequency migrator configuration field value of 25, instead of the default value of 5.

A field is an outlier for a given document set if it occurs, or if any of its values occurs with a given type, in less than minFrequency percent of the documents.

  • An outlier field that occurs rarely is either (1) retained in a flex column of a table underlying the duality view or (2) reported in an error log and not used in the duality view, according to the value of configuration field useFlexFields.

  • An outlier field whose value is rarely of a different type than usual is handled differently. Import tries to convert any such values of a rare type to the expected type for the field. Unsuccessful conversion is reported in an error log and the field is not used in the duality view.

See Fields Specifying Configuration Parameters for Inference and Generation for information about configuration fields minFrequency and useFlexFields.

Example 9-1 Student Document Set (Migrator Input)

These are the student documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator. There are no outlier fields; that is, there are no fields that are rare or whose values have rare types.

The documents all have the same fields, but note that field grade is of mixed type: string and number. Neither type occurs rarely as a grade value, however (in less than 25% of the student documents, 25 being the minFrequency value we use for the examples here).

{"studentId" : 1,
 "name"      : "Donald P.",
 "age"       : 20,
 "courses"   : [ {"courseNumber" : "MATH101",
                  "name"         : "Algebra",
                  "grade"        : 90},
                 {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "grade"        : 90},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "grade"        : "TBD"} ]}

{"studentId" : 2,
 "name"      : "Elena H.",
 "age"       : 21,
 "courses"   : [ {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "grade"        : 95},
                 {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "grade"        : 75},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "grade"        : "TBD"} ]}

{"studentId" : 3,
 "name"      : "Francis K.",
 "age"       : 20,
 "courses"   : [ {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "grade"        : 83} ]}

{"studentId" : 4,
 "name"      : "Georgia D.",
 "age"       : 19,
 "courses"   : [ {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "grade"        : 85},
                 {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "grade"        : 75},
                 {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "grade"        : 82} ]}

{"studentId" : 5,
 "name"      : "Hye E.",
 "age"       : 21,
 "courses"   : [ {"courseNumber" : "MATH101",
                  "name"         : "Algebra",
                  "grade"        : 97},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "grade"        : "TBD"} ]}

{"studentId" : 6,
 "name"      : "Ileana D.",
 "age"       : 21,
 "courses"   : [ {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "grade"        : 95}]}

{"studentId" : 7,
 "name"      : "Jatin S.",
 "age"       : 20,
 "courses"   : [ {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "grade"        : 85},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "grade"        : "TBD"} ]}

{"studentId" : 8,
 "name"      : "Katie H.",
 "age"       : 21,
 "courses"   : [ {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "grade"        : 90},
                 {"courseNumber" : "CS102",
                  "name"         : "Data Structures",
                  "grade"        : "TBD"} ]}

{"studentId" : 9,
 "name"      : "Luis F.",
 "age"       : 19,
 "courses"   : [ {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "grade"        : 95},
                 {"courseNumber" : "CS101",
                  "name"         : "Algorithms",
                  "grade"        : 75},
                 {"courseNumber" : "MATH103",
                  "name"         : "Advanced Algebra",
                  "grade"        : 85} ]}

{"studentId" : 10,
 "name"      : "Ming L.",
 "age"       : 20,
 "courses"   : [ {"courseNumber" : "MATH102",
                  "name"         : "Calculus",
                  "grade"        : 95} ]}

Compare this with the student document set migrated using the default conversion, Example 9-19. There are no differences, beyond the addition of fields needed for duality-view support generally.

Example 9-2 Teacher Document Set (Migrator Input)

These are the teacher documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator. There are no outlier fields; that is, no fields are rare or have values with rare types.

The documents have the same fields, but note that field phoneNumber is of mixed type: string and array (array of strings). Neither type occurs rarely as a phoneNumber value, however (in less than 25% of the teacher documents, 25 being the minFrequency value we use for the examples here).

(Note also that the value of one occurrence of field coursesTaught is an empty array.)

{"_id"           : 101,
 "name"          : "Abdul J.",
 "phoneNumber"   : [ "222-555-011", "222-555-012" ],
 "salary"        : 200000,
 "department"    : "Mathematics",
 "coursesTaught" : [ {"courseId"  : "MATH101",
                      "name"      : "Algebra",
                      "classType" : "Online"},
                     {"courseId"  : "MATH102",
                      "name"      : "Calculus",
                      "classType" : "In-person"} ]}

{"_id"           : 102,
 "name"          : "Betty Z.",
 "phoneNumber"   : "222-555-022",
 "salary"        : 300000,
 "department"    : "Computer Science",
 "coursesTaught" : [ {"courseId"  : "CS101",
                      "name"      : "Algorithms",
                      "classType" : "Online"},
                     {"courseId"  : "CS102",
                      "name"      : "Data Structures",
                      "classType" : "In-person"} ]}

{"_id"           : 103,
 "name"          : "Colin J.",
 "phoneNumber"   : [ "222-555-023" ],
 "salary"        : 220000,
 "department"    : "Mathematics",
 "coursesTaught" : [ {"courseId"  : "MATH103",
                      "name"      : "Advanced Algebra",
                      "classType" : "Online"} ]}

{"_id"           : 104,
 "name"          : "Natalie C.",
 "phoneNumber"   : "222-555-044",
 "salary"        : 180000,
 "department"    : "Computer Science",
 "coursesTaught" : []}

Compare this with the teacher document set migrated using the default conversion, Example 9-20. There are no differences, beyond the addition of fields needed for duality-view support generally.

Example 9-3 Course Document Set (Migrator Input)

These are the course documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator. There two outlier fields, Notes and creditHours:

  • Field Notes is an outlier because it occurs in only one course document (one out of five, 20%, less than the minFrequency value of 25 that we use for the examples here).

  • Field creditHours is an outlier because it has a string value in less than 25% of the documents; it has a number value in the other documents.

{"courseId"         : "MATH101",
 "name"             : "Algebra",
 "creditHours"      : 3,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 5, "name" : "Hye E."} ],
 "teacher"          : {"teacherId" : 101, "name" : "Abdul J."},
 "Notes"            : "Prerequisite for Advanced Algebra"}

{"courseId"         : "MATH102",
 "name"             : "Calculus",
 "creditHours"      : 4,
 "students"         : [ {"studentId" : 2,  "name" : "Elena H."},
                        {"studentId" : 10, "name" : "Ming L."},
                        {"studentId" : 9,  "name" : "Luis F."},
                        {"studentId" : 4,  "name" : "Georgia D."} ],
 "teacher"          : {"teacherId" : 101,  "name" : "Abdul J."}}

{"courseId"         : "CS101",
 "name"             : "Algorithms",
 "creditHours"      : 5,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 2, "name" : "Elena H."},
                        {"studentId" : 4, "name" : "Georgia D."},
                        {"studentId" : 9, "name" : "Luis F."},
                        {"studentId" : 7, "name" : "Jatin S."} ],
 "teacher"          : {"teacherId" : 102, "name" : "Betty Z."}}

{"courseId"         : "CS102",
 "name"             : "Data Structures",
 "creditHours"      : 3,
 "students"         : [ {"studentId" : 1, "name" : "Donald P."},
                        {"studentId" : 2, "name" : "Elena H."},
                        {"studentId" : 5, "name" : "Hye E."},
                        {"studentId" : 7, "name" : "Jatin S."},
                        {"studentId" : 8, "name" : "Katie H."} ],
 "teacher"          : {"teacherId" : 102, "name" : "Betty Z."}}

{"courseId"         : "MATH103",
 "name"             : "Advanced Algebra",
 "creditHours"      : "3",
 "students"         : [ {"studentId" : 3, "name" : "Francis K."},
                        {"studentId" : 4, "name" : "Georgia D."},
                        {"studentId" : 8, "name" : "Katie H."},
                        {"studentId" : 9, "name" : "Luis F."},
                        {"studentId" : 6, "name" : "Ileana D."} ],
 "teacher"          : {"teacherId" : 103, "name" : "Colin J."}}

Compare this with the course document set migrated using the default conversion, Example 9-21. There are no differences, beyond the addition of fields needed for duality-view support generally. In particular, outlier fields Notes (rare) and creditHours (rare type) are both present after migration, Notes because it is stored in a flex column, and creditHours because its outlier value for course MATH103 is converted from the string "3" to the number 3.