9.1 School Administration Example, Migrator Input Documents
Existing student, teacher, and course document sets comprise the JSON-to-duality migrator input for the school-administration example.
Note:
The document sets in the examples here are very small. In order to
demonstrate the handling of outlier (high-entropy) fields, we use a
minFrequency migrator configuration field value of 25,
instead of the default value of 5.
A field is an outlier for a given document set if it
occurs, or if any of its values occurs with a given type, in less than
minFrequency percent of the documents.
-
An outlier field that occurs rarely is either (1) retained in a flex column of a table underlying the duality view or (2) reported in an error log and not used in the duality view, according to the value of configuration field
useFlexFields. -
An outlier field whose value is rarely of a different type than usual is handled differently. Import tries to convert any such values of a rare type to the expected type for the field. Unsuccessful conversion is reported in an error log and the field is not used in the duality view.
See Fields Specifying Configuration Parameters for Inference and Generation for information about configuration fields minFrequency and
useFlexFields.
Example 9-1 Student Document Set (Migrator Input)
These are the student documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator. There are no outlier fields; that is, there are no fields that are rare or whose values have rare types.
The documents all have the same fields, but note that field
grade is of mixed type: string and number. Neither type
occurs rarely as a grade value, however (in less than 25% of the
student documents, 25 being the minFrequency value
we use for the examples here).
{"studentId" : 1,
"name" : "Donald P.",
"age" : 20,
"courses" : [ {"courseNumber" : "MATH101",
"name" : "Algebra",
"grade" : 90},
{"courseNumber" : "CS101",
"name" : "Algorithms",
"grade" : 90},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"grade" : "TBD"} ]}
{"studentId" : 2,
"name" : "Elena H.",
"age" : 21,
"courses" : [ {"courseNumber" : "MATH102",
"name" : "Calculus",
"grade" : 95},
{"courseNumber" : "CS101",
"name" : "Algorithms",
"grade" : 75},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"grade" : "TBD"} ]}
{"studentId" : 3,
"name" : "Francis K.",
"age" : 20,
"courses" : [ {"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"grade" : 83} ]}
{"studentId" : 4,
"name" : "Georgia D.",
"age" : 19,
"courses" : [ {"courseNumber" : "MATH102",
"name" : "Calculus",
"grade" : 85},
{"courseNumber" : "CS101",
"name" : "Algorithms",
"grade" : 75},
{"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"grade" : 82} ]}
{"studentId" : 5,
"name" : "Hye E.",
"age" : 21,
"courses" : [ {"courseNumber" : "MATH101",
"name" : "Algebra",
"grade" : 97},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"grade" : "TBD"} ]}
{"studentId" : 6,
"name" : "Ileana D.",
"age" : 21,
"courses" : [ {"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"grade" : 95}]}
{"studentId" : 7,
"name" : "Jatin S.",
"age" : 20,
"courses" : [ {"courseNumber" : "CS101",
"name" : "Algorithms",
"grade" : 85},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"grade" : "TBD"} ]}
{"studentId" : 8,
"name" : "Katie H.",
"age" : 21,
"courses" : [ {"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"grade" : 90},
{"courseNumber" : "CS102",
"name" : "Data Structures",
"grade" : "TBD"} ]}
{"studentId" : 9,
"name" : "Luis F.",
"age" : 19,
"courses" : [ {"courseNumber" : "MATH102",
"name" : "Calculus",
"grade" : 95},
{"courseNumber" : "CS101",
"name" : "Algorithms",
"grade" : 75},
{"courseNumber" : "MATH103",
"name" : "Advanced Algebra",
"grade" : 85} ]}
{"studentId" : 10,
"name" : "Ming L.",
"age" : 20,
"courses" : [ {"courseNumber" : "MATH102",
"name" : "Calculus",
"grade" : 95} ]}
Compare this with the student document set migrated using the default conversion, Example 9-19. There are no differences, beyond the addition of fields needed for duality-view support generally.
Example 9-2 Teacher Document Set (Migrator Input)
These are the teacher documents that we assume comprise an existing external document set that serves as input to the JSON-to-duality migrator. There are no outlier fields; that is, no fields are rare or have values with rare types.
The documents have the same fields, but note that field
phoneNumber is of mixed type: string and array (array of
strings). Neither type occurs rarely as a phoneNumber value,
however (in less than 25% of the teacher documents, 25 being the
minFrequency value we use for the examples here).
(Note also that the value of one occurrence of field
coursesTaught is an empty array.)
{"_id" : 101,
"name" : "Abdul J.",
"phoneNumber" : [ "222-555-011", "222-555-012" ],
"salary" : 200000,
"department" : "Mathematics",
"coursesTaught" : [ {"courseId" : "MATH101",
"name" : "Algebra",
"classType" : "Online"},
{"courseId" : "MATH102",
"name" : "Calculus",
"classType" : "In-person"} ]}
{"_id" : 102,
"name" : "Betty Z.",
"phoneNumber" : "222-555-022",
"salary" : 300000,
"department" : "Computer Science",
"coursesTaught" : [ {"courseId" : "CS101",
"name" : "Algorithms",
"classType" : "Online"},
{"courseId" : "CS102",
"name" : "Data Structures",
"classType" : "In-person"} ]}
{"_id" : 103,
"name" : "Colin J.",
"phoneNumber" : [ "222-555-023" ],
"salary" : 220000,
"department" : "Mathematics",
"coursesTaught" : [ {"courseId" : "MATH103",
"name" : "Advanced Algebra",
"classType" : "Online"} ]}
{"_id" : 104,
"name" : "Natalie C.",
"phoneNumber" : "222-555-044",
"salary" : 180000,
"department" : "Computer Science",
"coursesTaught" : []}
Compare this with the teacher document set migrated using the default conversion, Example 9-20. There are no differences, beyond the addition of fields needed for duality-view support generally.
Example 9-3 Course Document Set (Migrator Input)
These are the course documents that we assume comprise an existing
external document set that serves as input to the JSON-to-duality migrator. There
two outlier fields, Notes and
creditHours:
-
Field
Notesis an outlier because it occurs in only one course document (one out of five, 20%, less than theminFrequencyvalue of25that we use for the examples here). -
Field
creditHoursis an outlier because it has a string value in less than 25% of the documents; it has a number value in the other documents.
{"courseId" : "MATH101",
"name" : "Algebra",
"creditHours" : 3,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 5, "name" : "Hye E."} ],
"teacher" : {"teacherId" : 101, "name" : "Abdul J."},
"Notes" : "Prerequisite for Advanced Algebra"}
{"courseId" : "MATH102",
"name" : "Calculus",
"creditHours" : 4,
"students" : [ {"studentId" : 2, "name" : "Elena H."},
{"studentId" : 10, "name" : "Ming L."},
{"studentId" : 9, "name" : "Luis F."},
{"studentId" : 4, "name" : "Georgia D."} ],
"teacher" : {"teacherId" : 101, "name" : "Abdul J."}}
{"courseId" : "CS101",
"name" : "Algorithms",
"creditHours" : 5,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 2, "name" : "Elena H."},
{"studentId" : 4, "name" : "Georgia D."},
{"studentId" : 9, "name" : "Luis F."},
{"studentId" : 7, "name" : "Jatin S."} ],
"teacher" : {"teacherId" : 102, "name" : "Betty Z."}}
{"courseId" : "CS102",
"name" : "Data Structures",
"creditHours" : 3,
"students" : [ {"studentId" : 1, "name" : "Donald P."},
{"studentId" : 2, "name" : "Elena H."},
{"studentId" : 5, "name" : "Hye E."},
{"studentId" : 7, "name" : "Jatin S."},
{"studentId" : 8, "name" : "Katie H."} ],
"teacher" : {"teacherId" : 102, "name" : "Betty Z."}}
{"courseId" : "MATH103",
"name" : "Advanced Algebra",
"creditHours" : "3",
"students" : [ {"studentId" : 3, "name" : "Francis K."},
{"studentId" : 4, "name" : "Georgia D."},
{"studentId" : 8, "name" : "Katie H."},
{"studentId" : 9, "name" : "Luis F."},
{"studentId" : 6, "name" : "Ileana D."} ],
"teacher" : {"teacherId" : 103, "name" : "Colin J."}}
Compare this with the course document set migrated using the default
conversion, Example 9-21. There are no differences, beyond the addition of fields needed for duality-view
support generally. In particular, outlier fields Notes (rare) and
creditHours (rare type) are both present after migration,
Notes because it is stored in a flex column, and
creditHours because its outlier value for course
MATH103 is converted from the string "3" to
the number 3.