Data Modeling

Data Model

This document serves as a discussion point for the entities and relationships represented in our data model.

We need to facilitate effective communication between technical and business stakeholders. As a shared responsibility, the creation and maintenance of our data model requires clear understanding and agreement from all parties.

Developers need precise rules to ensure data integrity and to effectively express the business requirements in the data.

Business stakeholders need to ensure the model can capture the required information and answer key business questions, while understanding the constraints and costs associated with any changes to the model.

For example, let's consider these important questions (there are a very large number of such questions):

  • An Org Tier has a Many-to-Many relationship to Persons, which implies ownership. Ownership, in common parlance, tends to be singular. Does this mean that an org_tier can have multiple owners or none at all? If ownership is intended to be singular, can we guarantee that a change in ownership is atomic, i.e., there is never a point in time when an org_tier is without an owner or has multiple owners?

  • Our tables have a common pattern, featuring created_at, updated_at, and retired_at fields. This brings up several points for discussion:

    • Privacy and Data Retention: In line with privacy principles, personal data should not be retained longer than necessary. If a record is marked as retired but not actually deleted, this might conflict with the principle of data minimization and the requirement to delete data that is no longer necessary.
    • Auditing: The updated_at field suggests we might intend to implement some sort of auditing system. If that's the case, should we not also have an updated_by field? In fact, might we not want a historical record of all such updates? This would allow us to track changes over time and identify who made those changes.
    • Impact on Performance: It's worth noting that the addition of fields for tracking and auditing purposes might have an impact on system performance, especially for large and frequently updated tables. How can we balance the need for data integrity and traceability with the requirement for optimal performance?

By addressing these questions and other similar considerations, we can ensure that our data model is robust, flexible, and fully supportive of our business needs while maintaining compliance with our objectives.

Our data model isn't just a technical artifact; it's also a representation of our business. Changes to it can have far-reaching implications. Every decision should be well-considered and agreed upon by all stakeholders.

Entities and Relationships

This is woefully incomplete, just a starting point. ChatGPT helped ;-)

Here is the entity-relationship diagram of the current database schema:

Epicenter Lighthouse Score

Relationships

Organizations

  • An organization is the overarching entity within this model. Each person in the system belongs to exactly one organization, indicating a One-to-Many relationship between Organizations and Persons. This relationship ensures that every person is linked to an organization, establishing clear ownership and accountability.

Org Tiers

  • Org Tiers are the next level in the hierarchy. Each Org Tier belongs to exactly one organization, again indicating a One-to-Many relationship, this time between Organizations and Org Tiers. This relationship serves to segment or structure the organization into manageable parts.

  • Org Tiers then form a Tree (or a Forest in the case of multiple independent Trees), each node representing an Org Tier.

  • Each Org Tier has exactly one parent Org Tier (One-to-One relationship), except for the top-level Org Tiers which have no parent, making them the root nodes of each Tree or Forest.

  • On the other hand, an Org Tier can have multiple child Org Tiers. The expectation is that this would usually be between 0-10 (Many-to-One relationship). The system is therefore designed to manage a reasonable number of relationships, ensuring that it remains efficient and usable.

  • Importantly, the set of transitive "parent" relationships must not form a cycle. This means that you can't have an Org Tier that is indirectly its own parent. This restriction is critical to maintaining the Tree (or Forest) structure and preventing logical inconsistencies or infinite loops in traversals.

This hierarchical data model serves to create a well-defined structure for the organization, with clear relationships between the different entities and levels. It allows for efficient querying of the data, such as finding all persons under a particular Org Tier or determining the top-level Org Tier for a given person.

Between Persons and Organizations

Each person is linked to an organization, suggesting that they are a part of that organization. The field 'organization_id' in the 'persons' table references the 'id' field in the 'organizations' table.

Between Org_tiers and Organizations

Each org_tier is part of an organization, shown by the 'organization_id' field in 'org_tiers' referencing the 'id' field in 'organizations'. Additionally, org_tiers can be part of another org_t

Entities

Here are the beginings of a description of each of the important concepts in the project.

Persons

Persons are individuals within the organization. They hold unique pieces of information about them such as:

  • id: Unique identifier for each person.
  • user_id: Identifier linking the person to a user account.
  • family_name: The family name of the person.
  • given_name: The given name of the person.
  • email: Email of the person.
  • phone: Phone number of the person.
  • organization_id: Identifier linking the person to an organization.

Organizations

Organizations are structured entities that might represent businesses, associations, government agencies, etc. They have details like:

  • id: Unique identifier for each organization.
  • name_en: The English name of the organization.
  • name_fr: The French name of the organization.
  • acronym_en: The English acronym of the organization.
  • acronym_fr: The French acronym of the organization.
  • org_type: The type of organization.
  • url: Website URL of the organization.

Org_tiers

Org_tiers represent different hierarchical levels or departments within an organization. They have attributes such as:

  • id: Unique identifier for each org_tier.
  • organization_id: Identifier linking the org_tier to an organization.
  • tier_level: Numeric value representing the level of the tier.
  • name_en: The English name of the org_tier.
  • name_fr: The French name of the org_tier.
  • primary_domain: The main domain of the org_tier (e.g., public_health, policy, data, etc.)
  • parent_tier: Identifier linking to a parent tier (if any).

Skills

Skills represent capabilities or expertise areas individuals may possess. They include details like:

  • id: Unique identifier for each skill.
  • name_en: The English name of the skill.
  • name_fr: The French name of the skill.
  • description_en: English description of the skill.
  • description_fr: French description of the skill.
  • domain: The domain to which the skill belongs (e.g., public_health, policy, data, etc.)

Capabilities

Capabilities refer to specific skills that individuals possess within the organization, including their level of expertise. They are defined by:

  • id: Unique identifier for each capability.
  • name_en: The English name of the capability.
  • name_fr: The French name of the capability.
  • domain: The domain to which the capability belongs (e.g., public_health, policy, data, etc.)
  • person_id: Identifier linking the capability to a person.
  • skill_id: Identifier linking the capability to a skill.
  • organization_id: Identifier linking the capability to an organization.
  • self_identified_level: The level of the capability as identified by the person (e.g., novice, experienced, etc.)
  • validated_level: The validated level of the capability, if it has been validated.
  • validation_values: Array of validation values, if any.

Schema Summary

The given schema provides a comprehensive structure to express relationships, affiliations, skills, capabilities, roles, and other attributes of persons and organizations in a complex hierarchical system.

  1. Persons: This model represents individuals with fields for a unique user id, family name, given name, email, phone, and their affiliated organization. There are relations established to affiliations, capabilities, tier ownerships in organizations, roles, and team ownerships.

  2. Organizations: The model represents various organizations with a unique name, acronym, type, and URL. It also has a timestamp for creation, updating, and retiring. It establishes relationships with affiliations, capabilities, different tiers within the organization, persons associated, and teams.

  3. Affiliations: This model stores relationships between persons and organizations, along with their roles and the time span of the affiliation.

  4. Capabilities: This model contains details about a person's capabilities in different domains. It includes self-identified levels and validated levels of capability. It has a connection with organizations, persons, skills, and validations.

  5. Org_tiers: The model represents different hierarchical levels within organizations with unique names and primary domains. It maintains a relationship with tier ownerships, its parent tier, other tiers, teams, and the organization itself.

  6. Org_tier_ownerships: This model represents the ownership of organizational tiers by different persons.

  7. Roles: This model represents roles held by persons within teams, along with the associated effort, HR group, and HR level. It maintains relations with tasks and works done in that role.

  8. Skills: The model represents various skills in different domains, each with unique names and descriptions.

  9. Tasks: This model represents various tasks created by different roles. It maintains the intended outcome, final outcome, and approval tier of tasks.

  10. Team_ownerships: This model represents the ownership of teams by different persons.

  11. Teams: The model represents different teams within an organization, each with unique names and descriptions in multiple languages. It has relations with roles, team ownerships, organizational tiers, and the organization itself.

  12. Users: This model represents users of the system, each with unique emails and access keys. It also maintains the approved role and access level of each user.

  13. Valid_roles: This model represents the valid roles that a user can have in the system.

  14. Validations: The model stores validation details about a person's capabilities, done by a validator.

The schema also has enums for access level, capability level, HR group, and skill domain to maintain consistent and valid data across different models.