Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing the Proliferation of Event Types #518

Open
tychonievich opened this issue Jul 22, 2024 · 5 comments
Open

Managing the Proliferation of Event Types #518

tychonievich opened this issue Jul 22, 2024 · 5 comments

Comments

@tychonievich
Copy link
Collaborator

tychonievich commented Jul 22, 2024

This is not really a new issue; it is an effort to collect topics from several other issues and discussions and add context for those who haven't followed those conversations. Those other issues are scattered and I'm not confident that I found them all; if I've missed something, please add it here!

If you want to propose specific new events or attributes, see the event and attribute proposal tracker..
If you want to see a large list of possible new attributes and events, see #117.
There are also various issues discussing specific new events or substructures thereof. This issue is instead about reorganizing the entire event/attribute system.

The challenge/situation

GEDCOM currently has 47 event/attribute types (32 event types, 15 attribute types), not counting the generic EVEN and FACT structures. More than 200 additional types have been proposed in various issues here (notably #117).

Long lists of options make user interfaces challenging to create and user decisions hard to guide. For some applications they may also make code lengthy with increased chance of accidentally omitting or cross-coding some component.

The current set has some quite broad types, like DESC which subsumes multiple extension types some applications support (such as _COLO, _EYEC, _HAIR, _HEIG, _WEIG); and some surprisingly narrow, like BARM and BASM being distinct. This inconsistency in specificity helps fuel discontent, with those who like specificity wondering why the level of specificity they find in one place is not present in another; and those who like generic flexibility having complementary wondering.

Any type with a high degree of specificity makes translation and multicultural communication challenging.

The current set are not uniformly understood. Some structures have definitions that do not map well to non-English languages or non-Christian-European cultures. Some users apply the closest available structure to each situation not formally covered in the specification, such as using MARB for any marriage-announcement-like event even if it is informal and not a bann, while other users do not do this.

These topics are not restricted to events and attributes; calendars and name parts have both had similar discussions, but with fewer types in the 7.0 specification.

Five Proposed Solutions

  1. Add all the events and attributes that come to mind.

    This approach was rejected by the GEDCOM steering committee in April 2023, at which time the "valuable, absent, and used" criteria were introduced for discussing new event proposals. But that doesn't mean it couldn't be revisited.

  2. Add all the events and attributes that multiple applications support.

    This approach is implicitly the intent of the event and attribute proposal tracker and the "awaiting use" label in the issue tracker.

  3. Use only a small number of types; any additional clarification goes in a free-text field like TYPE or NOTE.

    While option has been mentioned in passing, I'm not aware of any serious proposal along these lines.

  4. Create a type hierarchy. For example a Marriage Bann ⇒ Marriage Announcement ⇒ Pre-Marriage Event ⇒ Marriage Event ⇒ Family Event ⇒ Event, where "⇒" means "is a subtype of" or "implies" or "is subsumed by".

    This approach is used in some peer specification, notably schema.org, but has not received much discussion here.

  5. Create a smaller set of broad types, with optional enumerated-value subtypes in a KIND substructure.

    This has two parts:

    1. defining the smaller set of broader types. Several have been proposed:

    2. adding the KIND substructure. A concrete proposal can be found in Add KIND enumerated values for MARR and BURI #322.

    An additional open question is if the enumerations would be singular or plural. We could do any of the following:

    1. One broad type, one specific type.
    2. All of the types in an subtype inheritance path.
    3. Functional tags: a value for any kind of announcement, a value for any kind of pre-event event, a value for any type of religious or church-sponsored event, and so on.

    This proposal has received the most discussion, but also has the most open questions.

Solution implementation options

Assuming we converge on a solution that we like, we could do any of the following:

In 7.1 we could

  • add more event types
  • add KIND to the existing event type
  • deprecate some current event types, leaving them in the spec but stating that they should be changed to a different structure.

In 8.0 we could replace and refactor as much as we wish.

No matter what we do, it is likely that applications will wish to support exporting new data in 7.0 and earlier formats. For example, if we deprecate or remove the MARB in favor of some broader structure with a KIND we should be clear what KIND values imply this can be exported as a MARB.

@Norwegian-Sardines
Copy link

Thanks Luther for creating this issue. I think I am the one that may have started the conversation to move to the use of <fact>.TYPE to begin managing the proliferation of new event types. With the addition of the KIND tag to enumerate various “like” events as an alternative to using TYPE I personally think we have a winner!

I’ll let others weigh in with their comments, but I’m on board to discuss this option above others going forward!

@cdhorn
Copy link
Contributor

cdhorn commented Sep 2, 2024

When I brought up adding more event and attribute types in #117 the primary reason for doing so was related to information context. Having a far richer set of specific enumerated types helps preserve context when data is shared between people from different countries.

Think about it in terms of tagging data for machine learning, you want and need the tags to be applied consistently across languages and cultures. And the finer, the more detailed the tagging, the more context is preserved and value can be extracted.

Should that be accomplished with a flat namespace or a hierarchical namespace? Both have benefits and drawbacks. After some consideration I think the later, if well thought out, will have more long term benefits. However, as almost every genealogical site and application today uses a flat namespace I think changing that should be a 8.0 item. Ideally I would like to see shared events and groups in 8.0 as well, but I know that is wishful thinking.

In the end the primary responsibility of Gedcom is to serve as a data transmission envelope. It will always be a lossy envelope, but each iteration should strive to further improve fidelity.

@pfahlr
Copy link

pfahlr commented Feb 25, 2025

I may be completely missing something integral to this specification, so just ignore me if this comment seems misguided or whatever.

Coming from an entirely outside perspective as I'm completely new to this specification, it would seem to me that events might be grouped at the highest level by HOW they affect the record in time.

A to create a set of events types that covers the range of specific culturally relevant events that exist in the world would require 100s if not thousands of types. Instead, I imagine they should be broken down into a hierarchy based on how they AFFECT the associated records as well as any logical structures by which other information might emerge from the interaction of various events. By employing this structure it seems some very powerful features of the data might emerge.

This list is by no means exhaustive, but off the top of my head I have the following questions that might be used to classify event types:

  1. Do they simply mark the occurance of an event?

  2. Do they imply a permanent or temporary change to the record, in the latter is there expectation of an event marking the end of the temporary change or simply an interval specified by the initial event?

  3. Does an event imply that another event type should not be able to occur afterwards or until the expiration and if such an event does anyway, does this have some kind of meaning? For example, two marriages without a divorce in-between implies certain additional information, but the beginning of a significant non-marriage romantic entanglement does not necessarily imply the end of another. Perhaps the marriage event should not even be considered anything more than a legal familial association in the absence of a corresponding romantic entanglement which may or may not be a better predictor of the lineage of children arising from such relationships. Certainly what is recorded in official records prior to genetic paternity testing is often not always a factual representation of lineage. And, in many cases, other historical might be available.

  4. Does the event mark a change to one of the fields in another record (i.e., a person announcing a change to their gender identity, an educational milestone resulting in a change to the prefix or suffix of an individual's name)

Further, and this goes beyond the scope of this topic, should there not be a means of noting a parent-child relationship that is not genetic or even by marriage. A woman or man acts in the role of step-parent to the child of their romantic partner to which they were never married. Or an adult role model who de facto fills the role of a traditionally familil relationship and all such Fictive Kin relationships. These relationships have historical significance, huge significance in people's lives, and can direct research regarding yet unknown genetic links between individuals. There should be a means by which they can be noted.

Just the thoughts of this naieve obsever after reading over the specification. Hopefully someone will find some of it useful.

@Norwegian-Sardines
Copy link

Can you give some examples of your hierarchy?

Further, and this goes beyond the scope of this topic, should there not be a means of noting a parent-child relationship that is not genetic or even by marriage. A woman or man acts in the role of step-parent to the child of their romantic partner to which they were never married.

Many of these constructs use made up terms without specific definitions. These roles may have no meaning (or alternate meaning) in other cultures.

This being said, I could capture this in a couple of ways using the current GEDCOM.

I could create a FAMILY_Record with a not married event, and connect the child to the record with a FAMC adoption, and an adoption tag with a mother Attribute.

I could also just use an ASSO relationship with a ROLE OF “Other” and a phrase “Janet Jones acting as a step parent”!

For example, two marriages without a divorce in-between implies certain additional information, but the beginning of a significant non-marriage romantic entanglement does not necessarily imply the end of another.

A Family_Record would not in general have two marriage events, except in a very uncommon and not always supported instances! The reporting of two marriage events between two persons on different days with no clear date! However if you don’t have evidence that a divorce occurred should you assume one did occur? I would not make that assumption!

Does the event mark a change to one of the fields in another record (i.e., a person announcing a change to their gender identity, an educational milestone resulting in a change to the prefix or suffix of an individual's name)

One event could possibly trigger a change to anther event. This logic should be implemented by the application and/or verified by the data entry person! Remember that the GEDCOM is a transmittal coding of data with verified evidence.

Or an adult role model who de facto fills the role of a traditionally familil relationship and all such Fictive Kin relationships. These relationships have historical significance, huge significance in people's lives, and can direct research regarding yet unknown genetic links between individuals.

Before modern times this was one of the definitions of “Fostering”. But again a “Mentor”, “Advisor” non-family relationship (god-parent) can be recorded using an ASSO tag!

@Norwegian-Sardines
Copy link

As a genealogist I record events and attributes based on evidence. If an event happened on a day I record that exact day. If an event occurred over multiple days I record the date period, if I know the event happened in a non exact day I use a date range or date appropriate. I rarely record things without evidence!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants