Skip to content

Conversation

@bobbai00
Copy link
Contributor

@bobbai00 bobbai00 commented Dec 17, 2025

What changes were proposed in this PR?

The third-party code (all MIT licensed, Category A) is compatible with Apache License 2.0 but requires proper attribution per Apache policy. This PR addresses license compliance issues identified during an audit:

  1. Restored original MIT license headers for third-party code:

    • common/workflow-operator/src/main/scala/com/kjetland/** (mbknor-jackson-jsonschema)
    • frontend/src/app/common/formly/array.type.ts (Google Angular)
    • frontend/src/app/common/formly/object.type.ts (Google Angular)
    • frontend/src/app/common/formly/multischema.type.ts (Google Angular)
    • frontend/src/app/common/formly/null.type.ts (Google Angular)
  2. Updated LICENSE file with third-party attribution following Apache Spark's approach:

    • Added pointer to licenses/ directory for full license text
    • Listed bundled dependencies with copyright and source attribution
  3. Created licenses/LICENSE-MIT.txt containing the full MIT license text

  4. Updated .licenserc.yaml to exclude third-party files from Apache license header checking

References:

Any related issues, documentation, discussions?

Closes #4135. Related to #4132.

How was this PR tested?

Manual verification that:

  • All MIT-licensed files have correct license headers
  • LICENSE file correctly lists all bundled third-party dependencies
  • licenses/LICENSE-MIT.txt contains the complete MIT license text

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude code.

…ibution

This commit addresses license compliance issues identified during an audit:

1. Restored original MIT license headers for third-party code:
   - pyright-language-service/src/*.ts (TypeFox monaco-languageclient)
   - common/workflow-operator/src/main/scala/com/kjetland/** (mbknor-jackson-jsonschema)
   - frontend/src/app/common/formly/array.type.ts (Google Angular)

2. Updated LICENSE file with proper third-party attribution section
   including full MIT license text for each bundled dependency

3. Updated .licenserc.yaml to exclude third-party files from
   Apache license header checking

4. Added sbt-license-report plugin (v1.7.0) for automated dependency
   license tracking and compliance auditing

The third-party code (all MIT licensed, Category A) is compatible with
Apache License 2.0 but requires proper attribution per Apache policy.
@github-actions github-actions bot added frontend Changes related to the frontend GUI docs Changes related to documentations service common labels Dec 17, 2025
Remove pyright-language-service license header changes as they are
already addressed in PR apache#4132. This commit now focuses only on:

- mbknor-jackson-jsonschema (MIT license attribution)
- Angular array.type.ts (MIT license attribution)
- sbt-license-report plugin for dependency tracking
@github-actions github-actions bot removed the service label Dec 17, 2025
@bobbai00 bobbai00 changed the title fix: restore proper license headers for third-party code and add attr… fix: restore proper license headers for third-party code Dec 17, 2025
@bobbai00 bobbai00 self-assigned this Dec 17, 2025
@parshimers
Copy link
Member

While this fixes some of the issues, I still think there are more remaining.
Anything under frontend/src/app/common/formly is MIT-licensed by Google, not just that one file.
Furthermore the required attribution from the MIT license belongs in NOTICE, not LICENSE. The text of the Apache License actually refers to the NOTICE file (https://www.apache.org/licenses/LICENSE-2.0.html#redistribution) as the place where these sorts of attributions go.

I would also suggest that this task not be done via Claude or other LLMs. I don't think it's the right tool for the job. Using a script that an LLM might generate could be good, but there is no easy way to check the validity of the output an LLM would generate in this task. In fact it's exceedingly hard to validate this task, and easy to mistake it for being done correctly, as we have seen. Therefore the method in which it is done is important to scrutinize and have a high degree of confidence in.

My method so far has been to look at the diff of the change that added all of the ASF headers (and inadvertently changed some), and look carefully at any instance where lines were removed instead of added. There are not many of these. Each of those should be scrutinized and marked as either appropriate or mistaken.

@bobbai00
Copy link
Contributor Author

While this fixes some of the issues, I still think there are more remaining. Anything under frontend/src/app/common/formly is MIT-licensed by Google, not just that one file. Furthermore the required attribution from the MIT license belongs in NOTICE, not LICENSE. The text of the Apache License actually refers to the NOTICE file (https://www.apache.org/licenses/LICENSE-2.0.html#redistribution) as the place where these sorts of attributions go.

I would also suggest that this task not be done via Claude or other LLMs. I don't think it's the right tool for the job. Using a script that an LLM might generate could be good, but there is no easy way to check the validity of the output an LLM would generate in this task. In fact it's exceedingly hard to validate this task, and easy to mistake it for being done correctly, as we have seen. Therefore the method in which it is done is important to scrutinize and have a high degree of confidence in.

My method so far has been to look at the diff of the change that added all of the ASF headers (and inadvertently changed some), and look carefully at any instance where lines were removed instead of added. There are not many of these. Each of those should be scrutinized and marked as either appropriate or mistaken.

Thank you for your review! I will manually check files and make the fix complete.

I have one question regarding the location of attribution. Quoting from https://infra.apache.org/licensing-howto.html#permissive-deps,

In LICENSE, add a pointer to the dependency's license within the distribution and a short note summarizing its licensing:
This product bundles SuperWidget 1.2.3, which is available under a
"3-clause BSD" license. For details, see deps/superwidget/.
Under normal circumstances, there is no need to modify NOTICE to mention a bundled dependency.

Seems the attribution should still be put in LICENSE, not NOTICE?

@parshimers
Copy link
Member

Ah yes, you are right about that being in LICENSE and not NOTICE. Sorry for the misdirection. I am too used to the way it is done in Asterix, where we never modify that file, we only modify the template that surrounds it. So I reflexively look too skeptically at modifications there.

The change is looking better already- let me know when it's ready for a look.

@bobbai00
Copy link
Contributor Author

bobbai00 commented Jan 3, 2026

Ah yes, you are right about that being in LICENSE and not NOTICE. Sorry for the misdirection. I am too used to the way it is done in Asterix, where we never modify that file, we only modify the template that surrounds it. So I reflexively look too skeptically at modifications there.

The change is looking better already- let me know when it's ready for a look.

Thanks for confirming. Can you take another look on current PR?

@parshimers
Copy link
Member

What about the pyright-language-service files that triggered this, i.e. https://github.com/apache/texera/pull/4132/files ? Should those fixes be part of this change?

@bobbai00
Copy link
Contributor Author

bobbai00 commented Jan 6, 2026

What about the pyright-language-service files that triggered this, i.e. https://github.com/apache/texera/pull/4132/files ? Should those fixes be part of this change?

Yes, since currently #4132 is separate PR. Is it better to merge it with current PR and close that one?

@parshimers
Copy link
Member

Yeah, I think so. This way this change can contain all of the fixes regarding inadvertently changed license headers.

@bobbai00
Copy link
Contributor Author

bobbai00 commented Jan 8, 2026

Yeah, I think so. This way this change can contain all of the fixes regarding inadvertently changed license headers.

Please take another look. I merged the changes from #4132

@parshimers
Copy link
Member

Some things are still missing. For example, frontend/src/assets/svg/operator-view-result.svg has an ambiguous license. It seems like it was obtained from svgrepo.com, so it likely wasn't authored by anyone on Texera, but the license is not noted.

@bobbai00
Copy link
Contributor Author

Some things are still missing. For example, frontend/src/assets/svg/operator-view-result.svg has an ambiguous license. It seems like it was obtained from svgrepo.com, so it likely wasn't authored by anyone on Texera, but the license is not noted.

Thank you for pointing them out. I fixed some of them. I just checked the folder frontend/src/assets, which contains all the icons used by texera's GUI. I assume all of them need to have their original licenses, not the apache licenses, right? If so, should I fix them by tracing back to their original licenses and restoring them?

@parshimers
Copy link
Member

Good job tracking those icon's actual licenses down, I couldn't find them myself. The one that's MIT licensed is fine, and I think the version of the CC license used by the other two are fine. If we can't find the license for the last one, we need to delete it.

@bobbai00
Copy link
Contributor Author

Good job tracking those icon's actual licenses down, I couldn't find them myself. The one that's MIT licensed is fine, and I think the version of the CC license used by the other two are fine. If we can't find the license for the last one, we need to delete it.

I found the license for the last one, and it is https://www.shutterstock.com/license, which is incompatible with apache. So I deleted it and its usage

@parshimers
Copy link
Member

Cool. Hopefully that's everything then.
One thing to be careful of is, in the source release, those 2 CC-BY licensed icons shouldn't be included (https://www.apache.org/legal/resolved.html#cc-by) . It might be easier to just replace them with something else.

LICENSE Outdated
Copyright (c) 2018 Google Inc. All Rights Reserved.
Source: https://angular.io

SVG Icons
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this grouped by license and not by artifact type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* specific language governing permissions and limitations
* under the License.
*/
//The source file can be referred to: https://github.com/TypeFox/monaco-languageclient/blob/main/packages/examples/src/python/server/main.ts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the MIT license header from TypeFox be here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* under the License.
*/

declare module 'hocon-parser' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. What's the license of this file? Is it MIT-licensed from TypeFox?

Copy link
Member

@parshimers parshimers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things, but I think the hard work of finding all the source inclusions is done.

LICENSE Outdated
CC BY 3.0 License (licenses/LICENSE-CC-BY-3.0.txt)
--------------------------------------------------

NOTE: The following files use CC BY 3.0 license which requires attribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not that they should be removed in the release, it's that they can only be packaged with binaries and not as part of the source release. it's not exactly clear to me how this applies to a svg. it is textual but not in the way source code is usually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. I removed it.

Copy link
Member

@parshimers parshimers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two more small things

* specific language governing permissions and limitations
* under the License.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're sure this file was written by someone in Texera? it seems like src/main.ts refers to it- was that file taken verbatim from TypeFox or was this file and that modification done by someone in Texera?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the original file from TypeFox. Should I reach out to the people who introduced this file to confirm?

@bobbai00
Copy link
Contributor Author

Cool. Hopefully that's everything then. One thing to be careful of is, in the source release, those 2 CC-BY licensed icons shouldn't be included (https://www.apache.org/legal/resolved.html#cc-by) . It might be easier to just replace them with something else.

I replaced two CC icons with MIT-licensed icons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common docs Changes related to documentations frontend Changes related to the frontend GUI service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restore proper license headers for third-party bundled source code

4 participants