Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: unencoded subpath cannot contain . or .. #368

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jeremylong
Copy link
Contributor

Add test cases to check for invalid . and .. segments in the subpath.

jkowalleck
jkowalleck previously approved these changes Feb 13, 2025
jkowalleck
jkowalleck previously approved these changes Feb 13, 2025
@jkowalleck jkowalleck mentioned this pull request Feb 13, 2025
3 tasks
@jkowalleck
Copy link
Member

@matt-phylum
Copy link
Contributor

I'm wondering if this is actually a good idea. I can agree that the subpath should not contain . or .. segments, and I would prefer if implementations didn't need path normalization logic, but I don't think parsers and formatters should validate that the subpath does not contain . or .. segments. The problem is that if you observe a PURL pkg:golang/google.golang.org/genproto@abcdedf#/googleapis/../api/annotations/ it cannot be understood at all if the parser rejects the .., and it cannot be canonicalized or otherwise parsed and then written back out if the formatter rejects the ... It's syntactically valid and at least the type name version have an clear, unambiguous meaning, but because of the .. everything is thrown away.

@jkowalleck
Copy link
Member

jkowalleck commented Feb 13, 2025

[...] but I don't think parsers and formatters should validate that the subpath does not contain . or .. segments.

why? i mean, this is already part of the spec:

The problem is that if you observe a PURL pkg:golang/google.golang.org/genproto@abcdedf#/googleapis/../api/annotations/ it cannot be understood at all if the parser rejects the .., and it cannot be canonicalized or otherwise parsed and then written back out if the formatter rejects the ... It's syntactically valid and at least the type name version have an clear, unambiguous meaning, but because of the .. everything is thrown away.

per spec, the .. and . are to be discarded, rest is to be kept untouched. to attempt of path-resolution or anything. foo/../bar becomes foo/bar. (not bar)

@matt-phylum
Copy link
Contributor

matt-phylum commented Feb 13, 2025

Then these tests are wrong, right? They have "is_invalid": true, and they include the . and .. in the parsed subpaths.

@jkowalleck

This comment was marked as outdated.

"version": "abcdedf",
"qualifiers": null,
"subpath": "googleapis/./api/annotations",
"is_invalid": true
Copy link
Member

@jkowalleck jkowalleck Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets see how the different test scenarios behave:

  • passing - parsing the test canonical purl then re-building a purl from these parsed components should return the test canonical purl. there si no way o get this self-fulfilling prophecy
  • passed, as the . is expected to be stripped - parsing the test purl should return the components parsed from the test canonical purl
  • passing, as the . is expected to be stripped - parsing the test purl then re-building a purl from these parsed components should return the test canonical purl
  • passing - building a purl from the test components should return the test canonical purl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also it also fails because the parse succeeds without an error, but is_invalid indicates that an error is expected.

Copy link
Member

@jkowalleck jkowalleck Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revisited the tests. you are right - all test are passing.
will set the is_invalid from true to false

"version": "abcdedf",
"qualifiers": null,
"subpath": "googleapis/../api/annotations",
"is_invalid": true
Copy link
Member

@jkowalleck jkowalleck Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets see how the different test scenarios behave:

  • passing - parsing the test canonical purl then re-building a purl from these parsed components should return the test canonical purl. there si no way o get this self-fulfilling prophecy failing, anyway.
  • passed, as the .. is expected to be stripped - parsing the test purl should return the components parsed from the test canonical purl
  • passed, as the .. is expected to be stripped - parsing the test purl then re-building a purl from these parsed components should return the test canonical purl
  • passing - building a purl from the test components should return the test canonical purl

@jkowalleck jkowalleck self-requested a review February 13, 2025 15:10
"name": "genproto",
"version": "abcdedf",
"qualifiers": null,
"subpath": "googleapis/../api/annotations",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be with the .. or without? If we're using the value as input to verify that formatting removes the .., then we want .. here, but if we're verifying that parsing removes the .. we wouldn't want it here.

Copy link
Member

@jkowalleck jkowalleck Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, but if we're verifying that parsing removes the .. we wouldn't want it here

The same thing probably confused the original author (and me at first review).

that was the reason i looked up the intended test scenarios.

To test ``purl`` parsing and building, a tool can use this test suite and for
every listed test object, run these tests:
- parsing the test canonical ``purl`` then re-building a ``purl`` from these parsed
components should return the test canonical ``purl``
- parsing the test ``purl`` should return the components parsed from the test
canonical ``purl``
- parsing the test ``purl`` then re-building a ``purl`` from these parsed components
should return the test canonical ``purl``
- building a ``purl`` from the test components should return the test canonical ``purl``

so there is no case where any result is compared against the test components.
there are only cases where the test components are used as input and are expected to produce a canonicalized PURL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants