Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Linguist::Repository to isolate Rugged usage #7094

Merged
merged 4 commits into from
Nov 25, 2024

Conversation

vdye
Copy link
Contributor

@vdye vdye commented Oct 16, 2024

Description

The goal of this change is to add flexibility to how repository data is accessed by Linguist::Repository & Linguist::LazyBlob, allowing users to easily configure an alternative to Rugged.

Internally, Linguist::Repository and Linguist::LazyBlob use Rugged to read Git repository data, including diff, attribute, and blob information. While this works for most repositories, it has limits:

  • Rugged/libgit2 can lag behind feature support in Git (e.g. reftable, previously SHA-256).
  • Rugged is a Git API, which makes using Linguist with other SCMs challenging.

The approach taken here is to replace the Rugged::Repository instance in the Linguist::Repository with a new Linguist::Source::Repository instance. The "source" repository contains functions wrapping what were previously Rugged operations (diff, attribute lookup, etc.). Users can then write their custom implementations of those functions and pass their Linguist::Source::Repository into Linguist::Repository to use them seamlessly.

This isn't intended to be a breaking change, so there are a few extra things done to avoid compatibility issues with existing usage:

  • If a Rugged::Repository is passed in as the first argument to either the Linguist::Repository or LazyBlob initializer, it is wrapped in a Linguist::Source::RuggedRepository internally.
  • GIT_ATTR_OPTS & GIT_ATTR_FLAGS are Rugged-specific so they're moved to RuggedRepository, but the LazyBlob constants are not removed and instead point to their RuggedRepository counterparts.
  • current_tree and read_index don't make sense for non-Rugged repos (the former returns a Rugged tree instance, the latter is specific to how Rugged needs to look up attributes). They raise NotImplementedError with a message referencing deprecation only if called on a non-Rugged repository instance; otherwise they behave the same way as before.
  • A method_missing implementation is added to Linguist::RuggedRepository to delegate any unmatched method calls to the internal Rugged::Repository instance (in case users are calling Linguist::Repository.repository directly).

The only possible compatibility issue I can imagine is if a user does some kind of type check on Linguist::Repository.repository (previously it was a Rugged::Repository, now it'll be a Linguist::Source::RuggedRepository). That seems highly unlikely, though, and should be a simple fix if needed.


The commits on this branch are organized to be atomic and incrementally reviewable:

  • Commit 1 adds the generic Linguist::Source:Repository and Linguist::Source::Diff interfaces, with all methods raising NotImplementedError to ensure they are overridden by a subclass implementation.
  • Commit 2 adds a Rugged implementation of Linguist::Source::Repository matching existing usage in compute_stats and Linguist::LazyBlob.
  • Commit 3 updates Linguist::Repository to use a Linguist::Source::Repository instead of a Rugged::Repository to read repository content.
  • Commit 4 adds the method_missing implementation to RuggedRepository.

Checklist:

  • I am adding a new extension to a language.

  • I am adding a new language.

    • The extension of the new language is used in hundreds of repositories on GitHub.com.
    • I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a syntax highlighting grammar: [URL to grammar repo]
    • I have added a color
      • Hex value: #RRGGBB
      • Rationale:
    • I have updated the heuristics to distinguish my language from others using the same extension.
  • I am fixing a misclassified language

    • I have included a new sample for the misclassified language:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
  • I am changing the source of a syntax highlighting grammar

    • Old: [URL to grammar repo]
    • New: [URL to grammar repo]
  • I am updating a grammar submodule

  • I am adding new or changing current functionality

    • I have added or updated the tests for the new or changed functionality.
  • I am changing the color associated with a language

    • I have obtained agreement from the wider language community on this color change.
      • [URL to public discussion]
      • [Optional: URL to official branding guidelines for the language]

Add interfaces representing a generic "Repository" and "Diff", containing
functions currently handled by the Rugged repository instance in
'Linguist::Repository'. Inheriting from these interfaces will allow for
alternative implementations of the functions used to traverse and analyze a
repository, e.g. using a different API for Git storage or a different SCM
altogether.

For now, the interfaces are unused.

Signed-off-by: Victoria Dye <[email protected]>
Add Rugged implementations of the 'Repository', 'Diff', and 'Diff::Delta'
interfaces matching existing usage in 'Linguist::Repository' &
'Linguist::LazyBlob'. In a subsequent commit, this will allow us to
substitute an instance of the 'Repository' interface for what is currently
direct usage of a 'Rugged::Repository'.

Signed-off-by: Victoria Dye <[email protected]>
Change the 'repository' argument to 'Linguist::Repository' &&
'Linguist::LazyBlob' from a 'Rugged::Repository' to an instance of
'Linguist::Source::Repository'. This will allow users of Linguist to easily
configure and use a custom repository interface.

There are two methods that don't have a clear or useful parallel in a
generic repository interface and are more specific to Rugged: 'read_index'
and 'current_tree'. For both of these methods, raise a 'NotImplementedError'
if any repository instance that's not a 'RuggedRepository' calls them, and
return the legacy value for ones that are 'RuggedRepository'.

Also for backward-compatibility purposes, users can still initialize a
'Linguist::Repository' or 'Linguist::LazyBlob' with a 'Rugged::Repository';
it will be wrapped in the 'Linguist::Source::RuggedRepository' in the
initialization method.

Finally, update 'test_repository.rb' to test both a 'RuggedRepository'
instance and a mocked always-empty repository.

Signed-off-by: Victoria Dye <[email protected]>
Add a 'method_missing' implementation to 'RuggedRepository' to delegate all
unmatched methods to the internal Rugged objects of each. This is done for
backward-compatibility purposes; users of Linguist can access the
'repository' member of 'Linguist::Repository', so they may rely on
interacting directly with the 'Rugged::Repository'. The 'method_missing'
delegate ensures that those interactions will generally continue to work
(the main exception being explicit type checking performed on
'Linguist::Repository.repository').

Signed-off-by: Victoria Dye <[email protected]>
@vdye vdye marked this pull request as ready for review October 18, 2024 15:14
@vdye vdye requested a review from a team as a code owner October 18, 2024 15:14
@lildude lildude added this pull request to the merge queue Nov 25, 2024
Merged via the queue into github-linguist:main with commit eb88732 Nov 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants