Update pytorch scraper, include various 2.x versions #2513

blahgeek · 2025-05-31T03:23:40Z

Various 2.x versions are included separately. Pytorch versions are not backward compatible, it has different compatibilities between CUDA etc, so people may use specific versions for a extended period of time.
Removed the type replacement table for get_type. Instead, get the type from breadcrumbs directly. IMO this produces better results that matches the index in the original website (the left side menu in docs.python.org). Also, the TYPE_REPLACEMENT table was opiniated and hard to maintain across versions.
Always include default entry (removed include_default_entry? function). I don't see the downside of this. Previously some pages are missing because of this (e.g. torchrun https://docs.pytorch.org/docs/1.13/elastic/run.html)

Updated the versions and releases in the scraper file
Ensured the license is up-to-date
Ensured the icons and the SOURCE file in public/icons/your_scraper_name/ are up-to-date if the documentation has a custom icon
Ensured self.links contains up-to-date urls if self.links is defined
Tested the changes locally to ensure:
- The scraper still works without errors
- The scraped documentation still looks consistent with the rest of DevDocs
- The categorization of entries is still good

Siteproxy

1. Various 2.x versions are included separately. Pytorch versions are not backward compatible, it has different compatibilities between CUDA etc, so people may use specific versions for a extended period of time. 2. Removed the type replacement table for `get_type`. Instead, get the type from breadcrumbs directly. IMO this produces better results that matches the index in the original website (the left side menu in docs.python.org). Also, the `TYPE_REPLACEMENT` table was opiniated and hard to maintain across versions. 3. Always include default entry (removed `include_default_entry?` function). I don't see the downside of this. Previously some pages are missing because of this (e.g. torchrun https://docs.pytorch.org/docs/1.13/elastic/run.html)

blahgeek · 2025-05-31T03:24:39Z

cc @ArkciaTheDragon for modifying some code from #2137

ArkciaTheDragon · 2025-05-31T09:23:48Z

Tested on PyTorch 2.7 — looks cool!

Also +1 to the suggestion of using separate PyTorch versions. For context: PyTorch 2.7 generates only 16.2MB of offline data, which is actually a bit smaller than Python 3.12’s ~18MB. I hope hosting the versions separately won’t be an issue in terms of space.

simon04

Thank you!

blahgeek requested a review from a team as a code owner May 31, 2025 03:23

pytorch: add redirects

Loading
Loading status checks…

23ffcc3

simon04 approved these changes Jun 1, 2025

View reviewed changes

simon04 merged commit fafde1b into freeCodeCamp:main Jun 1, 2025
2 checks passed

simon04 added the docs/update label Jun 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Update pytorch scraper, include various 2.x versions #2513

Update pytorch scraper, include various 2.x versions #2513

blahgeek commented May 31, 2025

Siteproxy

搜索引擎

常用网站

新闻网站

海外论坛

blahgeek commented May 31, 2025

Uh oh!

ArkciaTheDragon commented May 31, 2025

Uh oh!

simon04 left a comment

Uh oh!

Uh oh!

Uh oh!

Update pytorch scraper, include various 2.x versions #2513

Update pytorch scraper, include various 2.x versions #2513

Conversation

blahgeek commented May 31, 2025

Siteproxy

搜索引擎

常用网站

新闻网站

海外论坛

Uh oh!

blahgeek commented May 31, 2025

Uh oh!

ArkciaTheDragon commented May 31, 2025

Uh oh!

Uh oh!

simon04 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!