Week 14

Weekly Progress Update (April 14–15 & April 21–22)

Meeting Participants: Haocheng Lu, Yufeng Xu, Haochen Yang, Minjun Zhu
Meeting Mode: In Person & Online

Continued working on our individual assigned issues.
Shared debugging strategies and provided feedback on each other’s open pull requests.
Reviewed the progress on issues from previous meetings and identified the ones actively being worked on.
Prepared presentation slides for upcoming meetings.
Practiced delivering the presentation to ensure clarity and flow.

Haocheng Lu:
Added support for manually setting head_dim in Qwen2 MoE models. Updated the configuration, modified the attention module, and added corresponding tests to align with behavior seen in Llama and Mixtral models.
➔ PR #37643
Yufeng Xu:
Submitted a pull request to fix incorrect installation instructions, currently under review.
➔ PR #37640
Haochen Yang:
Investigated a state dictionary bug in IterableDataset (datasets library) and started testing a proposed local fix. Documented findings in the related issue discussions.
Minjun Zhu:
Submitted a PR for adding tests for the new Tensor Parallel integration.
➔ PR #37596
Submitted a PR to add resume checkpoint support for the ClearML callback, including a new test file.
➔ PR #37635
Provided the issue opener with base code for fine-tuning the SigLIP2 model.
➔ Issue #37627

Written before or on April 27, 2025