Week 14

Weekly Progress Update (April 14–15 & April 21–22)

Meeting Participants: Haocheng Lu, Yufeng Xu, Haochen Yang, Minjun Zhu
Meeting Mode: In Person & Online

Group Accomplishments

  • Continued working on our individual assigned issues.
  • Shared debugging strategies and provided feedback on each other’s open pull requests.
  • Reviewed the progress on issues from previous meetings and identified the ones actively being worked on.
  • Prepared presentation slides for upcoming meetings.
  • Practiced delivering the presentation to ensure clarity and flow.

Individual Accomplishments

  • Haocheng Lu:
    Added support for manually setting head_dim in Qwen2 MoE models. Updated the configuration, modified the attention module, and added corresponding tests to align with behavior seen in Llama and Mixtral models.
    PR #37643

  • Yufeng Xu:
    Submitted a pull request to fix incorrect installation instructions, currently under review.
    PR #37640

  • Haochen Yang:
    Investigated a state dictionary bug in IterableDataset (datasets library) and started testing a proposed local fix. Documented findings in the related issue discussions.

  • Minjun Zhu:
    Submitted a PR for adding tests for the new Tensor Parallel integration.
    PR #37596
    Submitted a PR to add resume checkpoint support for the ClearML callback, including a new test file.
    PR #37635
    Provided the issue opener with base code for fine-tuning the SigLIP2 model.
    Issue #37627

Written before or on April 27, 2025