Week14

Week 14 | Presentations

This week was really exciting because we had two project presentations lined up — one from our classmates on Preswald, and one from our team on Huggingface.

Preswald Presentation

Our classmates introduced Preswald, a lightweight framework for building data-visualization dashboards using Python and HTML. Rather than open-sourcing every feature, Preswald follows an Open Core model: the core library is free and extensible, but advanced widgets and enterprise integrations require a paid license.

Key takeaways from their demo:

  • Community health
    • A small but active subreddit keeps conversation going.
    • The official Slack channel exists, but it’s under-moderated and often slow to respond.
  • Core contributions
    1. LaTeX support – rendering math in charts
    2. Sidebar functionality – handling tokenized menu items proved tricky
    3. Interactive chat interface – built against their published docs
    4. Debug panel – streamlined development and visual testing

They were candid about the hurdles they encountered:

  • Sparse documentation made onboarding the codebase a steep climb.
  • Rapid churn in upstream releases frequently broke their local branches.
  • Community support was uneven, leaving some PRs in limbo.

Tip: Schedule quick sync-calls with Preswald maintainers and advocate for a formal Contributing Guide to smooth the initial setup process.


Huggingface Presentation

Our team’s presentation covered our final OSSD project on Huggingface 🤗 — one of the largest open-source AI ecosystems. We structured our talk into five parts:

1. What Is Huggingface?
Huggingface provides a unified hub for ML models, datasets, and tools.

  • Transformers (e.g., DeepSeek, Llama, Nemotron)
  • Diffusers (e.g., Stable Diffusion XL, ControlNet)
  • Datasets (e.g., Wikipedia, CommonsenseQA, COCO)

2. Contribution Statistics

  • Total contributions: 22
    • 11 PRs (4 merged)
    • 11 issues
  • Breakdown by category:
    • Feature Implementation — 3
    • Bug Fixes — 3
    • Bug Analysis — 4
    • Documentation Improvement — 4
    • Community Support — 3
    • Infrastructure Improvement — 3
    • Test Additions — 2

3. Our Workflow

  • Slack coordination: Clear channels for feature requests, reviews, and hotfixes
  • Branch strategy: Each contributor claimed an issue, created a feature branch, then pinged reviewers
  • Pull-request etiquette: Descriptive titles, linked issues, and inline code comments kept reviews efficient

Individual Contributions

  • Haocheng Lu: Patched indexing logic, optimized attention masks, corrected padding in the image processor, fixed documentation links, and added configurable head dimensions for MoE models.
  • Yufeng Xu: Closed outdated bug branches, enforced type consistency, stabilized KV-cache parameters, and improved the installation guide.
  • Haochen Yang: Refactored complex code paths for clarity and reliability, and patched edge-case behaviors in the training loop.
  • Minjun Zhu: Enhanced documentation clarity, wrote comprehensive tests for cache updates and tensor-parallel features, added “resume checkpoint” support in ClearML, and provided example code for fine-tuning SigLIP2.

Group Contribution: Time-Based Strategy

  • Motivation: Step/epoch-based evaluation and saving can drift across different hardware environments.
  • Process:
    1. Claimed the feature request for eval_strategy=TIME.
    2. Iterated with maintainers on PR formatting and unit tests.
    3. Verified through three test scenarios:
      • Evaluation trigger (eval_minutes=1)
      • Checkpoint saving (save_minutes=1)
      • Logging output (logging_minutes=1)

Final Thoughts

Comparing the two projects highlighted the spectrum of open-source engagement:

  • Preswald is nimble and still carving out its ecosystem, but needs stronger documentation and community processes.
  • Huggingface boasts a mature infrastructure with clear contribution pathways, yet still relies on dedicated testers and communicators.

No matter the scale, impactful open-source work comes down to:

  1. Deep codebase familiarity
  2. Transparent communication
  3. Rigorous testing
  4. Persistence through setbacks

Really proud of everyone’s efforts this week! 🎉

Written before or on April 27, 2025