Zilyu Ye

Student majoring in Artificial Intelligence at South China University of Technology


About Me

Hi, I'm Zilyu, a second-year undergraduate student majoring in Artificial Intelligence at South China University of Technology. I'm currently working on my research project on AIGC supervised by Prof. Qi Liu (opens new window) and Prof. Guo-Jun Qi (opens new window).

Sincerely looking for Ph.D. positions for fall 2026 and internships opportunities in for Summer 2024.

Research Interests

Computer Vision, Multimodal Learning, AIGC, Diffusion Model etc.


  • [March 2024] Submit a paper focusing on the open-domain visual storytelling task to CVPR 2024 workshop.

Education & Experiences

  • South China University of Technology (SCUT)
    Undergraduate major in Artificial Intelligence, 2022.9 - 2026.6 (expected)
  • Multimodal Computing and Emotional Interaction Lab in SCUT
    Research intern on AIGC supervised by Prof. Qi Liu (opens new window), 2023.11 - present
  • Machine Perception and Learning (MAPLE) Lab in Westlake University
    Research intern on AIGC supervised by Prof. Guo-Jun Qi (opens new window), 2024.3 - present
  • Undergraduate Student Robotic Lab (ROBOTIC) in SCUT
    Member of machine vision team participating in ROBOCON competition, 2022.12 - 2023.6

Papers and Projects

OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven Visual Storytelling

Zilyu Ye*, Jinxiu Liu*, Jinjin Cao, Zhiyang Chen, Ziwei Xuan, Mingyuan Zhou, Qi Liu, Guo-Jun Qi

Recently, the advancement and evolution of genera-tive AI have been highly compelling. In this paper, wepresent OpenStory, a large-scale dataset tailored for train-ing subject-focused story visualization models to generatecoherent and contextually relevant visual narratives. Ad-dressing the challenges of maintaining subject continuityacross frames and capturing compelling narratives, We propose an innovative pipeline that automates the extraction ofkeyframes from open-domain videos. It ingeniously employsvision-language models to generate descriptive captions,which are then refined by a large language model to ensurenarrative flow and coherence. Furthermore, advanced sub-ject masking techniques are applied to isolate and segmentthe primary subjects. Derived from diverse video sources,including YouTube and existing datasets, OpenStory offersa comprehensive open-domain resource, surpassing priordatasets confined to specific scenarios. With automatedcaptioning instead of manual annotation, high-resolutionimagery optimized for subject count per frame, and exten-sive frame sequences ensuring consistent subjects for tem-poral modeling, OpenStory establishes itself as an invalu-able benchmark. It facilitates advancements in subject-focused story visualization, enabling the training of modelscapable of comprehending and generating intricate multi-modal narratives from extensive visual and textual inputs.

VDU@CVPR 2024 workshop, oral

Awards & Honors

  • SCUT undergraduate scholoarship, 2022 -- Third Prize
  • Excellent Group Enterprise Scholarship, SCUT, 2023 -- Third Prize
  • Asia and Pacific Mathematical Modeling Contest —Third Prize
  • National College Student Robot Contest (ROBOCON) —Third Prize
  • National Undergraduate Mathematical Modeling Contest In Guangdong Province — Second Prize
  • SCUT Future Technology Institute "Alibaba Cloud Cup" Programming Competition — Third Prize
Last Updated: 4/7/2024, 2:35:57 PM