publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- Visual Anchors Are Strong Information Aggregators For Multimodal Large Language ModelIn NeurIPS , 2024
- DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset CurationIn NeurIPS , 2024
- Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity LearningIn Forty-first International Conference on Machine Learning , 2024
- InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language ModelIn Findings of the Association for Computational Linguistics ACL 2024 , 2024
- Law of Vision Representation in MLLMsarXiv preprint arXiv:2408.16357, 2024
- InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical ReasoningarXiv preprint arXiv:2409.12568, 2024
- Vitar: Vision transformer with any resolutionarXiv preprint arXiv:2403.18361, 2024
- InfiMM-HD: A leap forward in high-resolution multimodal understandingarXiv preprint arXiv:2403.01487, 2024
- Exploring the Reasoning Abilities of Multimodal Large Language Models : A Comprehensive Survey on Emerging Trends in Multimodal ReasoningarXiv e-prints, 2024
- COCO is "ALL” You Need for Visual Instruction Fine-tuningarXiv e-prints, 2024
2023
- Disentangled Representation Learning with Causality for Unsupervised Domain AdaptationIn Proceedings of the 31st ACM International Conference on Multimedia , 2023
- Transmot: Spatial-temporal graph transformer for multiple object trackingIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2023
- Mmptrack: Large-scale densely annotated multi-camera multiple people tracking benchmarkIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2023
- Deep frequency filtering for domain generalizationIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023
- RefineVIS: Video Instance Segmentation with Temporal Attention RefinementarXiv preprint arXiv:2306.04774, 2023
- Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion ModelingarXiv preprint arXiv:2310.06389, 2023
- Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image SynthesisarXiv preprint arXiv:2311.17126, 2023
- Improving In-Context Learning in Diffusion Models with Visual Context-Modulated PromptsarXiv preprint arXiv:2312.01408, 2023
- InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language ModelsarXiv e-prints, 2023
- InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language ModelsarXiv e-prints, 2023
2022
- Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptationIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022
- Qualifier: Question-guided self-attentive multimodal fusion network for audio visual scene-aware dialogIn Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2022
- Sa-VQA: structured alignment of visual and semantic representations for visual question answeringarXiv preprint arXiv:2201.10654, 2022
- Consistent Video Instance Segmentation with Inter-Frame Recurrent AttentionarXiv preprint arXiv:2206.07011, 2022
2021
- 4D tracking utilizing depth data from multiple 3D camerasJul 2021US Patent 11,062,469
- Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identificationarXiv preprint arXiv:2103.13917, Jul 2021
- Writing by memorizing: Hierarchical retrieval-based medical report generationarXiv preprint arXiv:2106.06471, Jul 2021
2020
- Double-layer conditional random fields model for human action recognitionSignal Processing: Image Communication, Jul 2020
- Real-time 3d deep multi-camera trackingarXiv preprint arXiv:2003.11753, Jul 2020
- A benchmark dataset for understandable medical language translationarXiv preprint arXiv:2012.02420, Jul 2020
2019
- Real-time multiple people hand localization in 4d point cloudsarXiv preprint arXiv:1903.01695, Jul 2019
- Action4d: Online action recognition in the crowd and clutterIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , Jul 2019
- Sentiment recognition for short annotated GIFs using visual-textual fusionIEEE Transactions on Multimedia, Jul 2019
2018
- Image captioning at will: A versatile scheme for effectively injecting sentiments into image descriptionsarXiv preprint arXiv:1801.10121, Jul 2018
- Touch your heart: A tone-aware chatbot for customer care on social mediaIn Proceedings of the 2018 CHI conference on human factors in computing systems , Jul 2018
- End-to-End Convolutional Semantic EmbeddingsIn IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Jul 2018
- Action4d: Real-time action recognition in the crowd and clutterarXiv preprint arXiv:1806.02424, Jul 2018
- Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTMIn ACM Multimedia Conference, Seoul, Korea, 2018. , Jul 2018
- "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and AttentionIn ECCV 2018 , Jul 2018
- Risk Prediction on Electronic Health Records with Prior Medical KnowledgeIn SIGKDD , Jul 2018
- Kame: Knowledge-based attention model for diagnosis prediction in healthcareIn Proceedings of the 27th ACM International Conference on Information and Knowledge Management , Jul 2018
2017
- Image-based appraisal of real estate propertiesIEEE transactions on multimedia, Jul 2017
- Visual Sentiment Analysis by Attending on Local Image RegionsIn Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) , Jul 2017
- Cultural Diffusion and Trends in Facebook PhotographsIn ICWSM , Jul 2017
- Semantic natural language vector spaceOct 2017US Patent 9,792,534
- Image captioning with weak supervisionNov 2017US Patent 9,811,765
- Aesthetic quality assessment of photos with facesIn Image and Graphics: 9th International Conference, ICIG 2017, Shanghai, China, September 13-15, 2017, Revised Selected Papers, Part III 9 , Nov 2017
-
- When saliency meets sentiment: Understanding how image content invokes emotion and sentimentIn 2017 IEEE International Conference on Image Processing (ICIP) , Nov 2017
- Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networksIn Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining , Nov 2017
2016
- Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The BenchmarkIn The Thirtieth AAAI Conference on Artificial Intelligence (AAAI) , Nov 2016
- Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimediaIn Proceedings of the Ninth ACM international conference on Web search and data mining , Nov 2016
- Image captioning with semantic attentionIn CVPR 2016 , Nov 2016
- Voting with feet: who are leaving Hillary Clinton and Donald TrumpIn 2016 IEEE International Symposium on Multimedia (ISM) , Nov 2016
- User characteristic prediction using images posted in online social networksNov 2016US Patent 9,489,592
- Sampling for nyström extension-based spectral clustering: Incremental perspective and novel analysisACM Transactions on Knowledge Discovery from Data (TKDD), Nov 2016
- Sentiment and Emotion Analysis for Social Multimedia: Methodologies and ApplicationsIn ACM MM (DS) , Nov 2016
- Adaptive greedy dictionary selection for web media summarizationIEEE Transactions on Image Processing, Nov 2016
- The effect of pets on happiness: A data-driven approach via large-scale social mediaIn 2016 IEEE International Conference on Big Data (Big Data) , Nov 2016
- A picture tells a thousand words—About you! User interest profiling from user generated visual contentSignal Processing, Nov 2016
- Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networksIn Proceedings of the 24th ACM international conference on Multimedia , Nov 2016
2015
- Robust image sentiment analysis using progressively trained and domain transferred deep networksIn Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA. , Nov 2015
- Snap n’ Shop: Visual Search-Based Mobile Shopping Made a Breeze by Machine and Crowd IntelligenceIn IEEE International Conference on Semantic Computing , Nov 2015
- Sentiment analysis using social multimediaMultimedia Data Mining and Analytics: Disruptive Innovation, Nov 2015
- Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User’s Online Photo CollectionsIn AAAI International Conference on Weblogs and Social Media (ICWSM) , Nov 2015
- Joint Visual-Textual Sentiment Analysis with Deep Neural NetworksIn ACM Multimedia , Nov 2015
- A Multifaceted Approach to Social Multimedia-based Prediction of ElectionsIEEE Transactions on Multimedia, Nov 2015
- A Picture Tells a Thousand Words–About You! User Interest Profiling from User Generated Visual ContentarXiv preprint arXiv:1504.04558, Nov 2015
2014
- Transit tomography using probabilistic time geography: planning routes without a road mapJournal of Location Based Services, Nov 2014
- The eyes of the beholder: Gender prediction using images posted in online social networksIn IEEE International Conference on Data Mining, Workshop on Social Multimedia Data Mining , Nov 2014
- Inferring home location from user’s photo collections based on visual content and mobility patternsIn Proceedings of the 3rd ACM multimedia workshop on geotagging and its applications in multimedia , Nov 2014
2013
- Sentribute: image sentiment analysis from a mid-level perspectiveIn Proceedings of the second international workshop on issues of sentiment discovery and opinion mining , Nov 2013
- Towards understanding the effectiveness of election related images in social mediaIn 2013 IEEE 13th International Conference on Data Mining Workshops , Nov 2013
- Towards social imagematics: sentiment analysis in social multimediaIn Proceedings of the thirteenth international workshop on multimedia data mining , Nov 2013
- Are there cultural differences in event driven information propagation over social media?In Proceedings of the 2nd international workshop on Socially-aware multimedia , Nov 2013
2011
- An improved spectral clustering algorithm based on random walkFrontiers of Computer Science in China, Nov 2011
- Clusterability analysis and incremental sampling for nyström extension based spectral clusteringIn 2011 IEEE 11th International Conference on Data Mining , Nov 2011