Counting
by Visual Prompting

Technical Report Try Demo!

INTRODUCTION VIDEO

Interactive Object Counting

T-Rex is an object counting model that can first detect then count any objects through visual prompting, which is highlighted by the following features:

  • Open-Set: T-Rex possess the capacity to count any object, without constraints on predefined categories.
  • Visual Promptable: Users can provide visual examples to specify the objects for counting.
  • Intuitive Visual Feedback: T-Rex is a detection-based model that allows for visual feedback (i.e. detected boxes), enabling users to assess the accuracy of the result.
  • Interactive: Users can actively participate in the counting process to rectify any errors.

Structure of T-Rex

T-Rex consists of three components, including an image encoder, a prompt encoder and a box decoder:

  • User Interaction: Users provide visual examples by marking the reference image, specifying the object of interest. T-Rex then detects all instances with a similar pattern in the target image, counting the total detected boxes.
  • Interactive Prompt Addition: Users can interactively add prompts on missed or falsely detected objects, using T-Rex's visualization output for guidance.
  • Continual Refinement and Efficiency: This process allows for ongoing improvement of T-Rex's predictions, with users having the ability to judge the accuracy of results. Importantly, this interaction is both fast and resource-efficient, as each round of interaction only requires activating T-Rex's decoder.

Workflows of T-Rex

T-Rex offers three major interactive workflows:

  • Positive-only Prompt Mode: T-Rex can detect then count similar objects in an image with just a single click or box drawing. Additional visual prompts can be added for densely packed or small objects..
  • Positive with Negative Prompt Mode: To address false detections caused by similar objects, users can correct the outcome by applying negative prompts to the erroneously detected objects.
  • Cross Image Prompt Mode: This feature supports counting across different reference and target images, ideal for automatic annotation. Users prompt on one image, and T-Rex annotates the others automatically.

Application Forms

T-Rex is designed for counting, meanwhile, since it is an open-set detection model, it can be used for any detection task, as well as for autaomatic annotation scenarios.

Object Counting

T-Rex can be applyed to various domains for counting including but not limited to Agriculture, Industry, Livestock, Biology, Medical, Retail, Electronic, Transportation, Logistics, Human, etc.

Automatic Annotation

T-Rex is also an open-set object detector, which can be applied for automatic annotaion. It process exponential zero-shot detection capability, and offers strong performance in dense and overlapping scenes.

Application Examples

We list some application scenarios of T-Rex to show its powerful zero-shot counting capability.

Authors

Qing Jiang

Feng Li

Tianhe Ren

Shilong Liu

Zhaoyang Zeng

Kent Yu

Lei Zhang

Acknowledgments

We would like to express our deepest gratitude to multiple teams within IDEA for their substantial support in the T-Rex project. We sincerely appreciate the CVR team, whose essential contributions and technical expertise were pivotal in realizing the project's goals. We thank Wei Liu, Xiaohui Wang, and Yakun Hu from the Product team for their strategic insights and innovative input in the development of the demo. Appreciation is also extended to Yuanhao Zhu and Ce Feng from the Front-End team for their technical excellence and dedication. The robust solutions provided by Weiqiang Hu, Xiaoke Jiang, and Zhiqiang Li from the Back-End team were also crucial in supporting the project's infrastructure. We also thank Jie Yang for helpful discussion and Ling-Hao Chen for helping in video demos.

Contact Us

If you have any problems, feel free to contact us.