The average person with a computer will soon have access to the world’s collections of digital video and images. However, unlike text that can be alphabetized or numbers that can be ordered, image and video has no general language to aid in its organization. Tools that can “see” and “understand” the content of imagery are still in their infancy, but they are now at the point where they can provide substantial assistance to users in navigating through visual media. This paper describes new tools based on “vision texture” for modeling image and video. The focus of this research is the use of a society of low-level models for performing relatively high-level tasks, such as retrieval and annotation of image and video libraries. This paper surveys recent and present research in this fast-growing area.