Skip to main content
Log in

Part and Attribute Discovery from Relative Annotations

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Part and attribute based representations are widely used to support high-level search and retrieval applications. However, learning computer vision models for automatically extracting these from images requires significant effort in the form of part and attribute labels and annotations. We propose an annotation framework based on comparisons between pairs of instances within a set, which aims to reduce the overhead in manually specifying the set of part and attribute labels. Our comparisons are based on intuitive properties such as correspondences and differences, which are applicable to a wide range of categories. Moreover, they require few category specific instructions and lead to simple annotation interfaces compared to traditional approaches. On a number of visual categories we show that our framework can use noisy annotations collected via “crowdsourcing” to discover semantic parts useful for detection and parsing, as well as attributes suitable for fine-grained recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/Florida_Scrub_Jay.

References

  • Agarwal, A., & Triggs, B. (2006). Recovering 3d human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.

    Article  Google Scholar 

  • Berg, T., Berg, A., & Shih, J. (2010). Automatic attribute discovery and characterization from noisy web data. In European Conference on Computer Vision.

  • Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In SIGIR (pp. 127–134).

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

  • Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. In European Conference on Computer Vision.

  • Bourdev, L., Maji, S., & Malik, J. (2011). Describing people: A poselet-based approach to attribute classication. In International Conference on Computer Vision.

  • Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In International Conference on Computer Vision.

  • Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., & Belongie, S. (2010). Visual recognition with humans in the loop. In K. Daniilidis, P. Maragos & N. Paragios (Eds.), Computer vision-ECCV 2010 (pp. 438–451). Berlin: Springer.

  • Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16, 79–85.

  • Bush, V. (1945). The atlantic monthly. As we may think.

  • Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014). Describing textures in the wild. In Computer Vision and Pattern Recognition (CVPR).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In N. Dalal & B. Triggs (Eds.), Computer Vision and Pattern Recognition (pp. 886–893).

  • Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Computer vision-ECCV 2012 (pp. 158–172). Berlin: Springer.

  • Duan, K., Parikh, D., Crandall, D., & Grauman, K. (2012). Discovering localized attributes for fine-grained recognition. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3474–3481).

  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Farhadi, A., Endres, I., & Hoiem, D. (2010). Attribute-centric recognition for cross-category generalization. In Computer Vision and Pattern Recognition.

  • Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transaction of Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61, 55–79.

    Article  Google Scholar 

  • Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In Computer Vision and Pattern Recognition.

  • Frome, A., Singer, Y., & Malik, J. (2007). Image retrieval and classification using local distance functions. In Advances in neural information processing systems 19: Proceedings of the 2006 conference (Vol. 19, p. 417). MIT Press.

  • Girshick, R. B., Felzenszwalb, P. F., & McAllester, D. (2012) Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/rbg/latent-release5/.

  • Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato & C. Schmid (Eds.), Computer vision-ECCV 2012 (pp. 459–472). Berlin: Springer.

  • Itti, L., & Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2(3), 194–203.

    Article  Google Scholar 

  • Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 133–142). ACM.

  • Kovashka, A., Parikh, D., & Grauman, K. (2012). Whittlesearch: Image search with relative attribute feedback. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2973–2980). IEEE.

  • Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision.

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In ECCV workshop on statistical learning in computer vision (pp. 17–32).

  • Maji, S. (2011). Large scale image annotations on amazon mechanical turk. Tech. Rep. UCB/EECS-2011-79, EECS Department, University of California, Berkeley (2011). http://www.eecs.berkeley.edu/Pubs/TechRpts/2011/EECS-2011-79.html

  • Maji, S. (2012). Discovering a lexicon of parts and attributes. In Second International Workshop on Parts and Attributes, ECCV.

  • Maji, S., & Shakhanarovich, G. (2013). Part discovery from partial correspondence. In Computer vision and pattern recognition.

  • Maji, S., & Shakhnarovich, G. (2012). Part annotations via pairwise correspondence. In Human computation workshops at the AAAI.

  • Malisiewicz, T., & Efros, A. (2009). Beyond categories: The visual memex model for reasoning about object relationships. In Advances in neural information processing systems (pp. 1222–1230).

  • Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Ensemble of exemplar-svms for object detection and beyond. In International conference on computer vision.

  • Parikh, D., & Grauman, K. (2011). Interactive discovery of task-specic nameable attributes. In Workshop on fine-grained visual categorization, CVPR.

  • Patterson, G., & Hays, J. (2012). Sun attribute database: Discovering, annotating, and recognizing scene attributes. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2751–2758). IEEE.

  • Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato & C. Schmid (Eds.), Computer vision-ECCV 2012 (pp. 73–86). Berlin: Springer.

  • Tamura, H., Mori, S., & Yamawaki, T. (1978). Textural features corresponding to visual perception. IEEE Transactions on Systems, Man and Cybernetics, 8(6), 460–473.

    Article  Google Scholar 

  • Tamuz, O., Liu, C., Belongie, S., Shamir, O., & Kalai, A. (2011). Adaptively learning the crowd kernel. In International conference on machine learning (ICML). Bellevue, WA.

  • Von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 319–326). ACM.

  • Von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In Proceedings of the SIGCHI conference on Human Factors in computing systems (pp. 55–64). ACM.

  • Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Computer vision and pattern recognition.

  • Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., & Perona, P. (2010). Caltech-UCSD birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology.

  • Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3485–3492). IEEE.

  • Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1385–1392). IEEE.

  • Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2879–2886). IEEE.

Download references

Acknowledgments

Part of the work was done by SM during a workshop (http://www.clsp.jhu.edu/workshops/archive/ws-12/groups/tduosn/) at the CLSP, Johns Hopkins University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subhransu Maji.

Additional information

Communicated by Serge Belongie and Kristen Grauman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maji, S., Shakhnarovich, G. Part and Attribute Discovery from Relative Annotations. Int J Comput Vis 108, 82–96 (2014). https://doi.org/10.1007/s11263-014-0716-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0716-6

Keywords

Navigation