skip to main content
10.1145/2872427.2882975acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Discovering Structure in the Universe of Attribute Names

Published:11 April 2016Publication History

ABSTRACT

Recently, search engines have invested significant effort to answering entity--attribute queries from structured data, but have focused mostly on queries for frequent attributes. In parallel, several research efforts have demonstrated that there is a long tail of attributes, often thousands per class of entities, that are of interest to users. Researchers are beginning to leverage these new collections of attributes to expand the ontologies that power search engines and to recognize entity--attribute queries. Because of the sheer number of potential attributes, such tasks require us to impose some structure on this long and heavy tail of attributes. This paper introduces the problem of organizing the attributes by expressing the compositional structure of their names as a rule-based grammar. These rules offer a compact and rich semantic interpretation of multi-word attributes, while generalizing from the observed attributes to new unseen ones. The paper describes an unsupervised learning method to generate such a grammar automatically from a large set of attribute names. Experiments show that our method can discover a precise grammar over 100,000 attributes of {\sc Countries} while providing a 40-fold compaction over the attribute names. Furthermore, our grammar enables us to increase the precision of attributes from 47\% to more than 90\% with only a minimal curation effort. Thus, our approach provides an efficient and scalable way to expand ontologies with attributes of user interest.

References

  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bellare, P. P. Talukdar, G. Kumaran, F. Pereira, M. Liberman, A. McCallum, and M. Dredze. Lightly-supervised attribute extraction. NIPS 2007 Workshop on Machine Learning for Web Search, 2007.Google ScholarGoogle Scholar
  3. R. Blanco, B. B. Cambazoglu, P. Mika, and N. Torzec. Entity recommendations in web search. In The 12th International Semantic Web Conference (ISWC 2013), pages 33--48. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22--29, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. O. P. Damani and S. Ghonge. Appropriately incorporating statistical significance in PMI. In EMNLP, 2013.Google ScholarGoogle Scholar
  7. M.-C. de Marneffe, B. MacCartney, and C. D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006.Google ScholarGoogle Scholar
  8. C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In SIGKDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Commun. ACM, 51(12):68--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, pages 1535--1545, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. A. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Ghani, K. Probst, Y. L. 0002, M. Krema, and A. E. Fano. Text mining for product attribute extraction. SIGKDD Explorations, 8(1):41--48, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Grycner and G. Weikum. HARPY: hypernyms and alignment of relational paraphrases. In COLING 2014, pages 2195--2204, 2014.Google ScholarGoogle Scholar
  14. R. Gupta, A. Y. Halevy, X. Wang, S. E. Whang, and F. Wu. Biperpedia: An ontology for search applications. PVLDB, 7(7):505--516, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Lee, Z. Wang, H. Wang, and S.-W. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. S. Lee and B. Liu. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, pages 523--534, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Mikolov, Q. V. Le, and I. Sutskever. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013.Google ScholarGoogle Scholar
  20. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. Never-ending learning. In AAAI-15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. J. Mungall. Obol: Integrating language and meaning in bio-ontologies. Comparative and Functional Genomics, vol. 5, no. 6--7, pp. 509--520, 2004. doi:10.1002/cfg.435, 5(6--7):509--520, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Nakashole, G. Weikum, and F. Suchanek. Discovering semantic relations from the web and organizing them with patty. SIGMOD Rec., 42(2), July 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Nakashole, G. Weikum, and F. M. Suchanek. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL, pages 1135--1145, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Pasca. Turning web text and search queries into factual knowledge: Hierarchical class attribute extraction. In AAAI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Pasca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In IJCAI, volume 7, pages 2832--2837, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In hltemnlp2005, pages 339--346, Vancouver, Canada, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sarawagi and R. Gupta. Accurate max-margin training for structured output spaces. In ICML, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Singhal. Introducing the knowledge graph: things, not strings. Official Google Blog, May, 2012.Google ScholarGoogle Scholar
  30. N. A. Smith and J. Eisner. Guiding unsupervised grammar induction using contrastive estimation. In In Proc. of IJCAI Workshop on Grammatical Inference Applications, pages 73--82, 2005.Google ScholarGoogle Scholar
  31. R. Socher and C. D. Manning. Deep learning for NLP (without magic). In Tutorial at NAACL, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. From baby steps to leapfrog: How "less is more" in unsupervised dependency parsing. In HLT-NAACL. The Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Tratz and E. H. Hovy. A taxonomy, dataset, and classifier for automatic noun compound interpretation. In ACL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453--1484, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Yahya, S. Whang, R. Gupta, and A. Y. Halevy. Renoun: Fact extraction for nominal attributes. In EMNLP, pages 325--335, 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Discovering Structure in the Universe of Attribute Names

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '16: Proceedings of the 25th International Conference on World Wide Web
          April 2016
          1482 pages
          ISBN:9781450341431

          Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

          Publisher

          International World Wide Web Conferences Steering Committee

          Republic and Canton of Geneva, Switzerland

          Publication History

          • Published: 11 April 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader