ABSTRACT
Recently, search engines have invested significant effort to answering entity--attribute queries from structured data, but have focused mostly on queries for frequent attributes. In parallel, several research efforts have demonstrated that there is a long tail of attributes, often thousands per class of entities, that are of interest to users. Researchers are beginning to leverage these new collections of attributes to expand the ontologies that power search engines and to recognize entity--attribute queries. Because of the sheer number of potential attributes, such tasks require us to impose some structure on this long and heavy tail of attributes. This paper introduces the problem of organizing the attributes by expressing the compositional structure of their names as a rule-based grammar. These rules offer a compact and rich semantic interpretation of multi-word attributes, while generalizing from the observed attributes to new unseen ones. The paper describes an unsupervised learning method to generate such a grammar automatically from a large set of attribute names. Experiments show that our method can discover a precise grammar over 100,000 attributes of {\sc Countries} while providing a 40-fold compaction over the attribute names. Furthermore, our grammar enables us to increase the precision of attributes from 47\% to more than 90\% with only a minimal curation effort. Thus, our approach provides an efficient and scalable way to expand ontologies with attributes of user interest.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007. Google ScholarDigital Library
- K. Bellare, P. P. Talukdar, G. Kumaran, F. Pereira, M. Liberman, A. McCallum, and M. Dredze. Lightly-supervised attribute extraction. NIPS 2007 Workshop on Machine Learning for Web Search, 2007.Google Scholar
- R. Blanco, B. B. Cambazoglu, P. Mika, and N. Torzec. Entity recommendations in web search. In The 12th International Semantic Web Conference (ISWC 2013), pages 33--48. Springer, 2013. Google ScholarDigital Library
- K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, pages 1247--1250, 2008. Google ScholarDigital Library
- K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22--29, 1990. Google ScholarDigital Library
- O. P. Damani and S. Ghonge. Appropriately incorporating statistical significance in PMI. In EMNLP, 2013.Google Scholar
- M.-C. de Marneffe, B. MacCartney, and C. D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006.Google Scholar
- C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In SIGKDD, 2008. Google ScholarDigital Library
- O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Commun. ACM, 51(12):68--74, 2008. Google ScholarDigital Library
- A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, pages 1535--1545, 2011. Google ScholarDigital Library
- L. A. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013. Google ScholarDigital Library
- R. Ghani, K. Probst, Y. L. 0002, M. Krema, and A. E. Fano. Text mining for product attribute extraction. SIGKDD Explorations, 8(1):41--48, 2006. Google ScholarDigital Library
- A. Grycner and G. Weikum. HARPY: hypernyms and alignment of relational paraphrases. In COLING 2014, pages 2195--2204, 2014.Google Scholar
- R. Gupta, A. Y. Halevy, X. Wang, S. E. Whang, and F. Wu. Biperpedia: An ontology for search applications. PVLDB, 7(7):505--516, 2014. Google ScholarDigital Library
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING, 1992. Google ScholarDigital Library
- T. Lee, Z. Wang, H. Wang, and S.-W. Hwang. Attribute extraction and scoring: A probabilistic approach. In ICDE, pages 194--205, 2013. Google ScholarDigital Library
- W. S. Lee and B. Liu. Learning with positive and unlabeled examples using weighted logistic regression. In ICML, 2003.Google ScholarDigital Library
- Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, pages 523--534, 2012. Google ScholarDigital Library
- T. Mikolov, Q. V. Le, and I. Sutskever. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.Google ScholarDigital Library
- T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling. Never-ending learning. In AAAI-15, 2015. Google ScholarDigital Library
- C. J. Mungall. Obol: Integrating language and meaning in bio-ontologies. Comparative and Functional Genomics, vol. 5, no. 6--7, pp. 509--520, 2004. doi:10.1002/cfg.435, 5(6--7):509--520, 2004. Google ScholarDigital Library
- N. Nakashole, G. Weikum, and F. Suchanek. Discovering semantic relations from the web and organizing them with patty. SIGMOD Rec., 42(2), July 2013. Google ScholarDigital Library
- N. Nakashole, G. Weikum, and F. M. Suchanek. PATTY: A taxonomy of relational patterns with semantic types. In EMNLP-CoNLL, pages 1135--1145, 2012. Google ScholarDigital Library
- M. Pasca. Turning web text and search queries into factual knowledge: Hierarchical class attribute extraction. In AAAI, 2008. Google ScholarDigital Library
- M. Pasca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In IJCAI, volume 7, pages 2832--2837, 2007. Google ScholarDigital Library
- A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In hltemnlp2005, pages 339--346, Vancouver, Canada, 2005. Google ScholarDigital Library
- S. Sarawagi and R. Gupta. Accurate max-margin training for structured output spaces. In ICML, 2008. Google ScholarDigital Library
- A. Singhal. Introducing the knowledge graph: things, not strings. Official Google Blog, May, 2012.Google Scholar
- N. A. Smith and J. Eisner. Guiding unsupervised grammar induction using contrastive estimation. In In Proc. of IJCAI Workshop on Grammatical Inference Applications, pages 73--82, 2005.Google Scholar
- R. Socher and C. D. Manning. Deep learning for NLP (without magic). In Tutorial at NAACL, 2013. Google ScholarDigital Library
- V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. From baby steps to leapfrog: How "less is more" in unsupervised dependency parsing. In HLT-NAACL. The Association for Computational Linguistics, 2010. Google ScholarDigital Library
- S. Tratz and E. H. Hovy. A taxonomy, dataset, and classifier for automatic noun compound interpretation. In ACL, 2010. Google ScholarDigital Library
- I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453--1484, 2005. Google ScholarDigital Library
- P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37(1):141--188, 2010. Google ScholarDigital Library
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD. ACM, 2012. Google ScholarDigital Library
- M. Yahya, S. Whang, R. Gupta, and A. Y. Halevy. Renoun: Fact extraction for nominal attributes. In EMNLP, pages 325--335, 2014.Google ScholarCross Ref
Index Terms
Discovering Structure in the Universe of Attribute Names
Recommendations
Attribute relation learning for zero-shot classification
In computer vision and pattern recognition communities, one often-encountered problem is that the limited labeled training data are not enough to cover all the classes, which is also called the zero-shot learning problem. For addressing that challenging ...
Attribute-assisted reranking for web image retrieval
MM '12: Proceedings of the 20th ACM international conference on MultimediaImage search reranking is an effective approach to refine the text-based image search result. Most existing reranking approaches are based on low-level visual features. In this paper, we propose to exploit semantic attributes for image search reranking. ...
Hidden access structure ciphertext policy attribute based encryption with constant length ciphertext
ADCONS'11: Proceedings of the 2011 international conference on Advanced Computing, Networking and SecurityIn Cipher text Policy Attribute Based Encryption (CP-ABE) scheme, a user is able to decrypt the cipher text only if the pre-specified access structure (also called ciphertext policy) in the ciphertext, matches the attributes defined in the secret key. ...
Comments