{"id":598,"date":"2016-08-09T11:51:32","date_gmt":"2016-08-09T11:51:32","guid":{"rendered":"http:\/\/doceng2016.caa.tuwien.ac.at\/?page_id=598"},"modified":"2016-08-24T14:12:14","modified_gmt":"2016-08-24T14:12:14","slug":"program-3","status":"publish","type":"page","link":"https:\/\/doceng2016.cvl.tuwien.ac.at\/?page_id=598","title":{"rendered":"Thursday, September 15th"},"content":{"rendered":"<div id=\"DocEng16Program\">\n<table class=\"page_table\" cellpadding=\"0\" cellspacing=\"0\">\n<tbody>\n<tr>\n<td class=\"left_td_column\">       &nbsp; <\/p>\n<div class=\"left_spacer\">&nbsp;<\/div>\n<\/td>\n<td class=\"central_td_column\">\n<div id=\"content\">\n<p class=\"days\"> <a href=\"https:\/\/doceng2016.caa.tuwien.ac.at\/?page_id=583\" class=\"program_days\">Overview<\/a>| <a href=\"https:\/\/doceng2016.cvl.tuwien.ac.at\/?page_id=594\" class=\"program_days\">Wednesday<\/a>| <a class=\"active_day program_days\" href=\"https:\/\/doceng2016.caa.tuwien.ac.at\/?page_id=598\">Thursday<\/a>| <a class=\"program_days\" href=\"https:\/\/doceng2016.caa.tuwien.ac.at\/?page_id=599\">Friday<\/a> <\/p>\n<p>  <script src=\"styles\/program.js\"> <\/script> <script>addEventHandler(window,'load',function () {Program.ping1('\/statistics\/page_access_x.cgi',Program.ping2,'m1=1')})<\/script> <script>Program.data={pr:540,co:148955,pk:'pd:148955:2016-09-15'}<\/script> <\/p>\n<div class=\"session\"> <a name=\"session:10867\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">09:30-10:30<\/span> <span class=\"title\">Session 8: Keynote II<\/span> <\/div>\n<p> <\/a> <\/p>\n<div class=\"session_chair\">\n<div class=\"chair_word\">Chair: <\/div>\n<div class=\"chair_names\"> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person21.html\" class=\"person\">Tamir Hassan<\/a> <\/div>\n<\/p><\/div>\n<table class=\"talks\">\n<tbody>\n<tr class=\"talk\">\n<td class=\"time\">09:30<\/td>\n<td> <a name=\"talk:31865\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31865\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person242.html\" class=\"person\">Peter Bi\u013eak<\/a> <\/div>\n<div class=\"title\">Design Is Not What You Think It Is<\/div>\n<div class=\"speaker\">SPEAKER: <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person242.html\">Peter Bi\u013eak<\/a> <\/div>\n<div class=\"abstract\">\n<p>ABSTRACT.  In this talk, Peter Bi\u013eak will examine the ways that current publishing  practices are rooted in the 19th century, and how in order to move forward, we may have to go back to the roots and reconnect with readers.  He will also talk about his recent project, Works That Work magazine, which set out to rethink publishing paradigms, starting with its financing, distribution and production. Works That Work aims to discuss design outside of the traditional design discourse, and Bi\u013eak will argue  for widening the understanding of the design discipline.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<div class=\"session\">\n<div class=\"coffeebreak\"> <span class=\"interval\">10:30-11:00<\/span> <span class=\"title\">Coffee Break<\/span> <\/div>\n<\/p><\/div>\n<div class=\"session\"> <a name=\"session:10868\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">11:00-12:15<\/span> <span class=\"title\">Session 9: Text Analysis II: Classification<\/span> <\/div>\n<p> <\/a> <\/p>\n<div class=\"session_chair\">\n<div class=\"chair_word\">Chair: <\/div>\n<div class=\"chair_names\"> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person44.html\">Michael Piotrowski<\/a> <\/div>\n<\/p><\/div>\n<table class=\"talks\">\n<tbody>\n<tr class=\"talk\">\n<td class=\"time\">11:00<\/td>\n<td> <a name=\"talk:30517\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30517\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person117.html\" class=\"person\">Salvatore Trani<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person118.html\">Diego Ceccarelli<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person119.html\">Claudio Lucchese<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person120.html\">Salvatore Orlando<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person121.html\">Raffaele Perego<\/a> <\/div>\n<div class=\"title\">SEL: a Unified Algorithm for Entity Linking and Saliency Detection<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  The Entity Linking task consists in automatically identifying and linking the entities mentioned in a text to their URIs in a given Knowledge Base, e.g., Wikipedia. Entity Linking has a large impact in several text analysis and information retrieval related tasks. This task  is very challenging due to natural language ambiguity. However, not all  the entities mentioned in a document have the same relevance and utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in a document,  also known as Salient Entities, is attracting increasing interest. In this paper we propose SEL, a novel supervised two-step algorithm comprehensively addressing both entity linking and saliency detection. The first step is based on a classifier aimed at identifying a set of candidate entities that are likely to be mentioned in the document, thus  maximizing the precision of the method without hindering its recall. The second step is still based on machine learning, and aims at choosing  from the previous set the entities that actually occur in the document.  Indeed, we tested two different versions of the second step, one aimed at solving only the entity linking task, and the other that, besides detecting linked entities, also scores them according to their saliency.  Experiments conducted on two different datasets show that the proposed algorithm outperforms state-of-the-art competitors, and is able to detect salient entities with high accuracy.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">11:30<\/td>\n<td> <a name=\"talk:30536\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30536\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person195.html\">Jan Oevermann<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person196.html\">Wolfgang Ziegler<\/a> <\/div>\n<div class=\"title\">Automated Intrinsic Text Classification for Component Content Management Applications in Technical Communication<\/div>\n<div class=\"speaker\">SPEAKER: <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person195.html\" class=\"person\">Jan Oevermann<\/a> <\/div>\n<div class=\"abstract\">\n<p>ABSTRACT.  Classification models are used in component content management to identify content components for retrieval, reuse and distribution. Intrinsic metadata, such as the assigned information class, play an important role in these tasks. With the increasing demand for efficient classification of content components, the sector of technical documentation needs mechanisms that allow for an automation of such tasks. Vector space model based approaches can lead to sufficient results, while maintaining good performance, but they must be adapted to the peculiarities that characterize modular technical documents. <\/p>\n<p>In this paper we will present domain specific differences, as well as  characteristics, that are special to the field of technical documentation and derive methods to adapt widespread classification and retrieval techniques for these tasks. We verify our approach with data provided from companies in the sector of plant and mechanical engineering and use it for supervised learning and automated classification.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">11:45<\/td>\n<td> <a name=\"talk:30537\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30537\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person180.html\">Mario Kubek<\/a> and <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person181.html\" class=\"person\">Herwig Unger<\/a> <\/div>\n<div class=\"title\">Centroid Terms as Text Representatives<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  Algorithms to topically cluster and classify texts rely on information about their semantic distances and similarities. Standard methods based on the bag-of-words model to determine this information return only rough estimations regarding the relatedness of texts. Moreover, they are  per se unable to find generalising terms or abstractions describing the  textual contents. A new method to determine centroid terms in texts and  to evaluate their similarity using those representing terms will be introduced. In first experiments, its results and advantages will be discussed.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">12:00<\/td>\n<td> <a name=\"talk:30529\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30529\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person214.html\">Phanucheep Chotnithi<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person184.html\">Atsuhiro Takasu<\/a> <\/div>\n<div class=\"title\">Frequent Multi-Byte Character Subtring Extraction using a Succinct Data Structure<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  The frequent string mining is widely used in text processing to extract  text features. Most researches have focused on text in single-byte characters. However, they have problems when applying to text represented with multi-byte charac- ters such as Japanese and Chinese text. The main drawback is huge memory usage for treating the multi-byte character strings. To solve this problem, we apply the wavelet tree-based compressed suffix array instead of the normal suffix array to  reduce the memory usage. Then, we propose a novel technique which utilizes the rank operation to improve the run-time efficiency. The experi- mental evaluation shows the  proposed method reduces the processing time by 45% compared to a method  using only compressed suffix array. It also shows the proposed method reduces the memory usage by 75%.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<div class=\"session notalk\"> <a name=\"session:10874\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">12:15-12:45<\/span> <span class=\"title\">Session 10: BoF: The Results<\/span> <\/div>\n<p> <\/a> <\/p>\n<div class=\"session_chair\">\n<div class=\"chair_word\">Chair: <\/div>\n<div class=\"chair_names\"> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person40.html\">Charles Nicholas<\/a> <\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div class=\"session\">\n<div class=\"lunchbreak\"> <span class=\"interval\">12:45-14:15<\/span> <span class=\"title\">Lunch Break<\/span> <\/div>\n<\/p><\/div>\n<div class=\"session notalk\"> <a name=\"session:10875\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">14:15-14:45<\/span> <span class=\"title\">Session 11: Workshop Session Recap<\/span> <\/div>\n<p> <\/a> <\/p>\n<div class=\"session_chair\">\n<div class=\"chair_word\">Chair: <\/div>\n<div class=\"chair_names\"> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person49.html\" class=\"person\">Sonja Schimmler<\/a> <\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div class=\"session\"> <a name=\"session:10869\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">14:45-15:05<\/span> <span class=\"title\">Session 12: Poster Lightning Talks<\/span> <\/div>\n<p> <\/a> <\/p>\n<table class=\"talks\">\n<tbody>\n<tr class=\"talk\">\n<td class=\"time\">14:45<\/td>\n<td> <a name=\"talk:31336\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31336\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person207.html\">Luciano Cabral<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person208.html\" class=\"person\">Manoel Neto<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person209.html\">Artur Borges<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person102.html\">Rafael Lins<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person210.html\" class=\"person\">Rinaldo Lima<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person100.html\">Rafael Ferreira<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person53.html\">Steven Simske<\/a> and <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person104.html\" class=\"person\">Marcelo Riss<\/a> <\/div>\n<div class=\"title\">Multilingual News Article Summarization  in Mobile Devices \u2013 Demo<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  Mobile devices such as smart phones and tablets are omnipresent in modern societies. Such devices allow browsing the Internet. This demo paper briefly describes two tools for mobile devices that attempt to collect automatically the most important information of news articles in  WebPages.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:47<\/td>\n<td> <a name=\"talk:31331\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31331\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person223.html\">Daan Leijen<\/a> <\/div>\n<div class=\"title\">Rendering Mathematic Formulas for the Web in Madoko<\/div>\n<div class=\"speaker\">SPEAKER: <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person223.html\" class=\"person\">Daan Leijen<\/a> <\/div>\n<div class=\"abstract\">\n<p>ABSTRACT.  Madoko is a novel authoring system for writing complex documents. It is  especially well suited for complex academic or industrial documents, like scientific articles, reference manuals, or math-heavy presentations. This application note describes progress made over the last year and details how math formulas are rendered in HTML. Moreover we show how other mechanisms, like replacement rules, help with creating  mini domain-specific extensions to cleanly express complex math.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:49<\/td>\n<td> <a name=\"talk:31335\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31335\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person110.html\" class=\"person\">Roya Rastan<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person111.html\">Hye-Young Paik<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person112.html\">John Shepherd<\/a> <\/div>\n<div class=\"title\">A PDF Wrapper for Table Processing<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  We propose a PDF document wrapper system that is specifically targeted at table processing applications. We (i) review the PDF specifications and identify particular challenges from the table processing point of view, (ii) specify a table-oriented document model containing the required atomic elements for table extraction and understanding applications. Our evaluation showed that the wrapper was able to detect important features such as page columns, bullets and numbering in all measures, recording over 90% accuracy, leading to better table locating and segmenting.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:51<\/td>\n<td> <a name=\"talk:31337\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31337\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person192.html\">Alexey Shigarov<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person193.html\" class=\"person\">Andrey Mikhailov<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person194.html\">Andrey Altaev<\/a> <\/div>\n<div class=\"title\">Configurable Table Structure Recognition in Untagged PDF documents<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  Today, PDF is one of the most popular document formats in the web. Many  PDF documents are not images, but remain untagged. They have no tags for identifying the logical reading order, paragraphs, figures and tables. One of the challenges with these documents is how to extract tables from them. The paper discusses a new system for table structure recognition in untagged PDF documents. It is formulated as a set of configurable parameters and ad-hoc heuristics for recovering table cells. We consider two different configurations for the system and demonstrate experimental results based on the existing competition dataset for both of them.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:53<\/td>\n<td> <a name=\"talk:31340\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31340\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person174.html\" class=\"person\">Tobias Gradl<\/a> and <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person175.html\" class=\"person\">Andreas Henrich<\/a> <\/div>\n<div class=\"title\">Extending data models by declaratively specifying contextual knowledge<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  The research data landscape of the arts and humanities is characterized  by a high degree of heterogeneity. To improve interoperability, recent initiatives and research infrastructures are encouraging the use of standards and best practices. However, custom data models are often considered necessary to exactly reflect the requirements of a particular  collection or research project. <\/p>\n<p>To address the needs of scholars in the arts and humanities for a composition of research data irrespective of the degree of structuredness and standardization, we propose a concept on the basis of  formal languages, which facilitates declarative data modeling by respective domain experts. By identifying and defining grammatical patterns and deriving transformation functions, the structure of data is  generated or extended in accordance with the particular context and needs of the domain.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:55<\/td>\n<td> <a name=\"talk:31339\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31339\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person82.html\">Alessandro Calefati<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person80.html\">Ignazio Gallo<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person81.html\" class=\"person\">Alessandro Zamberletti<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person79.html\">Lucia Noce<\/a> <\/div>\n<div class=\"title\">Using Convolutional Neural Networks for Content Extraction from Online Flyers<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  The rise of online shopping has hurt physical retailers, which struggle  to persuade customers to buy products in physical stores rather than online. Marketing flyers are a great mean to increase the visibility of physical  retailers, but the unstructured offers appearing in those documents cannot be easily compared with similar online deals, making it hard for a  customer to understand whether it is more convenient to order a product  online or to buy it from the physical shop. In this work we tackle this problem, introducing a content extraction algorithm that automatically extracts structured data from flyers. Unlike competing approaches that mainly focus on textual content or simply analyze font type, color and text positioning, we propose a new approach that uses Convolutional Neural Networks to classify words extracted from flyers typically used in marketing materials to attract the attention of readers towards specific deals. We obtained good results and a high language and genre independence.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:57<\/td>\n<td> <a name=\"talk:31338\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31338\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person94.html\">Tobias Swoboda<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person95.html\">Matthias Hemmje<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person96.html\" class=\"person\">Mihai Dascalu<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person97.html\">Stefan Trausan-Matu<\/a> <\/div>\n<div class=\"title\">Combining Taxonomies using Word2Vec<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  Taxonomies have gained a broad usage in a variety of fields due to their extensibility, as well as their use for classification and knowledge organization. Of particular interest is the digital document management domain in which their hierarchical structure can be effectively employed in order to organize documents into content-specific categories. Common or standard taxonomies (e.g., the ACM Computing Classification System) contain concepts that are too general for conceptualizing specific knowledge domains. In this paper we  introduce a novel automated approach that combines sub-trees from general taxonomies with specialized seed taxonomies by using specific Natural Language Processing techniques. We provide an extensible and generalizable model for combining taxonomies in the practical context of  two very large European research projects. Because the manual combination of taxonomies by domain experts is a highly time consuming task, our model measures the semantic relatedness between concept labels  in CBOW or skip-gram Word2vec vector spaces. A preliminary quantitative  evaluation of the resulting taxonomies is performed after applying a greedy algorithm with incremental thresholds used for matching and combining topic labels.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">14:59<\/td>\n<td> <a name=\"talk:31334\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31334\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person215.html\" class=\"person\">Junki Tanijiri<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person183.html\">Manabu Ohta<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person184.html\" class=\"person\">Atsuhiro Takasu<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person185.html\">Jun Adachi<\/a> <\/div>\n<div class=\"title\">Important Word Organization for Support of Browsing Scholarly Papers Using Author Keywords<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  When beginning researchers read scholarly papers, they often encounter technical terms they do not know, which may take considerable time. We, therefore, have been developing an interface for support of browsing scholarly papers which gives users useful links for technical terms. The  interface displays important words extracted from papers on top of the image of papers. In this study, we propose organizing such important words extracted from papers by using author keywords. We first identify important words in papers and then associate the important words and author keywords by using word2vec. Experiments showed that our method improved the classification accuracy of important words compared to a simple baseline and associated an author keyword with about 2.5 relevant  important words.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">15:01<\/td>\n<td> <a name=\"talk:31333\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31333\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person30.html\" class=\"person\">Baoli Li<\/a> <\/div>\n<div class=\"title\">Selecting Features with Class Based and Importance Weighted Document Frequency in Text Classification<\/div>\n<div class=\"speaker\">SPEAKER: <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person30.html\">Baoli Li<\/a> <\/div>\n<div class=\"abstract\">\n<p>ABSTRACT.  Document Frequency (DF) is reported to be a simple yet quite effective measure for feature selection in text classification, which is a key step in processing big textual data collections. The calculation is based on how many documents in a collection contain a feature, which can  be a word, a phrase, a n-gram, or a specially derived attribute. It is an unsupervised and class independent metric. Features of the same DF value may have quite different distribution over different categories, and thus have different discriminative power over categories. On the other hand, different features play different roles in delivering the content of a document. The chosen features are expected to be the important ones, which carry the main information of a document collection. However, the traditional DF metric considers features equally important. To overcome the above weaknesses of the original document frequency metric, we propose a class based and importance weighted document frequency measure to further refine the original DF to  some extent. Extensive experiments on two text classification datasets demonstrate the effectiveness of the proposed metric.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">15:03<\/td>\n<td> <a name=\"talk:31332\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:31332\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person216.html\" class=\"person\">Giorgos Sfikas<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person217.html\" class=\"person\">Georgios Louloudis<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person218.html\">Nikolaos Stamatopoulos<\/a> and <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person219.html\" class=\"person\">Basilis Gatos<\/a> <\/div>\n<div class=\"title\">Bayesian mixture models on connected components for Newspaper article segmentation<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  In this paper we propose a new method for automated segmentation of scanned newspaper pages into articles. Article regions are produced as a  result of merging sub-article level content and title regions. We use a  Bayesian Gaussian mixture model to model page Connected Component information and cluster input into sub-article components. The Bayesian model is conditioned on a prior distribution over region features, aiding classification into titles and content. Using a Dirichlet prior we are able to automatically estimate correctly the number of title and article regions. The method is tested on a dataset of digitized historical newspapers, where visual experimental results are very promising.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<div class=\"session\">\n<div class=\"coffeebreak\"> <span class=\"interval\">15:05-16:00<\/span> <span class=\"title\">Coffee &amp; Poster Session<\/span> <\/div>\n<\/p><\/div>\n<div class=\"session\"> <a name=\"session:10877\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">16:00-16:45<\/span> <span class=\"title\">Session 13: Text Analysis III: Summarization<\/span> <\/div>\n<p> <\/a> <\/p>\n<table class=\"talks\">\n<tbody>\n<tr class=\"talk\">\n<td class=\"time\">16:00<\/td>\n<td> <a name=\"talk:30532\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30532\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person101.html\" class=\"person\">Rodolfo Ferreira<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person100.html\" class=\"person\">Rafael Ferreira<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person102.html\">Rafael Lins<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person103.html\">Hil\u00e1rio Oliveira<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person104.html\">Marcelo Riss<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person53.html\">Steven Simske<\/a> <\/div>\n<div class=\"title\">Appling Link Target Identification and Content Extraction to improve Web News Summarization<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  The existing automatic text summarization systems whenever applied to web-pages of news articles show poor performance as the text is encapsulated within a HTML page. This paper takes advantage of the link identification and content extraction techniques. The results show the validity of such a strategy.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">16:15<\/td>\n<td> <a name=\"talk:30531\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30531\"\/> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person212.html\" class=\"person\">Jamilson Batista Antunes<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person102.html\">Rafael Dueire Lins<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person210.html\" class=\"person\">Rinaldo Lima<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person53.html\" class=\"person\">Steven J. Simske<\/a> and <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person104.html\">Marcelo Riss<\/a> <\/div>\n<div class=\"title\">Towards Cohesive Extractive Summarization through Anaphoric Expression Resolution<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  This paper presents a new method for improving the cohesiveness of summaries generated by extractive summarization systems. The solution presented attempts to improve the legibility and cohesion of the generated summaries through coreference resolution. It is based on a heuristic-based post-processing step that binds dangling coreference to the most important entity in a given coreference chain. The proposed solution was evaluated on the CNN corpus of 3,000 news articles, using  four state-of-the-art summarization systems and seventeen techniques for  sentence scoring proposed in the literature. The results obtained may be considered encouraging, as the final summaries reached better ROUGE scores, besides being more cohesive.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<tr class=\"talk\">\n<td class=\"time\">16:30<\/td>\n<td> <a name=\"talk:30530\"\/> <\/p>\n<div class=\"authors\"> <a name=\"talk:30530\"\/> <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person103.html\">Hil\u00e1rio Oliveira<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person210.html\">Rinaldo Lima<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person102.html\">Rafael Lins<\/a>, <a class=\"person\" href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person213.html\">Fred Freitas<\/a>, <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person104.html\" class=\"person\">Marcelo Riss<\/a> and <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person53.html\" class=\"person\">Steven Simske<\/a> <\/div>\n<div class=\"title\">Assessing Concept Weighting in Integer Linear Programming based Single-document Summarization<\/div>\n<div class=\"speaker\"\/>\n<div class=\"abstract\">\n<p>ABSTRACT.  Some of the recent state-of-the-art systems for Automatic Text Summarization rely on the concept-based approach using Integer Linear Programming (ILP), mainly for multidocument summarization. A study on the suitability of such an approach to single-document summarization is still missing, however. This work presents an assessment of several methods of concept weighing for a concept-based ILP approach on the single-document summarization scenario. The unigram and bigram representations for concepts are also investigated. The experimental results obtained on the DUC 2001-2002 and the CNN corpora show that bigrams are more suitable than unigrams for the representation of concepts. Among the concept scoring methods investigated, the sentence position method presented the best performance on all evaluation corpora.<\/p>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<div class=\"session\"> <a name=\"session:108761\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">16:45-17:30<\/span> <span class=\"title\">Session 14: SIGWEB Presentation<\/span> <\/div>\n<p> <\/a> <\/p>\n<div class=\"session_chair\">\n<div class=\"chair_word\">Chair: <\/div>\n<div class=\"chair_names\"> <a href=\"http:\/\/easychair.org\/smart-program\/DocEng2016\/person243.html\" class=\"person\">Dick Bulterman<\/a> <\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div class=\"session notalk\"> <a name=\"session:10854\"> <\/p>\n<div class=\"heading\"> <span class=\"interval\">19:30-23:59<\/span> <span class=\"title\">Session: Conference Banquet, Rathaus (City Hall)<\/span> <\/div>\n<p> <\/a> <\/div>\n<\/p><\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; &nbsp; Overview| Wednesday| Thursday| Friday 09:30-10:30 Session 8: Keynote II Chair: Tamir Hassan 09:30 Peter Bi\u013eak Design Is Not<\/p>\n","protected":false},"author":6,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/pages\/598"}],"collection":[{"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=598"}],"version-history":[{"count":5,"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/pages\/598\/revisions"}],"predecessor-version":[{"id":636,"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=\/wp\/v2\/pages\/598\/revisions\/636"}],"wp:attachment":[{"href":"https:\/\/doceng2016.cvl.tuwien.ac.at\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=598"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}