CS Conference Navigator

Improving discovery of relevant computer science research through visualization and clustering

CVPR 2014 (Poster paper)

Other guides: NIPS 2013, CVPR 2013, ICML 2013, NIPS 2012,
Info: maintained by, source code
Visualization of poster papers presented at CVPR 2014
(PDF links coming soon)

Hover over a node to see the paper title. Click on a color to only show papers connected to that cluster. Zoom and move around with normal map controls.



Papers are linked together based on TF-IDF similarity and are colored using their predicted topic index.

Toggle the topics below to sort by category. The top 10 words from each cluster are shown.

Filter current papers by keyword or author:
#11 - Total Variation Blind Deconvolution: The Devil is in the Details [pdf]
Daniele Perrone, Paolo Favaro

Abstract: In this paper we study the problem of blind deconvolution. Our analysis is based on the algorithm of Chan and Wong~\cite{Chan1998} which popularized the use of sparse gradient priors via total variation. We use this algorithm because many methods in the literature are essentially adaptations of this framework. Such algorithm is an iterative alternating energy minimization where at each step either the sharp image or the blur function are reconstructed. Recent work of Levin et al. \cite{Levin2011Understanding} showed that any algorithm that tries to minimize that same energy would fail, as the desired solution has a higher energy than the no-blur solution, where the sharp image is the blurry input and the blur is a Dirac delta. However, experimentally one can observe that Chan and Wong's algorithm converges to the desired solution even when initialized with the no-blur one. We provide both analysis and experiments to resolve this paradoxical conundrum. We find that both claims are right. The key to understanding how this is possible lies in the details of Chan and Wong's implementation and in how seemingly harmless choices result in dramatic effects. Our analysis reveals that the delayed scaling (normalization) in the iterative step of the blur kernel is fundamental to the convergence of the algorithm. This then results in a procedure that eludes the no-blur solution, despite it being a global minimum of the original energy. We introduce an adaptation of
Similar papers:
  • Gyro-Based Multi-Image Deconvolution for Removing Handshake Blur [pdf] - Sung Hee Park, Marc Levoy
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
  • Discriminative Blur Detection Features [pdf] - Jianping Shi, Li Xu, Jiaya Jia
#15 - Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction [pdf]
Gunhee Kim, Leonid Sigal, Eric Xing

Abstract: In this paper, we address the problem of jointly summarizing large-scale Flickr images and YouTube user videos. Starting from the intuition that the characteristics of the two media are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summary but also a novel structural summary of online images as storyline graphs, which can illustrate various events or activities associated with the topic in a form of a branching network. In our approach, the video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we create the datasets of 20 outdoor recreational activities, consisting of 2.7M of Flickr images and 16K of YouTube user videos. Due to the large-scale nature of our problems, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other important baselines and our own methods using videos or images only.
Similar papers:
  • 6 Seconds of Sound and Vision: Creativity in Micro-Videos [pdf] - Miriam Redi, Michele Trevisiol, Rossano Schifanella, neil O'Hare, Alejandro Jaimes
  • Seeing the Arrow of Time [pdf] - Lyndsey Pickup, Zheng Pan, Donglai Wei, Yichang Shih, Andrew Zisserman, Bill Freeman, Bernhard Schoelkopf
  • Illumination-Aware Age Progression [pdf] - Supasorn Suwajanakorn, Ira Kemelmacher, Steve Seitz
  • Quasi Real-Time Summarization for Consumer Videos [pdf] - Bin Zhao, Eric Xing
#20 - Stable and Informative Spectral Signatures for Graph Matching [pdf]
Nan Hu, Raif Rustamov, Leonidas J. Guibas

Abstract: In this paper, we consider the approximate weighted graph matching problem and introduce stable and informative first and second order compatibility terms suitable for inclusion into the popular integer quadratic program formulation. Our approach relies on a rigorous analysis of stability of spectral signatures based on the graph Laplacian. In the case of the first order term, we derive an objective function that measures both the stability and informativeness of a given spectral signature. By optimizing this objective, we design new spectral node signatures tuned to a specific graph to be matched. We also introduce the pairwise heat kernel distance as a stable second order compatibility term; we justify its plausibility by showing that in a certain limiting case it converges to the classical adjacency matrix-based second order compatibility function. We have tested our approach on a set of synthetic graphs, the widely-used CMU house sequence, and a set of real images. These experiments show the superior performance of our first and second order compatibility terms as compared with the commonly used ones.
Similar papers:
  • Finding Matches in a Haystack: A Max-Pooling Strategy for Graph Matching in the Presence of Outliers [pdf] - Minsu Cho, Jian Sun, Jean Ponce
  • Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf] - Yusuke Yoshiyasu
  • Unsupervised Learning for Graph Matching: An Attempt to Define and Extract Soft Attributed Patterns [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
  • Human Action Recognition Based on Context-Dependent Graph Kernels [pdf] - Baoxin Wu, Chunfeng Yuan, Weiming Hu
#22 - Fantope Regularization in Metric Learning [pdf]
Marc Law, Nicolas Thome, Matthieu Cord

Abstract: Metric learning is very useful for image retrieval, classification and identification. This paper introduces a regularization method to explicitly control the rank of a learned symmetric positive semidefinite distance matrix. To this end, we propose to incorporate in the objective function a linear regularization term that consists in minimizing the k smallest eigenvalues of the distance matrix. It is equivalent to minimizing the trace of the product of the distance matrix with a matrix in the convex hull of rank-k projection matrices, called a Fantope. Based on this new regularization method, we derive an optimization scheme to efficiently learn the distance matrix. We demonstrate the effectiveness of the method on synthetic and challenging real datasets of face verification and image classification with relative attributes, on which our method outperforms state-of-the-art metric learning algorithms.
Similar papers:
  • SCAMS: Simultaneous Clustering and Model Selection [pdf] - Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf] - Lucy Liang, Kristen Grauman
#23 - One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf]
Vahid Kazemi, Josephine Sullivan

Abstract: This paper addresses the problem of Face Alignment for a single image. We show how an ensemble of regression trees can be used to estimate the face's landmark positions directly from a sparse subset of pixel intensities, achieving super-realtime performance with high quality predictions. We present a general framework based on gradient boosting for learning an ensemble of regression trees that optimizes the sum of square error loss and naturally handles missing or partially labelled data. We show how using appropriate priors exploiting the structure of image data helps with efficient feature selection. Different regularization strategies and its importance to combat overfitting are also investigated. In addition, we analyse the effect of the quantity of training data on the accuracy of the predictions and explore the effect of data augmentation using synthesized data.
Similar papers:
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • The Fastest Deformable Part Model for Object Detection [pdf] - Junjie Yan, Zhen Lei, Stan Li
  • Incremental Face Alignment in the Wild [pdf] - Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
#24 - From Categories to Individuals in Real Time --- A Unified Boosting Approach [pdf]
David Hall, Pietro Perona

Abstract: A method for online, real-time learning of individual-object detectors is presented. Starting with a pre-trained boosted category detector, an individual-object detector is trained with near-zero computational cost. The individual detector is obtained by using the same feature cascade as the category detector along with elementary manipulations of the thresholds of the weak classifiers. This is ideal for online operation on a video stream or for interactive learning with a human in the loop. Applications addressed by this technique are reidentification and individual tracking. Experiments on two challenging pedestrian and face datasets indicate that it is indeed possible to learn identity classifiers in real-time; besides being faster-trained, our classifier has better detection rates than previous methods.
Similar papers:
  • Bi-label Propagation for Generic Multiple Object Tracking [pdf] - Wenhan Luo, Tae-Kyun Kim, Bjrn Stenger, Xiaowei Zhao, Roberto Cipolla
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Active Frame, Location, and Detector Selection for Automated and Manual Video Annotation [pdf] - Vasiliy Karasev, Avinash Ravichandran, Stefano Soatto
  • When 3D Reconstruction Meets Ubiquitous RGB-D Images [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
#34 - Visual Persuasion: Inferring the Communicative Intents of Images [pdf]
Jungseock Joo, Weixin Li, Francis Steen, Song Chun Zhu

Abstract: In this paper we introduce the novel problem of understanding visual persuasion. Modern mass media and advertising make extensive use of images and video to present arguments and influence public opinion, and their techniques are widely studied in media research, political science, and psychology, typically using small, hand-coded datasets. We propose to extend the significant advances in syntactic analyses, such as the detection and identification of objects and sentiments in images and video, to the higher-level challenge of understanding the underlying communicative intent implied in the images. We define the problem of inferring communicative intents from images in a computational framework, and demonstrate the feasibility of progress in a case study from politics, a domain of intense competitive persuasion with continuously measurable outcomes in opinion polls. To this end, we identify 9 dimensions of persuasive intent latent in images of politicians, e.g., ``Trustworthy'', as well as 12 syntactical attributes, e.g.,, ``Smile'', from which one can semantically infer communicative intents. We present a new dataset of 866 images of politicians labeled with ground-truth intents in the form of ranking. In this application, we show that our learned model predicts communicative intents in a large dataset. These results demonstrate that a systematic focus on visual persuasion opens up the field of computer vision to a new class of investigations around mediated images, inter
Similar papers:
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
#35 - BirdMachine: Large-scale Fine-grained Visual Categorization of Birds [pdf]
Thomas Berg, Jiongxin Liu, Seung Woo Lee, Michelle Alexander, David Jacobs, Peter Belhumeur

Abstract: We address the problem of large-scale fine-grained visual categorization, describing new methods we have used to produce an online field guide to 500 North American bird species. We focus on the challenges raised when such a system is asked to distinguish between highly similar species of birds. First, we develop "one-vs-most" classifiers. By eliminating highly similar species during training, these classifiers achieve more accurate and intuitive results. Second, we show how to estimate spatio-temporal class priors from observations that are sampled at irregular and biased locations. We show how these priors can be used to significantly improve performance. We then show recognition performance that significantly exceeds the state-of-the-art on a new, large dataset that we make publicly available. These recognition methods are integrated into the online field guide, which is also publicly available.
Similar papers:
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Similarity Comparisons for Interactive Fine-Grained Categorization [pdf] - Catherine Wah, Grant Van Horn, Steven Branson, Subhransu Maji, Pietro Perona, Serge Belongie
  • Data-driven Flower Petal Modeling with Botany Priors [pdf] - Chenxi Zhang, Mao Ye, BO FU, Ruigang Yang
  • Nonparametric Part Transfer for Fine-grained Recognition [pdf] - Christoph Gring, Erik Rodner, Alexander Freytag, Joachim Denzler
#36 - DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf]
Chen Sun, Ram Nevatia

Abstract: We propose a unified framework to simultaneously classify high-level events, identify important segments and generate descriptions for large amounts of unconstrained web videos. The motivation is our observation that many video events are characterized by certain evidence types of important segments. Our goal is to find the important segments and capture their information for event classification and recounting (description). We introduce an evidence localization model (ELM) where evidence types and locations are modeled as latent variables. We impose constraints on global video appearance, local evidence appearance and the temporal structure of evidence types. The model is learned via a max-margin framework and allows efficient inference. Our method does not require annotating sources of evidence, and is jointly optimized for event classification and recounting. Experimental results are shown on the challenging TRECVID 2013 MEDTest dataset.
Similar papers:
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
  • Event Detection using Multi-Level Relevance Labels and Multiple Features [pdf] - Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander Hauptmann
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • A Hierarchical Context Model for Event Recognition in Surveillance Video [pdf] - Xiaoyang Wang, Qiang Ji
#39 - Transfer Joint Matching for Visual Domain Adaptation [pdf]
Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu

Abstract: Visual domain adaptation, which learns an accurate classifier for a new domain using labeled images from an old domain, has shown promising value in computer vision yet still been a challenging problem. Most prior works have explored two learning strategies independently for domain adaptation: feature matching and instance reweighting. In this paper, we show that both strategies are important and inevitable when the domain difference is substantially large. We therefore put forward a novel Transfer Joint Matching (TJM) approach to model them in a unified optimization problem. Specifically,TJM aims to reduce the domain difference by jointly matching the features and reweighting the instances across domains in a principled dimensionality reduction procedure, and construct new feature representation that is invariant to both the distribution difference and the irrelevant instances. Comprehensive experimental results verify that TJM can significantly outperform competitive methods for cross-domain image recognition problems.
Similar papers:
  • Time Machine: Continuous Manifold Based Adaptation for Evolving Visual Domains [pdf] - Judy Hoffman, Trevor Darrell, Kate Saenko
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
#40 - Occluding Contours for Multi-View Stereo [pdf]
Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, Steve Seitz

Abstract: This paper leverages occluding contours (aka "internal silhouettes") to improve the performance of multi-view stereo methods. The contributions are 1) a new technique to identify free-space regions arising from occluding contours, and 2) a new approach for incorporating the resulting free-space constraints into Poisson surface reconstruction. The proposed approach outperforms state of the art MVS techniques for challenging Internet datasets, yielding dramatic quality improvements both around object contours and in surface detail.
Similar papers:
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
#43 - Raw-to-raw: Mapping between image sensor color responses [pdf]
Rang Nguyen, Dilip Prasad, Michael Brown

Abstract: Camera images saved in raw format are being adopted in computer vision tasks since raw values represent minimally processed sensor responses. Camera manufacturers, however, have yet to adopt a standard for raw images and current raw-rgb values are device specific due to different sensors spectral sensitivities. This results in significantly different raw images for the same scene captured with different cameras. This paper focuses on estimating a mapping that can convert a raw image of an arbitrary scene and illumination from one camera's raw space to another. To this end, we examine various mapping strategies including linear and non-linear transformations applied both in a global and an illumination-specific manner. We show that illumination-specific mappings give the best result, however, at the expense of requiring a large number of transformations. To address this issue, we introduce an illumination-independent mapping approach that uses white-balancing to assist in reducing the number of required transformations. We show that this approach achieves state-of-the-art results on a range of consumer cameras and images of arbitrary scenes and illuminations.
Similar papers:
  • Simultaneous Localization and Calibration [pdf] - Qian-Yi Zhou, Vladlen Koltun
  • Two-View Camera Calibration for Multi-Layer Flat Refractive Interface [pdf] - Xida Chen, Yee Hong Yang
  • The Photometry of Intrinsic Images [pdf] - Marc Serra, Robert Benavente, Maria Vanrell, Dimitris Samaras, Olivier Penacchio
  • Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera [pdf] - Avinash Kumar, Narendra Ahuja
#45 - Sparse Representation for Edit Propagation of High-Resolution Images [pdf]
Xiaowu Chen, Jianwei Li, Dongqing Zou, Xiaochun Cao, Qinping Zhao, Hao (Richard) Zhang

Abstract: We introduce the use of sparse representation for edit propagation of high-resolution images or video. Previous approaches for edit propagation typically employ a global optimization over the whole set of image pixels, incurring a prohibitively high memory and time consumption for high-resolution images. Rather than propagating an edit pixel by pixel, we follow the principle of sparse representation to obtain a compact set of representative samples (or features) and perform edit propagation on the samples instead. The sparse set of samples provide an intrinsic basis for an input image, and the coding coefficients capture the linear relationship between all pixels and the samples. The representative set of samples is computed by a novel scheme which maximizes the KL-divergence between each sample pair. We show several applications of sparsity-based edit propagation including video recoloring, theme editing, and seamless cloning, operating on both color and texture features. We demonstrate that with a sample-to-pixel ratio in the order of 0.01%, signifying a significant reduction on memory, our method still maintains a high-degree of visual fidelity.
Similar papers:
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
  • Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf] - Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
  • Edge-aware Gradient Domain Optimization Framework for Image Filtering by Local Propagation [pdf] - Miao Hua, Xiaohui Bie, Wencheng Wang
#54 - The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf]
Hilde Kuehne, Ali Arslan, Thomas Serre

Abstract: This paper describes a framework for modeling human activities as temporally structured processes. Our approach is motivated by the inherently hierarchical nature of human activities and the close correspondence between human actions and speech: We model action units using HMMs much like words in speech; action units then form the building blocks for more complex activities using an ``action grammar'', much like words for sentences. To evaluate our approach, we collected a large dataset of daily cooking activities: The dataset includes a total of 52 participants, each performing a total of 10 cooking activities in multiple real-life kitchens, resulting in more than 77 hrs of video footage. We fully annotated the dataset at both a fine, motor-command level and a coarser, goal-oriented level. We test the approach using the HTK toolkit, a state-of-the-art speech recognition engine in combination with different feature descriptors. We evaluate the proposed approach on multiple tasks from activity recognition to frame-based action recognition and semantic parsing. Our results demonstrate the benefits of structured temporal generative approaches over existing discriminative approaches in coping with the complexity of human daily life activities.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Incremental Activity Modeling and Recognition in Streaming Videos [pdf] - MAHMUDUL HASAN, Amit Roy-Chowdhury
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
#55 - Newton Greedy Pursuit: a Quadratic Approximation Method for Sparsity-Constrained Optimization [pdf]
Xiao-Tong Yuan, Qingshan Liu

Abstract: First-order greedy selection algorithms have been widely applied to sparsity-constrained convex optimization. The main theme of this type of methods is to evaluate the function gradient in the previous iteration to update the non-zero entries and their values in the next iteration. In contrast, relatively less effort has been made in the study of second-order greedy selection method additionally utilizing the Hessian information. Inspired by the classic constrained Newton method, we propose in this paper the NewTon Greedy Pursuit (NTGP) method to approximately minimizes a twice differentiable function over sparsity constraint. At each iteration, NTGP constructs a second-order Taylor expansion to approximate the cost function, and then estimates the next iterate by optimizing the constructed quadratic model over sparsity constraint. Theoretical analysis shows that under proper conditions NTGP converges superlinearly until an estimation error bound is reached. We demonstrate the improved computational efficiency of our method over first-order greedy selection methods in sparse logistic regression tasks.
Similar papers:
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
  • On Projective Reconstruction In Arbitrary Dimensions [pdf] - Behrooz Nasihatkon, Richard Hartley, Jochen Trumpf
  • Generalized Nonconvex Nonsmooth Low-Rank Minimization [pdf] - Canyi Lu, Shuicheng Yan, Zhouchen Lin
#59 - Collaborative Hashing [pdf]
Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang

Abstract: Hashing technique has become a promising approach for fast similarity search. Most of existing hashing research pursue the binary codes for the same type of entities by preserving their similarities. In practice, there are many scenarios involving nearest neighbor search on the data given in matrix form, where two different types of, yet naturally associated entities respectively correspond to its two dimensions or views. To fully explore the duality between the two views, we propose a collaborative hashing scheme for the data in matrix form to enable fast search in various applications such as image search using bag of words and recommendation using user-item ratings. By simultaneously preserving both the entity similarities in each view and the interrelationship between views, our collaborative hashing effectively learn the compact binary codes and the explicit hash functions for out-of-sample extension in an alternating optimization way. Extensive evaluations are conducted on three well-known datasets for search inside a single view and search across different views, demonstrating that our proposed method outperforms state-of-the-art baselines, with significant accuracy gains ranging from 7.67% to 45.87% relatively.
Similar papers:
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
#60 - Timing-Based Local Descriptor for Dynamic Surfaces [pdf]
Tony Tung, Takashi Matsuyama

Abstract: In this paper, we present the first local descriptor designed for dynamic surfaces. A dynamic surface is a surface that can undergo non-rigid deformation (e.g., human body surface). Using state-of-the-art technology, details on dynamic surfaces such as cloth wrinkle or facial expression can be accurately reconstructed. Hence, various results (e.g., surface rigidity, elasticity, etc.) could be derived by microscopic categorization of surface elements. We propose a timing-based descriptor to model local spatiotemporal variations of surface intrinsic properties. The low-level descriptor encodes gaps between local event dynamics of neighboring keypoints using timing structure of linear dynamical systems (LDS). We also introduce the bag-of-timings (BoT) paradigm for surface dynamics characterization. Experiments are performed on synthesized and real-world datasets. We show the proposed descriptor can be used for challenging dynamic surface classification and segmentation with respect to rigidity at surface keypoints.
Similar papers:
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
#66 - Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf]
Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai

Abstract: While machine learning has been instrumental to the ongoing progress in most areas of computer vision, it has not been applied to the problem of stereo matching with similar frequency or success. We present a supervised learning approach for predicting the correctness of stereo matches based on a random forest and a set of features that capture various forms of information about each pixel.We show highly competitive results in predicting the correctness of matches and in confidence estimation, which allows us to rank pixels according to the reliability of their assigned disparities. Moreover, we show how these confidence values can be used to improve the accuracy of disparity maps by integrating them with an MRF-based stereo algorithm. This is an important distinction from current literature that has mainly focused on sparsification by removing potentially erroneous disparities to generate quasi-dense disparity maps.
Similar papers:
  • Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf] - Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
#70 - The Fastest Deformable Part Model for Object Detection [pdf]
Junjie Yan, Zhen Lei, Stan Li

Abstract: This paper solves the speed bottleneck of deformable part model (DPM), while maintaining the state-of-the-art accuracy in detection for challenging datasets. Three prohibitive steps in cascade version of DPM are accelerated, including 2D correlation between root filter and feature map, cascade part pruning and HOG feature extraction. For 2D correlation, the root filter is constrained to be low rank, so that 2D correlation can be calculated by more efficient linear combination of 1D correlations. A proximal gradient algorithm is adopted to progressively learn the low rank filter in a discriminative manner. For cascade part pruning, neighborhood aware cascade is proposed to capture the dependence in neighborhood regions for aggressive pruning. Instead of explicit computation of part scores, hypotheses can be pruned by scores of neighborhoods under the first order approximation. For HOG feature extraction, look-up tables are constructed to replace expensive calculations of orientation partition and magnitude with simpler matrix index operations. Extensive experiments show that (a) the proposed method is 4 times faster than the current fastest DPM method with similar accuracy on Pascal VOC, (b) the proposed method achieves state-of-the-art accuracy on pedestrian and face detection task with frame-rate speed.
Similar papers:
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier [pdf] - Arthur Costea, Sergiu Nedevschi
  • One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf] - Vahid Kazemi, Josephine Sullivan
  • Incremental Face Alignment in the Wild [pdf] - Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
#75 - Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf]
Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian

Abstract: In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low discriminative power, so false positive matches occur prevalently. Apart from the information loss during quantization, another cause is that the SIFT feature only describes the local gradient distribution. To address this problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform feature fusion at indexing level. Basically, complementary features are coupled into a multi-dimensional inverted index. Each dimension of c-MI corresponds to one kind of feature, and the retrieval process votes for images similar in both SIFT and other feature spaces. Specifically, we exploit the fusion of local color feature into c-MI. While the precision of visual match is greatly enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation of SIFT and color features significantly reduces the impact of false positive matches. Extensive experiments on several benchmark datasets demonstrate that c-MI improves the retrieval accuracy significantly, while consuming only half of the query time compared to the baseline. Importantly, we show that c-MI is well complementary to many prior techniques. Assembling these methods, we have obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench datasets, respectively, which are the best results ever published.
Similar papers:
  • Immediate, scalable object category detection [pdf] - Yusuf Aytar, Andrew Zisserman
  • Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Wengang Zhou, Qi Tian
  • Locally Optimized Product Quantization [pdf] - Yannis Kalantidis, Yannis Avrithis
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
#83 - From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf]
Nam Vo, Aaron Bobick

Abstract: We propose a probabilistic method for parsing complex activities that are defined as composition of sub-activities. The temporal structure is represented by a string-length limited stochastic context-free grammar. Given the grammar, a Bayes network is generated where the variable nodes correspond to the start and end times of component actions, and the network integrates information about duration of each primitive action, visual detection results for each primitive action, and the activity's temporal structure. At each moment in time during the activity, message passing is used to perform exact inference yielding the posterior probabilities of the start and end times for each different actions. We provide demonstrations of this framework being applied to various vision tasks such as action prediction, classification of the high-level activities or temporal segmentation of a test sequence; the method is also applicable in Human Robot Interaction domain where continually prediction of human's actions is needed.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf] - Hilde Kuehne, Ali Arslan, Thomas Serre
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
#86 - Simultaneous Localization and Calibration [pdf]
Qian-Yi Zhou, Vladlen Koltun

Abstract: We describe an approach for simultaneous localization and calibration of a stream of range images. Our approach jointly optimizes the camera trajectory and a calibration function that corrects the camera's unknown nonlinear distortion. Experiments with real-world benchmark data and synthetic data show that our approach increases the accuracy of camera trajectories and geometric models estimated from range video produced by consumer-grade cameras.
Similar papers:
  • Two-View Camera Calibration for Multi-Layer Flat Refractive Interface [pdf] - Xida Chen, Yee Hong Yang
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Blind Image Quality Assessment using Semi-supervised Rectifier Networks [pdf] - Huixuan Tang, Neel Joshi, Ashish Kapoor
  • Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera [pdf] - Avinash Kumar, Narendra Ahuja
#90 - Global Optimization for Depth Reconstruction from Speckle Patterns [pdf]
Qifeng Chen, Vladlen Koltun

Abstract: We present an approach to increasing the accuracy of range images produced by speckle-based range cameras. Our approach optimizes a global objective on the range image. The optimization is performed by a convergent block coordinate descent scheme that updates a horizontal or vertical line in each iteration. We show that this update can be performed optimally in linear time. The resulting algorithm is extremely efficient and trivially parallelizable. Experiments with ground-truth data demonstrate that our algorithm is significantly more accurate than alternative algorithms for optimizing the same objective and that our approach is significantly more accurate than alternative range image rectification schemes.
Similar papers:
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
#100 - Kernel-PCA Analysis of Surface Normals [pdf]
Patrick Snape, Stefanos Zafeiriou

Abstract: We propose a kernel-based framework for computing components from a set of surface normals. This framework allows us to easily demonstrate that component analysis can be performed directly upon normals. We link previously proposed mapping functions, the azimuthal equidistant projection (AEP) and principal geodesic analysis (PGA), to our kernel-based framework. We also propose a new mapping function based upon the cosine distance between normals. We demonstrate the robustness of our proposed kernel when trained with noisy training sets. We also compare our kernels within an existing shape-from-shading (SFS) algorithm. Our spherical representation of normals, when combined with the robust properties of cosine kernel, produces a very robust subspace analysis technique. In particular, our results within SFS show a substantial qualitative and quantitative improvement over existing techniques.
Similar papers:
  • Super Normal Vector for Activity Recognition Using Depth Sequences [pdf] - Xiaodong Yang, Yingli Tian
  • Class Specific 3D Object Shape Priors Using Surface Normals [pdf] - Christian Hne, Nikolay Savinov, Marc Pollefeys
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
#103 - The Shape-Time Random Field for Semantic Video Labeling [pdf]
Andrew Kae, Erik Learned-Miller, Benjamin Marlin

Abstract: We propose a novel discriminative model for semantic labeling in videos by incorporating a temporal shape prior to model both the shape and temporal dependencies of an object in video. While the conditional random field (CRF) can label regions in video frames, and can be extended to incorporate temporal dependencies between frames, it typically lacks a global shape prior, which can be informative. Recent work has shown how to incorporate a global shape prior into a CRF for image labeling, but this prior does not account for temporal dependencies. The conditional restricted Boltzmann machine (CRBM) can model temporal dependencies and has been used to successfully learn walking styles from motion-capture data. In this work we use the CRBM to model not only the shape of on object in a video but also the temporal dependencies of the object from previous frames. We incorporate this CRBM prior (to model the shape and temporal dependencies) along with the CRF (to model local dependencies) to create a new state-of-the-art model for the task of semantic labeling in videos. In particular, we explore the task of labeling faces into Hair/Skin/Background regions in videos from the YouTube Faces Database (YFDB). Our combined approach outperforms two baselines: a CRF with temporal potentials and a CRF with a global shape prior but without temporal dependencies.
Similar papers:
  • The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf] - Hilde Kuehne, Ali Arslan, Thomas Serre
  • Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf] - Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer
  • Blind Image Quality Assessment using Semi-supervised Rectifier Networks [pdf] - Huixuan Tang, Neel Joshi, Ashish Kapoor
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
#105 - Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval [pdf]
Liang Zheng, Shengjin Wang, Wengang Zhou, Qi Tian

Abstract: The Bag-of-Words (BoW) representation is well applied to recent state-of-the-art image retrieval works. In this model, the vocabulary is of key importance. Typically, multiple vocabularies are generated to correct quantization artifacts and improve recall. However, this routine is corrupted by vocabulary correlation, i.e., overlapping among different vocabularies. Vocabulary correlation leads to an over-counting of the indexed features in the overlapped area, or the intersection set, thus compromising the retrieval accuracy. In order to address the correlation problem while preserve the benefit of high recall, this paper proposes a Bayes merging approach to down-weight the indexed features in the intersection set. Through explicitly modeling the correlation problem in a probabilistic view, a joint similarity on both image- and feature-level is estimated for the indexed features in the intersection set. We evaluate our method through extensive experiments on three benchmark datasets. Albeit simple, Bayes merging can be well applied in various merging tasks, and consistently improves the baselines on multi-vocabulary merging. Moreover, Bayes merging is efficient in terms of both time and memory cost, and yields competitive performance compared with the state-of-the-art methods.
Similar papers:
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Immediate, scalable object category detection [pdf] - Yusuf Aytar, Andrew Zisserman
  • Locality in Generic Instance Search from One Example [pdf] - Ran Tao, Efstratios Gavves, Cees Snoek, Arnold Smeulders
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
#108 - Joint Depth Estimation and Camera Shake Removal from Single Blurry Image [pdf]
Zhe Hu, Li Xu, Ming-Hsuan Yang

Abstract: Camera shake during exposure time often results in spatially variant blur effect of the image. The non-uniform blur effect is not only caused by the camera motion, but also the depth variation of the scene. The objects close to the camera sensors are likely to appear more blurry than those at a distance in such cases. However, recent non-uniform deblurring methods do not explicitly consider the depth factor or assume fronto-parallel scenes with constant depth for simplicity. While single image non-uniform deblurring is a challenging problem, the blurry results in fact contains depth information which can be exploited. We propose to jointly estimate scene depth and remove non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with only single blurry image as input. Toward this, we present a a unified layer-based model for depth-involved deblurring, and develop an expectation-maximization scheme to solve the problem. Experiments on challenging examples demonstrate that both depth and camera shape removal can be well addressed within the unified framework.
Similar papers:
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
  • Deblurring Text Images via L0-Regularized Intensity and Gradient Prior [pdf] - Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
#128 - Unsupervised Learning for Graph Matching: An Attempt to Define and Extract Soft Attributed Patterns [pdf]
Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki

Abstract: Graph matching is a fundamental problem in computer vision, and is widely applied to the matching of 2D and 3D objects. In this paper, we define the soft attributed pattern (SAP) oriented towards attributed relational graphs (ARGs), which describes the pattern of common sub-graphs among the ARGs, considering both the graphical structure and graph attributes. We propose a direct solution to extract the maximal SAP among the ARGs without node enumeration, and thus use it to extend the concept of the unsupervised learning for graph matching. Given an initial graph template and a number of ARGs, we modify the graph template into the maximal SAP in an unsupervised fashion, achieving good matching performance between the template and the ARGs. Our method exhibits superior performance to conventional methods for learning graph matching on RGB and RGB-D images.
Similar papers:
  • Inferring Analogous Attributes [pdf] - Chao-Yeh Chen, Kristen Grauman
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
#132 - When 3D Reconstruction Meets Ubiquitous RGB-D Images [pdf]
Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki

Abstract: 3D reconstruction from a single image is a classical problem in computer vision. However, it still poses great challenges for the reconstruction of daily-use objects with irregular shapes. In this paper, we propose to learn 3D reconstruction knowledge from informally captured RGB-D images, which will probably be ubiquitously used in daily life. The learning of 3D reconstruction is defined as a category modeling problem, in which a model for each category is trained to encode category-specific knowledge for 3D reconstruction. The category model estimates the pixel-level 3D structure of an object from its 2D appearance, by taking into account considerable variations in rotation, 3D structure, and texture. Learning 3D reconstruction from ubiquitous RGB-D images creates a new set of challenges. Experimental results have demonstrated the effectiveness of the proposed approach.
Similar papers:
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
  • Unsupervised Learning for Graph Matching: An Attempt to Define and Extract Soft Attributed Patterns [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
  • From Categories to Individuals in Real Time --- A Unified Boosting Approach [pdf] - David Hall, Pietro Perona
#133 - Scalable 3D Tracking of Multiple Interacting Objects [pdf]
Nikolaos Kyriazis, Antonis Argyros

Abstract: We consider the problem of tracking multiple interacting objects in 3D, using RGBD input and by considering a hypothesize-and-test approach. Due to their interaction, objects to be tracked are expected to occlude each other in the field of view of the camera observing them. A naive approach would be to employ a Set of Independent Trackers (SIT) and to assign one tracker to each object. This approach scales well with the number of objects but fails as occlusions become stronger due to their disjoint consideration. The solution representing the current state of the art employs a single Joint Tracker (JT) that accounts for all objects simultaneously. This directly resolves ambiguities due to occlusions but has a computational complexity that grows geometrically with the number of tracked objects. We propose a middle ground, namely an Ensemble of Collaborative Trackers (ECT), that combines best traits from both worlds to deliver a practical and accurate solution to the multi-object 3D tracking problem. We present quantitative and qualitative experiments with several synthetic and real world sequences of diverse complexity. Experiments demonstrate that ECT manages to track far more complex scenes than JT at a computational time that is only slightly larger than that of SIT.
Similar papers:
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf] - Iason Oikonomidis, Manolis Lourakis, Antonis Argyros
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#137 - The Role of Context for Object Detection and Semantic Segmentation in the Wild [pdf]
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, Alan Yuille

Abstract: In this paper we study the role of context in modern detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge. We believe this data will give plenty of extrachallenges to the community, as it provides 548 new object classes for detection and 603 classes for semantic segmentation. We analyze the ability of state-of-the-art methods to perform semantic segmentation in this new setting. Our analyses show that NN based approaches perform poorly on semantic segmentation of context classes, which shows the variability of PASCAL imagery. Furthermore, improvements of existing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable partbased model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that the model significantly helps in detecting objects at all scales.
Similar papers:
  • Understanding Objects in Detail with Fine-grained Attributes [pdf] - Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • Incorporating Scene Context and Object Layout into Appearance Modeling [pdf] - Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi
  • Superpixel-grounded Deformable Part Models [pdf] - Eduard Trulls, Iasonas Kokkinos, Francesc Moreno-Noguer, Alberto Sanfeliu
#142 - Simplex-Based 3D Spatio-Temporal Feature Description for Action Recognition [pdf]
Hao Zhang, Wenjun Zhou, Christopher Reardon, Lynne Parker

Abstract: We present a novel feature description algorithm to describe 3D local spatio-temporal features for human action recognition. Our descriptor avoids the singularity and limited discrimination power issues of traditional 3D descriptors by quantizing and describing visual features in the simplex topological vector space. Specifically, given a features support region containing a set of 3D visual cues, we decompose the cues orientation into three angles, transform the decomposed angles into the simplex space, and describe them in such a space. Then, quadrant decomposition is performed to improve discrimination, and a final feature vector is composed from the resulting histograms. We develop intuitive visualization tools for analyzing feature characteristics in the simplex topological vector space. Experimental results demonstrate that our novel simplex-based orientation decomposition (SOD) descriptor substantially outperforms traditional 3D descriptors for the challenging KTH, UCF Sport, and Hollywood-2 benchmark action datasets. In addition, the results show that our SOD descriptor is a superior individual descriptor for action recognition.
Similar papers:
  • Deeply-Learned Slow Feature Analysis for Action Recognition [pdf] - LIN SUN
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf] - Waqas Sultani, Imran Saleemi
#152 - Predicting Object Dynamics in Scenes [pdf]
David Fouhey, Larry Zitnick

Abstract: Given a static scene, a human can trivially enumerate the myriad of things that can happen next and also characterize the relative likelihood of each. In the process, we make use of enormous amounts of common-sense knowledge about how the world works. In this paper, we investigate learning this common sense knowledge from data. To overcome a lack of densely annotated spatiotemporal data, we learn from bounding-box-level annotation of sequences of abstract images gathered using crowdsourcing. We demonstrate qualitatively and quantitatively that our models produce convincing scene predictions on both the abstract images as well as natural images taken from the internet.
Similar papers:
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Understanding Objects in Detail with Fine-grained Attributes [pdf] - Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
#157 - Interval Tracker: Tracking by Interval Analysis [pdf]
Junseok Kwon, Kyoung Mu Lee

Abstract: This paper proposes a robust tracking method that uses interval analysis. Any single posterior model necessarily includes a modeling uncertainty (error), and thus, the posterior should be represented as an interval of probability. Then, the objective of visual tracking becomes to find the best state that maximizes the posterior and minimizes its interval simultaneously. By minimizing the interval of the posterior, our method can reduce the modeling uncertainty in the posterior. In this paper, the aforementioned objective is achieved by using the M4 estimation, which combines the Maximum a Posterior (MAP) estimation with Minimum Mean-Square Error (MMSE), Maximum Likelihood (ML), and Minimum Interval Length (MIL) estimations. In the M4 estimation, our method maximizes the posterior over the state obtained by the MMSE estimation. The method also minimizes interval of the posterior by reducing the gap between the lower and upper bounds of the posterior. The gap is reduced when the likelihood is maximized by the ML estimation and the interval length of the state is minimized by the MIL estimation. The experimental results demonstrate that M4 estimation can be easily integrated into conventional tracking methods and can greatly enhance their tracking accuracy. In several challenging datasets, our method outperforms state-of-the-art tracking methods.
Similar papers:
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#162 - Scanline Sampler without Detailed Balance: An Efficient MCMC for MRF Optimization [pdf]
Wonsik Kim, Kyoung Mu Lee

Abstract: Markov chain Monte Carlo (MCMC) is an elegant tool, widely used in variety of areas. In computer vision, it has been used for the inference on the Markov random field model (MRF). However, MCMC less concerned than other deterministic approaches although it converges to global optimal solution in theory. The major obstacle is its slow convergence. To come up with faster sampling method, we investigate two ideas: breaking detailed balance and updating multiple number of nodes at a time. Although detailed balance is considered to be essential element of MCMC, it actually is not the necessary condition for the convergence. In addition, exploiting the structure of MRF, we introduce a new kernel which updates multiple number of nodes in a scanline rather than a single node. Those two ideas are integrated in a novel way to develop an efficient method called scanline sampler without detailed balance. In experimental section, we apply our method to the OpenGM2 benchmark of MRF optimization and show the proposed method achieves faster convergence than the conventional approaches.
Similar papers:
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
  • A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf] - Menglong Yang, Yiguang Liu, Stan Li
  • Ground Plane Estimation using a Hidden Markov Model [pdf] - Ralf Dragon, Luc Van Gool
  • Empirical Minimum Bayes Risk Prediction: How to extract an extra 3% performance from vision models with just two more parameters [pdf] - Vittal Premachandran, Daniel Tarlow, Dhruv Batra
#173 - Depth Enhancement via Low-rank Matrix Completion [pdf]
Si Lu, Xiaofeng Ren, Feng Liu

Abstract: Depth captured by consumer RGB-D cameras is often noisy and misses values at some pixels, especially around object boundaries. Most existing methods complete the missing depth values guided by the corresponding color image. When the color image is noisy or the correlation between color and depth is weak, the depth map cannot be properly enhanced. In this paper, we present a depth map enhancement algorithm that performs depth map completion and de-noising simultaneously. Our method is based on the observation that for each RGB-D patch, if we find similarly looking patches, they lie in a very low-dimensional subspace. We can then assemble similar patches into a matrix and enforce this low-rank subspace constraint. This low-rank subspace constraint essentially captures the underlying structure in the RGB-D patches and enables robust depth enhancement against the noise or weak correlation between color and depth. Based on this subspace constraint, our method formulates depth map enhancement as a low-rank matrix completion problem. Since the rank of a matrix changes over matrices, we develop a data-driven method to automatically determine the rank number for each matrix. The experiments with our method on public benchmark show that our method can effectively enhance depth maps from consumer RGB-D cameras.
Similar papers:
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
#178 - Describing Textures in the Wild [pdf]
Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi

Abstract: Patterns and textures are defining characteristics of many natural objects: a shirt can be striped, the wings of a butterfly can be veined, and the skin of an animal can be scaly. Aiming at supporting this analytical dimension in image understanding, we address the challenging problem of describing textures with semantic attributes. We identify a rich vocabulary of forty-seven texture terms and use them to describe a large dataset of patterns collected in the wild. The resulting Describable Textures Dataset (DTD) is the basis to seek for the best texture representation for recognizing describable texture attributes in images. We port from object recognition to texture recognition the Improved Fisher Vector (IFV) and show that, surprisingly, it outperforms specialized texture descriptors not only on our problem, but also in established material recognition datasets. We also show that the describable attributes are excellent texture descriptors, transferring between datasets and tasks; in particular, combined with IFV, they significantly outperform the state-of-the-art by more than 8% on both FMD and KTHTIPS-2b benchmarks. We also demonstrate that they produce intuitive descriptions of materials and Internet images.
Similar papers:
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Lacunarity Analysis on Image Patterns for Texture Classification [pdf] - Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
#180 - Parsing World's Skylines using Shape-Constrained MRFs [pdf]
Rashmi Tonge, Subhransu Maji, C.V. Jawahar

Abstract: We propose an approach for extracting the detailed structure of buildings in typical skyline images. Our approach is based on a Markov Random Field (MRF) formulation that exploits the fact that such images contain highly overlapping objects of similar shapes. Our contributions are the following: (1) A dataset of 120 skyline images containing over 4,000 buildings that are individually labeled that allows us to quantitatively evaluate the performance of various methods, (2) An analysis of low-level features that are useful for segmentation of buildings, and (3) A shape-constrained MRF that enforces shape priors over the regions. We perform experiments on automatic and interactive setting, and show that in both cases to our formulation offers an order of magnitude speedup over traditional approaches and improves performance.
Similar papers:
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • FastSeg: More Efficiency on Multiple Figure-Ground Segmentations [pdf] - ahmad Humayun, Fuxin Li, James Rehg
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
#182 - StoryGraphs: Narrative Charts for TV series [pdf]
Makarand Tapaswi, Martin Buml, Rainer Stiefelhagen

Abstract: We present a novel way to automatically summarize and represent the storyline of a TV episode by visualizing character interactions in a narrative chart. We also propose a scene detection method that lends itself well to generate oversegmented scenes which is used to partition the video. The positioning of the characters in the chart is formulated as an optimization problem wherein we trade off between the aesthetics of the chart and its functionality. Using automatic person identification, we generate StoryGraphs on 3 diverse TV series encompassing a total of 22 episodes. We define quantitative criteria to evaluate StoryGraphs and also compare them against episode summaries to evaluate their ability to provide the episode overview.
Similar papers:
  • Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction [pdf] - Gunhee Kim, Leonid Sigal, Eric Xing
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Region-based Discriminative Feature Pooling for Scene Text Recognition [pdf] - Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
  • Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf] - Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
#196 - Inferring Unseen Views of People [pdf]
Chao-Yeh Chen, Kristen Grauman

Abstract: We pose unseen view synthesis as a probabilistic tensor completion problem. Given images of people organized by their rough viewpoint, we form a 3D appearance tensor indexed by images (pose examples), viewpoints, and image positions. After discovering the low-dimensional latent factors that approximate that tensor, we can impute its missing entries. In this way, we generate novel synthetic views of people---even when they are observed from just one camera viewpoint. We show that the inferred views are both visually and quantitatively accurate. Furthermore, we demonstrate their value for recognizing actions in unseen views and estimating viewpoint in novel images. While existing methods are often forced to choose between data that is either realistic or multi-view, our virtual views offer both, thereby allowing greater robustness to viewpoint in novel images.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
  • Feature-Independent Action Spotting Without Human Localization, Segmentation or Frame-wise Tracking [pdf] - Chuan Sun, Hassan Foroosh
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
#200 - Orientational Pyramid Matching for Recognizing Indoor Scenes [pdf]
Lingxi Xie, Jingdong Wang, Bo Zhang, Qi Tian

Abstract: Scene recognition is a basic task towards image understanding. Spatial Pyramid Matching (SPM) has been shown to be a popular solution for spatial context modeling. In this paper, we introduce an alternative approach, Orientational Pyramid Matching (OPM), for orientational context modeling. Our approach is motivated by the observation that the 3D orientations of objects are a crucial factor to discriminate indoor scenes. The novelty lies in that OPM uses the 3D orientations to form the pyramid and produce the pooling regions, which is unlike SPM that uses the spatial positions to form the pyramid. Experimental results over the challenging MIT Indoor-67 dataset show that OPM achieves the performance comparable with SPM and that OPM and SPM make complementary contributions and their combination gives the state-of-the-art performance.
Similar papers:
  • Super Normal Vector for Activity Recognition Using Depth Sequences [pdf] - Xiaodong Yang, Yingli Tian
  • Bags of Spacetime Energies for Dynamic Scene Recognition [pdf] - Christoph Feichtenhofer, Axel Pinz, Richard Wildes
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
  • Learning Important Spatial Pooling Regions for Scene Classification [pdf] - DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia
#212 - Joint Unsupervised Multi-Class Image Segmentation [pdf]
Fan Wang, Qixing Huang, Maks Ovsjanikov, Leonidas J. Guibas

Abstract: Joint segmentation of image sets is a challenging problem, especially when there are multiple objects with variable appearances shared among the images in the collection and the set of objects present in each particular image is itself varying and unknown. In this paper, we present a novel method to jointly segment a set of images with objects from multiple classes. We first establish consistent functional maps across the input images, and introduce a formulation that explicitly models partial similarity across images instead of global consistency. Given the optimized maps across the images, multiple groups of consistent segmentations are found such that they align with segmentation cues in the images, agree with the functional maps, and are mutually exclusive. The proposed fully unsupervised approach exhibits a significant improvement over the state-of-the-art methods, as shown on the co-segmentation data sets MSRC, Flickr, and PASCAL.
Similar papers:
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
  • Dense Non-Rigid Shape Correspondence using Random Forests [pdf] - Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers
  • Laplacian Coordinates for Seeded Image Segmentation [pdf] - Wallace Casaca, Gustavo Nonato, Gabriel Taubin
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
#213 - Looking Beyond the Visible Scene [pdf]
Joseph Lim, Aditya Khosla, Antonio Torralba, Byoungkwon An An

Abstract: A common thread that ties previous works in scene understanding together is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned its pixels, it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential establishments near it, the potential crime rate in the area, or even the economic climate. In this work, we explore several areas from both the human perception and computer vision perspective. Specifically, we show that its possible to predict the distance of surrounding establishments such as McDonald's or hospitals even by using a scenes located far from them. We go a step further to show that both humans and computers perform reasonably at navigating the environment based only on visual cues from scenes that contain no direct information about the target. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activity. Simply put, here, we illustrate that it is possible to look beyond the visible scene.
Similar papers:
  • Describing Textures in the Wild [pdf] - Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Predicting Object Dynamics in Scenes [pdf] - David Fouhey, Larry Zitnick
  • Compuer vision vs. human vision: What can be learned? [pdf] - Ali Borji, Laurent Itti
#218 - Towards Multi-view and Partially-occluded Face Alignment [pdf]
Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan

Abstract: We present a robust algorithm to locate facial landmarks under different views and possibly severe occlusions. To build reliable relationships between face appearance and shape with large view variations, we propose to formulate face alignment as an $\ell_1$-induced Stagewise Relational Dictionary (SRD) learning problem. During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks. During testing, the SRD model automatically selects a sparse set of the most related shape displacements for the testing sample and uses them to refine its shape iteratively. To locate face landmarks under occlusions, we further propose to learn an occlusion dictionary to model different kinds of partial face occlusions. By deploying the occlusion dictionary into the SRD model, the alignment performance for occluded faces can be further improved. Our algorithm is simple, effective, and easy to implement. Extensive experiments on two benchmark datasets and two newly built datasets have demonstrated its superior performances over the state-of-the-art methods, especially for faces with large view variations and/or occlusions.
Similar papers:
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
#220 - Learning Mid-level Filters for Person Re-identification [pdf]
Rui Zhao, Wanli Ouyang, Xiaogang Wang

Abstract: In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification. It is well motivated by our study on what are good filters for person re-identification. Our mid-level filters are discriminatively learned for identifying specific visual patterns and distinguishing persons, and have good cross-view invariance. First, local patches are qualitatively measured and classified with their discriminative power. Discriminative and representative patches are collected for filter learning. Second, patch clusters with coherent appearance are obtained by pruning hierarchical clustering trees, and a simple but effective cross-view training strategy is proposed to learn filters that are view-invariant and discriminative. Third, filter responses are integrated with patch matching scores in RankSVM training. The effectiveness of our approach is validated on the VIPeR dataset and the CUHK Campus dataset. The learned mid-level features are complementary to existing handcrafted low-level features, and improve the best Rank-1 matching rate on the VIPeR dataset by 14%.
Similar papers:
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Filter Pairing Neural Network for Person Re-identification [pdf] - Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang
#230 - Automatic Face Reenactment [pdf]
Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt

Abstract: We propose an image-based facial reenactment system that replaces the face of an actor in an existing target video with the face of a user from a source video, while preserv- ing the original target performance. Our system is fully au- tomatic and does not require a database of source expres- sions. Instead, it is able to produce convincing reenactment results from a short source video of the user performing ar- bitrary facial gestures captured with an off-the-shelf cam- era, such as a webcam. Our reenactment pipeline is con- ceived as part image retrieval and part face transfer: Image retrieval is based on temporal clustering of target frames and a novel image matching metric that combines appear- ance and motion to select candidate frames from the source video, while face transfer is done by a 2D warping strat- egy that preserves the users identity. Our system excels in simplicity because it does not rely on a 3D face model, it is robust under head motion and does not require the source and target performance to be similar. We show convincing reenactment results for videos that we recorded ourselves and for low-quality footage taken from the Internet.
Similar papers:
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
#231 - Illumination-Aware Age Progression [pdf]
Supasorn Suwajanakorn, Ira Kemelmacher, Steve Seitz

Abstract: We present an approach that takes a single photograph of a child as input and automatically produces a series of age-progressed outputs between 1 and 80 years of age, accounting for pose, expression, and illumination. Leveraging thousands of photos of children and adults at many ages from the Internet, we first show how to compute average image subspaces that are pixel-to-pixel aligned and model variable lighting. These averages depict a prototype man and woman aging from 0 to 80, under any desired illumination, and capture the differences in shape and texture between ages. Applying these differences to a new photo yields an age progressed result. Contributions include re-lightable age subspaces, a novel technique for subspace-to-subspace alignment, and the most extensive evaluation of age progression techniques in the literature.
Similar papers:
  • Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction [pdf] - Gunhee Kim, Leonid Sigal, Eric Xing
  • Multi-modal Learning in Loosely-organized Web Images [pdf] - Kun Duan, David Crandall, Dhruv Batra
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
  • A Study on Cross-Population Age Estimation [pdf] - Chao Zhang, Guodong Guo
#234 - Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf]
Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang

Abstract: This paper presents a novel algorithm which uses hash bits for efficiently optimizing non-linear kernel SVM in very large scale visual classification problems. Our key idea is to represent each sample with compact hash bits and define an inner product over these bits, which serves as the surrogate of the original nonlinear kernels. Then the optimal solution of the nonlinear SVM can be transformed into solving a linear SVM over the hash bits. The proposed Hash-SVM enjoys both greatly reduced data storage owing to the compact binary representation, as well as the (sub-)linear training complexity via linear SVM. As a crucial component of Hash-SVM, we propose a novel hashing scheme for arbitrary non-linear kernels via random subspace projection in reproducing kernel Hilbert space. Our comprehensive analysis reveals a well behaved theoretic bound of the deviation between the proposed hashing-based kernel approximation and the original kernel function. We also derived moderate requirements on the hash bits for achieving a satisfactory accuracy level. Several experiments on large-scale visual classification benchmarks are conducted, including one with over 1 million images. The results well demonstrated the superiority of our algorithm when compared with other alternatives.
Similar papers:
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
#237 - Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts [pdf]
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Nam-Gyu Cho, Sanja Fidler, Raquel Urtasun, Alan Yuille

Abstract: Detecting objects becomes difficult when we should deal with large shape deformation, occlusion and low resolution. We propose a novel approach to i) handle large deformation and partial occlusion in animals (as examples of highly deformable objects), ii) describe them in terms of body parts, and iii) detect them when their body parts are hard to detect (e.g., animals of low resolution). We represent the holistic object and body parts separately and use a fully connected model to arrange templates for the holistic object and body parts. Our model automatically decouples the holistic object or body parts from the model when they are hard to detect. This enables our model to represent an exponential number of holistic object and body part combinations to better deal with different detectability patterns caused by deformations, occlusions or low resolution. We apply our method to the six animal categories in the Pascal VOC dataset and show that our method significantly improves state-of-the-art by 4.1 AP and provides a richer representation for objects. During training we use annotations for body parts (e.g., head, torso, etc). This makes use of a new dataset of fully annotated object parts for Pascal VOC 2010, which provides the mask for the parts.
Similar papers:
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Human Pose Estimation: New Benchmark and State of the Art Analysis [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
#247 - Quality-based Multimodal Classification Using Tree-Structured Sparsity [pdf]
Soheil Bahrampour, Asok Ray, nasser Nasrabadi, Kenneth Jenkins

Abstract: Recent studies have demonstrated advantages of information fusion based on sparsity models for multimodal classification. Among several sparsity models, tree-structured sparsity provides a flexible framework for extraction of cross-correlated information from different sources and for enforcing group sparsity at multiple granularities. However, the existing algorithm only solves an approximated version of the cost functional and the resulting solution is not necessarily sparse at group levels. This paper reformulates the tree-structured sparse model for multimodal classification task. An accelerated proximal algorithm is proposed to solve the optimization problem, which is an efficient tool for feature-level fusion among either homogeneous or heterogeneous sources of information. In addition, a (fuzzy-set-theoretic) possibilistic scheme is proposed to weight the available modalities, based on their respective reliability, in a joint optimization problem for finding the sparsity codes. This approach provides a general framework for quality-based fusion that offers added robustness to several sparsity-based multimodal classification algorithms. To demonstrate their efficacy, the proposed methods are evaluated on three different applications -- multiview face recognition, multimodal face recognition, and target classification.
Similar papers:
  • Topic Modeling of Multimodal Data: an Autoregressive Approach [pdf] - Yin Zheng, Yu-Jin Zhang, Hugo Larochelle
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Robust Surface Reconstruction via Triple Sparsity [pdf] - Hicham Badri, Hussein Yahia, Driss Aboutajdine
  • Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf] - Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma
#249 - Diversity-Enhanced Condensation Algorithm and Its Application for Robust and Accurate Endoscope Electromagnetic Tracking [pdf]
Ying Wan, Xiongbiao Luo, Sean He, Jie Yang, Terry Peters, kensaku Mori

Abstract: The paper proposed a diversity-enhanced condensation algorithm to address the particle degeneracy or impoverishment problem which particle filtering methods usually suffer from. The particle diversity plays an important role in state prorogation since it affects the algorithm's performance. Unfortunately, the condensation algorithm easily gets trapped in local minima due to the shortage of particle modes. We introduce a modified evolutionary computing method, adaptive differential evolution, to resolve the particle impoverishment under a proper size of the particle population. We applied our proposed method to endoscope electromagnetic tracking for estimating three-dimensional motion of the endoscopic camera. Validation on a dynamic phantom proves that our proposed method offers a more robust and accurate tracking framework than previous methods by reduce the tracking error from 4.8 mm to 3.2 mm.
Similar papers:
  • Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf] - Iason Oikonomidis, Manolis Lourakis, Antonis Argyros
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Non-Parametric Bayesian Constrained Local Models [pdf] - Pedro Martins, Rui Caseiro, Jorge Batista
  • Region-based particle filter for video object segmentation [pdf] - David Varas, Ferran Marques
#267 - Cross-Scale Cost Aggregation for Stereo Matching [pdf]
Kang Zhang, Yuqiang Fang, Dongbo Min, Lifeng Sun, Shiqiang Yang, Shuicheng Yan, Qi Tian

Abstract: Human beings process stereoscopic correspondence across multiple scales. However, this bio-inspiration is ignored by state-of-the-art cost aggregation methods for dense stereo correspondence. In this paper, a generic cross-scale cost aggregation framework is proposed to allow multi-scale interaction in cost aggregation. We firstly reformulate cost aggregation from a unified optimization perspective and show that different cost aggregation methods essentially differ in the choices of similarity kernels. Then, an inter-scale regularizer is introduced into optimization and solving this new optimization problem leads to the proposed framework. Since the regularization term is independent of the similarity kernel, various cost aggregation methods can be integrated into the proposed general framework. We show that the cross-scale framework is important as it effectively and efficiently expands state-of-the-art cost aggregation methods and leads to significant improvements, when evaluated on Middlebury, KITTI and New Tsukuba datasets.
Similar papers:
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
#276 - Learning an image-based motion context for multiple people tracking [pdf]
Laura Leal-Taix, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese

Abstract: We present a novel method for multiple people tracking that leverages a generalized model for capturing interactions among individuals. At the core of our model is a learned dictionary of interaction feature strings which capture the relationship between the motion of the targets. These feature strings, created from low-level image features, lead to a much richer representation of the physical interactions between targets compared to hand-specified social force models that previous works have introduced for tracking. One disadvantage of using social forces is that all pedestrians need to be detected in order for the forces to be applied, while our method is able to encode the effect of undetected targets, making the tracker more robust to partial occlusions. The interaction feature strings are used in a Random Forest framework to track the targets according to the features surrounding them. Results on six publicly available sequences show that our method outperforms state-of-the-art approaches in multiple people tracking.
Similar papers:
  • Unsupervised Trajectory Modelling using Temporal Information via Minimal Paths [pdf] - Brais Cancela, Alberto Iglesias, Marcos Ortega, Manuel Penedo
  • Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier [pdf] - Arthur Costea, Sergiu Nedevschi
  • Informed Haar-like Features Improve Pedestrian Detection [pdf] - Shanshan Zhang, Christian Bauckhage, Armin Cremers
  • Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf] - Jiejie Zhu
#282 - Evaluation of Scan-Line Optimization for 3D Medical Image Registration [pdf]
Simon Hermann

Abstract: Scan-line optimization via cost accumulation has become very popular for stereo estimation in computer vision applications and is often combined with a semi-global integration strategy, known as SGM. This paper introduces this combination as a general and effective optimization technique. It is the first time that this concept is applied to 3D medical image registration. The presented algorithm, SGM-3D, employs a coarse-to-fine strategy and reduces the search space dimension for consecutive pyramid levels by a fixed linear rate. This allows it to handle large displacements to an extent that is required for clinical applications in high dimensional data. SGM-3D is evaluated in context of pulmonary motion analysis on the recently extended DIR-lab benchmark that provides ten 4D computed tomography (CT) image data sets, as well as ten challenging 3D CT scan pairs from the COPDgene study archive. Results show that both registration errors as well as run-time performance are very competitive with current state-of-the-art methods.
Similar papers:
  • Fast and Exact: Shape Segmentation Using ADMM and Structured Prediction [pdf] - Haithem Boussaid, Iasonas Kokkinos
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
#288 - Deeply-Learned Slow Feature Analysis for Action Recognition [pdf]
LIN SUN

Abstract: Most previous works on video action recognition primarily use complex hand-designed local features, such as popular SIFT, HOG and SURF, but these approaches are time-consuming and difficult to extend to other sensor modalities. Recent studies discover that there is no universally best hand-engineered features for all datasets, and learning features directly from the dataset itself may be more advantageous. One such endeavor is Slow Feature Analysis (SFA) proposed by Wiskott and Sejnowski \cite{sfa}. SFA can learn the invariant and slowly varying features from input signals and has proved to be valuable in human action recognition \cite{sfa_action}. It is also observed that the multi-layer feature representation has succeeded remarkably in idespread machine learning applications. In this paper, we propose to combine SFA with deep learning techniques to learn hierarchical representations from the high-resolution video data. Specifically, we use a two-layered SFA learning structure with 3D convolution and max pooling operations to scale up the method to large inputs. Sharing the same merits of deep learning, the proposed method is generic and fully automated. Our classification results on Hollywood2, KTH and UCF sports datasets are superior to most of previous published results. To highlight some, on the challenging Hollywood2 dataset, our recognition rate shows approximately $1\%$ improvement in comparison to most of hand-designed methods even without supervising and dense sa
Similar papers:
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Leveraging Hierarchical Parametric Network for Skeletal Joints Action Segmentation and Recognition [pdf] - Di Wu, Ling Shao
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
#289 - Subspace Clustering for Sequential Data [pdf]
Stephen Tierney, Junbin Gao, Yi Guo

Abstract: We propose Ordered Subspace Clustering (OSC) to segment data drawn from a sequentially ordered union of subspaces. Current subspace clustering techniques learn the relationships within a set of data and then use a separate clustering algorithm such as NCut for final segmentation. In contrast our technique, under certain conditions, is capable of segmenting clusters intrinsically without providing the number of clusters as a parameter. Similar to Sparse Subspace Clustering (SSC) we formulate the problem as one of finding a sparse representation but include a new penalty term to take care of sequential data. We test our method on data drawn from infrared hyper spectral data, video sequences and face images. Our experiments show that our method, OSC, outperforms the state of the art methods: Spatial Subspace Clustering (SpatSC), Low-Rank Representation (LRR) and SSC.
Similar papers:
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
  • SCAMS: Simultaneous Clustering and Model Selection [pdf] - Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
  • Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces [pdf] - Yingying Zhu, Dong Huang, Fernando de la Torre, Simon Lucey
#294 - How to Evaluate Foreground Maps? [pdf]
Ran Margolin, Lihi Zelnik-Manor, Ayellet Tal

Abstract: The output of many algorithms in computer-vision is either non-binary maps or binary maps (e.g., salient object detection and object segmentation). Several measures have been suggested to evaluate the accuracy of these foreground maps. In this paper, we show that the most commonly-used measures for evaluating both non-binary maps and binary maps do not always provide a reliable evaluation. This includes the Area-Under-the-Curve measure, the Average-Precision measure, the F-measure, and the evaluation measure of the PASCAL VOC segmentation challenge. We start by identifying three causes of inaccurate evaluation. We then propose a new measure that amends these flaws. An appealing property of our measure is being an intuitive generalization of the F-measure. Finally we propose four meta-measures to compare the adequacy of evaluation measures. We show via experiments that our novel measure is preferable.
Similar papers:
  • Joint Motion Segmentation and Background Subtraction in Dynamic Scenes [pdf] - Adeel Mumtaz, Weichen Zhang, Antoni Chan
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
#295 - An Automated Estimator of Image Visual Realism Based on Human Cognition [pdf]
Shaojing Fan, Tian-Tsong Ng, Jonathan Herberg, Bryan Koenig, Cheston Tan, Rang-ding Wang

Abstract: Assessing the visual realism of images is increasingly becoming an essential aspect of fields ranging from computer graphics (CG) rendering to photo manipulation. In this paper we systematically evaluate factors underlying human perception of visual realism and use that information to create an automated assessment of visual realism. We make the following unique contributions. First, we established a benchmark dataset of images with empirically determined visual realism scores. Second, we identified attributes potentially related to image realism, and used correlational techniques to determine that realism was most related to image naturalness, familiarity, aesthetics, and semantics. Third, we created an attributes-motivated, automated computational model that estimated image visual realism quantitatively. Using human assessment as a benchmark, the model was below human performance, but outperformed other state-of-the-art algorithms.
Similar papers:
  • Illumination-Aware Age Progression [pdf] - Supasorn Suwajanakorn, Ira Kemelmacher, Steve Seitz
  • Describing Textures in the Wild [pdf] - Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
#302 - Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf]
Seung-Hwan Bae, Kuk-Jin Yoon

Abstract: Online multi-object tracking aims at producing complete tracks of multiple objects using the information up to the present time. It still remains a difficult problem in complex scenes, because of frequent occlusion by a clutter or other objects, similar appearances of different objects, and so on. In this paper, we propose a robust online multi-object tracking method that can handle those difficulties effectively. We first propose the tracklet confidence using the detectability and continuity of a tracklet, and formulate a multi-object tracking problem based on the tracklet confidence. The multi-object tracking problem is then solved by associating tracklets in different ways according to their confidence values. Based on this strategy, tracklets sequentially grow with online-provided detections and fragmented tracklets are linked up with others without any iterative and expensive associations. Here, for the reliable association between tracklets and detections, we also propose a novel online learning method using an incremental linear discriminant analysis for discriminating the appearances of objects. By exploiting the proposed learning method, the tracklet association can be successfully achieved even under severe occlusion. Experiments with challenging public datasets show obvious performance improvement over other batch and online tracking methods.
Similar papers:
  • Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf] - Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan
  • An Online Learned Elementary Grouping Model for Multi-target Tracking [pdf] - Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu
  • Tracklet Association with Online Reidentification in Network Flow Optimaiztion for Long-term Multi-Person Tracking [pdf] - BING WANG, Gang Wang, Kap Luk Chan, LI WANG
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
#304 - Discrete-Continuous Gradient Orientation Estimation for Faster Unsupervised Segmentation [pdf]
Michael Donoser, Dieter Schmalstieg

Abstract: The state-of-the-art in fully unsupervised segmentation builds hierarchical segmentation structures based on analyzing local feature cues in spectral settings. Due to their impressive performance, such segmentation approaches have become building blocks in many computer vision applications. Nevertheless, the main bottlenecks are still the computationally demanding processes of local feature extraction and subsequent spectral analysis. In this paper, we demonstrate that based on effectively trained random forests aiming at a discrete-continuous optimization of oriented gradient signals, we are able to provide segmentation performance competitive to state-of-the-art (even without any additional spectral analysis) while reducing computation time by a factor of 30. The output of our algorithm is a hierarchy of segmentation results with differing granularity, and in such a way we are able to provide useful input to various computer vision applications at significantly reduced runtime.
Similar papers:
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
#305 - Good Vibrations: A Modal Analysis Approach for Sequential Non-Rigid Structure from Motion [pdf]
Antonio Agudo, Lourdes Agapito, Begoa Calvo, Jose M. Montiel

Abstract: We propose an online solution to Non-Rigid Structure from Motion that performs camera pose and 3D shape esti- mation of highly deformable surfaces on a frame-by-frame basis. Our method models non-rigid deformations as a linear combination of some mode shapes obtained using modal analysis from continuum mechanics. The shape is first discretized into linear elastic triangles, modelled by means of finite elements, which are used to pose the force balance equations for an un-damped free vibrations model. The shape basis computation comes down to solving an eigenvalue problem, without the requirement of a learning step. The camera pose and time varying weights that de- fine the shape at each frame are then estimated on the fly, in an online fashion, using bundle adjustment over a sliding window of image frames. The result is a low computational cost method that can run sequentially in real-time. We show experimental results on synthetic sequences with ground truth 3D data and real videos for different scenarios ranging from sparse to dense scenes. Our sys- tem exhibits a good trade-off between accuracy and com- putational budget, it can handle missing data and performs favourably compared to competing methods.
Similar papers:
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • Bayesian Active Contours with Affine-Invariant, Elastic Shape Prior [pdf] - Darshan Bryner, Anuj Srivastava
  • A Procrustean Markov Process for Non-Rigid Structure Recovery [pdf] - Minsik Lee, Chong-Ho Choi, Songhwai Oh
  • Deformable Object Matching via Deformation Decomposition based 2D Label MRF [pdf] - Kangwei Liu, zhang Junge, Kaiqi Huang, Tieniu Tan
#306 - Incremental Learning of NCM Forests for Large-Scale Image Classification [pdf]
Marko Ristin, Matthieu Guillaumin, Juergen Gall, Luc Van Gool

Abstract: In recent years, large image data sets such as ImageNet, TinyImages or ever-growing social networks like Flickr have emerged, posing new challenges to image classication that were not apparent in smaller image sets. In particular, the efcient handling of dynamically growing data sets, where not only the amount of training images, but also the number of classes increases over time, is a relatively unexplored problem. To remedy this, we introduce Nearest Class Mean Forests (NCMF), a variant of Random Forests where the decision nodes are based on nearest class mean (NCM) classication. NCMFs not only outperform conventional random forests, but are also well suited for integrating new classes. To this end, we propose and compare several approaches to incorporate data from new classes, so as to seamlessly extend the previously trained forest instead of re-training them from scratch. In our experiments, we show that NCMFs trained on small data sets with 10 classes can be extended to large data sets with 1000 classes without signicant loss of accuracy.
Similar papers:
  • Incremental Activity Modeling and Recognition in Streaming Videos [pdf] - MAHMUDUL HASAN, Amit Roy-Chowdhury
  • Discriminative Feature-to-Point Matching in Image-Based Localization [pdf] - Michael Donoser, Dieter Schmalstieg
  • Dense Non-Rigid Shape Correspondence using Random Forests [pdf] - Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
#310 - Asymmetrical Gauss Mixture Models for Point Sets Matching [pdf]
Wenbing Tao, Kun Sun

Abstract: The probabilistic methods based on Symmetrical Gauss Mixture Model(SGMM)[3,12,7] have achieved great success in point sets registration, but are seldom used to find the correspondences between two images due to the complexity of the non-rigid transformation and too many outliers. In this paper we propose an Asymmetrical GMM(AGMM) for point sets matching between a pair of images. Different from the previous SGMM, the AGMM gives each Gauss component a different weight which is related to the feature similarity between the data point and model point, which leads to two effective algorithms: the Single Gauss Model for Mismatch Rejection(SGMR) algorithm and the AGMM algorithm for point sets matching. The SGMR algorithm iteratively filters mismatches by estimating a non-rigid transformation between two images based on the spatial coherence of point sets. The AGMM algorithm combines the feature information with position information of the SIFT feature points extracted from the images to achieve point sets matching so that much more correct correspondences with high precision can be found. A number of comparison and evaluation experiments reveal the excellent performance of the proposed SGMR algorithm and AGMM algorithm.
Similar papers:
  • Fast Rotation Search with Stereographic Projections for 3D Registration [pdf] - Alvaro Parra Bustos, Tat-Jun Chin, David Suter
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Finding Matches in a Haystack: A Max-Pooling Strategy for Graph Matching in the Presence of Outliers [pdf] - Minsu Cho, Jian Sun, Jean Ponce
#313 - A Compositional Model for Low-Dimensional Image Set Representation [pdf]
Hossein Mobahi, Ce Liu, Bill Freeman

Abstract: Learning a low-dimensional representation of images is useful for various applications in graphics and computer vision. Manifold learning on images addresses this problem. However, existing works either require very dense sampling of the space, or are applicable to patch level only, ignoring global structures in the images. We present a simple method that operates on the entire image, but can learn from small sized datasets. The model relies on a composition structure of color, shape, and appearance. We show that each component can be approximated by a low-dimensional subspace when the others are factored out. Our formulation allows for very efficient learning and experiments show encouraging results.
Similar papers:
  • SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization [pdf] - Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun
  • SphereFlow: 6 DoF Scene Flow from RGB-D Pairs [pdf] - Michael Hornacek, Andrew Fitzgibbon, Margrit Gelautz, Carsten Rother
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
#317 - Associative embeddings for large-scale knowledge transfer with self-assessment [pdf]
Alexander Vezhnevets, Vittorio Ferrari

Abstract: We propose a method for knowledge transfer between semantically related classes in ImageNet. By transferring knowledge from the images that have bounding-box annotations to the others, our method is capable of automatically populating ImageNet with many more bounding-boxes and even pixel-level segmentations. The underlying assumption that objects from semantically related classes look alike is formalized in our novel Associative Embedding (AE) representation. AE recovers the latent low-dimensional space of appearance variations among image windows. The dimensions of AE space tend to correspond to aspects of window appearance (e.g. side view, close up, background). We model the overlap of a window with an object using Gaussian Processes (GP) regression, which spreads annotation smoothly through AE space. The probabilistic nature of GP allows our method to perform self-assessment, i.e. assigning a quality estimate to its own output. It enables trading off the amount of returned annotations for their quality. A large scale experiment on 219 classes and 0.5 million images demonstrates that our method outperforms state-of-the-art methods and baselines for both object localization and segmentation. Using self-assessment we can automatically return bounding-box annotations for 30\% of all images with high localization accuracy (i.e. 73\% average overlap with ground-truth).
Similar papers:
  • Instance-weighted Transfer Learning of Active Appearance Models [pdf] - Daniel Haase, Erik Rodner, Joachim Denzler
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
  • Efficient Localization with Fisher Vectors using Approximate Normalizations [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
#318 - Max-Margin Boltzmann Machines for Object Segmentation [pdf]
Jimei Yang, Simon Safar, Ming-Hsuan Yang

Abstract: We present Max-Margin Boltzmann Machines (MMBMs) for object segmentation. MMBMs are essentially a class of Conditional Boltzmann Machines that model the joint distribution of hidden variables and output labels conditioned on input observations. In addition to image-to-label connections, we build direct image-to-hidden connections to facilitate global shape prediction, and thus derive a simple Iterated Conditional Modes algorithm for efficient maximum a posteriori inference. We formulate a max-margin objective function for discriminative training, and analyze the effects of different margin functions on learning. We evaluate MMBMs using three datasets against state-of-the-art methods to demonstrate the strength of the proposed algorithms.
Similar papers:
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Blind Image Quality Assessment using Semi-supervised Rectifier Networks [pdf] - Huixuan Tang, Neel Joshi, Ashish Kapoor
  • The Shape-Time Random Field for Semantic Video Labeling [pdf] - Andrew Kae, Erik Learned-Miller, Benjamin Marlin
#324 - Parsing Occluded People [pdf]
Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes

Abstract: Occlusion poses a significant difficulty for object recognition due to the combinatorial diversity of possible occlusion patterns. We take a strongly supervised, non-parametric approach to modeling occlusion by learning deformable models with many local part mixture templates using large quantities of synthetically generated training data. This allows the model to learn the appearance of different occlusion patterns including figure-ground cues such as the shapes of occluding contours as well as the co-occurrence statistics of occlusion between neighboring parts. We test the resulting model on human pose estimation under heavy occlusion and find it produces improved localization accuracy. The underlying part mixture-structure also allows the model to make compelling predictions of figure-ground-occluder segmentations.
Similar papers:
  • Towards Multi-view and Partially-occluded Face Alignment [pdf] - Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf] - Menglong Yang, Yiguang Liu, Stan Li
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
#325 - Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf]
Golnaz Ghiasi, Charless Fowlkes

Abstract: The presence of occluding objects significantly impacts performance of systems for object recognition. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and keypoint localization that explicitly models occlusions of parts. The proposed model structure makes it possible to augment positive training data with large numbers of synthetically occluded instances. This allows us to easily incorporate the statistics of occlusion patterns in a discriminatively trained model. We test the model on several benchmarks for keypoint localization including challenging sets featuring significant occlusion. We find that the addition of an explicit model of occlusion yields a system that outperforms existing approaches in keypoint localization accuracy.
Similar papers:
  • A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf] - Menglong Yang, Yiguang Liu, Stan Li
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • Towards Multi-view and Partially-occluded Face Alignment [pdf] - Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
#328 - Modeling long-tail distributions of object subcategories [pdf]
Xiangxin Zhu, Dragomir Anguelov, Deva Ramanan

Abstract: We argue that object subcategories follow a long-tail distribution: a few subcategories are common, while many are rare. We describe distributed algorithms for learning large-mixture models that capture long-tail distributions, which are hard to model with current approaches. We introduce a generalized notion of mixtures (or subcategories) that allow for examples to be shared across multiple subcategories. We optimize our models with a discriminative meanshift clustering algorithm that searches over mixtures in a distributed, brute-force fashion. We have used our scalable system to train tens of thousands of deformable mixtures for VOC objects. We demonstrate significant performance improvements, particularly for object classes that are characterized by large appearance variation.
Similar papers:
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
  • Analysis by Synthesis: Object Recognition by Object Reconstruction [pdf] - Mohsen Hejrati, Deva Ramanan
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • Object Discovery and Segmentation via Discriminative Visual Subcategories [pdf] - Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta
#331 - A General and Simple Method for Camera Pose and Focal Length Determination [pdf]
Yinqiang Zheng, Shigeki Sugimoto, Imari Sato, Masatoshi Okutomi

Abstract: In this paper, we revisit the pose determination problem of a partially calibrated camera with unknown focal length, hereafter referred to as the P\(n\)Pf problem, by using \(n\) (\(n\geq4\)) 3D-to-2D point correspondences. Our core contribution is to introduce the angle constraint and derive a compact bivariate polynomial equation for each point triplet. Based on this polynomial equation, we propose a truly general method for the P\(n\)Pf problem, which is best suited both to the minimal 4-point based RANSAC application, and also to large scale scenarios with thousands of points, irrespective of the 3D point configuration. In addition, by solving bivariate polynomial systems via Sylvester resultant, our method is very simple and easy to implement. Its simplicity is especially obvious when one needs to develop a fast enough solver for the 4-point case. Experiment results have also demonstrated its superiority in accuracy and efficiency when compared with the existing state-of-the-art solutions.
Similar papers:
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Partial Symmetry in Polynomial Systems and Its Application in Computer Vision [pdf] - Yubin Kuang, Yinqiang Zheng, Kalle Astroem
#333 - Compuer vision vs. human vision: What can be learned? [pdf]
Ali Borji, Laurent Itti

Abstract: Research in computer vision has resulted in many models, some specialized for one problem, others more general. In the meantime, experimental vision scientists have collected invaluable behavioral data. Here, to help focus research efforts onto the hardest unsolved problems, and bridge computer and human vision, we define a battery of 5 tests that measure the gap between human and machine performances in several dimensions (generalization across scene categories, generalization from images to edge maps and line drawings, invariance to rotation and scaling, local/global information with jumbled images, object recognition performance). These tests assess models in achieving human-level object and scene recognition, irrespective of implementation details (biologically-inspired or not). To objectively quantify this, in addition to accuracy, we also measure the correlation between model and human error patterns. Experimenting over 7 scene and object datasets, where human data is available, and gauging 14 well-established models, we find that none fully resembles humans in all aspects, and we learn from each test which models and features are more promising in approaching humans in the tested dimension. Across all tests, we find that models based on local edge histograms consistently resemble humans more, while several scene statistics or gist models do perform well with both scenes and objects. While computer vision has long been inspired by human vision
Similar papers:
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Beyond Human Opinion Scores: Blind Image Quality Assessment based on Synthetic Scores [pdf] - Peng Ye, David Doermann
  • Incorporating Scene Context and Object Layout into Appearance Modeling [pdf] - Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi
  • Looking Beyond the Visible Scene [pdf] - Joseph Lim, Aditya Khosla, Antonio Torralba, Byoungkwon An An
#335 - SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization [pdf]
Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun

Abstract: We propose a novel motion model, SteadyFlow, to represent the motion between neighboring video frames for stabilization. A SteadyFlow is a specific optical flow by enforcing strong spatial coherence, such that smoothing feature trajectories can be replaced by smoothing pixel (motion) profiles, which are motion vectors collected at the same pixel location in the SteadyFlow over time. In this way, we can avoid brittle feature tracking in a video stabilization system. Besides, SteadyFlow is a more general 2D motion model which can deal with spatially-variant motion. We initialize the SteadyFlow by optical flow and then discard discontinuous motions by a spatial-temporal analysis and fill in missing regions by motion completion. Our experiments demonstrate the effectiveness of our stabilization on real-world challenging videos.
Similar papers:
  • A Compositional Model for Low-Dimensional Image Set Representation [pdf] - Hossein Mobahi, Ce Liu, Bill Freeman
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
#336 - Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization [pdf]
Brandon Smith, Jonathan Brandt, Zhe Lin, Li Zhang

Abstract: This paper addresses the problem of facial landmark localization on faces with extreme head poses and expressions. We propose a data-driven approach that models the correlations between each landmark and its surrounding appearance features. At runtime, each feature casts a weighted vote to predict landmark locations, where the weight is precomputed to take into account the feature's discriminative power. The feature voting-based landmark detection is more robust than previous local appearance-based detectors; we combine it with non-parametric shape regularization to build a novel facial landmark localization pipeline that is robust to scale, in-plane rotation, expression, and most importantly, extreme head pose. We achieve state-of-the-art performance on two especially challenging datasets populated by faces with extreme head poses and expressions.
Similar papers:
  • Gauss-Newton Constrained Local Models [pdf] - GEORGIOS TZIMIROPOULOS, Maja Pantic
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
#338 - Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf]
Linchao Bao, Qingxiong Yang, Hailin Jin

Abstract: We present a fast local optical flow algorithm that can handle large displacement motions. Our algorithm is inspired by recent successes of local methods in stereo matching and optical flow as well as approximate nearest neighbor field algorithms. The main novelty is a fast randomized edge-preserving approximate nearest neighbor field algorithm which propagates self-similarity patterns in addition to propagating offsets. Together with a hierarchical matching scheme, our method can produce high-quality flow in a very fast speed. Experimental results on public optical flow benchmarks show that our method is significantly faster than competitors without compromising on quality, especially when scenes contain large motions. In fact, the performance on MPI Sintel benchmark clearly demonstrates the effectiveness of our method for handling large displacement motions.
Similar papers:
  • DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences [pdf] - Hongsheng Yang, Wen-Yan Lin, Jiangbo Lu
  • A Compositional Model for Low-Dimensional Image Set Representation [pdf] - Hossein Mobahi, Ce Liu, Bill Freeman
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • SphereFlow: 6 DoF Scene Flow from RGB-D Pairs [pdf] - Michael Hornacek, Andrew Fitzgibbon, Margrit Gelautz, Carsten Rother
#341 - Co-Occurrence Statistics for Zero-Shot Classification [pdf]
Thomas Mensink, Cees Snoek, Efstratios Gavves

Abstract: In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes. Different from the common strategy in the literature, that requires manually defined attribute-to-class mappings, we rely on easy to obtain co-occurrence statistics of class labels harvested from existing annotations, web-search hit counts or image tags. Our main contribution is to use inter-dependencies that arise naturally between classes, for zero-shot classification. We propose various similarity metrics for leveraging the these co-occurrences, and show that our zero-shot classifiers can serve as priors for few-shot learning. Experiments on three challenging multi-labelled datasets reveal that our proposed zero-shot methods, are approaching and occasionally outperforming supervised SVMs. We conclude that co-occurrence statistics suffice for zero-shot classification.
Similar papers:
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Understanding Objects in Detail with Fine-grained Attributes [pdf] - Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Inferring Analogous Attributes [pdf] - Chao-Yeh Chen, Kristen Grauman
#345 - Analysis by Synthesis: Object Recognition by Object Reconstruction [pdf]
Mohsen Hejrati, Deva Ramanan

Abstract: We introduce a new approach for recognizing and reconstructing 3D objects in images. Our approach is based on an analysis by synthesis strategy. We use a forward synthesis model to constructs possible geometric interpretations of the world, and then selects the interpretation that best agrees with the measured visual evidence. This forward model synthesizes visual templates defined on invariant (HOG) features. These visual templates are discriminatively trained to be accurate for inverse estimation. We introduce an efficient brute-force approach to inference that searches through a large number of candidate reconstructions, returning the optimal one (or multiple likely candidates, if desired). One benefit of such an approach is that recognition is inherently (re)constructive. We show state of the art performance for detection and reconstruction on two challenging 3D object recognition datasets of cars and cuboids.
Similar papers:
  • Using k-poselets for detecting people and localizing their keypoints [pdf] - Bharath Hariharan, Georgia Gkioxari, Ross Girshick, Jitendra Malik
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
  • Manifold Based Dynamic Texture Synthesis from Extremely Few Samples [pdf] - Hongteng Xu, Hongyuan Zha, Mark Davenport
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
#346 - Object Classification with Adaptive Regions [pdf]
Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool

Abstract: In classification of objects substantial work has gone into improving the low level representation of an image by considering various aspects such as different features, a number of feature pooling and coding techniques and considering different kernels. Unlike these works, in this paper, we propose to enhance the \textit{semantic representation} of an image. We aim to learn the most important visual components of an image and how they interact in order classify the objects correctly. To achieve our objective, we propose a new latent SVM model for category level object classification. Starting from image-level annotations, we jointly learn the object class and its context in terms of spatial location (where) and appearance (what). Furthermore, to regularize the complexity of the model we learn the spatial and co-occurrence relations between adjacent regions, such that unlikely configurations are penalized. Experimental results demonstrate that the proposed method can consistently enhance results on the challenging Pascal VOC dataset in terms of classification. We also show how semantic representation can be exploited for finding similar content.
Similar papers:
  • Joint Motion Segmentation and Background Subtraction in Dynamic Scenes [pdf] - Adeel Mumtaz, Weichen Zhang, Antoni Chan
  • How to Evaluate Foreground Maps? [pdf] - Ran Margolin, Lihi Zelnik-Manor, Ayellet Tal
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
#355 - Joint-Histogram Weighted Median Filter [pdf]
Qi Zhang, Li Xu, Jiaya Jia

Abstract: Weighted median, in the form of either solver or filter, has been employed in a wide range of computer vision applications for its beneficial properties in sparsity representation. But it is hard to be accelerated due to both the spatial varying weight and median property compared with other local filters. We propose an efficient scheme to reduce computation complexity from O(r2) to O(r) where r is the kernel size. Our contribution is on a new joint-histogram representation, median tracking, and a new data structure that enables fast data access. The effectiveness of this scheme is demonstrated on optical flow estimation, stereo matching, structure-texture separation, image filtering, to name a few. The running time is largely shortened from several minutes to less than 1 second.
Similar papers:
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • T-Linkage: a Continuous Relaxation of J-Linkage for Multi-Model Fitting [pdf] - Luca Magri, Andrea Fusiello
#356 - Tracklet Association with Online Reidentification in Network Flow Optimaiztion for Long-term Multi-Person Tracking [pdf]
BING WANG, Gang Wang, Kap Luk Chan, LI WANG

Abstract: This paper presents a novel introduction of online reidentification in track fragment (tracklet) association by network flow optimization for long-term multi-person tracking. Different from other network flow formulation, each node in our network represents a tracklet, and each edge represents the likelihood of neighboring tracklets belonging to the same trajectory as measured by our proposed affinity score. In our method, target-specific similarity metrics are learned leading to the appearance-based models used in the reidentification. Trajectory-based tracklets are refined by the learned metrics to account for appearance consistency and to identify reliable tracklets. The metrics are then re-learned using reliable tracklets for computing tracklet affinity scores. Long-term tracjectories are then obtained by network flow optimization. Occlusions and missed detections are handled by a trajectory completion step. Our method is effective for long-term tracking even when the targets are spatially close or completely occluded by others. We validate our proposed framework on several public datasets and show that it outperforms several state of art methods.
Similar papers:
  • Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf] - Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan
  • An Online Learned Elementary Grouping Model for Multi-target Tracking [pdf] - Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
#368 - Piecewise Planar and Compact Floorplan Reconstruction from Images [pdf]
Ricardo Cabral, Yasutaka Furukawa

Abstract: This paper presents a system that automatically reconstructs piecewise planar and compact floorplans from panorama images, which are then converted to high quality texture-mapped models for free-viewpoint scene visualization. There are two main challenges in image-based floorplan reconstruction. The first challenge is the lack of 3D information that can be extracted from images through Structure from Motion and Multi-View Stereo, since indoor scenes abound with non-diffuse and homogeneous surfaces plus clutter. The second challenge is the need of a sophisticated regularization technique that enforces piecewise planarity to suppress clutter and yields high quality texture mapped models. Our technical contributions are twofold. First, we propose a novel structure classification technique to classify each pixel to three structure regions, which provides 3D cues even from a single image. Second, we cast floorplan reconstruction as a shortest path problem on a specially crafted graph, which enables us to enforce piecewise planarity. Besides producing compact piecewise planar models, this formulation allows us to directly control the output complexity (i.e., the number of vertices). We evaluate our system on a number of real businesses, and show that our texture mapped mesh models provide compelling free-viewpoint visualization experiences, when compared against the state-of-the-art and ground truth.
Similar papers:
  • Incorporating Scene Context and Object Layout into Appearance Modeling [pdf] - Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi
  • DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf] - Chen Sun, Ram Nevatia
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
#372 - Persistent Tracking for Wide Area Aerial Surveillance [pdf]
Jan Prokaj, Gerard Medioni

Abstract: Persistent surveillance of large geographic areas from unmanned aerial vehicles allows us to learn much about the daily activities in the region of interest. Nearly all of the approaches addressing tracking in this imagery are detection based and rely on background subtraction or frame differencing to provide detections. This, however, makes it difficult to track targets once they slow down or stop, which is not acceptable for persistent tracking, our goal. We present a multiple target tracking approach that does not exclusively rely on background subtraction and is better able to track targets through stops. It accomplishes this by effectively running two trackers in parallel: one based on detections from background subtraction providing target initialization and reacquisition, and one based on a target state regressor providing frame to frame tracking. We evaluated the proposed approach on a long sequence from a wide area aerial imagery dataset, and the results show improved object detection rates and id-switch rates with limited increases in false alarms compared to the competition.
Similar papers:
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
#375 - Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf]
Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler

Abstract: Current systems for scene understanding typically represent objects as 2D or 3D bounding boxes. While these representations have proven robust in a variety of applications, they provide only coarse approximations to the true 2D and 3D extent of objects. As a result, object-object interactions, such as occlusions or ground-plane contact, can be represented only superficially. In this paper, we approach the problem of scene understanding from the perspective of 3D shape modeling, and design a 3D scene representation that reasons jointly about the 3D shape of multiple objects. This representation allows to express 3D geometry and occlusion on the fine detail level of individual vertices of 3D wireframe models, and makes it possible to treat dependencies between objects, such as occlusion reasoning, in a deterministic way. In our experiments, we demonstrate the benefit of jointly estimating the 3D shape of multiple objects in a scene over working with coarse boxes, on the recently proposed KITTI dataset of realistic street scenes.
Similar papers:
  • Towards Multi-view and Partially-occluded Face Alignment [pdf] - Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
  • A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf] - Menglong Yang, Yiguang Liu, Stan Li
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
#378 - Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf]
Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura

Abstract: We present an accurate and efficient stereo matching method using locally shared labels, a new labeling scheme that enables spatial propagation in MRF inference using graph cuts. They give each pixel and region a set of candidate disparity labels, which are randomly initialized, spatially propagated, and refined for continuous disparity estimation. We cast the selection and propagation of locally-defined disparity labels as fusion-based energy minimization. The joint use of graph cuts and locally shared labels has advantages over previous approaches based on fusion moves or belief propagation; it produces submodular moves deriving a subproblem optimality; enables powerful randomized search; helps to find good smooth, locally planar disparity maps, which are reasonable for natural scenes; allows parallel computation of both unary and pairwise costs. Our method is evaluated using the Middlebury stereo benchmark and achieves first place in sub-pixel accuracy.
Similar papers:
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
#382 - Two-View Camera Calibration for Multi-Layer Flat Refractive Interface [pdf]
Xida Chen, Yee Hong Yang

Abstract: In this paper, we present a novel refractive calibration method for an underwater stereo camera system where both cameras are looking through multiple parallel flat refractive interfaces. At the heart of our method is an important finding that the thickness of the interface can be estimated from a set of pixel correspondences in the stereo images when the refractive axis is given. To our best knowledge, such a finding has not been studied or reported. Moreover, by exploring the search space for the refractive axis and using reprojection error as a measure, both the refractive axis and the thickness of the interface can be recovered simultaneously. Our method does not require any calibration target such as a checkerboard pattern which may be difficult to manipulate when the cameras are deployed deep undersea. The implementation of our method is simple. In particular, it only requires solving a set of linear equations of the form $Ax = b$ and applies sparse bundle adjustment to refine the initial estimated results. Extensive experiments have been carried out which include simulations with and without outliers to verify the correctness of our method as well as to test its robustness to noise and outliers. The results of real experiments are also provided. The accuracy of our results is comparable to that of a state-of-the-art method that requires known 3D geometry of a scene.
Similar papers:
  • Simultaneous Localization and Calibration [pdf] - Qian-Yi Zhou, Vladlen Koltun
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera [pdf] - Avinash Kumar, Narendra Ahuja
#390 - Deblurring Low-light Images with Light Streaks [pdf]
Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang

Abstract: Images taken in low-light conditions with handheld cameras are often blurry due to the required long exposure time. Although significant progress has been made recently on image deblurring, state-of-the-art approaches often fail on low-light images, as these images do not contain sufficient salient features that deblurring methods rely on. On the other hand, light streaks are common phenomenons in low-light images that contain rich blur information, but have not been extensively explored in previous approaches. In this work, we propose a new method that utilizes light streaks to help deblur low-light images. Our approach first automatically detects useful light streaks in the input image, and then poses them as constraints for estimating the blur kernel in an optimization framework. Experimental results show that by explicitly modeling light streaks in the deblur process, our approach could obtain good results on challenging real-world examples that no other methods could achieve before.
Similar papers:
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Discriminative Blur Detection Features [pdf] - Jianping Shi, Li Xu, Jiaya Jia
  • Aliasing Detection and Reduction in Plenoptic Imaging [pdf] - Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
#397 - Locally Optimized Product Quantization [pdf]
Yannis Kalantidis, Yannis Avrithis

Abstract: We present a simple vector quantizer that combines low distortion with fast search and apply it to approximate nearest neighbor (ANN) search in high dimensional spaces. Leveraging the very same data structure that is used to provide non-exhaustive search, i.e., inverted lists or a multi-index, the idea is to locally optimize an individual product quantizer (PQ) per cell and use it to encode residuals. Local optimization is over rotation and space decomposition; interestingly, we apply a parametric solution that assumes a normal distribution and is extremely fast to train. With a reasonable space and time overhead that is constant in the data size, we set a new state-of-the-art on several public datasets, including a billion-scale one.
Similar papers:
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
  • Locally Linear Hashing for Extracting Non-Linear Manifolds [pdf] - Go Irie, Zhenguo Li, Xiao-Ming Wu, Shi-Fu Chang
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
#405 - Generalized Nonconvex Nonsmooth Low-Rank Minimization [pdf]
Canyi Lu, Shuicheng Yan, Zhouchen Lin

Abstract: As surrogate functions of $L_0$-norm, many nonconvex penalty functions have been proposed to enhance the sparse vector recovery. It is easy to extend these nonconvex penalty functions on singular values of a matrix to enhance low-rank matrix recovery. However, different from convex optimization, solving the nonconvex low-rank minimization problem is much more challenging than the nonconvex sparse minimization problem. We observe that all the existing nonconvex penalty functions are concave and monotonically increasing on $[0,\infty)$. Thus their gradients (or supergradient at the nonsmooth point) are decreasing functions. Based on this property, we propose an Iteratively Reweighted Nuclear Norm (IRNN) algorithm to solve the nonconvex nonsmooth low-rank minimization problem. IRNN iteratively solves a Weighted Singular Value Thresholding (WSVT) problem. By setting the weight vector as the gradient of the concave penalty function, the WSVT problem has a closed form solution, whose computational cost is the same as Singular Value Thresholding (SVT). In theory, we prove that IRNN decreases the objective function value monotonically, and any limit point is a stationary point. Extensive experiments on both synthetic data and real images demonstrate that the proposed algorithm enhances the low-rank matrix recovery compared with state-of-the-art convex algorithms.
Similar papers:
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
  • Pseudoconvex Proximal Splitting for $L_\infty$ Problems in Multiview Geometry [pdf] - Anders Eriksson
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
#406 - Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf]
Longyin Wen, Wenbo Li, Zhen Lei, Stan Li

Abstract: Multi-target tracking is an important but challenging task in computer vision field. Most of the previous data as- sociation based methods merely consider the relationships between detections in the limited local temporal domain, leading to their difficulties in handling long-term occlusion and distinguishing the spatially close targets with similar appearance in crowded scenes. In this paper, we propose a novel data association approach based on the hierarchical relation hypergraph, which formulates the tracking task as a hierarchical dense neighborhoods searching problem on the dynamically constructed affinity graph. The relationships between different detections across the spatio-temporal do- main are considered in a high-order way, which makes the tracker robust to the spatially close targets with similar ap- pearance. Meanwhile, the hierarchical design of the opti- mization process fuels our tracker to the long-term occlu- sion with more robustness. Extensive experiments on vari- ous challenging datasets (i.e. PETS2009, ParkingLot), in- cluding both low-density and high-density sequences, vali- date the superiority of our tracker over other state-of-the- art methods.
Similar papers:
  • Persistent Tracking for Wide Area Aerial Surveillance [pdf] - Jan Prokaj, Gerard Medioni
  • An Online Learned Elementary Grouping Model for Multi-target Tracking [pdf] - Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu
  • Tracklet Association with Online Reidentification in Network Flow Optimaiztion for Long-term Multi-Person Tracking [pdf] - BING WANG, Gang Wang, Kap Luk Chan, LI WANG
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
#407 - Discriminative Deep Metric Learning for Face Verification in the Wild [pdf]
Junlin Hu, Jiwen Lu, Yap-Peng Tan

Abstract: This paper presents a new discriminative deep metric learning (DDML) method for face verification in the wild. Different from existing metric learning-based face verification methods which aim to learn a Mahalanobis distance metric to maximize the inter-class variations and minimize the intra-class variations, simultaneously, the proposed DDML trains a deep neural network which learns a pair of hierarchical nonlinear transformations to project face pairs into two feature subspaces, one subspace for each sample in the pair, under which the distance of each positive face pair is less than a smaller threshold and that of each negative pair is higher than a larger threshold, respectively, so that discriminative information can be exploited in the deep network. Our method achieves the state-of-the-art face verification performance on the widely used LFW and YouTube Faces (YTF) datasets.
Similar papers:
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • Deep Fisher Kernels [pdf] - Mayu Sakurada, Vladyslav Sydorov , Christoph Lampert
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
#411 - Ground Plane Estimation using a Hidden Markov Model [pdf]
Ralf Dragon, Luc Van Gool

Abstract: We focus the problem of estimating the ground plane orientation and location in monocular video sequence from a moving observer. Our only assumptions are that that the 3D ego motion t and the ground plane normal n are orthogonal, and that n and t are smooth over time. We formulate the problem as a state-continuous Hidden Markov Model (HMM) where the hidden state contains t and n and may be estimated by sampling and decomposing homographies. We show that using blocked Gibbs sampling, we can infer the hidden state with high robustness towards outliers, drifting trajectories, rolling shutter and an imprecise intrinsic calibration. Since our approach does not need any initial orientation prior, it works for arbitrary camera orientations.
Similar papers:
  • Efficient pruning LMI conditions for Branch-and-Prune Rank and Chirallity-Constrained Estimation of the Dual Absolute Quadric [pdf] - Adlane Habed, Danda Pani Paudel, Cdric Demonceaux, David Fofi
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
  • High Accuracy Monocular Localization for Autonomous Driving Using Adaptive Ground Estimation [pdf] - Shiyu Song, Manmohan Chandraker
#414 - Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf]
Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger

Abstract: This paper proposes a method for estimating the 3D body shape of a person with robustness to clothing. We formulate the problem as optimization over the manifold of valid depth maps of body shapes learned from synthetic training data. The manifold itself is represented using a novel data structure, a Multi-Resolution Manifold Forest (MRMF), which contains vertical edges between tree nodes as well as horizontal edges between nodes that correspond to overlapping partitions. We show that this data structure allows both efficient localization and navigation on the manifold, for on-the-fly building of local linear models (manifold charting). We demonstrate shape estimation results on clothed users, showing significant improvement in accuracy over global shape models and models using pre-computed clusters. We further compare the MRMF with alternative manifold charting methods on a public dataset for reconstructing 3-D motion from noisy 2-D marker observations, obtaining state-of-the-art results.
Similar papers:
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Human Pose Estimation: New Benchmark and State of the Art Analysis [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
  • Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts [pdf] - Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Nam-Gyu Cho, Sanja Fidler, Raquel Urtasun, Alan Yuille
  • Non-rigid Segmentation using Sparse Low Dimensional Manifolds and Deep Belief Networks [pdf] - Jacinto Nascimento, Gustavo Carneiro
#416 - Separation of Line Drawings Based on Split Faces for 3D Object Reconstruction [pdf]
Changqing ZOU

Abstract: Reconstructing 3D objects from single line drawings is of- ten desirable in computer vision and graphics applications. If the line drawing of a complex 3D object is decomposed into primitives of simple shape, the object can be easily re- constructed. We propose an effective method to conduct the line drawing separation and turn a complex line drawing into parametric 3D models. This is achieved by recursively sep- arating the line drawing using two types of split faces. Our experiments show that the proposed separation method can generate more basic and simple line drawings, and its com- bination with the example-based reconstruction can robustly recover more complex parametric 3D objects than previous methods.
Similar papers:
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Dual-Space Decomposition of 2D Complex Shapes [pdf] - Guilin Liu, Zhonghua Xi, Jyh-Ming Lien
  • Fast and robust identification of persistent homotopy types of noisy images [pdf] - Vitaliy Kurlin
#421 - Product Sparse Coding [pdf]
Tiezheng Ge, Kaiming He, Jian Sun

Abstract: Sparse coding is a widely involved technique in computer vision. However, the expensive computational cost can hamper its applications, typically when the codebook size must be limited due to concerns on running time. In this paper, we study a special case of sparse coding in which the codebook is a Cartesian product of two subcodebooks. We present algorithms to decompose this sparse coding problem into smaller subproblems, which can be separately solved. Our solution, named as Product Sparse Coding (PSC), reduces the time complexity from O(K) to O(\sqrt{K}) in the codebook size $K$. In practice, this can be 20-100x faster than standard sparse coding. In experiments we demonstrate the efficiency and quality of this method on the applications of image classification and
Similar papers:
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • Locally Optimized Product Quantization [pdf] - Yannis Kalantidis, Yannis Avrithis
  • Additive Quantization for Extreme Vector Compression [pdf] - Artem Babenko, Victor Lempitsky
#423 - Robust and Efficient Full-Angle Quaternions for Matching Arrays of 3D Rotations [pdf]
Stephan Liwicki, Stefanos Zafeiriou, Maja Pantic, Bjrn Stenger, Minh-Tri Pham

Abstract: Matching sets of features often involve dealing with corrupted data. In this paper, we introduce a new distance for robustly matching arrays of 3D rotations. We show that the distance leads to a new and efficient representation for 3D rotations which we coin full-angle quaternion (FAQ). We apply the distance and the representation to 3D shape recognition and 2D object tracking from color video. In the former application, we introduce efficient hashing of scaling and translation concurrently. In the latter application, we utilize subspace learning with the proposed FAQ representation. In both cases, our approach outperforms state of-the-art approaches.
Similar papers:
  • Is Rotation a Nuisance in Shape Recognition? [pdf] - Qiuhong Ke, Yi Li
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Fast Rotation Search with Stereographic Projections for 3D Registration [pdf] - Alvaro Parra Bustos, Tat-Jun Chin, David Suter
#427 - Look at the Driver, Look at the Road: No Distraction! No Accident! [pdf]
Mahdi Rezaei, Reinhard Klette

Abstract: The paper proposes an advanced driver-assistance system that correlates driver's attention to the road and traffic conditions by analyzing both simultaneously. In particular, we aim at the prevention of rear-end crashes due to driver fatigue or distraction. We propose an asymmetric appearance-modeling technique and 2D-to-3D registration to define the driver's head pose (in 6 degrees of freedom), yawing detection, and head-nodding detection. Global Haar (GHaar) classifiers are used for vehicle detection. Using a fuzzy-logic inference system, we develop an integrated solution to cover all of the above subjects. We demonstrate real-time performance of the proposed method for real-world scenarios.
Similar papers:
  • High Accuracy Monocular Localization for Autonomous Driving Using Adaptive Ground Estimation [pdf] - Shiyu Song, Manmohan Chandraker
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Learning-by-Synthesis for Appearance-based 3D Gaze Estimation [pdf] - Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato
  • Head Pose Estimation Based on Multivariate Label Distribution [pdf] - Xin Geng, Yu Xia
#430 - Efficient Boosted Exemplar-based Face Detection [pdf]
Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua

Abstract: Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved. Challenging conditions, such as extreme pose, lighting, and occlusion, have historically hampered traditional, model-based methods. In contrast, exemplar-based face detection has been shown to be effective, even under these challenging conditions, primarily because a large exemplar database is leveraged to cover all possible visual variations. However, relying heavily on a large exemplar database to deal with the face appearance variations makes the detector impractical due to the high space and time complexity. We construct an efficient boosted exemplar-based face detector which overcomes the defect of the previous work by being faster, more memory efficient, and more accurate. In our method, exemplars as weak detectors are discriminatively trained and selectively assembled in the boosting framework which largely reduces the number of required exemplars. Notably, we propose to include non-face images as negative exemplars to actively suppress false detections to further improve the detection accuracy. We verify our approach over two public face detection benchmarks and one personal photo album, and achieve significant improvement over the state-of-the-art algorithms in terms of both accuracy and efficiency.
Similar papers:
  • From Categories to Individuals in Real Time --- A Unified Boosting Approach [pdf] - David Hall, Pietro Perona
  • Event Detection using Multi-Level Relevance Labels and Multiple Features [pdf] - Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander Hauptmann
  • Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization [pdf] - Brandon Smith, Jonathan Brandt, Zhe Lin, Li Zhang
  • Semantic Object Selection [pdf] - Ejaz Ahmed, Scott Cohen, Brian Price
#436 - Multiscale Combinatorial Grouping [pdf]
Pablo Arbelaez, Jordi Pont-Tuset, Jon Barron, Ferran Marques, Jitendra Malik

Abstract: We propose a unified approach for bottom-up hierarchical image segmentation and object candidate generation for recognition, called Multiscale Combinatorial Grouping (MCG). For this purpose, we first develop a fast normalized cuts algorithm. We then propose a high-performance hierarchical segmenter that makes effective use of multiscale information and diversified inputs. Finally, we propose a grouping strategy that combines our multiscale regions into highly-accurate object candidates by exploring efficiently their combinatorial space. We conduct extensive experiments on both the BSDS500 and on the PASCAL 2012 segmentation datasets, showing that MCG produces state-of-the-art contours, regions and object candidates.
Similar papers:
  • Learning to Group Objects [pdf] - Victoria Yanulevskaya, Jasper Uijlings, Nicu Sebe
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
  • Manifold Based Dynamic Texture Synthesis from Extremely Few Samples [pdf] - Hongteng Xu, Hongyuan Zha, Mark Davenport
#439 - Non-Parametric Bayesian Constrained Local Models [pdf]
Pedro Martins, Rui Caseiro, Jorge Batista

Abstract: This work presents a novel non-parametric Bayesian formulation for aligning faces in unseen images. Popular approaches, such as the Constrained Local Models (CLM) or the Active Shape Models (ASM), perform facial alignment through a local search, combining an ensemble of detectors with a global optimization strategy that constraints the facial feature points to be within the subspace spanned by a Point Distribution Model (PDM). The global optimization can be posed as a Bayesian inference problem, looking to maximize the posterior distribution of the PDM parameters in a maximum a posteriori (MAP) sense. Previous approaches rely exclusively on Gaussian inference techniques, i.e. both the likelihood (detectors responses) and the prior (PDM) are Gaussians, resulting in a posterior which is also Gaussian, whereas in this work the posterior distribution is modeled as being non-parametric by a Kernel Density Estimator (KDE). We show that this posterior distribution can be efficiently inferred using Sequential Monte Carlo methods, in particular using a Regularized Particle Filter (RPF). The technique is evaluated in detail on several standard datasets (IMM, BioID, XM2VTS, LFW and FGNET Talking Face) and compared against state-of-the-art CLM methods. We demonstrate that inferring the PDM parameters non-parametrically significantly increase the face alignment performance.
Similar papers:
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
  • Region-based particle filter for video object segmentation [pdf] - David Varas, Ferran Marques
  • Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization [pdf] - Brandon Smith, Jonathan Brandt, Zhe Lin, Li Zhang
  • Gauss-Newton Constrained Local Models [pdf] - GEORGIOS TZIMIROPOULOS, Maja Pantic
#441 - Feature-Independent Action Spotting Without Human Localization, Segmentation or Frame-wise Tracking [pdf]
Chuan Sun, Hassan Foroosh

Abstract: In this paper, we propose an unsupervised framework for action spotting in videos that does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.). Furthermore, our solution requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (1) to reduce search space, we filter the dense trajectory cloud extracted from the target video; (2) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Array (HPVA) to validate the robustness of our framework under heavily perturbed situations.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Efficient feature extraction, encoding and classification\\ for action recognition [pdf] - Vadim Kantorov, Ivan Laptev
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#446 - A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions [pdf]
Simon Jones, Ling Shao

Abstract: Graph-based methods are a useful class of methods for improving the performance of unsupervised and semi-supervised machine learning tasks, such as clustering or information retrieval. However, the performance of such methods is highly dependent on how well the affinity graph reflects the original data structure. We propose that multimedia such as images or videos consist of multiple separate components, and therefore more than one graph is required to fully capture the relationship between them. Accordingly, we present a new spectral method -- the Feature Grouped Spectral Multigraph (FGSM) -- which comprises the following steps. First, mutually independent subsets of the original feature space are generated through feature clustering. Secondly, a separate graph is generated from each feature subset. Finally, a spectral embedding is calculated on each graph, and the embeddings are scaled/aggregated into a single representation. Using this representation, a variety of experiments are performed on three learning tasks -- clustering, retrieval and recognition -- on human action datasets, demonstrating considerably better performance than the state-of-the-art.
Similar papers:
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf] - Waqas Sultani, Imran Saleemi
  • Actionness Ranking with Lattice Conditional Ordinal Random Fields [pdf] - Wei Chen, Caimgin Xiong, Jason Corso
#447 - Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf]
Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian

Abstract: In this paper, we address the problem of recognizing images with weakly annotated text tags. Most previous work either cannot be applied to the scenarios where the tags are loosely related to the images; or simply take a pre-fusion at the feature level or a post-fusion at the decision level to combine the visual and textual content. Instead, we first encode the text tags as the relations among the images, and then propose a semi-supervised relational topic model (ss-RTM) to explicitly model the image content and their relations. In such way, we can efficiently leverage the loosely related tags, and build an intermediate level representation for a collection of weakly annotated images. The intermediate level representation can be regarded as a mid-level fusion of the visual and textual content, which is able to explicitly model their intrinsic relationships. Moreover, image category labels are also modeled in the ss-RTM, and recognition can be conducted without training an additional discriminative classifier. Our extensive experiments on social multimedia datasets (images+tags) demonstrated the advantages of the proposed model.
Similar papers:
  • Multi-modal Learning in Loosely-organized Web Images [pdf] - Kun Duan, David Crandall, Dhruv Batra
  • Topic Modeling of Multimodal Data: an Autoregressive Approach [pdf] - Yin Zheng, Yu-Jin Zhang, Hugo Larochelle
  • Tell Me What You See and I will Show You Where It Is [pdf] - Jia Xu, Alexander Schwing, Raquel Urtasun
  • NMF-KNN: Image Annotation using Weighted Multi-view Non-Negative Matrix Factorization [pdf] - Mahdi Kalayeh, Haroon Idrees, Mubarak Shah
#450 - Learning-Based Atlas Selection for Multiple-Atlas Segmentation [pdf]
Gerard Sanroma, Guorong Wu, Yaozong Gao, Dinggang Shen

Abstract: Recently, multi-atlas segmentation (MAS) has achieved a great success in the medical imaging area. The key assumption of MAS is that multiple atlases encompass richer anatomical variability than a single atlas. Therefore, we can label the target image more accurately by mapping the label information from the appropriate atlas images that have the most similar structures. The problem of atlas selection, however, still remains unexplored. Current state-of-the-art MAS methods rely on image similarity to select a set of atlases. Unfortunately, this heuristic criterion is not necessarily related to segmentation performance and, thus may undermine segmentation results. To solve this simple but critical problem, we propose a learning-based atlas selection method to pick up the best atlases that would eventually lead to more accurate image segmentation. Our idea is to learn the relationship between the pairwise appearance of observed instances (a pair of atlas and target images) and their final labeling performance (in terms of Dice ratio). In this way, we can select the best atlases according to their expected labeling accuracy. It is worth noting that our atlas selection method is general enough to be integrated with existing MAS methods. As is shown in the experiments, we achieve significant improvement after we integrate our method with 3 widely used MAS methods on ADNI and LONI LPBA40 datasets.
Similar papers:
  • Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification [pdf] - Luping Zhou, Lei Wang, Philip Ogunbona
  • Co-Occurrence Statistics for Zero-Shot Classification [pdf] - Thomas Mensink, Cees Snoek, Efstratios Gavves
  • Robust 3D Tracking with Descriptor Fields [pdf] - Alberto Crivellaro, Vincent Lepetit
  • Patch-based Evaluation of Image Segmentation [pdf] - Christian Ledig, Wenzhe Shi, Wenjia Bai, Daniel Rueckert
#462 - Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf]
George Papandreou, Liang-Chieh Chen, Alan Yuille

Abstract: The goal of this paper is to question the necessity of features like SIFT in categorical visual recognition tasks. As an alternative, we develop a generative model for the raw intensity of image patches and show that it can support image classification performance on par with optimized SIFT-based techniques in a bag-of-visual-words setting. Key ingredient of the proposed model is a compact dictionary of mini-epitomes, learned in an unsupervised fashion on a large collection of images. The use of epitomes allows us to explicitly account for photometric and position variability in image appearance. We show that this flexibility considerably increases the capacity of the dictionary to accurately approximate the appearance of image patches and support recognition tasks. For image classification, we develop histogram-based image encoding methods tailored to the epitomic representation, as well as an ``epitomic footprint'' encoding which is easy to visualize and highlights the generative nature of our model. We discuss in detail computational aspects and develop efficient algorithms to make the model scalable to large tasks. The proposed techniques are evaluated with experiments on the challenging PASCAL VOC-07 image classification benchmark.
Similar papers:
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Quasi Real-Time Summarization for Consumer Videos [pdf] - Bin Zhao, Eric Xing
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
#464 - Model Transport: Towards Scalable Transfer Learning on Manifolds [pdf]
Oren Freifeld, Soren Hauberg, Michael Black

Abstract: We consider the intersection of two research fields: \emph{transfer learning} and \emph{statistics on manifolds}. In particular, we consider, for manifold-valued data, transfer-learning of tangent-space models such as Gaussians distributions, PCA, regression, or classifiers. Though one would hope to simply use ordinary \Rn-transfer-learning ideas, the manifold structure prevents it. We overcome this by basing our method on (inner-product-preserving) \emph{parallel transport}, a well-known tool used in other problems of statistics on manifolds in computer vision. At first, this straightforward idea seems to suffer from an obvious shortcoming: Transporting large datasets is prohibitively expensive, hindering the scalability of the approach. Fortunately, with our approach, \emph{we never transport data}. Rather, we show how the \emph{statistical models} themselves can be transported, and prove that for the above tangent-space models the transport ``commutes'' with learning. Consequently, our compact framework, applicable to a large class of manifolds, is not restricted by the size of either the training or test sets. We demonstrate the approach by transferring PCA and regression models of real-world data involving 3D shapes and image descriptors.
Similar papers:
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • On the quotient representation for the essential manifold [pdf] - Roberto Tron, Kostas Daniilidis
  • Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI [pdf] - YUANXIANG WANG, Hesamoddin Salehian, Guang Cheng, Baba Vemuri
  • Covariance descriptors for 3D shape matching and retrieval [pdf] - Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
#471 - Image Preconditioning: Balancing Contrast and Ringing [pdf]
Yu Ji, Jinwei Ye, Sing Bing Kang, Jingyi Yu

Abstract: The goal of image preconditioning is to process an image such that after being convolved with a known kernel, will appear close to the sharp reference image. In a practical setting, the preconditioned image has significantly higher dynamic range than the latent image. As a result, some form of tone mapping is needed. In this paper, we show how global tone mapping functions affect contrast and ringing in image preconditioning. In particular, we show that linear tone mapping eliminates ringing but incurs severe contrast loss, while non-linear tone mapping functions such as Gamma curves slightly enhances contrast but introduces ringing. To enable quantitative analysis, we design new metrics to measure the contrast of an image with ringing. Specifically, we set out to find its "equivalent ringing-free" image that matches its intensity histogram and uses its contrast as the measure. We illustrate our approach on projector defocus compensation and visual acuity enhancement. Compared with the state-of-the-art, our approach significantly improves the contrast. We believe our technique is the first to analytically trade-off between contrast and ringing.
Similar papers:
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Deblurring Text Images via L0-Regularized Intensity and Gradient Prior [pdf] - Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang
#475 - Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces [pdf]
Yingying Zhu, Dong Huang, Fernando de la Torre, Simon Lucey

Abstract: With the increasing need of human action/behaviour analysis community for recovering 3D complex nonrigid motion ( e. g. multiple actions and human-object/humanhuman interaction) from 2D projections in image sequences, the existing approaches for Non-Rigid Structure from Motion (NRSfM) have met the grand challenge on 3D complex nonrigid motion reconstruction. The standard NRSfM models nonrigid motion by a single low rank subspace [7], while the literature shows that the complex nonrigid motion (multiple human actions) stem from a union of subspaces [11, 6, 13]. Solving complex 3D motion in a single-subspace, one can only approximate the union of subspaces by its convex envelope, therefore, produce random combinations of the original 3D actions within the envelope. An ideal solution is to cluster the 3D motion into local motion subspaces, and apply the standard NRSfM in each subspace. However, 3D motion is not available in the first place, and clustering 2D projections does not produce 3D subspaces due to the projection ambiguities and relative camera motion. To address this dilemma, we propose to directly solve for complex 3D nonrigid motion resides in a union of subspaces. By simultaneously solving for NRSfM and subspace clustering, our approach registers the 2D observations in a union of subspaces automatically grouped by the reconstructed 3D motion. Experiments on both synthetic and real videos illustrate the benefits of our approach for the comple
Similar papers:
  • Better Feature Tracking Through Subspace Constraints [pdf] - Bryan Poling, Gilad Lerman, Arthur Szlam
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
  • SCAMS: Simultaneous Clustering and Model Selection [pdf] - Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
  • Subspace Clustering for Sequential Data [pdf] - Stephen Tierney, Junbin Gao, Yi Guo
#477 - Recognizing RGB Images by Learning from RGB-D Data [pdf]
Lin Chen, Wen Li, Dong Xu

Abstract: In this work, we propose a new framework for recognizing RGB images captured by the conventional cameras by leveraging a set of labeled RGB-D data, in which the depth features can be additionally extracted from the depth images. We formulate this task as a new unsupervised domain adaptation (UDA) problem, in which we aim to take advantage of the additional depth features in the source domain and also cope with the data distribution mismatch between the source and target domains. To effectively utilize the additional depth features, we seek two optimal projection matrices to map the samples from both domains into a common space by preserving as much as possible correlations between the visual features and depth features. To effectively employ the training samples from the source domain for learning the target classifier, we reduce the data distribution mismatch by minimizing the Maximum Mean Discrepancy (MMD) criterion, which compares the data distributions for each type of feature in the common space. Based on the above two motivations, we propose a new SVM based objective function to simultaneously learn the two projection matrices and the optimal target classifier in order to well separate the source samples from different classes when using each type of feature in the common space. An efficient alternating optimization algorithm is developed to solve our new objective function. Comprehensive experiments for object recognition and gender recognition demonstrate the effectiv
Similar papers:
  • Time Machine: Continuous Manifold Based Adaptation for Evolving Visual Domains [pdf] - Judy Hoffman, Trevor Darrell, Kate Saenko
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
#481 - Cross-view Action Modeling, Learning and Recognition [pdf]
Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu

Abstract: Existing methods on video-based action recognition are generally view-dependent, i.e., performing recognition from the same views seen in the training data. We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i.e., the recognition is performed on the video from an unknown and unseen view. As a compositional model, MST-AOG compactly represents the hierarchical combinatorial structures of cross-view actions by explicitly modeling the geometry, appearance and motion variations. This paper proposes effective methods to learn the structure and parameters of MST-AOG. The inference based on MST-AOG enables action recognition from novel views. The training of MST-AOG takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, which is error-prone and time-consuming, but the recognition does not need 3D information and is based on 2D video input. A new Multi-view Action3D dataset has been created and will be released. Extensive experiments have demonstrated that this new action representation significantly improves the accuracy and robustness for cross-view action recognition on 2D videos.
Similar papers:
  • Leveraging Hierarchical Parametric Network for Skeletal Joints Action Segmentation and Recognition [pdf] - Di Wu, Ling Shao
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Depth and Skeleton Associated Action Recognition without Online Accessible RGB-D Cameras [pdf] - Yen-Yu Lin, Ju-Hsuan Hua, Nick Tang, Min-Hung Chen, Hong-Yuan Liao
#482 - Time-Mapping Using Space-Time Saliency [pdf]
Feng Zhou, Sing Bing Kang, Michael Cohen

Abstract: We describe a new approach for generating regular-speed, low-frame-rate (LFR) video from a high-frame-rate (HFR) input while preserving the important moments in the original. We call this {\em time-mapping}, a time-based analogy to high dynamic range to low dynamic range spatial tone-mapping. Our approach makes these contributions: (1) a robust space-time saliency method for evaluating visual importance, (2) a re-timing technique to temporally resample based on frame importance, and (3) temporal filters to enhance the rendering of salient motion. Results of our space-time saliency method on a benchmark dataset show it is state-of-the-art. In addition, the benefits of our approach to HFR-to-LFR time-mapping over more direct methods are demonstrated in a user study.
Similar papers:
  • A Reverse Hierarchy Model for Predicting Eye Fixations [pdf] - Tianlin Shi, Xiaolin Hu, Ming Liang
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#485 - Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf]
Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang

Abstract: We develop a sequential optimal sampling framework for stereo disparity estimation by adapting the Sequential Probability Ratio Test (SPRT) model. The proposed framework operates over local image neighborhoods by iteratively estimating single pixel disparity values until sufficient evidence has been gathered to either validate or contradict the current hypothesis regarding local scene structure. The output of our sampling within a given region is a set of sampled pixel positions along with a robust and compact estimate of the set of disparities contained within that region. The attainment of such disparity set enables the effective reduction of the disparity search space for all remaining non-sampled pixels. Accordingly, our sampling framework is a general pre-processing mechanism aimed at reducing computational complexity of disparity search algorithms. We build upon this framework to propose an efficient plane propagation mechanism that leverages the pre-computed sampling positions and the local structure model described by the local disparity set. Our experiments demonstrate the effectiveness and efficiency of the proposed approach when compared to recent state of the art.
Similar papers:
  • Cross-Scale Cost Aggregation for Stereo Matching [pdf] - Kang Zhang, Yuqiang Fang, Dongbo Min, Lifeng Sun, Shiqiang Yang, Shuicheng Yan, Qi Tian
  • Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf] - Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
#486 - Empirical Minimum Bayes Risk Prediction: How to extract an extra 3% performance from vision models with just two more parameters [pdf]
Vittal Premachandran, Daniel Tarlow, Dhruv Batra

Abstract: When building vision systems to predict structured objects like image segmentations or human pose, we are often concerned with performing well under a task-specific evaluation measure. An ongoing research challenge is how to make predictions so as to maximize performance on these evaluation measures. In this work, we present a simple meta-algorithm that is surprisingly effective. The algorithm takes as input a model that would normally be the final product, and learns two parameters so as to optimize performance on the task-specific measure. We demonstrate the approach in several domains, taking existing state-of-the-art algorithms and improving performance by up to 5%, simply with two extra parameters.
Similar papers:
  • Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf] - Joseph Tighe, Marc Niethammer, Svetlana Lazebnik
  • Predicting Object Dynamics in Scenes [pdf] - David Fouhey, Larry Zitnick
  • Using k-poselets for detecting people and localizing their keypoints [pdf] - Bharath Hariharan, Georgia Gkioxari, Ross Girshick, Jitendra Malik
  • Active Annotation Translation [pdf] - Steven Branson, Pietro Perona
#497 - Matrix-Similarity Based Loss Function and Feature Selection for Alzheimer's Disease Diagnosis [pdf]
Xiaofeng Zhu, Heung-Il Suk, Dinggang Shen

Abstract: Recent studies on AD and/or MCI diagnosis have shown that the tasks of identifying brain disease and predicting clinical scores are highly related to each other. Furthermore, it has been shown that feature selection with a manifold learning or a sparse model can handle the problems of high feature dimensionality and small sample size. However, the tasks of clinical score regression and clinical label classification were often conducted separately in the previous studies. Regarding the feature selection, to our best knowledge, most of the previous work considered a loss function defined as an element-wise difference between the target values and the predicted ones. In this paper, we consider the problems of joint regression and classification for AD/MCI diagnosis and propose a novel matrix-similarity based loss function that uses high-level information inherent in the target response matrix and imposes the information to be preserved in the predicted response matrix. The newly devised loss function is combined with a group lasso method for joint feature selection across tasks,i.e., prediction of clinical scores and a class label. In order to validate the effectiveness of the proposed method, we conducted experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and showed that the newly devised loss function helps enhance the performances of both clinical score prediction and disease status identification, outperforming the state-of-the-art methods.
Similar papers:
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
  • Incremental Face Alignment in the Wild [pdf] - Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
  • Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification [pdf] - Luping Zhou, Lei Wang, Philip Ogunbona
  • Learning Important Spatial Pooling Regions for Scene Classification [pdf] - DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia
#498 - Active Frame, Location, and Detector Selection for Automated and Manual Video Annotation [pdf]
Vasiliy Karasev, Avinash Ravichandran, Stefano Soatto

Abstract: We describe an information-driven active selection approach to determine which detectors to deploy at which location in which frame of a video shot to minimize semantic class label uncertainty at every pixel, with the smallest computational cost that ensures a given uncertainty bound. We show minimal performance reduction compared to a ``paragon'' algorithm running all detectors at all locations in all frames, at a small fraction of the computational cost. Our method can handle uncertainty in the labeling mechanism, so it can handle both ``oracles'' (manual annotation) or noisy detectors (automated annotation).
Similar papers:
  • Non-Parametric Bayesian Constrained Local Models [pdf] - Pedro Martins, Rui Caseiro, Jorge Batista
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Bi-label Propagation for Generic Multiple Object Tracking [pdf] - Wenhan Luo, Tae-Kyun Kim, Bjrn Stenger, Xiaowei Zhao, Roberto Cipolla
  • From Categories to Individuals in Real Time --- A Unified Boosting Approach [pdf] - David Hall, Pietro Perona
#508 - Saliency Optimization from Robust Background Detection [pdf]
Wangjiang Zhu, Shuang Liang, Yichen Wei, Jian Sun

Abstract: Recent progresses in salient object detection have exploited the boundary prior, or background information, to assist other saliency cues such as contrast and achieve state of the art results. However, their usage of boundary prior is still simple, fragile, and the integration with other cues is mostly heuristic. In this work, we present new methods to address these issues. Firstly, we propose a robust background measure, called \emph{boundary connectivity}. It characterizes the spatial layout of image regions with respect to image boundaries and is much robust. It has an intuitive geometrical interpretation and provides unique benefits that are absent in previous saliency measures. Secondly, we propose a principled optimization framework to integrate multiple low level cues, including our background measure, to obtain clean and uniform saliency maps. Our formulation is intuitive, efficient and obtains state of the art results on several benchmark datasets.
Similar papers:
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#514 - Robust Refinement of GPS-Tags Using RandomWalks with an Adaptive Damping Factor [pdf]
Amir Roshan Zamir

Abstract: The number of GPS-tagged images available on the web is increasing at a rapid rate. The majority of such location tags are specified by the users, either through manual tagging or localization-chips embedded in the cameras. However, a known issue with user shared images is the unreliability of such GPS-tags; in this paper, we propose a method for addressing this problem. We assume a large dataset of GPS-tagged images which includes an unknown subset with contaminated tags is available. We develop a robust method for identification and refinement of the subset with contaminated tags using the rest of the images in the dataset. In the proposed method, we form triplets of matching images and use them for estimating the location of the query image utilizing structure from motion. We generate a large number of such estimations, which include inaccurate ones due to the noisy GPS-tags in the dataset, and perform random walks on them in order to identify the subset with the maximal agreement. Finally, we refine the GSP-tag of the image utilizing the identified consistent subset using a weighted mean. We propose a new damping factor for random walks which adopts itself to various levels of noise in the input. We evaluated the proposed framework on a dataset of over 18k user-shared images; the experiments show it robustly and consistently improves the accuracy of GPS-tags under diverse scenarios.
Similar papers:
  • Similarity Comparisons for Interactive Fine-Grained Categorization [pdf] - Catherine Wah, Grant Van Horn, Steven Branson, Subhransu Maji, Pietro Perona, Serge Belongie
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • NMF-KNN: Image Annotation using Weighted Multi-view Non-Negative Matrix Factorization [pdf] - Mahdi Kalayeh, Haroon Idrees, Mubarak Shah
#515 - Color Transfer using Probabilistic Moving Least Squares [pdf]
Youngbae Hwang, Joon-Young Lee, In So Kweon, Seon Joo Kim

Abstract: This paper introduces a new color transfer method which is a process of transferring color of an image to match the color of another image of the same scene. The color of a scene may vary from image to image because the photographs are taken at different times, with different cameras, and under different camera settings. To solve for a full nonlinear and nonparametric color mapping in the 3D RGB color space, we propose a scattered point interpolation scheme using moving least squares and strengthen it with a probabilistic modeling of the color transfer in the 3D color space to deal with mis-alignments and noise. Experiments show the effectiveness of our method over previous color transfer works both quantitatively and qualitatively. In addition, our framework can be applied for various instances of color transfer such as transferring color between different camera models, camera settings, and illumination conditions, as well as for video color transfers.
Similar papers:
  • Inferring Analogous Attributes [pdf] - Chao-Yeh Chen, Kristen Grauman
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Nonparametric Part Transfer for Fine-grained Recognition [pdf] - Christoph Gring, Erik Rodner, Alexander Freytag, Joachim Denzler
  • Instance-weighted Transfer Learning of Active Appearance Models [pdf] - Daniel Haase, Erik Rodner, Joachim Denzler
#517 - Single-View 3D Scene Parsing by Attributed Grammar [pdf]
Xiaobai Liu, Yibiao Zhao, Song Chun Zhu

Abstract: In this paper, we present an attributed grammar for parsing man-made outdoor scenes into semantic surfaces, and recovering its 3D model simultaneously. The grammar takes superpixels as its terminal nodes and use five production rules to generate the scene into a hierarchical parse graph. Each graph node actually correlates with a surface or a composite of surfaces in the 3D world or the 2D image. They are described by attributes for the global scene model, e.g. focal length, vanishing points, or the surface properties, e.g. surface normal, contact line with other surfaces, and relative spatial location etc. Each production rule is associated with some equations that constraint the attributes of the parent nodes and those of their children nodes. Given an input image, our goal is to construct a hierarchical parse graph by recursively applying the five grammar rules while preserving the attributes constraints. We develop an effective top-down/bottom-up cluster sampling procedure which can explore this constrained space efficiently. We evaluate our method on both public benchmarks and newly built datasets, and achieve state-of-the-art performances in terms of layout estimation and region segmentation. We also demonstrate that our method is able to recover detailed 3D model with relaxed Manhattan structures which clearly advances the state-of-the-arts of single-view 3D reconstruction.
Similar papers:
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Unsupervised Learning for Graph Matching: An Attempt to Define and Extract Soft Attributed Patterns [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
#518 - Calibrating a non-isotropic near point light source using a plane [pdf]
Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

Abstract: We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene. We prove that if the radiant intensity distribution (RID) of a light source is radially symmetric with respect to its dominant direction, then the shading observed on a Lambertian scene plane is bilaterally symmetric with respect to a 2D line on the plane. The symmetry axis detected in an image provides a linear constraint for estimating the dominant light axis. The light position and RID parameters can then be estimated using a linear method as well. Specular highlights if available can also be used for light position estimation. We also extend our method to handle non-Lambertian surfaces which we model using biquadratic BRDFs. We have evaluated our method on synthetic data. Our experiments on real scenes show that our method works well in practice and enables light calibration without the need for specialized hardware.
Similar papers:
  • Saliency Detection on Light Fields [pdf] - Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, Jingyi Yu
  • Backscatter Compensated Photometric Stereo with 3 Sources [pdf] - Chourmouzios Tsiotsios, Maria Angelopoulou, Tae-Kyun Kim, Andrew Davison
  • Aliasing Detection and Reduction in Plenoptic Imaging [pdf] - Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
#519 - Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf]
Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

Abstract: In this paper, we propose a method to refine geometry of 3D meshes from Kinect fusion by exploiting shading cues captured from the infrared (IR) camera of Kinect. A major benefit of using the Kinect IR camera instead of a RGB camera is that the IR images captured by Kinect are narrow band images which filtered out most undesired ambient light that makes our system robust to natural indoor illumination. We define a near light IR shading model which describes the captured intensity as a function of surface normals, albedo, lighting direction, and distance between light source and surface points. To resolve ambiguity in our model between normals and distance, we utilize an initial 3D mesh from Kinect fusion and multi-view information to reliably estimate surface details that were not reconstructed by Kinect fusion. Our approach directly operates on mesh model for geometry refinement. The effectiveness of our approach is demonstrated through several challenging real-world examples.
Similar papers:
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
  • Better Shading for Better Shape Recovery [pdf] - Moumen El-Melegy
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
#520 - Fine-Grained Visual Comparisons with Local Learning [pdf]
Aron Yu, Kristen Grauman

Abstract: Given two images, we want to predict which exhibits a particular visual attribute more than the other---even when the two images are quite similar. Existing relative attribute methods rely on global ranking functions; yet rarely will the visual cues relevant to a comparison be constant for all data, nor will humans' perception of the attribute necessarily permit a global ordering. To address these issues, we propose a local learning approach for fine-grained visual comparisons. Given a novel pair of images, we learn a local ranking model on the fly, using only analogous training comparisons. We show how to identify these analogous pairs using learned metrics. With results on three challenging datasets---including a large newly curated dataset for fine-grained comparisons---our method outperforms state-of-the-art methods for relative attribute prediction.
Similar papers:
  • Linear Ranking Analysis [pdf] - Deng Weihong, Jiani Hu, Jun Guo
  • Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf] - Lucy Liang, Kristen Grauman
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
#525 - Compact Representation for Image Classification: To Choose or to Compress? [pdf]
Yu Zhang, Jianxin Wu, Jianfei Cai

Abstract: In large scale image classification, features such as Fisher vector or VLAD have achieved state-of-the-art results. However, the combination of large number of examples and high dimensional vectors necessitates dimensionality reduction, in order to reduce its storage and CPU costs to a reasonable range. In spite of the popularity of various feature compression methods, this paper argues that feature selection is a better choice than feature compression. We show that strong multicollinearity among feature dimensions may not exist, which undermines feature compression's effectiveness and renders feature selection a natural choice. We also show that many dimensions are noise and throwing them away is helpful for classification. We propose a supervised mutual information (MI) based importance sorting algorithm to choose features. Combining with 1-bit quantization, MI feature selection has achieved both higher accuracy and less computational cost than state-of-the-art feature compression methods such as product quantization and BPBC.
Similar papers:
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • Efficient feature extraction, encoding and classification\\ for action recognition [pdf] - Vadim Kantorov, Ivan Laptev
  • Asymmetric sparse kernel approximations for large-scale visual search [pdf] - Damek Davis, Stefano Soatto, Jonathan Balzer
  • Additive Quantization for Extreme Vector Compression [pdf] - Artem Babenko, Victor Lempitsky
#535 - Alert: Predicting Failures [pdf]
Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, Devi Parikh

Abstract: In real applications, not only is it important for computer vision systems to fail infrequently, it is also important for them to fail gracefully (e.g. with some warning). While the former has been the primary focus of the community, in this work, we hope to draw the community's attention to the latter problem. We introduce ALERT: a straightforward and general system that can predict the likely accuracy (or failure) of any computer vision system on an input instance. We promote two metrics to evaluate such failure prediction systems. We show that ALERT fairs surprisingly well at these metrics on a variety of applications such as semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction. We also explore attribute prediction, where classifiers are typically meant to generalize to new unseen categories. We show that ALERT can be useful in predicting failures of this transfer. Finally, we leverage ALERT to improve the performance of a downstream application of attribute prediction: zero-shot learning. We show that ALERT can outperform several strong baselines for zero-shot learning on four datasets.
Similar papers:
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Discriminative Feature-to-Point Matching in Image-Based Localization [pdf] - Michael Donoser, Dieter Schmalstieg
#542 - Surface Registration by Optimization in Constrained Diffeomorphism Space [pdf]
Wei Zeng, Lok Ming Lui, Xianfeng Gu

Abstract: This work proposes a novel framework for optimization in the constrained diffeomorphism space for deformable surface registration. The registration is formulated as an optimization problem in a constrained diffeomorphism space. First the diffeomorphism space is modeled as a special complex functional space on the source surface, the Beltrami coefficient space. The landmark constraints and the physical feasibility constraints define subspaces in the Beltrami coefficient space. Then the harmonic energy of the registration is minimized in the constrained subspaces. The minimization is achieved by alternating the optimization step and the projection step. The optimization step is to diffuse the Beltrami coefficient, and the projection step first deforms the conformal structure by the current Beltrami coefficient, then composes with a harmonic map from the deformed conformal structure to the target. The registration result is diffeomorphic, guarantees the landmark constraints, satisfies the physical constraints, and minimizes the conformality distortion.
Similar papers:
  • Robust Surface Reconstruction via Triple Sparsity [pdf] - Hicham Badri, Hussein Yahia, Driss Aboutajdine
  • Point Matching in the Presence of Outliers in Both Point Sets: A Concave Optimization Approach [pdf] - Wei Lian, Lei Zhang
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
  • Saliency Detection on Light Fields [pdf] - Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, Jingyi Yu
#553 - Low-Cost Compressive Sensing for Color Video and Depth [pdf]
Xin Yuan, Patrick Llull, Xuejun Liao, Jianbo Yang, David Brady, Guillermo Sapiro, Lawrence Carin

Abstract: A simple and inexpensive (low-power and lowbandwidth) modification is made to a conventional off-the-shelf color video camera, from which we recover multiple color frames for each of the original measured frames, and each of the recovered frames can be focused at a different depth. The recovery of multiple frames for each measured frame is made possible via high-speed coding, manifested via translation of a single coded aperture; the inexpensive translation is constituted by mounting the binary code on a piezoelectric device. To simultaneously recover depth information, a liquid lens is modulated at high speed, via a variable voltage. Consequently, during the aforementioned coding process, the liquid lens allows the camera to sweep the focus through multiple depths. In addition to designing and implementing the camera, fast recovery is achieved by an anytime algorithm exploiting the group-sparsity of wavelet/DCT coefficients.
Similar papers:
  • Fast and Robust Archetypal Analysis for Representation Learning [pdf] - Yuansi Chen, Julien Mairal, Zaid Harchaoui
  • Learning Inhomogeneous FRAME Models for Object Patterns [pdf] - Jianwen Xie, Wenze Hu, Song Chun Zhu, Ying Nian Wu
  • Bregman Divergences for Infinite Dimensional Covariance Matrices [pdf] - Mehrtash Harandi, Mathieu Salzmann, Fatih Porikli
  • Pseudoconvex Proximal Splitting for $L_\infty$ Problems in Multiview Geometry [pdf] - Anders Eriksson
#557 - Deformable Object Matching via Deformation Decomposition based 2D Label MRF [pdf]
Kangwei Liu, zhang Junge, Kaiqi Huang, Tieniu Tan

Abstract: Deformable object matching, which is also called elastic matching or deformation matching, is an important and challenging problem in computer vision. Although numerous deformation models have been proposed in different tasks, not many of them investigate the intrinsic physics underlying deformation. Due to the lack of physical analysis, these models cannot describe the structure changes of deformable objects very well. Motivated by this, we analyze the deformation physically and propose a novel deformation decomposition model to represent various deformations. Based on the physical model, we formulate the matching problem as a two-dimensional label Markov Random Field. The MRF energy function is derived from the deformation decomposition model. Furthermore, we propose a two-stage method to optimize the MRF energy function. To provide a quantitative benchmark, we build a deformation matching database with an evaluation criterion. Experimental results show that our method outperforms previous approaches especially on complex deformations.
Similar papers:
  • A Procrustean Markov Process for Non-Rigid Structure Recovery [pdf] - Minsik Lee, Chong-Ho Choi, Songhwai Oh
  • Good Vibrations: A Modal Analysis Approach for Sequential Non-Rigid Structure from Motion [pdf] - Antonio Agudo, Lourdes Agapito, Begoa Calvo, Jose M. Montiel
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
#558 - Similarity-Aware Patchwork Assembly for Depth Image Super-Resolution [pdf]
Jing Li, Zhichao Lu, Gang Zeng, Hongbin Zha

Abstract: This paper describes a patchwork assembly algorithm for depth image super-resolution. An input low resolution depth image is disassembled into parts by matching similar regions on a set of high resolution training images, and a super-resolution image is then assembled using these corresponding matched counterparts. We convert the super-resolution problem into a Markov random field labeling problem, and propose a unified formulation embedding (1) the consistency between the resolution enhanced image and the original input, (2) the similarity of disassembled parts with the corresponding regions on training images, (3) the depth smoothness in local neighborhoods, (4) the additional geometric constraints from self-similar structures in the scene, and (5) the boundary coincidence between the resolution enhanced depth image and an optional aligned high resolution intensity image. Experimental results on both synthetic and real-world data demonstrate that the proposed algorithm is capable of recovering high quality depth images with X 4 resolution enhancement along each coordinate direction, and that it outperforms the state-of-the-arts [14] in both qualitative and quantitative evaluations.
Similar papers:
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Super-resolving Appearance of 3D Deformable Shapes from Multiple Videos [pdf] - Jean-Sebastien Franco, Vagia Tsiminaki, Edmond Boyer
  • Multipoint Filtering with Local Polynomial Approximation and Range Guidance [pdf] - Xiao Tan, Changming Sun, Tuan Pham
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
#573 - Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf]
Yusuke Yoshiyasu

Abstract: We present a nonrigid shape matching technique for establishing correspondences of incomplete 3D surfaces that exhibit intrinsic reflectional symmetry. We formulate the shape matching problem as a quadratic assignment problem (QAP) which incorporates point-wise and pairwise matching constraints. The key for solving the symmetry ambiguity problem is to define a point-wise constraint from a local descriptor that is sensitive to local asymmetry such that we can discriminate global symmetry pairs, e.g. the left hand and the right hand. The proposed descriptor is based on a local depth map whose view-up direction is aligned with the gradient of a scalar field computed on a surface. Because this scalar field is smooth and isometric-invariant, the proposed descriptor is robust to isometric deformations as well as local geometric changes. Incompleteness of input surfaces is handled by constructing a pairwise constraint using the diffusion distance. Since we use a binary representation for a pairwise affinity, our technique is also robust to non-isometric deformations. To solve QAP efficiently, we propose a graph matching algorithm called iterative spectral relaxation which combines spectral embedding and spectral graph matching. The benefit of this algorithm is its near global convergence, while retaining efficiency. Experimental results show that our method can match a wide range of models and achieve a comparable result with other state-of-the art techniques on a surface correspond
Similar papers:
  • In Search of Inliers: 3D Correspondence by Local and Global Voting [pdf] - Anders Buch, Yang Yang, Norbert Krger, Henrik Petersen
  • Stable Template-Based Isometric 3D Reconstruction in All Imaging Conditions by Linear Least-Squares [pdf] - Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli, Toby Collins
  • Dense Non-Rigid Shape Correspondence using Random Forests [pdf] - Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
#578 - Generalized Max Pooling [pdf]
Naila Murray, Florent Perronnin

Abstract: State-of-the-art patch-based image representations involve a pooling operation that aggregates statistics computed from local descriptors. Standard pooling operations include average and max pooling. Average pooling lacks discriminability because the resulting representation is strongly influenced by frequent yet often uninformative descriptors, but only weakly influenced by rare yet potentially highly-informative ones. Max pooling equalizes the influence of frequent and rare descriptors but is only applicable to representations that rely on count statistics, such as the bag-of-visual-words (BOV). We propose a novel pooling mechanism that involves re-weighting the per-patch statistics. It achieves the same equalization effect as max pooling but is applicable beyond the BOV and especially to the state-of-the-art Fisher Vector -- hence the name Generalized Max Pooling (GMP). We show on five public image classification benchmarks that the proposed GMP performs on par with, and sometimes significantly better than, heuristic alternatives.
Similar papers:
  • Learning Important Spatial Pooling Regions for Scene Classification [pdf] - DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia
  • Bags of Spacetime Energies for Dynamic Scene Recognition [pdf] - Christoph Feichtenhofer, Axel Pinz, Richard Wildes
  • Learning Receptive Fields for Pooling from Tensors of Feature Response [pdf] - Can Xu, Nuno Vasconcelos
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
#583 - The Synthesizability of texture examples [pdf]
Dengxin Dai, Hayko Riemenschneider, Luc Van Gool

Abstract: While example-based texture synthesis (ETS) has been widely used to generate impressive high quality textures of desired size, not all images are equally good as the examples. In this paper we investigate the problem of predicting the synthesizability of an given image how synthesizable it is by ETS. We introduce a database (32, 000 texture samples) of which all images have been annotated in terms of their synthesizability. We design a set of texture features, such as homogeneity, repetitiveness, and regularity, and train a predictor using these features on the data collection. This work is the first attempt to quantify this image property, and we find that the synthesizability of images can be learned and predicted. In experiments, we verify the ectiveness of several designed features, and verify the usefulness of image synthesizability for multiple applications: perform an initial selection of examples for large-scale texture synthesis, trim images to parts that are more synthesizable, and serve as a feature for image recognition. Also, we suggest which texture synthesis method is best suited for synthesis of the given image.
Similar papers:
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
  • Super-resolving Appearance of 3D Deformable Shapes from Multiple Videos [pdf] - Jean-Sebastien Franco, Vagia Tsiminaki, Edmond Boyer
  • Describing Textures in the Wild [pdf] - Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi
  • Lacunarity Analysis on Image Patterns for Texture Classification [pdf] - Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
#585 - Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf]
Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu

Abstract: This paper proposes an unsupervised method for learning dictionaries of hierarchical compositional models for representing natural images. Each model is in the form of a template that consists of a small group of part templates that are allowed to shift their locations and orientations relative to each other, and each part template is in turn a composition of Gabor wavelets that are also allowed to shift their locations and orientations relative to each other. Given a set of unannotated training images, a dictionary of such hierarchical templates are learned so that each training image can be represented by a small number of templates that are spatially translated, rotated and scaled versions of the templates in the learned dictionary. The learning algorithm iterates between the following two steps: (1) Image encoding by a template matching pursuit process that involves a bottom-up template matching sub-process and a top-down template localization sub-process. (2) Dictionary re-learning by a shared matching pursuit process. Experimental results show that the proposed approach is capable of learning meaningful templates, and the learned templates are useful for tasks such as domain adaption and image cosegmentation.
Similar papers:
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • Analysis by Synthesis: Object Recognition by Object Reconstruction [pdf] - Mohsen Hejrati, Deva Ramanan
  • Learning Inhomogeneous FRAME Models for Object Patterns [pdf] - Jianwen Xie, Wenze Hu, Song Chun Zhu, Ying Nian Wu
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
#589 - Temporal Segmentation of Egocentric Videos [pdf]
Chetan Arora, Yair Poleg, Shmuel Peleg

Abstract: The use of wearable cameras makes it possible to record life logging egocentric videos. Browsing such long unstructured videos is time consuming and tedious. Segmentation into meaningful chapters is an important first step towards adding structure to egocentric videos, enabling efficient browsing, indexing and summarization of the long videos. Two sources of information for video segmentation are (i) the motion of the camera wearer, and (ii) the objects and activities recorded in the video. In this paper we address the motion cues for video segmentation. Motion based segmentation is especially difficult in egocentric videos when the camera is constantly moving due to natural head movement of the wearer. We propose a robust temporal segmentation of egocentric videos into a hierarchy of motion classes using a new {\em Array of Motion Integrators}. Unlike instantaneous motion vectors, segmentation using integrated motion vectors perform well even in dynamic and crowded scenes. No assumptions are made on the underlying scene structure and the algorithm works in indoor as well as outdoor situations. We demonstrate the effectiveness of our approach using publicly available videos as well as choreographed videos. An approach is also presented to compute the fixation of wearer's gaze in the walking portion of the egocentric videos.
Similar papers:
  • Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf] - Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer
  • Head Pose Estimation Based on Multivariate Label Distribution [pdf] - Xin Geng, Yu Xia
  • Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras [pdf] - Kenneth Funes Mora, Jean-Marc Odobez
  • Learning-by-Synthesis for Appearance-based 3D Gaze Estimation [pdf] - Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato
#591 - Efficient Localization with Fisher Vectors using Approximate Normalizations [pdf]
Dan Oneata, Jakob Verbeek, Cordelia Schmid

Abstract: The Fisher vector (FV) representation is a high-dimensional extension of the popular bag-of-word representation. Transformation of the FV by power and $\ell_2$ normalizations has been shown to significantly improve its performance. With these normalizations included, this representation has yielded state-of-the-art results for a wide number of image and video classification and retrieval tasks. The normalizations, however, render the representation non-additive over local descriptors. Combined with its high dimensionality, this makes the FV computationally very expensive for the purpose of localization tasks. In this paper we, first, present approximations to both these normalizations, which yield significant improvements in the memory requirements and computational costs of the FV when used for localization. Second, we show how these approximations can be used to define upper-bounds on the score function that can be efficiently evaluated, which paves the way for the use of branch-and-bound search as an alternative to exhaustive scanning window search. We present experimental evaluation results on classification and temporal localization of actions in videos. These show that the proposed approximations lead to speed-ups of at least one order of magnitude, while maintaining state-of-the-art action localization performance.
Similar papers:
  • Efficient feature extraction, encoding and classification\\ for action recognition [pdf] - Vadim Kantorov, Ivan Laptev
  • Fisher and VLAD with FLAIR [pdf] - Koen Van de Sande, Cees Snoek, Arnold Smeulders
  • Associative embeddings for large-scale knowledge transfer with self-assessment [pdf] - Alexander Vezhnevets, Vittorio Ferrari
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#592 - Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf]
Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra

Abstract: Use of higher order clique potentials for modeling inference problems has exploded in last few years. The algorithmic schemes proposed so far does not scale well with increasing clique size, thus limiting their usage to clique size of 4 in practice. Generic Cuts (GC) of Arora et al. [8] shows that when potentials are submodular, inference problems can be solved optimally in polynomial time for fixed size cliques. In this paper we report an algorithm called Approximate Cuts (AC) which uses a generalization of the gadget of GC and provides an approximate solution to inference in 2-label MRF-MAP problems with cliques of size k 2. The algorithm gives optimal solution for submodular potentials. When potentials are non-submodular, we show that important properties such as weak persistency hold for solution inferred by AC. AC is a polynomial time primal dual approximation algorithm for fixed clique size. We show experimentally that AC not only provides significantly better solutions in practice, it is hundreds of times faster than message passing schemes like Dual Decomposition [20] and TRWS [17] or Reduction based techniques like [10, 13, 15].
Similar papers:
  • FastSeg: More Efficiency on Multiple Figure-Ground Segmentations [pdf] - ahmad Humayun, Fuxin Li, James Rehg
  • Higher-Order Clique Reduction Without Auxiliary Variables [pdf] - Hiroshi Ishikawa
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Multi Label Generic Cuts: Optimal Inference in Multi Label Multi Clique MRF-MAP Problems [pdf] - Chetan Arora, S.N. Maheshwari
#594 - Multi Label Generic Cuts: Optimal Inference in Multi Label Multi Clique MRF-MAP Problems [pdf]
Chetan Arora, S.N. Maheshwari

Abstract: We propose an algorithm called Multi Label Generic Cuts (MLGC) for computing optimal solutions to MRF- MAP problems with submodular multi label multi-clique potentials. A transformation is introduced to convert a m- label k-clique problem to an equivalent 2-label(mk)-clique problem. We show that if the original multi-label problem is submodular then the transformed 2-label multi-clique problem is also submodular. We exploit sparseness in the feasible configurations of the transformed 2-label problem to suggest an improvement to Generic Cuts [3] to solve the 2-label problems efficiently. The algorithm runs in time O(m^k n^3 ) in the worst case (k is the order of cliques, m is the number of labels and n is the number of pixels) generalizing O(2^k n^3) running time of Generic Cuts. We show experimentally that MLGC is an order of magnitude faster than the current state of the art [17, 19]. While the result of MLGC is optimal for submodular clique potential it is significantly better than the compared methods even for problems with non-submodular clique potential.
Similar papers:
  • Efficient Squared Curvature [pdf] - Claudia Nieuwenhuis, Eno Toeppe, Lena Gorelick, Olga Veksler, Yuri Boykov
  • Higher-Order Clique Reduction Without Auxiliary Variables [pdf] - Hiroshi Ishikawa
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf] - Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra
#595 - Submodular Object Recognition [pdf]
Fan Zhu, Zhuolin Jiang, Ling Shao

Abstract: We present a novel object recognition framework based on multiple figure-ground hypotheses with a large object spatial support, generated by bottom-up processes and midlevel cues in an unsupervised manner. We exploit the benefit of regression for discriminating segments categories and qualities, where a regressor is trained to each category using the overlapping observations between each figureground segment hypothesis and the ground-truth of the target category in an image. Object recognition is achieved by maximizing a submodular objective function, which maximizes the similarities between the selected segments (i.e., facility locations) and their group elements (i.e., clients), penalizes the number of selected segments, and more importantly, encourages the consistency of object categories corresponding to maximum regression values from different category-specific regressors for the selected segments. The proposed framework achieves impressive recognition results on three benchmark datasets, including PASCAL VOC 2007, Caltech-101 and ETHZ-shape.
Similar papers:
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • A Bayesian Framework For the Local Configuration of Retinal Junctions [pdf] - Touseef Qureshi, Andrew Hunter, Bashir Al-Diri
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
#598 - A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf]
Menglong Yang, Yiguang Liu, Stan Li

Abstract: Mutual occlusions among targets can cause track loss or target position deviation. This is because the observation likelihood of a occluded target can vanish even when we have the estimated location of the target. This paper presents a novel probability framework for multitarget tracking with mutual occlusions. The primary contribution of this work is the introduction of a vectorial {\bf occlusion variable} as part of the solution. The occlusion variable describes occlusion states of the targets. This forms the basis of the proposed probability framework, with the following further contributions: 1) Likelihood: A new observation likelihood model is presented, in which the likelihood of an occluded target is computed by referring to both of the occluded and occluding targets. 2) Priori: Markov random field (MRF) is used to model the occlusion priori such that less likely ''circular'' or ''cascading'' types of occlusions have lower priori probabilities. Both the occlusion priori and the motion priori take into consideration the state of occlusion. 3) Optimization: A realtime RJMCMC-based algorithm with a new move type called ''occlusion state update'' is presented. Experiments are performed in comparison with several state-of-the-art algorithms. Results show that the proposed framework can handle occlusions well, including even long duration of full occlusions, which may cause tracking failures in the traditional methods.
Similar papers:
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • Occlusion Geodesics for Online Multi-Object Tracking [pdf] - Horst Possegger, Thomas Mauthner, Peter Roth, Horst Bischof
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
#600 - Multi-fold MIL Training for Weakly Supervised Object Localization [pdf]
Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid

Abstract: Object category localization is a challenging and fundamental problem in computer vision. Standard supervised training requires bounding box annotations of object instances. The time-consuming manual annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. We represent detection windows using the powerful Fisher vector representation, and reduce the storage and computational costs using a selective search strategy. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when high-dimensional representations, such as the Fisher vector, are used. We present a detailed experimental evaluation using the VOC 2007 dataset. Compared to state-of-the-art weakly supervised detectors, our approach better localizes objects in the training images, which translates into an improvement of detection performance from 15.0% to 22.4% mAP.
Similar papers:
  • Efficient Localization with Fisher Vectors using Approximate Normalizations [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Associative embeddings for large-scale knowledge transfer with self-assessment [pdf] - Alexander Vezhnevets, Vittorio Ferrari
  • Tell Me What You See and I will Show You Where It Is [pdf] - Jia Xu, Alexander Schwing, Raquel Urtasun
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
#605 - Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf]
Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter

Abstract: Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Non-linear hash functions have demonstrated the advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve non-linearity in hashing, which achieve encouraging retrieval performance at the price of slow evaluation and training time. For the first time, we propose to use boosted decision trees for achieving non-linearity in hashing, which are fast to train and evaluate, hence more suitable for hashing with high dimensional data. We separate the problem of learning hash functions into two independent sub-problems: binary code inference (via efficient Graph-Cut) and training of boosted decision trees via fitting the binary codes. Experiments demonstrate that our proposed method significantly outperforms most state-of-the-art methods in retrieval precision and training time. Especially for high-dimensional data, our method is orders of magnitude faster than many methods in terms of training time.
Similar papers:
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
#607 - Point Matching in the Presence of Outliers in Both Point Sets: A Concave Optimization Approach [pdf]
Wei Lian, Lei Zhang

Abstract: Recently, a concave optimization approach has been proposed to solve the robust point matching (RPM) problem. This method is globally optimal, but it requires that each point in the model point set has a counterpart in the data point set. Unfortunately, such a requirement may not be satisfied in some applications due to the presence of outliers in both point sets. To address this problem, we drop this condition and reduce the objective function of RPM to a function with few nonlinear terms by eliminating the transformation variables. The resulting function, however, is no longer quadratic. We prove that it is still concave over the feasible region of point correspondence. The branch-and-bound algorithm can then be used for optimization. To improve the efficiency of the branch-and-bound algorithm whose bottleneck lies in the computation of the lower bound, we propose a new lower bounding scheme which has a k-cardinality linear assignment formulation and can be efficiently solved. Experimental results demonstrate that the proposed concave optimization algorithm outperforms state-of-the-arts in its robustness to disturbances and point matching accuracy.
Similar papers:
  • Efficient pruning LMI conditions for Branch-and-Prune Rank and Chirallity-Constrained Estimation of the Dual Absolute Quadric [pdf] - Adlane Habed, Danda Pani Paudel, Cdric Demonceaux, David Fofi
  • Simplex-Based 3D Spatio-Temporal Feature Description for Action Recognition [pdf] - Hao Zhang, Wenjun Zhou, Christopher Reardon, Lynne Parker
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
#621 - Robust Surface Reconstruction via Triple Sparsity [pdf]
Hicham Badri, Hussein Yahia, Driss Aboutajdine

Abstract: Reconstructing a surface/image from corrupted gradient fields is a crucial step in many imaging applications where a gradient field is subject to both noise and unlocalized outliers, resulting typically in a non-integrable field. We present in this paper a new optimization method for robust surface reconstruction. The proposed formulation is based on a triple sparsity prior : a sparse prior on the residual gradient field and a double sparse prior on the surface itself. We develop an efficient alternate minimization strategy to solve the proposed optimization problem. The method is able to recover a good quality surface from severely corrupted gradients thanks to its ability to handle both noise and outliers. We demonstrate the performance of the proposed method on synthetic and real data. Experiments show that the proposed solution outperforms some existing methods in the three possible cases : noise only, outliers only and mixed noise/outliers.
Similar papers:
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Quality-based Multimodal Classification Using Tree-Structured Sparsity [pdf] - Soheil Bahrampour, Asok Ray, nasser Nasrabadi, Kenneth Jenkins
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
#622 - Robust Estimation of 3D Human Poses from Single Images [pdf]
CHUNYU WANG, Yizhou Wang, Zhouchen Lin, Alan Yuille, Wen Gao

Abstract: Human pose estimation is a key step to action recognition. We propose a method of estimating 3D human poses from single images, which works in conjunction with an existing 2D pose/joint detector. 3D pose estimation is challenging because multiple 3D poses may correspond to the same 2D pose after projection due to lack of depth information. Moreover, current 2D pose estimators are usually inaccurate, which may cause big errors in the 3D pose estimation. We address the challenges in three ways: (i) We represent a 3D pose as a linear combination of a sparse set of bases learned from 3D human skeletons. (ii) We enforce limb length constraints to eliminate anthropomorphically implausible poses. (iii) We estimate a 3D pose by minimizing the $L_1$-norm error between the projection of 3D joints and the corresponding 2D detections. The $L_1$-norm loss term is robust to inaccurate 2D joint estimations. We use the alternating direction method (ADM) to solve the $L_1$ minimization problem efficiently. Our approach outperforms the state-of-the-arts on three benchmark datasets.
Similar papers:
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • Posebits for Monocular Pose Estimation [pdf] - Gerard Pons-Moll, Bodo Rosenhahn, David Fleet
  • Mixing Body-Part Sequences for Human Pose Estimation [pdf] - Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid
#624 - A Minimal Solution to the Generalized Pose-and-Scale Problem [pdf]
Jonathan Ventura, Clemens Arth, Gerhard Reitmayr, Dieter Schmalstieg

Abstract: We propose a solution to a novel generalized camera pose problem which includes the internal scale of the generalized camera as an unknown parameter. This further generalization of the well-known absolute camera pose problem has applications in multi-frame loop closure. While a well-calibrated camera rig has a fixed and known scale, camera trajectories produced by monocular motion estimation necessarily lack a scale estimate. Thus, when performing loop closure in monocular visual odometry, or registering separate structure-from-motion reconstructions, we must estimate a seven degree-of-freedom similarity transform from corresponding observations. Existing approaches solve this problem, in specialized configurations, by aligning 3D triangulated points or individual camera pose estimates. Our approach handles general configurations of rays and points and directly estimates the full similarity transformation from the 2D-3D correspondences. Four correspondences are needed in the minimal case, which has eight possible solutions. The minimal solver can be used in a hypothesize-and-test architecture for robust transformation estimation. Our solver also produces a least-squares estimate in the overdetermined case. The approach is evaluated experimentally on synthetic and real datasets, and is shown to produce higher accuracy solutions to multi-frame loop closure than existing approaches.
Similar papers:
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • A General and Simple Method for Camera Pose and Focal Length Determination [pdf] - Yinqiang Zheng, Shigeki Sugimoto, Imari Sato, Masatoshi Okutomi
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • Relative Pose Estimation for a Multi-Camera System with Known Vertical Direction [pdf] - Gim Hee Lee, Marc Pollefeys, Friedrich Fraundorfer
#631 - Deep Fisher Kernels [pdf]
Mayu Sakurada, Vladyslav Sydorov , Christoph Lampert

Abstract: Fisher Kernels and Deep Belief Networks were two developments with significant impact on large-scale object categorization in the last years. Both approaches were shown to achieve state-of-the-art results on large-scale object categorization datasets, such as ImageNet. Conceptually, however, they are perceived as very different and it is not uncommon for heated debates to spring up when advocates of both paradigms meet at conferences or workshops. In this work, we emphasize the similarities between both architectures rather than their differences and we argue that such a unified view allows us to transfer ideas from one domain to the other. As a concrete example we introduce a training method that learns a support vector machine classifier with Fisher kernel at the same time as a task-specific data representation. The basis for this is a reinterpretation of a support vector classifiers with Fisher kernel as a multi-layer feed forward network. Its final layer is the classifier, parameterized by a weight vector, and the two previous layers compute Fisher vectors, parameterized by the coefficients of a Gaussian mixture model. We introduce a gradient-descent based learning algorithm that, in contrast to other feature learning techniques, is not just derived from intuition or biological analogy, but has a theoretical justification in the framework of statistical learning theory. Our experiments show that the new training procedure leads to significant improvements in classificat
Similar papers:
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Fisher and VLAD with FLAIR [pdf] - Koen Van de Sande, Cees Snoek, Arnold Smeulders
#637 - MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf]
Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu

Abstract: Interactive segmentation, in which a user provides a bounding box to an object of interest for image segmentation, has been applied to a variety of applications in image editing, crowdsourcing, computer vision, and medical imaging. The challenge of this semi-automatic image segmentation task is to deal with the uncertainty of the foreground object within the bounding box. Here, we turn the interactive image segmentation problem into a multiple instance learning (MIL) formulation, named MILCut, by generating positive bags from pixels of sweeping lines within the bounding box. We provide a justification to our formulation and develop an algorithm with significant performance and efficiency gain over existing state-of-the-art systems. The results on two benchmark datasets for interactive segmentation demonstrate the evident advantage of our approach.
Similar papers:
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision [pdf] - Liang-Chieh Chen, Sanja Fidler, Alan Yuille, Raquel Urtasun
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
#638 - Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf]
Lucy Liang, Kristen Grauman

Abstract: It is useful to automatically compare images based on their visual properties---for example, to predict which image is brighter, more feminine, or more blurry. However, comparative models are inherently more costly to train than their classification counterparts. Manually labeling all pairwise comparisons is intractable, so which pairs should a human supervisor compare? We explore active learning strategies for training relative attribute ranking functions, with the goal of requesting human comparisons only where they are most informative. We introduce a novel setwise criterion that requests a partial ordering for a set of examples that minimizes the cumulative rank margin in attribute space, subject to a visual diversity constraint. The setwise criterion helps amortize effort by identifying mutually informative comparisons, and the diversity requirement safeguards against requests a human viewer will find ambiguous. We develop an efficient strategy to search for sets that meet this criterion. On three challenging datasets, the proposed method outperforms existing active rank learning methods, demonstrating the importance of focusing attention when learning comparative attribute models.
Similar papers:
  • Linear Ranking Analysis [pdf] - Deng Weihong, Jiani Hu, Jun Guo
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
#639 - Immediate, scalable object category detection [pdf]
Yusuf Aytar, Andrew Zisserman

Abstract: The objective of this work is object category detection in large scale image datasets, where the object category is specified by a sliding window HOG classifier, and retrieval should be immediate at run time in the manner of Video Google. We make the following three contributions: (i) a new image representation based on mid-level discriminative patches, that is designed to be suited to immediate object category detection and inverted file indexing; (ii) a sparse representation of a HOG classifier using a set of mid-level discriminative classifier patches; and (iii) a fast method for spatial reranking images on their detections. We evaluate the detection method on the standard PASCAL VOC 2007 dataset, together with a 85K image subset of ImageNet, and demonstrate near state of the art detection performance at low ranks whilst maintaining immediate re trieval speeds. Applications are also demonstrated using an exemplar-SVM for pose matched retrieval.
Similar papers:
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
#642 - Blind Multi-Image Restoration [pdf]
Haichao Zhang

Abstract: Multiple image capturing is a simple way to increase the chance of capturing a good photo with a light-weight hand-held camera, for which the camera-shake blur is typically a nuisance problem. The naive approach of selecting the single best captured photo as output does not take full advantage of all the observations. Conventional multi-image blind deblurring methods can take all observations as input but usually require the multiple images are well aligned. However, the multiple blurry images captured in presence of camera shake are rarely free from mis-alignment. Registering multiple blurry images is a challenging task due to the presence of blur while deblurring of multiple blurry images requires accurate alignment, leading to an intrinsically coupling problem. In this paper, we propose a blind multi-image restoration method which can achieve joint alignment, non-uniform deblurring, together with resolution enhancement from multiple low quality images. Experiments on several real-world images with comparison to some previous methods validated the effectiveness of the proposed method.
Similar papers:
  • Gyro-Based Multi-Image Deconvolution for Removing Handshake Blur [pdf] - Sung Hee Park, Marc Levoy
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Joint Depth Estimation and Camera Shake Removal from Single Blurry Image [pdf] - Zhe Hu, Li Xu, Ming-Hsuan Yang
#643 - Data-driven Flower Petal Modeling with Botany Priors [pdf]
Chenxi Zhang, Mao Ye, BO FU, Ruigang Yang

Abstract: In this paper we focus on the 3D modeling of flower, in particular the petals. The complex structure, severe occlusions, and wide variations make the reconstruction of their 3D models a challenging task. Therefore, even though a flower is the most distinctive part of a plant, there has been little modeling study devoted to it. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. %are mostly designed for macro structures, such as trees or foliage; or based on pure synthesis given predefined rules with user interactions. Taking a 3D point cloud of an input flower scanned from a single view, our method starts with a level set based segmentation of each individual petal, using both appearance and position information. Each segmented petal is then fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned exemplar petals. Novel constraints based on botany studies, such as the number and spatial layout of petals, are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. Finally, the reconstructed petal shape is texture mapped using the registered color images, with occluded regions filled in by content from visible ones. Experiments show that our approach can obtain realistic modeling of flowers with noticeable occlusions and shape variations, and is invariant to flower size.
Similar papers:
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf] - Qing Zhang, BO FU
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
  • BirdMachine: Large-scale Fine-grained Visual Categorization of Birds [pdf] - Thomas Berg, Jiongxin Liu, Seung Woo Lee, Michelle Alexander, David Jacobs, Peter Belhumeur
#644 - Saliency Detection on Light Fields [pdf]
Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, Jingyi Yu

Abstract: Existing saliency detection approaches use images as inputs and are sensitive to foreground/background similarities, complex background textures, and occlusions. We explore the problem of using light fields as input for saliency detection. Our technique is enabled by the availability of commercial plenoptic cameras that capture the light field of a scene in a single shot. We show that the unique refocusing capability of light fields provides useful focusness, depths, and objectness cues. We further develop a new saliency detection algorithm tailored for light fields. To validate our approach, we acquire a light field database of a range of indoor and outdoor scenes and generate the ground truth saliency map. Experiments show that our saliency detection scheme can robustly handle challenging scenarios such as similar foreground and background, cluttered background, complex occlusions, \etc, and achieve high accuracy and robustness.
Similar papers:
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#646 - Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf]
Jiejie Zhu

Abstract: Detecting pedestrians at a distance from large-format wide-area imageries is a challenging problem because of low ground sampling distance (GSD) and the low frame rate of the imagery. In such a scenario, the approaches based on appearance cues alone easy to fail because pedestrians are only a few pixels in size. Frame-differencing and optical flow based approaches also give poor detection results due to noise, camera jitter and parallax. To overcome these challenges, we propose a novel approach to extract Multi-scale Intrinsic Motion Structure features from the pedestrian's motion patterns for pedestrian detection. The MIMS feature encodes the intrinsic motion properties of an object consisting of a few pixels, which is location, velocity and trajectory-shape invariant. The extracted MIMS representation is highly robust to noise in comparison with other approaches.
Similar papers:
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Informed Haar-like Features Improve Pedestrian Detection [pdf] - Shanshan Zhang, Christian Bauckhage, Armin Cremers
  • Learning an image-based motion context for multiple people tracking [pdf] - Laura Leal-Taix, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese
  • Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier [pdf] - Arthur Costea, Sergiu Nedevschi
#652 - Measuring Distance Between Unordered Sets of Different Sizes [pdf]
Andrew Gardner, Jinko Kanno, Rastko Selmic, Christian Duncan

Abstract: We present a distance metric based upon the notion of minimum-cost injective mappings between sets. Our function satisfies metric properties as long as the cost of the minimum mappings is derived from a semimetric, for which the triangle inequality is not necessarily satisfied. We show that the Jaccard distance (alternatively biotope, Tanimoto, or Marczewski-Steinhaus distance) may be considered the special case for finite sets where costs are derived from the discrete metric. Extensions that allow premetrics (not necessarily symmetric), multisets (generalized to include probability distributions), and multiple mappings are given that expand the versatility of the metric without sacrificing metric properties. The function has potential applications in pattern recognition, machine learning, and information retrieval.
Similar papers:
  • Asymmetric sparse kernel approximations for large-scale visual search [pdf] - Damek Davis, Stefano Soatto, Jonathan Balzer
  • Simultaneous Twin Kernel Learning for Structured Prediction [pdf] - Chetan Tonde, Ahmed Elgammal
  • T-Linkage: a Continuous Relaxation of J-Linkage for Multi-Model Fitting [pdf] - Luca Magri, Andrea Fusiello
  • Look at the Driver, Look at the Road: No Distraction! No Accident! [pdf] - Mahdi Rezaei, Reinhard Klette
#656 - Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf]
Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu

Abstract: Driven by the wide range of applications, scene text detection and recognition have become active research topics in computer vision. Though extensively studied, localizing and reading text in uncontrolled environments remain extremely challenging, due to various interference factors. In this paper, we propose a novel multi-scale representation for scene text recognition. This representation consists of a set of detectable primitives, termed as strokelets, which capture the essential substructures of characters at different granularities. Strokelets possess four distinctive advantages: (1) Usability: automatically learned from bounding box labels; (2) Robustness: insensitive to interference factors; (3) Generality: applicable to variant languages; and (4) Expressivity: effective at describing characters in natural scenes. Extensive experiments on standard benchmarks verify the advantages of strokelets and demonstrate that the proposed algorithm outperforms the state-of-the-art methods in the literature.
Similar papers:
  • Deblurring Text Images via L0-Regularized Intensity and Gradient Prior [pdf] - Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang
  • StoryGraphs: Narrative Charts for TV series [pdf] - Makarand Tapaswi, Martin Buml, Rainer Stiefelhagen
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Region-based Discriminative Feature Pooling for Scene Text Recognition [pdf] - Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
#665 - Inferring Analogous Attributes [pdf]
Chao-Yeh Chen, Kristen Grauman

Abstract: The appearance of an attribute can vary considerably from class to class (e.g., a ``fluffy" dog vs.~a ``fluffy" towel), making standard class-independent attribute models break down. Yet, training object-specific models for each attribute can be impractical, and defeats the purpose of using attributes to bridge category boundaries. We propose a novel form of transfer learning that addresses this dilemma. We develop a tensor factorization approach which, given a sparse set of class-specific attribute classifiers, can infer new ones for object-attribute pairs unobserved during training. For example, even though the system has no labeled images of striped dogs, it can use its knowledge of other attributes and objects to tailor ``stripedness" to the dog category. With two large-scale datasets, we demonstrate both the need for category-sensitive attributes as well as our method's successful transfer. Our inferred attribute classifiers perform similarly well to those trained with the luxury of labeled class-specific instances, and much better than those restricted to traditional modes of transfer.
Similar papers:
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
#668 - Online Object Tracking, Learning and Parsing with And-Or Graphs [pdf]
Yang Lu, Tianfu Wu, Song Chun Zhu

Abstract: This paper presents a framework for simultaneously tracking, learning and parsing objects with a hierarchical and compositional And-Or graph (AOG) representation. The AOG is discriminatively learned online to account for the appearance (e.g., lighting and partial occlusion) and structural (e.g., different poses and viewpoints) variations of the object itself, as well as the distractors (e.g., similar objects) in the scene background. In tracking, the state of the object (i.e., bounding box) is inferred by parsing with the current AOG using a spatial-temporal dynamic programming (DP) algorithm. When the AOG grows big for handling objects with large variations in long-term tracking, we propose a bottom-up/top-down scheduling scheme for efficient inference, which performs focused inference with the most stable and discriminative small sub-AOG. During online learning, the AOG is re-learned iteratively with two steps: (i) Identifying the false positives and false negatives of the current AOG in a new frame by exploiting the spatial and temporal constraints observed in the trajectory; (ii) Updating the structure of the AOG, and re-estimating the parameters based on the augmented training dataset. In experiments, the proposed method outperforms the state-of-the-art tracking algorithms on a recent public tracking benchmarks with 50 testing videos and 29 publicly available trackers evaluated \cite{trackingBenchmark}.
Similar papers:
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#669 - Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf]
Mehmet Yumer, Won Chun, Ameesh Makadia

Abstract: We present a novel co-segmentation method for textured 3D shapes. Our algorithm takes a collection of textured shapes belonging to the same category and sparse annotations of foreground segments, and produces a joint dense segmentation of the shapes in the collection. We model the segments present in the shape collection by a collectively trained Gaussian mixture model. The final model segmentation is formulated as an energy minimization across all models jointly, where intra-model edges control the smoothness and separation of model segments, and inter-model edges impart global consistency. We show promising results on two large real-world datasets, and also compare with previous shape-only 3D segmentation methods using publicly available datasets.
Similar papers:
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
#670 - Quasi Real-Time Summarization for Consumer Videos [pdf]
Bin Zhao, Eric Xing

Abstract: With the widespread availability of video cameras, we are facing an ever-growing enormous collection of unedited and unstructured video data. Due to lack of an automatic way to generate summaries from this large collection of consumer videos, they can be tedious and time consuming to index or search. In this work, we propose online video highlighting, a principled way of generating short video summarizing the most important and interesting contents of an unedited and unstructured video, costly both time-wise and financially for manual processing. Specifically, our method learns a dictionary from given video using group sparse coding, and updates atoms in the dictionary on-the-fly. A summary video is then generated by combining segments that cannot be sparsely reconstructed using the learned dictionary. The online fashion of our proposed method enables it to process arbitrarily long videos and start generating summaries before seeing the end of the video. Moreover, the processing time required by our proposed method is close to the original video length, achieving quasi real-time summarization speed. Theoretical analysis, together with experimental results on more than 12 hours of surveillance and YouTube videos are provided, demonstrating the effectiveness of online video highlighting.
Similar papers:
  • Semi-Supervised Coupled Dictionary Learning for Person Re-identification [pdf] - Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, Jiajun Bu
  • Towards Multi-view and Partially-occluded Face Alignment [pdf] - Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
#671 - Beyond Human Opinion Scores: Blind Image Quality Assessment based on Synthetic Scores [pdf]
Peng Ye, David Doermann

Abstract: General purpose blind image quality assessment (BIQA) aims to develop some computational model that can predict human perceived quality of distorted images without knowing the on-distorted reference images and any prior knowledge on the types of image distortions. State-of-the-art general purpose BIQA methods rely on 1) examples of distorted images and 2) corresponding human opinion scores to learn a regression function that maps image features to the quality score. These types of models are considered "opinion-aware" (OA) BIQA models. A large set of human scored training examples is usually required to train a reliable OA-BIQA model. However, obtaining human opinion score through subjective testing is often expensive and time-consuming. It is therefore desirable to develop "opinion-free" (OF) BIQA models that do not require human opinion scores for training. This paper proposes BLISS (Blind Learning of Image Quality using Synthetic Scores). BLISS is a simple, yet effective method for extending OA-BIQA models to OF-BIQA models. Instead of training on human opinion scores, we propose to train BIQA models on Full-Reference (FR) IQA measures. State-of-the-art FR measures yield high correlation with human opinion scores, therefore they can serve as an approximation to human opinion scores. Unsupervised rank aggregation is applied to combine different FR measures to generate a synthetic score, which serves as a better "gold standard". Extensive experiments on three standard IQA
Similar papers:
  • A 3D Feature for Moving Range Scanning Systems [pdf] - Xiangqi Huang, Bo Zheng, Takeshi Masuda, Katsushi Ikeuchi
  • Visual Persuasion: Inferring the Communicative Intents of Images [pdf] - Jungseock Joo, Weixin Li, Francis Steen, Song Chun Zhu
  • Active Sampling for Subjective Image Quality Assessment [pdf] - Peng Ye, David Doermann
  • Blind Image Quality Assessment using Semi-supervised Rectifier Networks [pdf] - Huixuan Tang, Neel Joshi, Ashish Kapoor
#672 - Detecting Objects using Deformation Dictionaries [pdf]
Bharath Hariharan, Piotr Dollar, Larry Zitnick

Abstract: Several popular and effective object detectors model intra-class variations due to deformations and appearance changes separately. This reduces model complexity while enabling detection of objects across change in viewpoint, object pose, etc. The Deformable Part Model (DPM) is perhaps the most successful such model to date. A common assumption is that the exponential number of templates enabled by a DPM is critical to its success. In this paper, we show the counter-intuitive result that it is possible to achieve similar accuracy using a small dictionary of global deformations. Each component in our model is represented by a single HOG template and a dictionary of flow fields that determine the deformations the template may undergo. While the number of candidate deformations is dramatically fewer than that for a DPM, the deformed templates tend to be plausible and interpretable. In addition, we discover that the set of deformation bases is actually transferable across object categories and that learning shared bases across similar categories can even boost accuracy.
Similar papers:
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Deformable Object Matching via Deformation Decomposition based 2D Label MRF [pdf] - Kangwei Liu, zhang Junge, Kaiqi Huang, Tieniu Tan
#675 - Using k-poselets for detecting people and localizing their keypoints [pdf]
Bharath Hariharan, Georgia Gkioxari, Ross Girshick, Jitendra Malik

Abstract: A k-poselet is a Deformable Part Model with k parts, where each of the parts is a poselet, aligned to a specific configuration of keypoints based on ground truth annotations. A separate HOG template is used to learn the appearance of each part. The parts are allowed to move with respect to each other with a deformation cost that is learned at training time. This model is richer than both the traditional version of poselets (Bourdev et al) and DPMs (Felzenszwalb et al) and experimental results verify its superiority at person detection as well as keypoint prediction.
Similar papers:
  • Parsing Occluded People [pdf] - Golnaz Ghiasi, Yi Yang, Deva Ramanan, Charless Fowlkes
  • Fast Rotation Search with Stereographic Projections for 3D Registration [pdf] - Alvaro Parra Bustos, Tat-Jun Chin, David Suter
  • Analysis by Synthesis: Object Recognition by Object Reconstruction [pdf] - Mohsen Hejrati, Deva Ramanan
  • Occlusion Coherence: Localizing Occluded Faces with a Hierarchical Deformable Part Model [pdf] - Golnaz Ghiasi, Charless Fowlkes
#678 - Super Normal Vector for Activity Recognition Using Depth Sequences [pdf]
Xiaodong Yang, Yingli Tian

Abstract: This paper presents a new framework for human activity recognition from video sequences captured by a depth camera. We cluster hypersurface normals in depth sequences to form polynormal which is used to jointly characterize the local motion and shape information. In order to globally capture the spatial and temporal orders, an adaptive spatio-temporal pyramid is introduced to subdivide a depth video into a set of space-time grids. We then propose a novel scheme of aggregating the low-level polynormals into the Super Normal Vector (SNV) which can be seen as a simplified version of the Fisher kernel representation. In the extensive experiments, we achieve classification results superior to all previous published results on the four public benchmark datasets, i.e., MSRAction3D, MSRDailyActivity3D, MSRGesture3D, and MSRActionPairs3D.
Similar papers:
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf] - Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
#689 - Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf]
Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo

Abstract: As a convex relaxation of the low rank matrix factorization problem, the nuclear norm minimization has been attracting significant research interest in recent years. The standard nuclear norm minimization regularizes each singular value equally to pursue the convexity of the objective function. However, this greatly restricts its capability and flexibility in dealing with many practical problems (e.g., denoising), where the singular values have clear physical meanings and should be treated differently. In this paper we study the weighted nuclear norm minimization (WNNM) problem with F-norm data fidelity, where the singular values are assigned different weights. The solutions of the WNNM problem are analyzed under different weighting conditions. We then apply the proposed WNNM algorithm to image denoising by exploiting the image nonlocal self-similarity. Experimental results clearly show that the proposed WNNM algorithm outperforms many state-of-the-art denoising algorithms such as BM3D in terms of both quantitative measure and visual perception quality.
Similar papers:
  • Generalized Nonconvex Nonsmooth Low-Rank Minimization [pdf] - Canyi Lu, Shuicheng Yan, Zhouchen Lin
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images [pdf] - Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu
#692 - Semi-Supervised Coupled Dictionary Learning for Person Re-identification [pdf]
Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, Jiajun Bu

Abstract: The desirability of being able to search for specific persons in surveillance videos captured by different cameras has increasingly motivated interest in the problem of person re-identification, which is a critical yet under-addressed challenge in multi-camera tracking systems. The main difficulty of person re-identification arises from the variations in human appearances from different camera views. In this paper, to bridge the human appearance variations across cameras, two coupled dictionaries that relate to the gallery and probe cameras are jointly learned in the training phase from both labeled and unlabeled images. The labeled training images carry the relationship between features from different cameras, and the abundant unlabeled training images are introduced to exploit the geometry of the marginal distribution for obtaining robust sparse representation. In the testing phase, the feature of each target image from the probe camera is first encoded by the sparse representation and then recovered in the feature space spanned by the images from the gallery camera. The features of the same person from different cameras are similar following the above transformation. Experimental results on publicly available datasets demonstrate the superiority of our method.
Similar papers:
  • Filter Pairing Neural Network for Person Re-identification [pdf] - Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang
  • Dual Linear Regression Based Classification for Face Cluster Recognition [pdf] - Liang Chen
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
#703 - Object-based Multiple Foreground Video Co-segmentation [pdf]
Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin

Abstract: We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set. The use of object elements overcomes limitations of low-level feature representations in separating complex foregrounds and backgrounds. We formulate object-based co-segmentation as a co-selection graph in which regions with foreground-like characteristics are favored while also accounting for intra-video and inter-video foreground coherence. To handle multiple foreground objects, we expand the co-selection graph model into a proposed multi-state selection graph model (MSG) that optimizes the segmentations of different objects jointly. This extension into the MSG can be applied not only to our co-selection graph, but also can be used to turn any standard graph model into a multi-selection solution that can be optimized directly by existing energy minimization techniques. Our experiments show that our object-based multiple foreground video co-segmentation method (ObMiC) compares well to related techniques on both single and multiple foreground cases.
Similar papers:
  • How to Evaluate Foreground Maps? [pdf] - Ran Margolin, Lihi Zelnik-Manor, Ayellet Tal
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Joint Motion Segmentation and Background Subtraction in Dynamic Scenes [pdf] - Adeel Mumtaz, Weichen Zhang, Antoni Chan
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
#708 - A 3D Feature for Moving Range Scanning Systems [pdf]
Xiangqi Huang, Bo Zheng, Takeshi Masuda, Katsushi Ikeuchi

Abstract: Laser range sensors are often demanded to mount on a moving platform for achieving the efficiency of 3D reconstruction, SLAM, object recognition, etc. However, such a moving system often suffers from the difficulty of matching the distorted 3D range images. In this paper, we propose novel 3D features which can be robustly extracted and matched together for distorted range scans captured by a moving system. Our feature extraction employs Morse function theory to construct measure function which obtains invariant critical points under the 3D surface distortion. Then at each critical point, we extract the maximally stable region as interest region by disconnectivity as well as the extremal region for comparison. Our feature description are designed as two processes: 1) affine-based normalization and 2) critical net construction. The former normalizes the detected local regions to canonical shapes while the later connects detected local regions with a subgraph. In experiments, we demonstrate that the proposed 3D feature achieves substantially better performance for distorted surface matching in comparison to state-of-the-art methods.
Similar papers:
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
  • Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf] - Qing Zhang, BO FU
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
#709 - Learning Fine-grained Image Similarity with Deep Ranking [pdf]
Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu

Abstract: Learning fine-grained image similarity models is a very challenging task. Fine-grained image similarity is usually characterized by very subtle differences that are difficult to be distinguished with hand-crafted features. We propose a ranking model that employs deep neural network learning techniques to learn image similarity models directly from images. We call this a deep ranking model. Compared to similar models based on hand-crafted features, the deep ranking model has higher learning capacity to better characterize the subtle differences required for fine-grained image similarity. We also propose an effective triplet sampling algorithm to learn the model with distributed asynchronized stochastic gradient. The experimental results show that the proposed algorithm outperforms both the state-of-the-art hand-crafted visual feature-based methods and deep neural-network classification models.
Similar papers:
  • Multilabel Ranking with Inconsistent Rankers [pdf] - Xin Geng, Longrun Luo
  • Linear Ranking Analysis [pdf] - Deng Weihong, Jiani Hu, Jun Guo
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
#711 - Domain Adaptation on the Statistical Manifold [pdf]
Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann

Abstract: In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions become similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that probability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this manifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the corresponding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of-the-art results on a standard object recognition benchmark.
Similar papers:
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Time Machine: Continuous Manifold Based Adaptation for Evolving Visual Domains [pdf] - Judy Hoffman, Trevor Darrell, Kate Saenko
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
#712 - Human Action Recognition Based on Context-Dependent Graph Kernels [pdf]
Baoxin Wu, Chunfeng Yuan, Weiming Hu

Abstract: Graphs are a powerful tool to model structured objects, but it is nontrivial to measure the similarity between two graphs. In this paper, we construct a two-graph model to represent human actions by recording the spatial and temporal relationships among local features. We also propose a novel family of context-dependent graph kernels (CGKs) to measure similarity between graphs. First, local features are used as the vertices of the two-graph model and the relationships among local features in the intra-frames and inter-frames are characterized by the edges. Then, the proposed CGKs are applied to measure the similarity between actions represented by the two-graph model. Graphs can be decomposed into numbers of primary walk groups with different walk lengths and our CGKs are based on the context-dependent primary walk group matching. Taking advantage of the context information makes the correctly matched primary walk groups dominate in the CGKs and improves the performance of similarity measurement between graphs. Finally, a generalized multiple kernel learning with a proposed l12-norm regularization is applied to combine these CGKs optimally together and simultaneously train a set of action classifiers. We conduct a series of experiments on four public action datasets. Our approach achieves a comparable performance to the state-of-the-art approaches which demonstrates the effectiveness of the two-graph model and the CGKs in recognizing human actions
Similar papers:
  • Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf] - Waqas Sultani, Imran Saleemi
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#717 - Towards Good Practices for Action Video Encoding [pdf]
Jianxin Wu, Yu Zhang

Abstract: Recently, high dimensional representation such as VLAD or FV has shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can get further accuracy boost comparable to the accuracy gains achieved by replacing bag-of-features with the FV or VLAD representation. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding consistently improves VLAD by large margins in action recognition.
Similar papers:
  • Efficient Localization with Fisher Vectors using Approximate Normalizations [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Locality in Generic Instance Search from One Example [pdf] - Ran Tao, Efstratios Gavves, Cees Snoek, Arnold Smeulders
  • Efficient feature extraction, encoding and classification\\ for action recognition [pdf] - Vadim Kantorov, Ivan Laptev
  • Fisher and VLAD with FLAIR [pdf] - Koen Van de Sande, Cees Snoek, Arnold Smeulders
#719 - Multi-source Deep Learning for Human Pose Estimation [pdf]
Wanli Ouyang, Xiaogang Wang, Xiao Chu

Abstract: Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multi-source deep model in order to extract non-linear representation from these different aspects of information sources. With the deep model, the global, high-order human body articulation patterns in these information sources are extracted for pose estimation. The task for estimating body locations and the task for human detection are jointly learned using a unified deep model. The proposed approach can be viewed as a post-processing of pose estimation results and can flexibly integrate with existing methods by taking their information sources as input. By extracting the non-linear representation from multiple information sources, the deep model outperforms state-of-the-art by up to 8.6 percent on three public benchmark datasets.
Similar papers:
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
#721 - Beta Process Multiple Kernel Learning [pdf]
Bingbing Ni, Pierre Moulin

Abstract: In kernel based learning, the kernel trick transforms the original representation of a feature instance into a vector of similarities with the training feature instances, known as kernel representation. However, feature instances are sometimes ambiguous and the kernel representation calculated based on them do not possess any discriminative information, which can eventually harm the trained classifier. To address this issue, we propose to automatically select good feature instances when calculating the kernel representation in multiple kernel learning. Specifically, for the kernel representation calculated for each input feature instance, we multiply it element-wise with a latent binary vector named as instance selection variables, resulting a new kernel representation with attenuated effect from the similarities calculated on ambiguous feature instances. Beta process is employed for generating the prior distribution for the introduced latent instance selection variables. We then propose a Bayesian graphical model which integrates both MKL learning and inferences for the distribution of the latent instance selection variables. Variational inference is derived for model learning under a max-margin principle. Qualitative and quantitative evaluations on a synthetic data, UCL toy datasets, two image classification benchmarks and an action recognition video benchmark demonstrate the effectiveness of the proposed method and its high discriminative capabilit
Similar papers:
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf] - Joseph Tighe, Marc Niethammer, Svetlana Lazebnik
#726 - SCAMS: Simultaneous Clustering and Model Selection [pdf]
Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou

Abstract: While clustering has been well studied in the past decade, model selection has drawn less attention. This paper addresses both problems in a joint manner with an indicator matrix formulation, in which the clustering cost is penalized by a Frobenius inner product term and the group number estimation is achieved by a rank minimization. As affinity graphs generally contain positive edge values, a sparsity term is further added to avoid the trivial solution. We then carefully investigate the convex relaxations of this unified problem and solve it efficiently using the Alternating Direction Method of Multipliers. The highly constrained nature of the optimization provides our algorithm with the robustness to deal with the varying and often imperfect input affinity matrices arising from different applications and different group numbers. Evaluations on the synthetic data as well as two real world problems show the superiority of the method across a large variety of settings.
Similar papers:
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
  • Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces [pdf] - Yingying Zhu, Dong Huang, Fernando de la Torre, Simon Lucey
  • Subspace Clustering for Sequential Data [pdf] - Stephen Tierney, Junbin Gao, Yi Guo
  • Constructing Robust Affinity Graph for Spectral Clustering [pdf] - Xiatian Zhu, Chen Change Loy, Shaogang Gong
#732 - Collective Matrix Factorization Hashing for Multimodal Data [pdf]
Guiguang Ding, Yuchen Guo, Jile Zhou

Abstract: Nearest neighbor search methods based on hashing have attracted considerable attention for effective and efficient large-scale similarity search in computer vision and information retrieval community. In this paper, we study the problems of learning hash functions in the context of multimodal data for cross-view similarity search. We put forward a novel hashing method, which is referred to Collective Matrix Factorization Hashing (CMFH). CMFH learns unified hash codes by collective matrix factorization with latent factor model from different modalities of one instance, which can not only supports cross-view search but also increases the search accuracy by merging multiple view information sources. We also prove that CMFH, a similarity-preserving hashing learning method, has upper and lower boundaries. Extensive experiments verify that CMFH significantly outperforms several state-of-the-art methods on three different datasets.
Similar papers:
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
#733 - Co-localization in Real-World Images [pdf]
Kevin Tang, Armand Joulin, Li-Jia Li, Li Fei-Fei

Abstract: In this paper, we tackle the problem of co-localization in real-world images. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images. Although similar problems such as co-segmentation and weakly supervised localization have been previously studied, we focus on being able to perform co-localization in real-world settings, which are typically characterized by large amounts of intra-class variation, inter-class diversity, and annotation noise. To address these issues, we present a joint image-box formulation for solving the co-localization problem, and show how it can be relaxed to a convex quadratic program which can be efficiently solved. We perform an extensive evaluation of our method compared to previous state-of-the-art approaches on the challenging PASCAL VOC 2007 and Object Discovery datasets. In addition, we also present a large-scale study of co-localization on ImageNet, involving ground-truth annotations for 3,624 classes and approximately 1 million images.
Similar papers:
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Scalable Object Detection using Deep Neural Networks [pdf] - Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
#735 - User-Specific Hand Modeling from Monocular Depth Sequences [pdf]
Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann

Abstract: This paper presents a method for acquiring dense non-rigid shape and deformation from a single monocular depth sensor. We consider an important special case of acquisition from nonrigid scenes: when a rough model template is available. We focus on modeling the human hand, and assume that a single rough model template is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data. We propose an objective that measures the error of fit between each sampled data point and a continuous model surface defined by a rigged control mesh, and use as-rigid-as-possible (ARAP) regularizers to cleanly separate the model and template geometries. Our use of a smooth model based on subdivision surfaces allows simultaneous optimization over both correspondences and model parameters, avoiding the use of iterated closest point (ICP) which can lead to slow convergence. Automatic initialization is obtained using a regression forest trained to infer approximate correspondences. Experiments show that the resulting meshes model the user's hand shape more accurately than just adapting the shape parameters of the skeleton, and that the retargeted skeleton accurately models the user's articulations. We investigate the effect of various modeling choices, and show the benefits of using subdivision surfaces and ARAP regularization.
Similar papers:
  • Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf] - Iason Oikonomidis, Manolis Lourakis, Antonis Argyros
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf] - Qing Zhang, BO FU
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
#750 - Multi-feature Spectral Clustering with Minimax Optimization [pdf]
Hongxing Wang, Chaoqun Weng, Junsong Yuan

Abstract: In this paper, we propose a novel formulation for multi-feature clustering using minimax optimization. To find a consensus clustering result that is agreeable to all feature modalities, our objective is to find a universal feature embedding, which not only fits each individual feature modality well, but also unifies different feature modalities by minimizing their pairwise disagreements. The loss function consists of both (1) unary embedding cost for each modality, and (2) pairwise disagreement cost for each pair of modalities, with weighting parameters automatically selected to maximize the loss. By performing minimax optimization, we can minimize the loss for a worst case with maximum disagreements, thus can better handle noisy feature modalities. To solve the minimax optimization, an iterative solution is proposed to update the universal embedding, individual embedding, and fusion weights, separately. Our minimax optimization has only one global parameter. The superior results on various multi-feature clustering tasks validate the effectiveness of our approach when compared with the state-of-the-art methods.
Similar papers:
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
  • A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions [pdf] - Simon Jones, Ling Shao
#751 - DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences [pdf]
Hongsheng Yang, Wen-Yan Lin, Jiangbo Lu

Abstract: Establishing dense correspondences reliably between a pair of images is an important vision task with many applications. Though significant advance has been made towards estimating dense stereo and optical flow fields for two images adjacent in viewpoint or in time, building reliable dense correspondence fields for two general images still remains largely unsolved. For instance, two given images sharing some content exhibit dramatic photometric and geometric variations, or they depict different 3D scenes of similar scene characteristics. Fundamental challenges to such an image or scene alignment task are often multifold, which render many existing techniques fall short of producing dense correspondences robustly and efficiently. This paper presents a novel approach called DAISY filter flow to address this challenging task. The DAISY filter flow algorithm leverages and extends a few established techniques: 1) DAISY descriptors, 2) filter-based efficient flow inference, and 3) the PatchMatch fast search. Coupling and optimizing these modules seamlessly with image segments as the bridge, our approach enables efficiently performing dense descriptor-based correspondence field estimation in a generalized high-dimensional label space, which is augmented by scales and rotations. Experiments on a variety of challenging scenes show that the proposed approach estimates spatially coherent yet discontinuity-preserving image alignment results both robustly and efficiently.
Similar papers:
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • A Compositional Model for Low-Dimensional Image Set Representation [pdf] - Hossein Mobahi, Ce Liu, Bill Freeman
  • SphereFlow: 6 DoF Scene Flow from RGB-D Pairs [pdf] - Michael Hornacek, Andrew Fitzgibbon, Margrit Gelautz, Carsten Rother
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
#766 - Occlusion Geodesics for Online Multi-Object Tracking [pdf]
Horst Possegger, Thomas Mauthner, Peter Roth, Horst Bischof

Abstract: Robust multi-object tracking-by-detection requires the correct assignment of noisy detection results to object trajectories. We address this problem by proposing an online approach based on the observation that object detectors primarily fail if objects are significantly occluded. In contrast to most existing work, we only rely on geometric information to efficiently overcome detection failures. In particular, we exploit the spatio-temporal evolution of occlusion regions, detector reliability, and target motion prediction to robustly handle missed detections. In combination with a conservative association scheme for visible objects, this allows for real-time tracking of multiple objects from a single static camera, even in complex scenarios. Our evaluations on publicly available multi-object tracking benchmark datasets demonstrate superior performance compared to the state-of-the-art in online and offline multi-object tracking.
Similar papers:
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
  • Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf] - Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan
  • A Probabilistic Framework for Multitarget Tracking with Mutual Occlusions [pdf] - Menglong Yang, Yiguang Liu, Stan Li
#772 - Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf]
Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan

Abstract: Multiple target tracking (MTT) is often formulated as a (multi-frame) data association problem, and different optimization approaches have been proposed to capture the association solution. Most existing approaches, however, treat different targets as independent of each other, thereby ignoring the interaction between subjects. In this paper, we model interactions between neighbor targets by pair-wise motion context, and further encode such context into the global association optimization. To solve the resulting global non-convex maximization, we propose an effective and efficient power iteration framework. This solution enjoys two advantages for MTT: First, it allows us to combine the global energy accumulated from individual trajectories and the between-trajectory interaction energy into a united optimization, which can be solved by the proposed power iteration algorithm. Second, the framework is flexible to accommodate various types of pairwise context models and we in fact studied two different context models in this paper. For evaluation, we apply the proposed methods to four public datasets involving different challenging scenarios such as dense aerial borne traffic tracking, dense point set tracking, and semi-crowded pedestrian tracking. In all the experiments, our approaches demonstrate very promising results in comparison with state-of-the-art trackers.
Similar papers:
  • An Online Learned Elementary Grouping Model for Multi-target Tracking [pdf] - Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
  • Curvilinear Structure Tracking by Low Rank Tensor Approximation with Model Propagation [pdf] - Erkang Cheng, Yu Pang, Ying Zhu, Haibin Ling
  • Tracklet Association with Online Reidentification in Network Flow Optimaiztion for Long-term Multi-Person Tracking [pdf] - BING WANG, Gang Wang, Kap Luk Chan, LI WANG
#778 - Facial Expression Recognition via a Boosted Deep Belief Network [pdf]
Ping Liu, shizhong han, zibo meng, Yan Tong

Abstract: A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages jointly in a unified framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded significant improvements in facial expression analysis.
Similar papers:
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition [pdf] - Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
#794 - Bayesian Active Contours with Affine-Invariant, Elastic Shape Prior [pdf]
Darshan Bryner, Anuj Srivastava

Abstract: Active contour, especially in conjunction with prior-shape models, has become an important tool in image segmentation. However, most contour methods use shape priors based on similarity-shape analysis, i.e.\ analysis that is invariant to rotation, translation, and scale. In practice, the training shapes used for prior-shape models may be collected from viewing angles different from those for the test images and require invariance to a larger class of transformation. Using an elastic, affine-invariant shape modeling of planar curves, we propose an active contour algorithm in which the training and test shapes can be at arbitrary affine transformations, and the resulting segmentation is robust to perspective skews. We construct a shape space of affine-standardized curves and derive a statistical model for capturing class-specific shape variability. The active contour is then driven by the true gradient of a total energy composed of a data term, a smoothing term, and an affine-invariant shape-prior term. This framework is demonstrated using a number of examples involving the segmentation of occluded or noisy images of targets subject to perspective skew.
Similar papers:
  • Good Vibrations: A Modal Analysis Approach for Sequential Non-Rigid Structure from Motion [pdf] - Antonio Agudo, Lourdes Agapito, Begoa Calvo, Jose M. Montiel
  • Dense Non-Rigid Shape Correspondence using Random Forests [pdf] - Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers
  • Fully Automated Non-rigid Segmentation with Distance Regularized Level Set Evolution Initialized and Constrained by Deep-structured Inference [pdf] - Tuan Ngo, Gustavo Carneiro
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
#795 - Frequency-Based 3D Reconstruction of Transparent and Specular Objects [pdf]
Ding Liu, Xida Chen, Yee-Hong Yang

Abstract: 3D reconstruction of transparent and specular objects is a very challenging topic in computer vision. For transparent and specular objects, which have complex interior and exterior structures that can reflect and refract light in a complex fashion, it is difficult, if not impossible, to use either passive stereo or the traditional structured light methods to do the reconstruction. We propose a frequency-based 3D reconstruction method, which incorporates the frequency-based matting method. Similar to the structured light methods, a set of frequency-based patterns are projected onto the object, and a camera captures the scene at the same time. Each pixel of the captured image is analyzed along the time axis and the corresponding signal is transformed to the frequency-domain using the Discrete Fourier Transform. Since the frequency is only determined by the source that creates it, the frequency of the signal can uniquely identify the location of the pixel in the patterns. In this way, the correspondences between the pixels in the captured images and the points in the patterns can be acquired. Using a new labelling procedure, the surface of transparent and specular objects can be reconstructed with very encouraging results.
Similar papers:
  • Aliasing Detection and Reduction in Plenoptic Imaging [pdf] - Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
#801 - Bi-label Propagation for Generic Multiple Object Tracking [pdf]
Wenhan Luo, Tae-Kyun Kim, Bjrn Stenger, Xiaowei Zhao, Roberto Cipolla

Abstract: In this paper, we propose a label propagation framework to handle the multiple object tracking (MOT) problem for a generic object type (cf. pedestrian tracking). Given a target object by an initial bounding box, all objects of the same type are localized together with their identities. We treat this as a problem of propagating bi-labels, i.e. a binary class label for detection and individual object labels for tracking. To propagate the binary class label, we adopt clustered Multiple Task Learning (cMTL) while enforcing spatio-temporal consistency and show that this improves the performance when given limited training data. To track objects, we propagate labels from trajectories to detections based on affinity using appearance, motion, and context. Experiments on public and challenging new sequences show that the proposed method improves over the current state of the art on this task.
Similar papers:
  • Occlusion Geodesics for Online Multi-Object Tracking [pdf] - Horst Possegger, Thomas Mauthner, Peter Roth, Horst Bischof
  • Better Feature Tracking Through Subspace Constraints [pdf] - Bryan Poling, Gilad Lerman, Arthur Szlam
  • Curvilinear Structure Tracking by Low Rank Tensor Approximation with Model Propagation [pdf] - Erkang Cheng, Yu Pang, Ying Zhu, Haibin Ling
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
#809 - Probabilistic Active Appearance Models [pdf]
Joan Alabort-i-Medina, Stefanos Zafeiriou

Abstract: In this paper we provide the first, to the best of our knowledge, probabilistic formulation of one of the most successful and well-studied statistical models of shape and texture, i.e. Active Appearance Models (AAMs). To this end, we use a simple probabilistic model for texture generation assuming both Gaussian noise and Gaussian prior over the latent texture space. We retrieve the shape parameters by formulating a cost function obtained by marginalizing out the latent texture space. This results in a fast implementation when compared to other simultaneous algorithms for fitting AAMs, mainly due to the removal of the calculation of texture parameters. We proceed with demonstrating that, contrary to what is believed regarding the performance of AAMs in generic fitting scenarios, optimization of the proposed cost function produces results that outperforms discriminatively trained state-of-the-art algorithms in the problem of facial alignment "in the wild".
Similar papers:
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
  • Automatic Construction of Deformable Models In-The-Wild [pdf] - Epameinondas Antonakos, Stefanos Zafeiriou
  • Gauss-Newton Constrained Local Models [pdf] - GEORGIOS TZIMIROPOULOS, Maja Pantic
#814 - Transformation Pursuit for Image Classification [pdf]
Mattis Paulin, Jerome REVAUD, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid

Abstract: An approach to learning invariances in image classifica- tion is to augment the training set with transformed ver- sions of the original images. However, given a large set of possible transformations, selecting an optimal subset of transformations is challenging. Indeed, transformations are not equally informative and adding uninformative transfor- mations increases training time with no gain in accuracy. We propose a principled algorithm Image Transforma- tion Pursuit (ITP) for the automatic selection of trans- formations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transforma- tions, that are combinations of basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. We report on CUB an improvement of top-1 accuracy from 28.2% to 45.2% and on ImageNet an improvement of top-5 accuracy from 70.1% to 74.9%.
Similar papers:
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
  • Three Guidelines of Online Learning for Large-Scale Visual Recognition [pdf] - Yoshitaka Ushiku, Tatsuya Harada
  • Simultaneous Twin Kernel Learning for Structured Prediction [pdf] - Chetan Tonde, Ahmed Elgammal
#817 - Dense Non-Rigid Shape Correspondence using Random Forests [pdf]
Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers

Abstract: We propose a shape matching method that produces dense correspondences tuned to a specific class of shapes and deformations. In a scenario where this class is represented by a small set of typical example shapes, the proposed method learns a shape decscriptor that captures these shapes and their deformations. The approach enables the \emph{wave kernel signature} to extend the class of recognized deformations from near isometries to the deformations appearing in the example set by means of a \emph{random forest} classifier. With the help of the introduced spatial regularization, the proposed method achieves significant improvements over the baseline approach and obtains state-of-the-art results while keeping short computation times.
Similar papers:
  • Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf] - Yusuke Yoshiyasu
  • Constructing Robust Affinity Graph for Spectral Clustering [pdf] - Xiatian Zhu, Chen Change Loy, Shaogang Gong
  • Incremental Learning of NCM Forests for Large-Scale Image Classification [pdf] - Marko Ristin, Matthieu Guillaumin, Juergen Gall, Luc Van Gool
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
#820 - Finding the Subspace Mean or Median to Fit Your Need [pdf]
Timothy Marrinan, Michael Kirby, Bruce Draper, Chris Peterson

Abstract: Many computer vision algorithms employ subspace models to represent data. Many of these approaches benefit from the ability to create an average or prototype for a set of subspaces. The most popular method in these situations is the Karcher mean, also known as the Riemannian center of mass. The prevalence of the Karcher mean may lead some to assume that it provides the best average in all scenarios. However, other subspace averages that appear less frequently in the literature may be more appropriate for certain tasks. The extrinsic manifold mean, the $L_2$-median, and the flag mean are alternative averages that can be substituted directly for the Karcher mean in many applications. This paper evaluates the characteristics and performance of these four averages on synthetic and real-world data. While the Karcher mean generalizes the Euclidean mean to the Grassman manifold, we show that the extrinsic manifold mean, the $L_2$-median, and the flag mean behave more like medians and are therefore more robust to the presence of outliers among the subspaces being averaged. We also show that while the Karcher mean and $L_2$-median are computed using iterative algorithms, the extrinsic manifold mean and flag mean can be found analytically and are therefore orders of magnitude faster in practice. Finally, we show that the flag mean is a generalization of the extrinsic manifold mean that permits subspaces with different numbers of dimensions to be averaged. The result is a "coo
Similar papers:
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • Complex Non-Rigid Motion 3D Reconstruction by Union of Subspaces [pdf] - Yingying Zhu, Dong Huang, Fernando de la Torre, Simon Lucey
  • Subspace Clustering for Sequential Data [pdf] - Stephen Tierney, Junbin Gao, Yi Guo
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
#836 - Deblurring Text Images via L0-Regularized Intensity and Gradient Prior [pdf]
Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang

Abstract: We propose a simple yet effective L0-regularized prior based on intensity and gradient for text image deblurring. The proposed image prior is motivated by observing distinct properties of text images. Based on this prior, we develop an efficient optimization method to generate reliable intermediate results for kernel estimation. The proposed method does not require any complex filtering strategies to select salient edges which are critical to the state-of-the-art deblurring algorithms. We discuss the relationship with other deblurring algorithms based on edge selection and provide insight on how to select salient edges more principally. In the final latent image restoration step, we develop a simple method to remove artifacts and render better deblurred images. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art text image deblurring methods. In addition, we show that the proposed method can be effectively applied to deblur low-illumination images.
Similar papers:
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Joint Depth Estimation and Camera Shake Removal from Single Blurry Image [pdf] - Zhe Hu, Li Xu, Ming-Hsuan Yang
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
#848 - Manifold Based Dynamic Texture Synthesis from Extremely Few Samples [pdf]
Hongteng Xu, Hongyuan Zha, Mark Davenport

Abstract: In this paper, we present a novel method to synthesize dynamic texture sequences from extremely few samples, e.g., merely two possibly disparate frames, leveraging both Markov Random Fields (MRFs) and manifold learning. Decomposing textural image as a set of patches, we achieve dynamic texture synthesis by estimating sequences of temporal patches. We select candidates for each temporal patch from spatial patches based on MRFs and regard them as samples from a low-dimensional manifold. After mapping candidates to a low-dimensional latent space, we estimate the sequence of temporal patches by finding an optimal trajectory in the latent space. Guided by some key properties of trajectories of realistic temporal patches, we derive a curvature-based trajectory selection algorithm. In contrast to the methods based on MRFs or dynamic systems that rely on a large amount of samples, our method is able to deal with the case of extremely few samples and requires no training phase. We compare our method with the state of the art and show that our method not only exhibits superior performance on synthesizing textures but it also produces results with pleasing visual effects.
Similar papers:
  • Unsupervised Trajectory Modelling using Temporal Information via Minimal Paths [pdf] - Brais Cancela, Alberto Iglesias, Marcos Ortega, Manuel Penedo
  • Super-resolving Appearance of 3D Deformable Shapes from Multiple Videos [pdf] - Jean-Sebastien Franco, Vagia Tsiminaki, Edmond Boyer
  • Analysis by Synthesis: Object Recognition by Object Reconstruction [pdf] - Mohsen Hejrati, Deva Ramanan
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
#849 - Super-Resolving Noisy Images [pdf]
Abhishek Singh, Fatih Porikli, Narendra Ahuja

Abstract: Our goal is to obtain a noise-free, high resolution (HR) image, from an observed, noisy, low resolution (LR) image. The conventional approach of preprocessing the image with a denoising algorithm, followed by applying a super-resolution (SR) algorithm, has an important limitation: Along with noise, some high frequency content of the image (particularly textural detail) is invariably lost during the denoising step. This 'denoising loss' restricts the performance of the subsequent SR step, wherein the challenge is to synthesize such textural details. In this paper, we show that high frequency details in the noisy image (which is ordinarily removed by denoising algorithms) can be effectively used to obtain the missing textural details in the HR domain. To do so, we first obtain HR versions of both the noisy and the denoised images, using a patch-similarity based SR algorithm. We then show that by taking a convex linear combination of orientation and frequency selective bands of the noisy and the denoised HR images, we can obtain a desired HR image where (i) some of the textural signal lost in the denoising step is effectively recovered in the HR domain, and (ii) additional textures can be easily synthesized by appropriately constraining the parameters of the convex combination. We show that this part-recovery and part-synthesis of textures through our algorithm yields HR images that are visually more pleasing than those obtained using the conventional processing pipeline. Furthe
Similar papers:
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
  • CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images [pdf] - Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu
#857 - Local Readjustment for High-Resolution 3D Reconstruction [pdf]
Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan

Abstract: Global bundle adjustment usually converges to a nonzero residual and produces sub-optimal camera poses for local areas, which leads to loss of details for high resolution reconstruction. Instead of trying harder to optimize everything globally, we argue that we should live with the non-zero residual and adapt the camera poses to local areas. To this end, we propose a segment-based approach to readjust the camera poses locally and improve the reconstruction for fine geometry details. The key idea is to partition the globally optimized structure-from-motion points into well-conditioned segments for re-optimization, reconstruct their geometry individually, and fuse everything back into a consistent global model. This significantly reduces severe propagated errors and estimation biases caused by the initial global adjustment. The results on several datasets demonstrate that this approach can significantly improve the reconstruction accuracy, while maintaining the consistency of the 3D structure between segments.
Similar papers:
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
  • 3D Reconstruction from Accidental Motion [pdf] - Fisher Yu, David Gallup, Steve Seitz
#859 - Dual-Space Decomposition of 2D Complex Shapes [pdf]
Guilin Liu, Zhonghua Xi, Jyh-Ming Lien

Abstract: While techniques that segment shapes into visually meaningful parts have generated impressive results, these techniques also have only focused on relatively simple shapes, such as those composed of a single object either without holes or with few simple holes. In many applications, shapes created from images can contain many overlapping objects and holes. These holes may come from sensor noise or may have important part of the shape and arbitrarily complex. These complexities that appear in real-world 2D shapes can pose grand challenges to the existing part segmentation methods. In this paper, we propose a new decomposition method, called Dual-space Decomposition that handles complex 2D shapes by recognizing the importance of holes and classifying holes as either topological noise or structurally important features. Our method creates a nearly convex decomposition of a given shape by segmenting both positive and negative regions of the shape. We compare our results to segmentation produced by non-expert human subjects. Based on two evaluation methods, we show that this new decomposition method creates statistically similar and sometimes better segmentation comparing to those produced by human subjects.
Similar papers:
  • FAST LABEL: Easy and Efficient Optimization of Joint Multi-Label and Estimation Problems [pdf] - Byung-Woo Hong, Ganesh Sundaramoorthi
  • Separation of Line Drawings Based on Split Faces for 3D Object Reconstruction [pdf] - Changqing ZOU
  • Point Matching in the Presence of Outliers in Both Point Sets: A Concave Optimization Approach [pdf] - Wei Lian, Lei Zhang
  • Object Partitioning using Local Convexity [pdf] - Simon Christoph Stein, Jeremie Papon, Markus Schoeler, Florentin Woergoetter
#863 - Aliasing Detection and Reduction in Plenoptic Imaging [pdf]
Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou

Abstract: When using plenoptic camera for digital refocusing, angular undersampling can cause severe (angular) aliasing artifacts. Previous approaches have focused on avoiding aliasing by pre-processing the acquired light field via prefiltering, demosaicing, reparameterization, etc. In this paper, we present a different solution that first detects and then removes aliasing at the light field refocusing stage. Different from previous frequency domain aliasing analysis, we carry out a spatial domain analysis to reveal whether the aliasing would occur and uncover where in the image it would occur. The spatial analysis also facilitates easy separation of the aliasing vs. non-aliasing regions and aliasing removal. Experiments on both synthetic scene and real light field camera array data sets demonstrate that our approach has a number of advantages over the classical prefiltering and depth-dependent light field rendering techniques.
Similar papers:
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Saliency Detection on Light Fields [pdf] - Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, Jingyi Yu
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
#873 - Finding Vanishing Points via Point Alignments in Image Primal and Dual Domains [pdf]
Jos Lezama, Rafael Grompone von Gioi, Jean-Michel Morel, Gregory Randall

Abstract: We present a novel method for automatic vanishing point detection based on primal and dual point alignment detection. The very same point alignment detection algorithm is used twice: first in the image domain to group line segment endpoints into more precise lines. Second, it is used in the dual domain where converging lines become aligned points. The use of the recently introduced PCLines dual spaces and a robust point alignment detector, leads to a very accurate algorithm. Experimental results on two public standard datasets show that our method significantly advances the state-of-the-art in the Manhattan world scenario, while producing state-of-the-art performances in non-Manhattan scenes.
Similar papers:
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
#877 - Investigating Haze-relevant Features in A Learning Framework for Image Dehazing [pdf]
Ketan Tang, Jianchao Yang, Jue Wang

Abstract: Haze is one of the major factors that degrade outdoor images. Removing haze from a single image is known to be severely ill-posed, and assumptions made in previous methods do not hold in many situations. In this paper, we systematically investigate different haze-relevant features in a learning framework to identify the best feature combination for image dehazing. We show that the dark-channel feature is the most informative one for this task, which confirms the observation of He et al. from a learning perspective, while other haze-relevant features also contribute significantly in a complementary way. We also find that surprisingly, the synthetic hazy image patches we use for feature investigation serve well as training data for realworld images, which allows us to train specific models for specific applications. Experiment results demonstrate that the proposed algorithm outperforms state-of-the-art methods on both synthetic and real-world datasets.
Similar papers:
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Backscatter Compensated Photometric Stereo with 3 Sources [pdf] - Chourmouzios Tsiotsios, Maria Angelopoulou, Tae-Kyun Kim, Andrew Davison
#880 - Stacked Progressive Auto-Encoder (SPAE) for Face Recognition Across Poses [pdf]
Meina Kan, Shiguang Shan, Hong Chang, Xilin Chen

Abstract: Identifying subjects with variations caused by poses is one of the most challenging tasks in face recognition, since the difference in appearances caused by poses may be even larger than the difference due to identity. Inspired by the observation that pose variations change non-linearly but smoothly, we propose to learn pose-robust features by modeling the complex non-linear transform from the non-frontal face images to frontal ones through an deep network in an progressive way, termed as stacked progressive auto-encoders (SPAE). Specifically, each shallow progressive auto-encoder of the stacked network is developed to map the face images at large poses to a virtual view at smaller ones, and meanwhile keep those images already at smaller poses unchanged. Then, stacking multiple these shallow auto-encoders can convert a non-frontal face image to frontal one progressively, which means the pose variations are narrowed down to zero step by step. As a result, the outputs of the topmost hidden layers of the stacked network contain very small pose variations, which can be used as the pose-robust features for face recognition. An additional attractiveness of the proposed method is that no pose estimation is needed for the test images. The proposed method is tested on two datasets with pose variations, i.e., MultiPIE and FERET datasets, and the experimental results demonstrate the superiority of our method to the existing works, especially to those 2D ones.
Similar papers:
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Learning Non-Linear Reconstruction Models for Image Set Classification [pdf] - Munawar Hayat, mohammed Bennamoun, Senjian An
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
#900 - Locality in Generic Instance Search from One Example [pdf]
Ran Tao, Efstratios Gavves, Cees Snoek, Arnold Smeulders

Abstract: This paper aims for generic instance search from a single example. Where the state-of-the-art relies on global image representation for the search, we proceed by including locality at all steps of the method. As the first novelty, we consider many boxes as candidate targets by an efficient point-indexed representation independent of the number of boxes considered. The same representation allows, as the second novelty, the application of very large vocabularies in the powerful Fisher vector and VLAD. As the third novelty we propose to emphasize local search in feature space by an exponential similarity function. Locality is advantageous in instance search as it will rest on the matching unique details. We demonstrate a substantial increase in generic instance search performance from one example on three standard datasets with buildings, logos, and scenes from 0.443 to 0.620 in mAP.
Similar papers:
  • Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Wengang Zhou, Qi Tian
  • Deep Fisher Kernels [pdf] - Mayu Sakurada, Vladyslav Sydorov , Christoph Lampert
  • Towards Good Practices for Action Video Encoding [pdf] - Jianxin Wu, Yu Zhang
  • Fisher and VLAD with FLAIR [pdf] - Koen Van de Sande, Cees Snoek, Arnold Smeulders
#907 - High Quality Photometric Reconstruction using a Depth Camera [pdf]
Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu

Abstract: In this paper we present a depth-guided photometric 3D reconstruction method that works solely with a depth camera like the Kinect. Existing methods that fuse depth with normal estimates use an external RGB camera to obtain photometric information and treat the depth camera as a black box that provides a low quality depth estimate. Our contribution to such methods are two fold. Firstly, instead of using an extra RGB camera, we use the infra-red (IR) camera of the depth camera system itself to directly obtain high resolution photometric information. We believe that ours is the first method to use an IR depth camera system in this manner. Secondly, photometric methods applied to complex objects result in numerous holes in the reconstructed surface due to shadows and self-occlusions. To mitigate this problem, we develop a simple and effective multiview approach that fuses depth and normal information from multiple viewpoints to build a complete, consistent and accurate 3D surface representation. We demonstrate the efficacy of our method to generate high quality 3D surface reconstructions for some complex 3D figurines.
Similar papers:
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf] - Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
#911 - Is Rotation a Nuisance in Shape Recognition? [pdf]
Qiuhong Ke, Yi Li

Abstract: Rotation in shape recognition is regarded as a puzzling nuisance in most algorithms. In this paper we address three fundamental issues brought by rotated shapes: 1) is alignment among shapes necessary? If the answer is no, 2) how to exploit information in different rotations of the same shape? and 3) how to use rotation unaware local features for rotation aware shape recognition? We argue that the origin of these issues is the use of hand crafted rotation-unfriendly features and measurements. Therefore our goal is to learn a set of hierarchical features that describe all rotated versions of a shape as one class, with the capability of distinguishing different such classes. We propose to rotate shapes as many times as possible as training samples, and learn the hierarchical feature representation by effectively adopting a convolutional neural network. We further show that our method is very efficient because the convolutional network responses of all n rotated versions of the same shape can be computed at the expense of O(log n) factor, instead of the naive O(n). We tested the algorithm on three real datasets: Swedish Leaves dataset, ETH-80 Shape dataset, and a subset of the recently collected Leafsnap dataset. Our approach used the curvature scale space and outperformed the state of the art.
Similar papers:
  • Fast Rotation Search with Stereographic Projections for 3D Registration [pdf] - Alvaro Parra Bustos, Tat-Jun Chin, David Suter
  • Robust and Efficient Full-Angle Quaternions for Matching Arrays of 3D Rotations [pdf] - Stephan Liwicki, Stefanos Zafeiriou, Maja Pantic, Bjrn Stenger, Minh-Tri Pham
  • Efficient Squared Curvature [pdf] - Claudia Nieuwenhuis, Eno Toeppe, Lena Gorelick, Olga Veksler, Yuri Boykov
  • Noising versus Smoothing for Vertex Identification in Unknown Shapes [pdf] - Konstantinos Raftopoulos, Marin Ferecatu
#912 - Orientation Robust Textline Detection in Natural Images [pdf]
Le Kang, Yi Li

Abstract: Current natural text detection methods focus on the bottom up approaches, which rely heavily on strong multi-stage hypotheses for characters and words, and use strong hand crafted heuristic rules to group potential elements into text lines. In contrast of this methodology, we suggest to use weak hypotheses in a similarity clustering framework, followed by a simple region-based filtering as post-processing. We treat text line detection as a graph partitioning problem, where each vertex is represented by a Maximally Stable Extremal Region (MSER). First, weak hypotheses are proposed by grouping MSERs into multiple overlapping regions, based on their spatial alignment with respect to their neighbors. Then, higher-order correlation clustering (HOCC) is used to partition the MSERs into textline candidates, using the hypotheses as soft constraints to enforce long range interactions. We further propose a regularization method to solve the Semidefinite Programming problem in the inference. Finally we use a simple texton-based texture classifier to filter out the non-text areas. This framework allows us to naturally handle multiple orientations, languages and fonts. Experiments show competitive performance using this framework. On a recent dataset, we achieved $ 9\%$ performance gain in precision, with comparable recall versus competing methods.
Similar papers:
  • Multi-modal Learning in Loosely-organized Web Images [pdf] - Kun Duan, David Crandall, Dhruv Batra
  • Region-based Discriminative Feature Pooling for Scene Text Recognition [pdf] - Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
  • Deblurring Text Images via L0-Regularized Intensity and Gradient Prior [pdf] - Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang
  • Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf] - Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
#915 - Detection, Rectification and Segmentation of Co-planar Repeated Patterns [pdf]
James Pritts, Ondrej Chum, Jiri Matas

Abstract: This paper presents a novel and general method for detection, rectification and segmentation of co-planar repeated patterns imaged. The only assumption on the image content is that repeated elements of the pattern can be mapped to each other in the scene plane by a set of Euclidean transformations. This is a very general assumption that covers nearly all commonly seen man-made repetitive patterns. In addition, novel linear constraints are exploited that enable geometric ambiguity reduction between the rectification of the imaged pattern and the real-world pattern. The remaining ambiguity is within a similarity if the scene plane contains repeated elements that are rotated differently, or within a similarity with a scale ambiguity along the axis of symmetry if any of the elements are reflected. The method is successfully tested on a broad range of image types including those where state-of-the-art methods fail.
Similar papers:
  • Mirror Symmetry Histograms for Capturing Geometric Properties in Images [pdf] - Marcelo Cicconet, Davi Geiger, Michael Werman, Kristin Gunsalus
  • Bayesian Active Contours with Affine-Invariant, Elastic Shape Prior [pdf] - Darshan Bryner, Anuj Srivastava
  • Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera [pdf] - Avinash Kumar, Narendra Ahuja
  • Global Optimization for Depth Reconstruction from Speckle Patterns [pdf] - Qifeng Chen, Vladlen Koltun
#917 - Discriminative Blur Detection Features [pdf]
Jianping Shi, Li Xu, Jiaya Jia

Abstract: The common existence of image blur brings out a practically important question -- what is effective to differentiate between blurred and unblurred image regions by nature. We address it by studying a few blur feature representations in image gradient and Fourier domains and through data-driven local filters. Unlike previous methods which are often based on restoration and deconvolution mechanisms, our features are constructed to enhance discriminative power and are adaptive to varying blur scale in images. To avail evaluation, we build a new blur perception dataset containing thousands of images with labeled ground-truth. Our results are applied to facilitate several applications, including blur region segmentation, deblurring.
Similar papers:
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Gyro-Based Multi-Image Deconvolution for Removing Handshake Blur [pdf] - Sung Hee Park, Marc Levoy
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
#920 - Visual Tracking via Probability Continuous Outlier Model [pdf]
Dong Wang, Huchuan Lu

Abstract: In this paper, we present a novel online visual tracking method based on linear representation. First, we present a novel probability continuous outlier model (PCOM) to depict the continuous outliers that occur in the linear representation model. In the proposed model, the element of the noisy observation sample can be either represented by a PCA subspace with small Guassian noise or treated as an arbitrary value with a uniform prior, in which the spatial consistency prior is exploited by using a binary Markov random filed model. Then, we derive the objective function of the PCOM method, the solution of which can be iteratively obtained by the outlier-free least squares and standard max-flow/min-cut steps. Finally, based on the proposed PCOM method, we design an effective observation likelihood function and a simple update scheme for visual tracking. Both qualitative and quantitative evaluations demonstrate that our tracker achieves very favorable performance in terms of both accuracy and speed.
Similar papers:
  • Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf] - Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#923 - Clothing Co-Parsing by Joint Image Segmentation and Labeling [pdf]
Wei Yang, Liang Lin, Ping Luo

Abstract: This paper aims at developing an integrated system of clothing co-parsing: given a database of clothes/human images that are unsegmented but annotated with tags, jointly parse them into semantic clothing configurations. We propose a data-driven framework consisting of two phases of inference. The first phase, referred as ``image co-segmentation'', iterates to extract consistent regions on images and jointly refines the regions over all images by employing the exemplar-SVM (E-SVM) technique. In the second phase (i.e. ``region co-labeling''), we construct a multi-image graphical model by taking the segmented regions as vertices, incorporating several contexts of clothing configuration (\eg, item location and mutual interactions). The joint label assignment can be solved using an efficient message passing algorithm. In addition to evaluate our framework on the Fashionista dataset \cite{Fashion}, we construct a dataset called CCP consisting of 2098 high-resolution street fashion photos to demonstrate the performance of our system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89% recognition rate on the Fashionista and the CCP datasets, respectively, which are superior compared with state-of-the-art methods.
Similar papers:
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • The Shape-Time Random Field for Semantic Video Labeling [pdf] - Andrew Kae, Erik Learned-Miller, Benjamin Marlin
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
  • Towards Unified Human Parsing and Pose Estimation [pdf] - Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
#926 - Randomized Max-Margin Compositions for Visual Recognition [pdf]
Angela Eigenstetter, Bjorn Ommer

Abstract: A main theme in object detection are currently discriminative part-based models. The powerful model that combines all parts is then typically only feasible for few constituents, which are in turn iteratively trained to make them as strong as possible. We follow the opposite strategy by randomly sampling a large number of instance specific part classifiers. Due to their number, we cannot directly train a powerful classifier to combine all parts. Therefore, we randomly group them into fewer, overlapping compositions that are trained using a maximum-margin approach. In contrast to the common rationale of compositional approaches, we do not aim for semantically meaningful ensembles. Rather we seek randomized compositions that are discriminative and generalize over all instances of a category. Our approach not only localizes objects in cluttered scenes, but also explains them by parsing with compositions and their constituent parts. Experiments on PASCAL VOC07, on the VOC10 evaluation server, and on the MITIndoor scene dataset show the competitive performance of the approach. Moreover, we evaluate the individual contributions and potential of compositions and their parts in separate experiments.
Similar papers:
  • Learning Important Spatial Pooling Regions for Scene Classification [pdf] - DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
  • Using k-poselets for detecting people and localizing their keypoints [pdf] - Bharath Hariharan, Georgia Gkioxari, Ross Girshick, Jitendra Malik
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
#935 - Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition [pdf]
Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen

Abstract: Facial expressions are temporally dynamic events which can be decomposed into a set of muscle motions occurring in different facial regions over various time intervals. For dynamic expression recognition, two key issues, temporal alignment and semantics-aware dynamic representation, must be taken into account. In this paper, we attempt to solve both problems via manifold modeling of videos based on a novel mid-level representation, i.e. expressionlet. Specifically, our method contains three key components: 1) each expression video clip is modeled as a spatiotemporal manifold (STM) formed by dense low-level features; 2) a Universal Manifold Model (UMM) is learned over all low-level features and represented as a set of local ST modes to statistically unify all the STMs. 3) the local modes on each STM can be instantiated by fitting to UMM, and the corresponding expressionlet is constructed by modeling the variations in each local ST mode. With above strategy, expression videos are naturally aligned both spatially and temporally. To enhance the discriminative power, the expressionlet-based STM representation is further processed with discriminant embedding. Our method is evaluated on four public expression databases, CK+, MMI, Oulu-CASIA, and AFEW. In all cases, our method reports results better than the known state-of-the-art.
Similar papers:
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • 3D-aided face recognition robust to expression and pose variations [pdf] - Baptiste Chu, Sami Romdhani, Liming Chen
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
#938 - Switchable Deep Network for Pedestrian Detection [pdf]
Ping Luo, Yonglong Tian

Abstract: In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection. The SDN automatically learns hierarchical features, salience maps, and mixture representations of different body parts. Pedestrian detection faces the challenges of background clutter and large variations of pedestrian appearance due to pose and viewpoint changes and other factors. One of our key contributions is to propose a Switchable Restricted Boltzmann Machine (SRBM) to explicitly model the complex mixture of visual variations at multiple levels. At the feature levels, it automatically estimates saliency maps for each test sample in order to separate background clutters from discriminative regions for pedestrian detection. At the part and body levels, it is able to infer the most appropriate template for the mixture models of each part and the whole body. We have devised a new generative algorithm to effectively pre-train the SDN and then fine-tune it with back-propagation. Our approach is evaluated on the Caltech and ETH datasets and achieves the state-of-the-art detection performance.
Similar papers:
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Informed Haar-like Features Improve Pedestrian Detection [pdf] - Shanshan Zhang, Christian Bauckhage, Armin Cremers
  • Filter Pairing Neural Network for Person Re-identification [pdf] - Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
#945 - Patch-based Evaluation of Image Segmentation [pdf]
Christian Ledig, Wenzhe Shi, Wenjia Bai, Daniel Rueckert

Abstract: The quantification of similarity between image segmentations is a complex yet important task. The ideal similarity measure should be unbiased to segmentations of different volume and complexity, and be able to quantify and visualise segmentation bias. Similarity measures based on overlap, e.g. Dice score, or surface distances, e.g. Hausdorff distance, clearly do not satisfy all of these properties. To address this problem, we introduce Patch-based Evaluation of Image Segmentation (PEIS), a general method to assess segmentation quality. Our method is based on finding patch correspondences and the associated patch displacements, which allow the estimation of segmentation bias. We quantify both the agreement of the segmentation boundary and the conservation of the segmentation shape. We further assess the segmentation complexity within patches to weight the contribution of local segmentation similarity to the global score. We evaluate PEIS on both synthetic data and two medical imaging datasets. On synthetic segmentations of different shapes, we provide evidence that PEIS, in comparison to the Dice score, produces more comparable scores, has increased sensitivity and estimates segmentation bias accurately. On cardiac MR images, we demonstrate that PEIS can evaluate the performance of a segmentation method independent of the size or complexity of the segmentation under consideration. On brain MR images, we compare five different automatic hippocampus segmentation techniques using
Similar papers:
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Co-Occurrence Statistics for Zero-Shot Classification [pdf] - Thomas Mensink, Cees Snoek, Efstratios Gavves
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Learning-Based Atlas Selection for Multiple-Atlas Segmentation [pdf] - Gerard Sanroma, Guorong Wu, Yaozong Gao, Dinggang Shen
#951 - Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf]
Novi Patricia, Barbara Caputo

Abstract: The transfer learning and domain adaptation problems originate from a distribution mismatch between the source and target data distribution. The causes of such mismatch are traditionally considered different. Thus, transfer learn- ing and domain adaptation algorithms are designed to ad- dress different issues, and cannot be used in both settings unless substantially modified. Still, one might argue that these problems are just different declinations of learning to learn, i.e. the ability to leverage over prior knowledge when attempting to solve a new task. We propose a learning to learn framework able to lever- age over source data regardless of the origin of the distri- bution mismatch. We consider prior models as experts, and use their output confidence value as features. We use them to build the new target model, combined with the features from the target data through a high-level cue integration scheme. This results in a class of algorithms usable in a plug-and-play fashion over any learning to learn scenario, from binary and multi-class transfer learning to single and multiple source domain adaptation settings. Experiments on several public datasets show that our approach consis- tently achieves the state of the art.
Similar papers:
  • Color Transfer using Probabilistic Moving Least Squares [pdf] - Youngbae Hwang, Joon-Young Lee, In So Kweon, Seon Joo Kim
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • Instance-weighted Transfer Learning of Active Appearance Models [pdf] - Daniel Haase, Erik Rodner, Joachim Denzler
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
#956 - Who Do I Look Like? Determining Parent-Offspring Resemblance via Genetic Features [pdf]
Afshin Dehghan, Enrique Ortiz

Abstract: Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks. In this paper, we consider the difficult task of determining parent-offspring resemblance using genetic features to answer the question "Who do I look like?" Although, humans can perform this job at a rate higher than chance, it is not clear how humans do it [2]. However, recent studies in anthropology [23] have determined which features tend to be the most genetically discriminative. In this study, we aim to not only create an accurate system for resemblance detection, but bridge the gap between studies in anthropology with computer vision techniques. In this paper, we aim to answer two key questions: 1) Do offspring resemble their parents? and 2) Do offspring resemble one parent more than the other? We propose an algorithm that fuses the features and metrics discovered via gated autoencoders with a discriminative neural network layer that learns the optimal, or what we call genetic, features to delineate parent-offspring relationships. We further analyze the correlation between our automatically detected features and those found in anthropological studies. Meanwhile, our method outperforms the state-of-the-art in kinship verification by 3-10% depending on the relationship using specific (father-son, mother-daughter, etc.) and generic models.
Similar papers:
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
#957 - From Human-Annotated to Machine-Discovered Concepts using Consensus Regularization [pdf]
Afshin Dehghan, Haroon Idrees

Abstract: A video captures a sequence and interactions of concepts that can be static, for instance, objects or scenes, or dynamic, such as actions. For large datasets containing hundreds of thousands of images or videos, it is impractical to manually annotate all the concepts, or all the instances of a single concept. However, a large set of concepts can be discovered automatically from unlabeled videos which can capture and express the entire dataset. The downside to these machine-discovered concepts is meaninglessness, i.e., they are devoid of semantics and interpretation. In this paper, we present an approach that leverages on the strengths of human-annotated and machine-discovered concepts by learning a relationship between them. Since instances of a human concept share visual similarity, the proposed approach uses a novel soft-consensus regularization to learn the mapping that enforces instances from each human concept to have similar representations. The testing is performed by projecting the query onto the machine-discovered concepts and new representations, with non-negativity and unit summation constraints for probabilistic interpretation. We tested our formulation on TRECVID MED and SIN tasks, and obtained encouraging results.
Similar papers:
  • Co-Occurrence Statistics for Zero-Shot Classification [pdf] - Thomas Mensink, Cees Snoek, Efstratios Gavves
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts [pdf] - Shuang Wu, Florian Luisier, Sravanthi Bondugula, Pradeep Natarajan
  • Video Classification Based on Generalized Maximum Co-occurrence Cliques [pdf] - Amir Roshan Zamir, Shayan Modiri Assari
#966 - Bags of Spacetime Energies for Dynamic Scene Recognition [pdf]
Christoph Feichtenhofer, Axel Pinz, Richard Wildes

Abstract: This paper presents a unified bag of visual word (BoW) framework for dynamic scene recognition. The approach builds on primitive features that uniformly capture spatial and temporal orientation structure of the imagery (e.g., video), as extracted via application of a bank of spatiotemporally oriented filters. Various feature encoding techniques are investigated to abstract the primitives to an intermediate representation that is best suited to dynamic scene representation. Further, a novel approach to adaptive pooling of the encoded features is presented that captures spatial layout of the scene even while being robust to situations where camera motion and scene dynamics are confounded. The resulting overall approach has been evaluated on two standard, publically available dynamic scene datasets. The results show that in comparison to a representative set of alternatives, the proposed approach outperforms the previous state-of-the-art in classification accuracy by 10%
Similar papers:
  • Orientational Pyramid Matching for Recognizing Indoor Scenes [pdf] - Lingxi Xie, Jingdong Wang, Bo Zhang, Qi Tian
  • Super Normal Vector for Activity Recognition Using Depth Sequences [pdf] - Xiaodong Yang, Yingli Tian
  • Generalized Max Pooling [pdf] - Naila Murray, Florent Perronnin
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
#974 - 3D Reconstruction from Accidental Motion [pdf]
Fisher Yu, David Gallup, Steve Seitz

Abstract: We have discovered that 3D reconstruction can be achieved from a single still photographic capture due to accidental motions of the photographer, even while attempting to hold the camera still. We present a novel 3D reconstruction system tailored for this problem that produces depth maps from short video sequences or bursts of still photos from standard cell phone, point-and-shoot, without the need for multi-lens optics, active sensors, or special motions by the photographer. This result leads to the possibility that depth maps of sufficient quality for applications like perspective change, simulated aperture, and object segmentation, can come ``for free'' for a certain fraction of still photographs. Our system first uses bundle adjustment to estimate camera poses, whose initialization and parameterization make use of the small motion assumption. In multiview stereo, we proposes to build long range connection between pixels to effectively regularize the noisy photo consistency measurement.
Similar papers:
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
#978 - Using Projection Kurtosis Concentration Of Natural Images For Blind Noise Covariance Matrix Estimation [pdf]
Siwei Lyu

Abstract: Kurtosis of 1D projections provides important statistical characteristics of natural images. In this work, we first provide a theoretical underpinning to a recently observed phenomenon known as {\em projection kurtosis concentration} that the kurtosis of natural images over different band-pass channels tend to concentrate around a ``typical'' value. Based on this analysis, we further describe a new method to estimate the covariance matrix of correlated Gaussian noise from a noise corrupted image using {\em random} band-pass filters. We demonstrate the effectiveness of our blind noise covariance matrix estimation method on natural images.
Similar papers:
  • Super-Resolving Noisy Images [pdf] - Abhishek Singh, Fatih Porikli, Narendra Ahuja
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
  • Covariance descriptors for 3D shape matching and retrieval [pdf] - Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
#981 - Ask the image: supervised pooling to preserve feature locality [pdf]
Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone

Abstract: In this paper we propose a weighted supervised pooling method for visual recognition systems. We combine a standard Spatial Pyramid Representation which is commonly adopted to encode spatial information, with an appropriate Feature Space Representation favouring semantic information in an appropriate feature space. For the latter, we propose a weighted pooling strategy exploiting data supervision to weigh each local descriptor coherently with its likelihood to belong to a given object class. The two representations are then combined adaptively with Multiple Kernel Learning. Experiments on common benchmarks (Caltech-256 and PASCAL VOC-2007) show that our image representation improves the current visual recognition pipeline and it is competitive with similar state-of-art pooling methods. We also evaluate our method on a real Human-Robot Interaction setting, where the pure Spatial Pyramid Representation does not provide sufficient discriminative power, obtaining a remarkable improvement.
Similar papers:
  • Learning Important Spatial Pooling Regions for Scene Classification [pdf] - DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia
  • Bags of Spacetime Energies for Dynamic Scene Recognition [pdf] - Christoph Feichtenhofer, Axel Pinz, Richard Wildes
  • Learning Receptive Fields for Pooling from Tensors of Feature Response [pdf] - Can Xu, Nuno Vasconcelos
  • Generalized Max Pooling [pdf] - Naila Murray, Florent Perronnin
#987 - Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf]
Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu

Abstract: In this paper, we introduce a bilateral consistency metric on the surface camera (SCam) for light field stereo matching to handle significant occlusion. The concept of SCam is used to model angular radiance distribution with respect to a 3D point. Our bilateral consistency metric is used to indicate the probability of occlusions by analyzing the SCams. We further show how to distinguish between on-surface and free space, textured and non-textured, and Lambertian and specular through bilateral SCam analysis. To speed up the matching process, we apply the edge-preserving guided filter on the consistency-disparity curves. Experimental results show that our technique outperforms both the state-of-the-art and the recent light field stereo matching methods, especially near occlusion boundaries.
Similar papers:
  • Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf] - Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
#1001 - Video Classification Based on Generalized Maximum Co-occurrence Cliques [pdf]
Amir Roshan Zamir, Shayan Modiri Assari

Abstract: We address the problem of classifying complex videos based on their content. A typical approach to this problem is performing the classification using semantic attributes, commonly termed concepts, which occur in the video. In this paper, we propose a contextual approach to video classification based on Generalized Maximum Clique Problem (GMCP) which leverages the co-occurrence of concepts as the context model. Specifically, we propose to represent a class based on the co-occurrence of its concepts and classify a video based on matching its semantic co-occurrence pattern to each class representation. We perform the matching using GMCP which finds the strongest clique of co-occurring semantic concepts in a video. We argue that, in principal, the co-occurrence of concepts yields a richer representation of a video compared to most of the current approaches. Additionally, we propose a novel optimal solution to GMCP based on Mixed Binary Integer Programming (MBIP). The evaluations show our novel approach, which opens new opportunities for further research in this direction, outperforms several well established video categorization methods.
Similar papers:
  • A Hierarchical Context Model for Event Recognition in Surveillance Video [pdf] - Xiaoyang Wang, Qiang Ji
  • Event Detection using Multi-Level Relevance Labels and Multiple Features [pdf] - Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander Hauptmann
  • Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts [pdf] - Shuang Wu, Florian Luisier, Sravanthi Bondugula, Pradeep Natarajan
  • From Human-Annotated to Machine-Discovered Concepts using Consensus Regularization [pdf] - Afshin Dehghan, Haroon Idrees
#1009 - Congruency-Based Reranking [pdf]
Itai Ben Shalom, Adiel Ben Shalom, Noga Levy, Lior Wolf, Tamir Hazan, Nachum Dershowitz, Yaniv Bar, Roni Shweka, Yaacov Choueka

Abstract: We present a tool for re-ranking the results of a specific query by considering the $(n+1) \times (n+1)$ matrix of pairwise similarities among the elements of the set of $n$ retrieved results and the query itself. The re-ranking thus makes use of the similarities between the various results and does not employ additional sources of information. The tool is based on employing graphical Bayesian models, which reinforce retrieved items strongly linked to other retrievals, and on repeated clustering in order to measure the stability of the obtained associations. The utility of the tool is demonstrated within the context of visual search of documents from the Cairo Genizah and for retrieval of paintings by the same artist and of the same style.
Similar papers:
  • Immediate, scalable object category detection [pdf] - Yusuf Aytar, Andrew Zisserman
  • Locally Linear Hashing for Extracting Non-Linear Manifolds [pdf] - Go Irie, Zhenguo Li, Xiao-Ming Wu, Shi-Fu Chang
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • Geometric Urban Geo-Localization [pdf] - Mayank Bansal, Kostas Daniilidis
#1016 - Predicting User Annoyance Using Image Attributes [pdf]
Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh

Abstract: Computer Vision algorithms make mistakes. In human-centric applications, some mistakes are more annoying to users than others. In order to design algorithms that minimize the annoyance to users, we need access to an annoyance or cost matrix that holds the annoyance of each type of mistake. Such matrices are not readily available, especially for a wide gamut of human-centric applications where annoyance is tied closely to human perception. To avoid having to conduct extensive user studies to gather the annoyance matrix for all possible mistakes, we propose predicting the annoyance of previously unseen mistakes by learning from example mistakes and their corresponding annoyance. We promote the use of attribute-based representations to transfer this knowledge of annoyance. Our experimental results with faces and scenes demonstrate that our approach can predict annoyance more accurately than baselines. We show that as a result, our approach makes less annoying mistakes in a real-world image retrieval application.
Similar papers:
  • Inferring Analogous Attributes [pdf] - Chao-Yeh Chen, Kristen Grauman
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
#1019 - Talking Heads: Detecting Humans and Recognizing Their Interactions [pdf]
Minh Hoai, Andrew Zisserman

Abstract: The objective of this work is to accurately and efficiently detect configurations of one or more people in edited TV material. Such configurations often appear in standard arrangements due to cinematic style, and we take advantage of this to provide scene context. We make the following contributions: first, we introduce a new learnable context aware configuration model for detecting sets of people in TV material. The model predicts the scale and location of each upper body in the configuration, has efficient and globally optimal inference, and is trained using a maximum margin framework. Second, we show that the configuration model outperforms a Deformable Part Model (DPM) for predicting upper body locations in video frames. Experiments are performed over two datasets: the TV Human Interaction dataset and 150 episodes from four different TV shows. We also demonstrate the benefits of the model in recognizing interactions in TV shows.
Similar papers:
  • Human Pose Estimation: New Benchmark and State of the Art Analysis [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
  • Active Frame, Location, and Detector Selection for Automated and Manual Video Annotation [pdf] - Vasiliy Karasev, Avinash Ravichandran, Stefano Soatto
  • Efficient Boosted Exemplar-based Face Detection [pdf] - Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
#1023 - 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf]
Ruizhe Wang, Jongmoo Choi, Gerard Medioni

Abstract: Registering 2 or more range scans is a fundamental problem, with application to 3D modeling. While this problem is well addressed by existing techniques such as ICP when the views overlap significantly, no satisfactory solution exists for wide baseline registration. We propose here a novel approach which leverages contour coherence and allows us to align two wide baseline range scans with limited overlap. We maximize the contour coherence by iteratively building robust corresponding pairs on apparent contours and minimizing their distances. We use the contour coherence under a multi-view rigid registration framework, and this enables the reconstruction of accurate and complete 3D models from as few as 4 frames. We further extend it to handle articulations. After modeling with a few frames, in case higher accuracy is required, more frames can be easily added in a drift-free manner by a conventional registration method. Experimental results on both synthetic and real data demonstrate the effectiveness and robustness of our contour coherence based registration approach to wide baseline range scans, and to 3D modeling.
Similar papers:
  • Real-time Model-based Articulated Object Pose Detection and Tracking with Variable Rigidity Constraints [pdf] - Karl Pauwels, Leonardo Rubio, Eduardo Ros
  • Non-rigid Segmentation using Sparse Low Dimensional Manifolds and Deep Belief Networks [pdf] - Jacinto Nascimento, Gustavo Carneiro
  • Evaluation of Scan-Line Optimization for 3D Medical Image Registration [pdf] - Simon Hermann
  • Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf] - Qing Zhang, BO FU
#1024 - Fisher and VLAD with FLAIR [pdf]
Koen Van de Sande, Cees Snoek, Arnold Smeulders

Abstract: A major computational bottleneck in many current algorithms is the evaluation of arbitrary boxes. Dense local analysis and powerful bag-of-word encodings, such as Fisher vectors and VLAD, lead to improved accuracy at the expense of increased computation time. Where a simplification in the representation is tempting, we exploit novel representations while maintaining accuracy. We start from state-of-the-art, fast selective search, but our method will apply to any initial box-partitioning. By representing the picture as sparse integral images, one per codeword, we achieve a Fast Local Area Independent Representation. FLAIR allows for very fast evaluation of any box encoding and still enables spatial pooling. In FLAIR we achieve exact VLAD's difference coding, even with L2 and power-norms. Finally, by multiple codeword assignments, we achieve exact and approximate Fisher vectors with FLAIR. The results are a 18x speedup, which enables us to set a new state-of-the-art on the challenging 2010 PASCAL VOC objects and the fine-grained categorization of the 2011 CU-Bird species.
Similar papers:
  • Efficient Localization with Fisher Vectors using Approximate Normalizations [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Deep Fisher Kernels [pdf] - Mayu Sakurada, Vladyslav Sydorov , Christoph Lampert
  • Towards Good Practices for Action Video Encoding [pdf] - Jianxin Wu, Yu Zhang
  • Locality in Generic Instance Search from One Example [pdf] - Ran Tao, Efstratios Gavves, Cees Snoek, Arnold Smeulders
#1029 - Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning [pdf]
Bang Zhang, Yi Wang, Yang Wang, fang Chen

Abstract: Many prevalent multi-class classification approaches can be unified and generalized by the output coding framework which usually consists of three phases:(1)coding,(2)learning binary classifiers, and(3)decoding. Most of these approaches focus on the first two phases and predefined distance function is used for decoding. In this paper, however, we propose to perform learning in coding space for more adaptive decoding, thereby improving overall performance. Ramp loss is exploited for measuring multi-class decoding error. The proposed algorithm has uniform stability. It is insensitive to data noises and scalable with large scale datasets. Generalization error bound and numerical results are given with promising outcomes. The outcome of the coding space learning in turn helps to improve binary classifiers. This is useful for resolving some difficult machine learning problems. To show this, the proposed method is extended for hypothesis transfer learning (HTL) which is a transfer learning framework only exploiting source domain hypotheses. Our method efficiently transfers knowledge from multiple source domains to multiple target domains by alternating coding space learning and target domain classifier learning. Empirical results are encouraging.
Similar papers:
  • Learning Inhomogeneous FRAME Models for Object Patterns [pdf] - Jianwen Xie, Wenze Hu, Song Chun Zhu, Ying Nian Wu
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
  • Fast and Robust Archetypal Analysis for Representation Learning [pdf] - Yuansi Chen, Julien Mairal, Zaid Harchaoui
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
#1041 - PatchMatch Based Joint View Selection and Depthmap Estimation [pdf]
Enliang Zheng, Vladimir Jojic, Enrique Dunn, Jan-Michael Frahm

Abstract: We propose a multi-view depthmap estimation approach aimed at adaptively ascertaining the pixel level data asso- ciations between a reference image and all the elements of a source image set. Namely, we address the question, What aggregation subset of the source image set should we use to estimate the depth of a particular pixel in the reference im- age? We pose the problem within a probabilistic framework that jointly models pixel-level view selection and depthmap estimation given the local pairwise image photoconsistency. The corresponding graphical model is solved by combining variational inference with PatchMatch-like depth sampling and propagation. Experimental results on standard multi- view benchmarks convey the state-of-the art estimation ac- curacy afforded by mitigating spurious pixel-level data as- sociations. Conversely, experiments on large internet crowd sourced data demonstrate the robustness of our approach a- gainst unstructured and heterogeneous image capture char- acteristics. Moreover, the linear computational and stor- age requirements of our formulation, as well as its inherent parallelism, enables an efficient and scalable GPU based implementation.
Similar papers:
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
  • Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf] - Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura
#1045 - Asymmetric sparse kernel approximations for large-scale visual search [pdf]
Damek Davis, Stefano Soatto, Jonathan Balzer

Abstract: We introduce an asymmetric sparse approximate embedding optimized for fast kernel comparison operations arising in large-scale visual search. In contrast to other methods that perform an explicit approximate embedding using kernel PCA followed by a distance compression technique in $\R^d$, which loses information at both steps, our method utilizes the implicit kernel representation directly. In addition, we empirically demonstrate that our method needs no {\em explicit} training step and can operate with a dictionary of random exemplars from the dataset. We evaluate our method on three benchmark image retrieval datasets: SIFT1M, ImageNet, and 80M-TinyImages.
Similar papers:
  • Compact Representation for Image Classification: To Choose or to Compress? [pdf] - Yu Zhang, Jianxin Wu, Jianfei Cai
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
  • Additive Quantization for Extreme Vector Compression [pdf] - Artem Babenko, Victor Lempitsky
#1052 - Multiple Granularity Analysis for Fine-grained Action Detection [pdf]
Bingbing Ni, Pierre Moulin

Abstract: We propose to decompose the fine-grained human activ- ity analysis problem into two sequential tasks with increas- ing granularity. Firstly, we infer the rough interaction sta- tus (i.e., which object is being manipulated). Knowing that the major challenge is frequent mutual occlusions during manipulation, we propose an interaction tracking frame- work in which hand/object position and status of interac- tion are jointly tracked by explicitly modeling the contex- tual information between occlusion and interaction status. Secondly, the inferred the hand/object position and rough interaction status are utilized to form a more compact and discriminative action representation and detection strategy, which effectively prune large amount of motion features from irrelevant spatio-temporal positions. We perform com- prehensive experiments on two challenging fine-grained ac- tivity dataset (i.e., cooking action) and the results show that the proposed framework achieves high accuracy/robustness in tracking multiple mutually occluded hand/object during manipulation as well as the significant recognition accu- racy improvement on fine-grained action recognition over the state-of-the-art methods.
Similar papers:
  • Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf] - Iason Oikonomidis, Manolis Lourakis, Antonis Argyros
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
#1063 - Active Annotation Translation [pdf]
Steven Branson, Pietro Perona

Abstract: We introduce a general framework for quickly augmenting a dataset containing pre-existing source annotations of one type (\eg, segmentations) with a new type of target annotation (\eg, part annotations). As annotators label new target annotations, we incrementally learn a translator from source to target labels as well as a computer-vision-based structured predictor. These two components are combined together to form an improved prediction system that is used to accelerate collection of target annotations via active learning and interactive labeling techniques. We show how the method can be applied to a wide variety computer vision learning problems and annotation schemes, including bounding boxes, segmentations, 2D and 3D part-based systems, and class and attribute labels. The proposed system will be a useful tool toward exploring new types of representations beyond simple bounding boxes, object segmentations, and class labels, toward building interactive methods for evolving definitions of part, attribute, and action vocabularies without relabeling the entire dataset, and toward finding new ways to exploit existing large datasets with traditional types of annotations like SUN~\cite{xiao2010sun}, Image Net~\cite{imagenet}, and Pascal VOC~\cite{everingham2010pascal}. \textit{TODO: summarize experimental results}
Similar papers:
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Understanding Objects in Detail with Fine-grained Attributes [pdf] - Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi
  • Topic Modeling of Multimodal Data: an Autoregressive Approach [pdf] - Yin Zheng, Yu-Jin Zhang, Hugo Larochelle
#1064 - Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf]
Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers

Abstract: This paper proposes a novel photometric stereo solution to jointly estimate surface normals and scattering parameters from a flat homogeneous translucent object. Similar to classic photometric stereo, our method only requires as few as three observations of the translucent object under directional lighting. Naively applying classic photometric stereo results in blurred photometric normals. We develop a novel blind deconvolution algorithm based on inverse rendering for recovering the sharp surface normals and the material properties. We demonstrate our method on a variety of translucent objects.
Similar papers:
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Kernel-PCA Analysis of Surface Normals [pdf] - Patrick Snape, Stefanos Zafeiriou
  • Backscatter Compensated Photometric Stereo with 3 Sources [pdf] - Chourmouzios Tsiotsios, Maria Angelopoulou, Tae-Kyun Kim, Andrew Davison
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
#1070 - Three Guidelines of Online Learning for Large-Scale Visual Recognition [pdf]
Yoshitaka Ushiku, Tatsuya Harada

Abstract: Combinations of high-dimensional features and linear classifiers are widely used today for large-scale visual recognition. Numerous so-called mid-level features have been developed and mutually compared on an experimental basis. Although various learning methods for linear classification have also been proposed in machine learning and natural language processing literature, they have rarely been evaluated for visual recognition. In this paper, we give guidelines via investigations of state-of-the-art online learning methods of linear classifiers. Many methods have been evaluated using toy data and natural language processing problems such as document classification. Consequently, we gave those methods a unified interpretation from the viewpoint of visual recognition. Results of controlled comparisons indicate three guidelines that might change the pipeline for visual recognition.
Similar papers:
  • Randomized Max-Margin Compositions for Visual Recognition [pdf] - Angela Eigenstetter, Bjorn Ommer
  • Stable Learning in Coding Space for Multi-Class Decoding and Its Extension for Multi-Class Hypothesis Transfer Learning [pdf] - Bang Zhang, Yi Wang, Yang Wang, fang Chen
  • Compact Representation for Image Classification: To Choose or to Compress? [pdf] - Yu Zhang, Jianxin Wu, Jianfei Cai
  • Transformation Pursuit for Image Classification [pdf] - Mattis Paulin, Jerome REVAUD, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid
#1071 - The Secrets of Salient Object Segmentation [pdf]
Yin Li, Xiaodi Hou, Christof Koch, James Rehg, Alan Yuille

Abstract: In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object benchmarks, called the dataset design bias, by over emphasising the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also mislead the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Our model gives superior performance in segmenting salient objects. We report significant benchmark progress on existing datasets, as well as our newly proposed dataset of salient object segmentation.
Similar papers:
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
  • A Reverse Hierarchy Model for Predicting Eye Fixations [pdf] - Tianlin Shi, Xiaolin Hu, Ming Liang
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
#1085 - Higher-Order Clique Reduction Without Auxiliary Variables [pdf]
Hiroshi Ishikawa

Abstract: We introduce a method to reduce most higher-order terms of Markov Random Fields with binary labels into lower-order ones without introducing any new variables, while keeping the minimizer of the energy unchanged. While the method does not reduce all terms, it can be used with existing techniques that transforms arbitrary terms (by introducing auxiliary variables) and improve the speed. The method eliminates a higher-order term in the polynomial representation of the energy by finding the value assignment to the variables involved that cannot be part of a global minimum and increasing the potential value only when that particular combination occurs by the exact amount that makes the potential of lower order. We also introduce a heuristic that forego the guarantee of exact equivalence of minimizer in favor of speed. With experiments on the same field of experts dataset used in previous work, we show that the roof-dual labeling after the reduction labels significantly more variables and the energy converges more rapidly.
Similar papers:
  • A General and Simple Method for Camera Pose and Focal Length Determination [pdf] - Yinqiang Zheng, Shigeki Sugimoto, Imari Sato, Masatoshi Okutomi
  • Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf] - Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra
  • Multi Label Generic Cuts: Optimal Inference in Multi Label Multi Clique MRF-MAP Problems [pdf] - Chetan Arora, S.N. Maheshwari
  • Partial Symmetry in Polynomial Systems and Its Application in Computer Vision [pdf] - Yubin Kuang, Yinqiang Zheng, Kalle Astroem
#1086 - Discriminative Ferns Ensemble for Hand Pose Recognition [pdf]
Eyal Krupka, Aharon Bar Hillel, Ben Klein, Alon Vinnikov, Daniel Freedman, Simon Stachniak

Abstract: We present the Discriminative Ferns Ensemble (DFE) classifier for efficient visual object recognition. The classifier architecture is designed to optimize both classification speed and accuracy when a large training set is available. Speed is obtained using simple binary features and direct indexing into a set of tables, and accuracy by using a large capacity model and careful discriminative optimization. The proposed framework is applied to the problem of hand pose recognition in depth and infra-red images, using a very large training set. Both the accuracy and the classification time obtained are considerably superior to relevant competing methods, allowing one to reach accuracy targets with run times orders of magnitude faster than the competition. We also show empirically that using DFE, we can significantly reduce classification time by increasing training sample size for a fixed target accuracy.
Similar papers:
  • Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf] - Iason Oikonomidis, Manolis Lourakis, Antonis Argyros
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf] - Vahid Kazemi, Josephine Sullivan
  • Discriminative Feature-to-Point Matching in Image-Based Localization [pdf] - Michael Donoser, Dieter Schmalstieg
#1089 - Large-scale visual font recognition [pdf]
Guang Chen, Jianchao Yang, Hailin Jin, Jonathan Brandt, Eli Shechtman, Aseem Agarwala, Tony Han

Abstract: This paper addresses the large-scale visual font recognition (VFR) problem, which aims at automatic identification of the typeface, weight, and slope of the text in an image or photo without any knowledge of content. Although visual font recognition has many practical applications, it has largely been neglected by the vision community. To address the VFR problem, we construct a large-scale dataset containing 2,420 font classes, which easily exceeds the scale of most image categorization datasets in computer vision. As font recognition is inherently dynamic and open-ended, \ie, new classes and data for existing categories are constantly added to the database over time, we propose a scalable solution based on the nearest class mean classifier (NCM). The core algorithm is built on local feature embedding, local feature metric learning and max-margin template selection, which is naturally amenable to NCM and thus to such open-ended classification problems. The new algorithm can generalize to new classes and new data at little added cost. Extensive experiments demonstrate that our approach is very effective on our synthetic test images, and achieves promising results on real world test images.
Similar papers:
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf] - Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
  • Region-based Discriminative Feature Pooling for Scene Text Recognition [pdf] - Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
#1092 - Bregman Divergences for Infinite Dimensional Covariance Matrices [pdf]
Mehrtash Harandi, Mathieu Salzmann, Fatih Porikli

Abstract: We introduce an approach to computing and comparing Covariance Descriptors (CovDs) in infinite-dimensional spaces. CovDs have become increasingly popular to address classification problems in computer vision. While CovDs offer some robustness to measurement variations, they also throw away part of the information contained in the original data by only retaining the second-order statistics over the measurements. Here, we propose to overcome this limitation by first mapping the original data to a high-dimensional Hilbert space, and only then compute the CovDs. We show that several Bregman divergences can be computed between the resulting CovDs in Hilbert space via the use of kernels. We then exploit these divergences for classification purposes. Our experiments demonstrate the benefits of our approach on several tasks, such as material and texture recognition, person re-identification, and action recognition from motion capture data.
Similar papers:
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Simultaneous Twin Kernel Learning for Structured Prediction [pdf] - Chetan Tonde, Ahmed Elgammal
  • Covariance descriptors for 3D shape matching and retrieval [pdf] - Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
  • Pseudoconvex Proximal Splitting for $L_\infty$ Problems in Multiview Geometry [pdf] - Anders Eriksson
#1096 - Multilabel Ranking with Inconsistent Rankers [pdf]
Xin Geng, Longrun Luo

Abstract: While most existing multilabel ranking methods assume the availability of a single objective label ranking for each instance in the training set, this paper deals with a more common case where subjective inconsistent rankings from multiple rankers are associated with each instance. The key idea is to learn a latent preference distribution for each instance. The proposed method mainly includes two steps. The first step is to generate a common preference distribution that is most compatible to all the personal rankings. The second step is to learn a mapping from the instances to the preference distributions. The proposed preference distribution learning (PDL) method is applied to the problem of natural scene image annotation. Experimental results show that PDL can effectively incorporate the information given by the inconsistent rankers, and perform remarkably better than the compared state-of-the-art multilabel ranking algorithms.
Similar papers:
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Linear Ranking Analysis [pdf] - Deng Weihong, Jiani Hu, Jun Guo
  • Learning Fine-grained Image Similarity with Deep Ranking [pdf] - Jiang Wang, Yang Song, Thomas Leung, Charles Rosenberg, James Philbin, Bo Chen, Ying Wu
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
#1109 - Deep Learning Hidden Identity Features for Face Verification [pdf]
Yi Sun, Xiaogang Wang, Xiaoou Tang

Abstract: This paper proposes a set of effective high-level features, referred to as hidden identity features (HIFs), for face verification. The HIFs are taken from the last hidden layer neuron activations of deep convolutional networks (ConvNets). When learned as classifiers to recognize thousands of face identities in the training set simultaneously, these deep ConvNets gradually form high-level features, which are more relevant to face identities, in the top layers. With this extremely challenging recognition task as supervision, we learned features that consistently correspond to identity and can be generalized well to new identities in test. Moreover, we found that classifying a large amount of training identities while retaining a small number of last hidden layer neurons is key to learning compact and discriminative features. The proposed features are extracted from various face regions to form complementary and over-complete representations. Any state-of-the-art classifiers can be learned based on these high-level representations for face verification. The performance of our model is among the best for all of the published methods on LFW, despite using only weakly aligned faces.
Similar papers:
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
#1110 - Latent Dictionary Learning for Sparse Representation based Classification [pdf]
Meng Yang, Luc Van Gool

Abstract: Dictionary learning (DL) for sparse coding has shown promising results in classification tasks, while how to adaptively build an relationship between dictionary atoms and class labels is still an important open question. The existing dictionary learning approaches simply fixed a dictionary atom to be class-specific or shared by all classes beforehand, but ignoring to update this relationship in DL. To address this issue, in this paper we propose a novel latent dictionary learning (LDL) method to learn a discriminative dictionary and build its relationship to class labels adaptively. Each dictionary atom is jointly learned with a latent vector, which associates this atom to the representation of different classes. More specifically, we introduce a latent representation model, in which discrimination of the learned dictionary is exploited via minimizing the within-class scatter of coding coefficients and the latent-value weighted dictionary coherence. The optimal solution is efficiently obtained by the proposed solving algorithm. Correspondingly, a latent sparse representation based classifier is also presented. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse representation and dictionary learning approaches for action, gender and face recognition.
Similar papers:
  • Semi-Supervised Coupled Dictionary Learning for Person Re-identification [pdf] - Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, Jiajun Bu
  • Towards Multi-view and Partially-occluded Face Alignment [pdf] - Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Quasi Real-Time Summarization for Consumer Videos [pdf] - Bin Zhao, Eric Xing
#1111 - CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images [pdf]
Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu

Abstract: In this paper, we propose a novel two-step scheme to filter heavy noise from images with the assistance of retrieved Web images. There are two key technical contributions in our scheme. First, for every noisy image block, we build two three dimensional (3D) data cubes by using similar blocks in retrieved Web images and similar non-local blocks within the noisy image, respectively. To better use their correlations, we propose different denoising strategies. The denoising in the 3D cube built upon the retrieved images is performed as medium filtering in the spatial domain, whereas the denoising in the other 3D cube is performed in the frequency domain. These two denoising results are then combined in the frequency domain to produce a denoising image. Second, to handle heavy noise, we further propose using the denoising image to improve image registration of the retrieved Web images, 3D cube building, and the estimation of filtering parameters in the frequency domain. Afterwards, the proposed denoising is performed on the noisy image again to generate the final denoising result. Our experimental results show that when the noise is high, the proposed scheme is better than BM3D by more than 2 dB in PSNR and the visual quality improvement is clear to see.
Similar papers:
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Super-Resolving Noisy Images [pdf] - Abhishek Singh, Fatih Porikli, Narendra Ahuja
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
#1112 - A New Perspective on Material Classification and Ink Identification [pdf]
Rakesh Shiradkar, Li Shen, George Landon, Sim Heng Ong, Ping Tan

Abstract: The surface bi-directional reflectance distribution function (BRDF) can be used to distinguish different materials. The BRDFs of many real materials are near isotropic and can be approximated well by a 2D function. However, when the camera principal axis is coincident with the surface normal of the material sample, the captured BRDF slice is nearly 1D, which suffers from significant information loss. Thus, dramatic improvement in classification performance can be achieved by simply setting the camera at a slanted view to capture a larger portion of the BRDF domain. We further use a handheld flashlight camera to capture a 1D BRDF slice for material classification. This 1D slice captures important reflectance properties such as specular reflection and retro-reflectance. We apply these results on ink classification, which can be used in forensics and analyzing historical manuscripts. For the first time, we show that most of inks on the market can be well distinguished by their reflectance properties. Our system achieves $85\%$ overall classification accuracy over $55$ different inks with a 2D BRDF slice, and $71\%$ accuracy with a 1D BRDF slice.
Similar papers:
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
  • The Photometry of Intrinsic Images [pdf] - Marc Serra, Robert Benavente, Maria Vanrell, Dimitris Samaras, Olivier Penacchio
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
#1120 - Fast Rotation Search with Stereographic Projections for 3D Registration [pdf]
Alvaro Parra Bustos, Tat-Jun Chin, David Suter

Abstract: Recently there has been a surge of interest to use branch-and-bound (bnb) optimisation for 3D point cloud registration. While bnb guarantees globally optimal solutions, it is usually too slow to be practical. A fundamental source of difficulty is the search for the rotation parameters in the 3D rigid transform. In this work, assuming that the translation parameters are known, we focus on constructing a fast rotation search algorithm. With respect to an inherently robust geometric matching criterion, we propose a novel bounding function for bnb that allows rapid evaluation. Underpinning our bounding function is the usage of stereographic projections to precompute and spatially index all possible point matches. This yields a robust and global algorithm that is significantly faster than previous methods. To conduct full 3D registration, the translation can be supplied by 3D feature matching, or by another optimisation framework that provides the translation. On various challenging point clouds, including those taken out of lab settings, our approach demonstrates superior efficiency.
Similar papers:
  • Using k-poselets for detecting people and localizing their keypoints [pdf] - Bharath Hariharan, Georgia Gkioxari, Ross Girshick, Jitendra Malik
  • Is Rotation a Nuisance in Shape Recognition? [pdf] - Qiuhong Ke, Yi Li
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Robust and Efficient Full-Angle Quaternions for Matching Arrays of 3D Rotations [pdf] - Stephan Liwicki, Stefanos Zafeiriou, Maja Pantic, Bjrn Stenger, Minh-Tri Pham
#1121 - Filter Pairing Neural Network for Person Re-identification [pdf]
Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang

Abstract: Person re-identification is to match pedestrian images from disjoint camera views detected by pedestrian detectors. Challenges are presented in the form of complex variations of lightings, poses, viewpoints, blurring effects, image resolutions, camera settings, occlusions and background clutter across camera views. In addition, misalignment introduced by the pedestrian detector will affect most existing person re-identification methods that use manually cropped pedestrian images and assume perfect detection. In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter. All the key components are jointly optimized to maximize the strength of each component when cooperating with others. In contrast to existing works that use handcrafted features, our method automatically learns features optimal for the re-identification task from data. The learned filter pairs encode photometric transforms. Its deep architecture makes it possible to model a mixture of complex photometric and geometric transforms. We build the world's largest benchmark dataset with 13,164 images of 1,360 pedestrians and will release it to the public. Unlike existing datasets, which only provide manually cropped pedestrian images, our dataset provides automatically detected bounding boxes for evaluation close to practical applications. Our neural network significantly outperforms state-of-the-art m
Similar papers:
  • Semi-Supervised Coupled Dictionary Learning for Person Re-identification [pdf] - Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, Jiajun Bu
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
#1128 - Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf]
Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim

Abstract: We propose an optimization algorithm for mutual-information-based unsupervised figure-ground separation. The algorithm jointly estimates the color distributions of the foreground and background, and separates these based on their mutual information with geometric regularity. To this end, we revisit the mutual information and reformulate it in terms of the photometric variable and the indicator function; and propose a sequential convex optimization strategy for solving the non-convex optimization problem that arises. We minimize a sequence of convex sub-problems for the mutual-information-based non-convex energy functional and we efficiently attain high quality solutions for challenging figure-ground segmentation problems. We demonstrate the capacity of our approach in numerous experiments that show convincing fully unsupervised figure-ground separation, in terms of both segmentation quality and robustness to initialization.
Similar papers:
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • FAST LABEL: Easy and Efficient Optimization of Joint Multi-Label and Estimation Problems [pdf] - Byung-Woo Hong, Ganesh Sundaramoorthi
  • Joint Unsupervised Multi-Class Image Segmentation [pdf] - Fan Wang, Qixing Huang, Maks Ovsjanikov, Leonidas J. Guibas
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
#1129 - A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf]
Youngwook Kee, Junmo Kim

Abstract: In this paper we revisit Ambrosio-Tortorelli's nonconvex elliptic functional for approximating the Mumford-Shah functional. Then we propose a convex relaxation for it to attempt to compute both globally optimal and visually better solutions; rather than solving the nonconvex functional directly---which is the main contribution of this paper. Inspired by McCormick's seminal work on factorable nonconvex problems, we split a nonconvex product term that arises in the Ambrosio-Tortorelli functional in a way that a typical alternating gradient method guarantees a globally optimal solution without taking coupling effects completely away. Furthermore, not only do we provide a fruitful analysis of the proposed relaxation, but also demonstrate the capacity of our relaxation in numerous experiments that show convincing results compared to a naive extension of the McCormick relaxation and its quadratic variant. Indeed, we believe that the proposed relaxation would open up a possibility for convexifying a new class of functions in the context of energy minimization for computer vision.
Similar papers:
  • Decorrelated Vectorial Total Variation [pdf] - Shunsuke Ono, Isao Yamada
  • Generalized Nonconvex Nonsmooth Low-Rank Minimization [pdf] - Canyi Lu, Shuicheng Yan, Zhouchen Lin
  • Joint Unsupervised Multi-Class Image Segmentation [pdf] - Fan Wang, Qixing Huang, Maks Ovsjanikov, Leonidas J. Guibas
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
#1132 - Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf]
DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham

Abstract: Reconstructing the shape of a 3D object from multi-view images under unknown, general illumination is a fundamental problem in computer vision and high quality reconstruction is usually challenging especially when high detail is needed. This paper presents a total variation (TV) based approach for recovering surface details using shading and multi-view stereo (MVS). Behind the approach are our two important observations: (1) the illumination over the surface of an object tends to be piecewise smooth and (2) the recovery of surface orientation is not sufficient for reconstructing geometry, which were previously overlooked. Thus we introduce TV to regularize the lighting and use visual hull to constrain partial vertices. The reconstruction is formulated as a constrained TV-minimization problem that treats the shape and lighting as unknowns simultaneously. An augmented Lagrangian method is proposed to quickly solve the TV-minimization problem. As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS. These advantages are demonstrated by the experiments with synthetic and real world examples.
Similar papers:
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
  • Better Shading for Better Shape Recovery [pdf] - Moumen El-Melegy
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf] - Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
#1139 - Fully Automated Non-rigid Segmentation with Distance Regularized Level Set Evolution Initialized and Constrained by Deep-structured Inference [pdf]
Tuan Ngo, Gustavo Carneiro

Abstract: We propose a new fully automated non-rigid segmentation approach based on the distance regularized level set method that is initialized and constrained by the results of a structured inference using deep belief networks. This recently proposed level-set formulation achieves reasonably accurate results in several segmentation problems, and has the advantage of eliminating periodic re-initializations during the optimization process, and as a result it avoids numerical errors. Nevertheless, when applied to challenging problems, such as the left ventricle segmentation from short axis cine magnetic ressonance (MR) images, the accuracy obtained by this distance regularized level set is lower than the state of the art. The main reasons behind this lower accuracy are the dependence on good initial guess for the level set optimization and on reliable appearance models. We address these two issues with an innovative structured inference using deep belief networks that produces reliable initial guess and appearance model. The effectiveness of our method is demonstrated on the MICCAI 2009 left ventricle segmentation challenge, where we show that our approach achieves one of the most competitive results (in terms of segmentation accuracy) in the field.
Similar papers:
  • Total-Variation Minimization on Unstructured Volumetric Mesh: Biophysical Applications on Reconstruction of 3D Ischemic Myocardium [pdf] - Jingjia Xu, Azar Rahimi Dehaghani, Fei Gao, Linwei Wang
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
  • Non-rigid Segmentation using Sparse Low Dimensional Manifolds and Deep Belief Networks [pdf] - Jacinto Nascimento, Gustavo Carneiro
  • Fast and Exact: Shape Segmentation Using ADMM and Structured Prediction [pdf] - Haithem Boussaid, Iasonas Kokkinos
#1147 - Non-rigid Segmentation using Sparse Low Dimensional Manifolds and Deep Belief Networks [pdf]
Jacinto Nascimento, Gustavo Carneiro

Abstract: In this paper, we propose a new methodology for segmenting non-rigid visual objects, where the search procedure is conducted directly on a sparse low-dimensional manifold, guided by the classification results computed from a deep belief network. Our main contribution is the fact that we do not rely on the typical sub-division of segmentation tasks into rigid detection and non-rigid delineation. Instead, the non-rigid segmentation is performed directly, where points in the sparse low-dimensional can be mapped to an explicit contour representation in image space. Our proposal shows significantly smaller search and training complexities given that the dimensionality of the manifold is much smaller than the dimensionality of the search spaces for rigid detection and non-rigid delineation aforementioned, and that we no longer require a two-stage segmentation process. We focus on the problem of left ventricle endocardial segmentation from ultrasound images, and lip segmentation from frontal facial images using the extended Cohn-Kanade (CK+) database. Our experiments show that the use of sparse low dimensional manifolds reduces the search and training complexities of current segmentation approaches without a significant impact on the segmentation accuracy shown by state-of-the-art approaches.
Similar papers:
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Fully Automated Non-rigid Segmentation with Distance Regularized Level Set Evolution Initialized and Constrained by Deep-structured Inference [pdf] - Tuan Ngo, Gustavo Carneiro
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
#1148 - 3D-aided face recognition robust to expression and pose variations [pdf]
Baptiste Chu, Sami Romdhani, Liming Chen

Abstract: Expression and pose variations are major challenges for reliable face recognition (FR) in 2D. In this paper, we aim to endow state of the art face recognition SDKs with robustness to facial expression variations and pose changes by using an extended 3D Morphable Model (3DMM) which isolates identity variations from those due to facial expressions. Specifically, given a probe with expression, a novel view of the face is generated where the pose is rectified and the expression neutralized. We present two methods of expression neutralization. The first one uses prior knowledge to infer the neutral expression image from an input image. The second method, specifically designed for verification, is based on the transfer of the gallery face expression to the probe. Experiments using rectified and neutralized view with a standard commercial FR SDK on two 2D face databases, namely Multi-PIE and AR, show significant performance improvement of the commercial SDK to deal with expression and pose variations and demonstrates the effectiveness of the proposed approach.
Similar papers:
  • Merging SVMs with Linear Discriminant Analysis: A Combined Model [pdf] - Symeon Nikitidis, Stefanos Zafeiriou, Maja Pantic
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition [pdf] - Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
#1151 - Remote Heart Rate Measurement From Face Videos Under Realistic Situations [pdf]
Xiaobai Li, Jie Chen, Guoying Zhao, Matti Pietikinen

Abstract: Heart rate is an important indicator of peoples physiological state. Recently, several papers report methods that can measure heart rate remotely from face videos. Those methods work well on stationary subjects under well controlled conditions, but their performance significantly degrades if the videos are recorded under more challenging conditions, specifically when subjects motions and illumination variations are involved. We propose a framework which utilizes face tracking and Normalized Least Mean Square adaptive filtering methods to counter their influences. We test our framework on a large difficult and public database MAHNOB-HCI and demonstrate that our method substantially outperforms all previous methods. We also use our method for long term heart rate monitoring in a game evaluation scenario and achieve promising results.
Similar papers:
  • Fully Automated Non-rigid Segmentation with Distance Regularized Level Set Evolution Initialized and Constrained by Deep-structured Inference [pdf] - Tuan Ngo, Gustavo Carneiro
  • Total-Variation Minimization on Unstructured Volumetric Mesh: Biophysical Applications on Reconstruction of 3D Ischemic Myocardium [pdf] - Jingjia Xu, Azar Rahimi Dehaghani, Fei Gao, Linwei Wang
  • Seeing the Arrow of Time [pdf] - Lyndsey Pickup, Zheng Pan, Donglai Wei, Yichang Shih, Andrew Zisserman, Bill Freeman, Bernhard Schoelkopf
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
#1157 - Discriminative Feature-to-Point Matching in Image-Based Localization [pdf]
Michael Donoser, Dieter Schmalstieg

Abstract: The prevalent approach to image-based localization is to match interest points detected in the query image to a sparse 3D point cloud representing the known world. The obtained correspondences are then used to recover a precise camera pose. In this field state-of-the-art often ignores the availability of a set of 2D descriptors per 3D point, for example by representing each 3D point by only its centroid. In this paper we demonstrate that these sets contain useful information that can be exploited by formulating matching as a discriminative classification problem. Since memory demands and computational complexity are crucial in such a setup, we base our algorithm on the efficient and effective random fern principle. We propose an extension which projects features to fern-specific embedding spaces, which yields improved matching rates in short runtime. Experiments first show that our novel formulation provides improved matching performance in comparison to the standard nearest neighbor approach and that we outperform related methods in our localization scenario.
Similar papers:
  • Incremental Learning of NCM Forests for Large-Scale Image Classification [pdf] - Marko Ristin, Matthieu Guillaumin, Juergen Gall, Luc Van Gool
  • Minimal Scene Descriptions from Structure from Motion Models [pdf] - Song Cao, Noah Snavely
  • Alert: Predicting Failures [pdf] - Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, Devi Parikh
  • Discriminative Ferns Ensemble for Hand Pose Recognition [pdf] - Eyal Krupka, Aharon Bar Hillel, Ben Klein, Alon Vinnikov, Daniel Freedman, Simon Stachniak
#1162 - Action localization by tubelets from motion [pdf]
Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek

Abstract: This paper considers the problem of action localization, where the objective is to determine when and where certain actions appear. We introduce a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets. Compared to state-of-the-art techniques, this drastically reduces the number of hypotheses that are likely to include the action of interest. Our method is inspired by a recent technique introduced in a context of image localization. Beyond considering this technique for the first time for videos, we revisit this strategy for 2D+t sequences obtained from super-voxels. Our sampling strategy advantageously exploits a criterion that reflects how action related motion deviates from background motion. We demonstrate the interest of our approach by extensive experiments on two public datasets: UCF Sports and MSR-II. Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.
Similar papers:
  • Actionness Ranking with Lattice Conditional Ordinal Random Fields [pdf] - Wei Chen, Caimgin Xiong, Jason Corso
  • Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf] - Waqas Sultani, Imran Saleemi
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
#1163 - Noising versus Smoothing for Vertex Identification in Unknown Shapes [pdf]
Konstantinos Raftopoulos, Marin Ferecatu

Abstract: A method for identifying vertices and estimating shape features of local nature e.g. curvature on the shape's boundary is presented. The boundary is seen as a real function and a study of a certain distance function reveals, almost counter-intuitively, that vertices can be defined and localized better in the presence of high frequency Fourier components (hfFc). The proposed method works on both smooth and noisy shapes, the presence of hfFc having an effect of improving on the results of the smoothed version. Experiments with noise and a comparison to the Local Area Integral Invariant descriptor (LAII) validate the method.
Similar papers:
  • Dual-Space Decomposition of 2D Complex Shapes [pdf] - Guilin Liu, Zhonghua Xi, Jyh-Ming Lien
  • FAST LABEL: Easy and Efficient Optimization of Joint Multi-Label and Estimation Problems [pdf] - Byung-Woo Hong, Ganesh Sundaramoorthi
  • Is Rotation a Nuisance in Shape Recognition? [pdf] - Qiuhong Ke, Yi Li
  • Efficient Squared Curvature [pdf] - Claudia Nieuwenhuis, Eno Toeppe, Lena Gorelick, Olga Veksler, Yuri Boykov
#1164 - Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf]
Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu

Abstract: We present an approach MSIL-CRF that incorporates multiple instance learning (MIL) into conditional random fields (CRFs). It can generalize CRFs to work on training data with uncertain labels by the principle of MIL. In this work, it is applied to saving manual efforts on annotating training data for semantic segmentation. Specifically, we consider the setting in which the training dataset for semantic segmentation is a mixture of a few object segments and an abundant set of objects' bounding boxes. Our goal is to infer the unknown object segments enclosed by the bounding boxes so that they can serve as training data for semantic segmentation. To this end, we generate multiple segment hypotheses for each bounding box with the assumption that at least one hypothesis is close to the ground truth. By treating a bounding box as a bag with its segment hypotheses as structured instances, MSIL-CRF selects the most likely segment hypotheses by leveraging the knowledge derived from both the labeled and uncertain training data. The experimental results on the Pascal VOC segmentation task demonstrate that MSIL-CRF can provide effective alternatives to manually labeled segments for semantic segmentation.
Similar papers:
  • Scalable Object Detection using Deep Neural Networks [pdf] - Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
#1180 - Constructing Robust Affinity Graph for Spectral Clustering [pdf]
Xiatian Zhu, Chen Change Loy, Shaogang Gong

Abstract: It is desirable for spectral clustering to have as input robust and meaningful affinity/similarity graphs in order to form clusters with desired structures that can better support human intuition. To construct such affinity graphs is non-trivial due to the ambiguity and uncertainty inherent in the raw data. In contrast to most existing clustering methods that typically employ all available features to construct affinity matrices with the Euclidean distance, which is often not an accurate representation of the underlying data structures, we propose a novel unsupervised approach to generating more robust affinity graphs via identifying and exploiting discriminative features for improving spectral clustering. Specifically, our model is capable of capturing and combining subtle similarity information distributed over discriminative feature subspaces for better revealing the latent data distribution and thereby leading to improved data clustering, especially with heterogeneous data sources. We demonstrate the efficacy of the proposed approach on challenging image and video datasets.
Similar papers:
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
  • A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions [pdf] - Simon Jones, Ling Shao
  • SCAMS: Simultaneous Clustering and Model Selection [pdf] - Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
#1184 - Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf]
Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua

Abstract: We propose an approach to reconstructing tree structures that evolve over time in 2D images and 3D image stacks such as neuronal axons or plant branches. Instead of reconstructing structures in each image independently, we do so for all images simultaneously to take advantage of temporal-consistency constraints. We show that this problem can be formulated as a Quadratic Mixed Integer Program and solved efficiently. The outcome of our approach is a framework that provides substantial improvements in reconstructions over traditional single time-instance formulations. Furthermore, an added benefit of our approach is the ability to automatically detect places where significant changes have occurred over time, which is challenging when considering large amounts of data.
Similar papers:
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
  • Efficient Hierarchical Graph-Based Segmentation of RBGD Videos [pdf] - Steven Hickson, Irfan Essa, Henrik Christensen, Stan Birchfield
  • FastSeg: More Efficiency on Multiple Figure-Ground Segmentations [pdf] - ahmad Humayun, Fuxin Li, James Rehg
#1185 - Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf]
Joseph Tighe, Marc Niethammer, Svetlana Lazebnik

Abstract: This work proposes a method to interpret a scene by assigning a semantic label at every pixel and inferring the spatial extent of individual object instances together with their occlusion relationships. Starting with an initial pixel labeling and a set of candidate object masks for a given test image, we select a subset of objects that explain the image well and have valid overlap relationships and occlusion ordering. This is done by minimizing an integer quadratic program either using a greedy method or a standard solver. Then we alternate between using the object predictions to improve the pixel labels and using the pixel labels to improve the object predictions. The proposed system obtains promising results on two challenging subsets of the LabelMe dataset, the largest of which contains 45,676 images and 232 classes.
Similar papers:
  • Towards Unified Human Parsing and Pose Estimation [pdf] - Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • Beta Process Multiple Kernel Learning [pdf] - Bingbing Ni, Pierre Moulin
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
#1187 - Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf]
Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer

Abstract: We propose a real-time, robust to outliers and accurate solution to the Perspective-n-Point (PnP) problem. The main advantages of our solution are twofold: first, it integrates the outlier rejection within the pose estimation pipeline with a negligible computational overhead; and second, its scalability to arbitrarily large number of correspondences. Given a set of 3D-to-2D matches, we formulate pose estimation problem as a low-rank homogeneous system where the solution lies on its 1D null space. Outlier correspondences are those rows of the linear system which perturb the null space and are progressively detected by projecting them on an iteratively estimated solution of the null space. Since our outlier removal process is based on an algebraic criterion which does not require computing the full-pose and reprojecting back all 3D points on the image plane at each step, we achieve speed gains of more than 100x compared to RANSAC strategies. An extensive experimental evaluation will show that our solution yields accurate pose estimation results in situations with up to 50% of outliers, and can process more than 1000 correspondences in less than 5 ms.
Similar papers:
  • T-Linkage: a Continuous Relaxation of J-Linkage for Multi-Model Fitting [pdf] - Luca Magri, Andrea Fusiello
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
#1192 - A Hierarchical Probabilistic Model for Facial Feature Detection [pdf]
Yue Wu, Ziheng Wang, Qiang Ji

Abstract: Facial feature detection from facial images has attracted great attention in the field of computer vision. It is a nontrivial task since the appearance and shape of the face tend to change under different conditions. In this paper, we propose a hierarchical probabilistic model that could infer the true locations of facial features given the image measurements even if the face is with significant facial expression and pose. The hierarchical model implicitly captures the lower level shape variations of facial components using the mixture model. Furthermore, in the higher level, it also learns the joint relationship among facial components, the facial expression, and the pose information through automatic structure learning and parameter estimation of the probabilistic model. Experimental results on benchmark databases demonstrate the effectiveness of the proposed hierarchical probabilistic model.
Similar papers:
  • Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition [pdf] - Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
#1194 - Active Sampling for Subjective Image Quality Assessment [pdf]
Peng Ye, David Doermann

Abstract: Subjective Image Quality Assessment (IQA) is the most reliable way to evaluate the visual quality of digital images perceived by the end user. It is often used to construct image quality datasets and provide the groundtruth for building and evaluating objective quality measures. Subjective tests based on the Mean Opinion Score (MOS) have been widely used in previous studies, but have many known problems such as an ambiguous scale definition and dissimilar interpretations of the scale among subjects. To overcome these limitations, Paired Comparison (PC) tests have been proposed as an alternative and are expected to yield more reliable results. However, PC tests can be expensive and time consuming, since for $n$ images they require n choose 2 comparisons. We present a hybrid subjective test which combines MOS and PC tests via a unified probabilistic model and an active sampling method. The proposed method actively constructs a set of queries consisting of MOS and PC tests based on the expected information gain provided by each test and can effectively reduce the number of tests required for achieving a target accuracy. Our method can be used in conventional laboratory studies as well as crowdsourcing experiments. Experimental results show the proposed method outperforms state-of-the-art subjective IQA tests in a crowdsourced setting.
Similar papers:
  • Quality Assessment for Comparing Image Enhancement Algorithms [pdf] - Zhengying Chen, Tingting Jiang, Yonghong Tian
  • Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf] - Lucy Liang, Kristen Grauman
  • Multilabel Ranking with Inconsistent Rankers [pdf] - Xin Geng, Longrun Luo
  • Beyond Human Opinion Scores: Blind Image Quality Assessment based on Synthetic Scores [pdf] - Peng Ye, David Doermann
#1195 - Informed Haar-like Features Improve Pedestrian Detection [pdf]
Shanshan Zhang, Christian Bauckhage, Armin Cremers

Abstract: We propose a simple yet effective detector for pedestrian detection. The basic idea is to incorporate common sense and everyday knowledge into the design of simple and computationally efficient features. As pedestrians usually appear up-right in image or video data, the problem of pedestrian detection is considerably simpler than general purpose people detection. We therefore employ a statistical model of the up-right human body where the head, the upper body, and the lower body are treated as three distinct components. Our main contribution is to systematically design a pool of rectangular templates that are tailored to this shape model. As we incorporate different kinds of low-level measurements, the resulting multi-modal and multi-channel Haar-like features represent characteristic differences between parts of the human body yet are robust against variations in clothing or environmental settings. Our approach avoids exhaustive searches over all possible configurations of rectangle features and neither relies on random sampling. It thus marks a middle ground among recently published techniques and yields efficient low-dimensional yet highly discriminative features. Experimental results on the INRIA and Caltech pedestrian datasets show that our detector reaches state-of-the-art performance at low computational costs and that our features are robust against occlusions.
Similar papers:
  • Learning an image-based motion context for multiple people tracking [pdf] - Laura Leal-Taix, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese
  • Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf] - Jiejie Zhu
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier [pdf] - Arthur Costea, Sergiu Nedevschi
#1197 - Shadow Removal from Single RGB-D Images [pdf]
Yao Xiao, Efstratios Tsougenis, Chi-keung Tang

Abstract: We present the first automatic method to remove shadows from single RGB-D images. Using normal cues directly derived from depth, we can remove hard and soft shadows while preserving surface texture and shading. Our key assumption is: pixels with similar normals, spatial locations and chromaticity should have similar colors. A modified nonlocal matching is used to compute a shadow confidence map that localizes well hard shadow boundary, thus handling hard and soft shadows within the same framework. We compare our results produced using state-of-the-art shadow removal on single RGB images, and intrinsic image decomposition on standard RGB-D datasets.
Similar papers:
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Better Shading for Better Shape Recovery [pdf] - Moumen El-Melegy
  • Two-Class Weather Labeling [pdf] - Cewu Lu, DI LIN, Jiaya Jia, Chi-keung Tang
  • Automatic Feature Learning for Robust Shadow Detection [pdf] - Salman Khan, mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
#1200 - Sign Language Spotting using Hierarchical Sequential Patterns with Temporal Intervals [pdf]
Nicolas Pugeault, Eng-Jon Ong, Richard Bowden, Oscar Koller

Abstract: This paper tackles the problem of spotting a set of signs occuring in videos with sequences of signs. To achieve this, we propose to model the spatio-temporal signatures of a sign using an extension of sequential patterns that contain temporal intervals called Sequential Interval Patterns (SIP). We then propose a novel multi-class classifier that organises different sequential interval patterns in a hierarchical tree structure called a Hierarchical SIP Tree (HSP-Tree). This allows one to exploit any subsequence sharing that exists between different SIPs of different classes. Multiple trees are then combined together into a forest of HSP-Trees resulting in a strong classifier that can be used to spot signs. We then show how the HSP-Forest can be used to spot sequences of signs that occur in an input video. We have evaluated the method on both concatenated sequences of isolated signs and continuous sign sequences. We also show that the proposed method is superior in robustness and accuracy to a state of the art sign recogniser when applied to spotting a sequence of signs.
Similar papers:
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
  • Interval Tracker: Tracking by Interval Analysis [pdf] - Junseok Kwon, Kyoung Mu Lee
#1201 - Accurate Localization and Pose Estimation for Large 3D Models [pdf]
Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl

Abstract: We consider the problem of localizing a novel image in a large 3D model. In principle, this is just an instance of camera pose estimation, but the scale introduces some challenging problems. For one, it makes the correspondence problem very difficult and it is likely that there will be a significant rate of outliers to handle. In this paper we use recent theoretical as well as technical advances to tackle these problems. Many modern cameras and phones have gravitational sensors that allow us to reduce the search space. Further, there are new techniques to efficiently and reliably deal with extreme rates of outliers. We extend these methods to camera pose estimation by using accurate approximations and fast polynomial solvers. Experimental results are given that demonstrate that it is possible to reliably estimate the camera pose despite more than 99% of outlier correspondences.
Similar papers:
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • T-Linkage: a Continuous Relaxation of J-Linkage for Multi-Model Fitting [pdf] - Luca Magri, Andrea Fusiello
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
#1206 - Efficient Nonlinear Markov Models for Human Motion [pdf]
Andreas Lehrmann, Peter Gehler, Sebastian Nowozin

Abstract: Dynamic Bayesian networks such as Hidden Markov Models (HMMs) are successfully used as probabilistic models for human motion. The use of hidden variables makes them expressive models, but inference is only approximate and requires procedures such as particle filters or Markov chain Monte Carlo methods. In this work we propose to instead use simple Markov models that only model observed quantities. We retain a highly expressive dynamic model by using interactions that are nonlinear and non-parametric. A presentation of our approach in terms of latent variables shows logarithmic growth for the computation of exact loglikelihoods in the number of latent states. We validate our model on human motion capture data and demonstrate state-of-the-art performance on action recognition and motion completion tasks.
Similar papers:
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
  • A Procrustean Markov Process for Non-Rigid Structure Recovery [pdf] - Minsik Lee, Chong-Ho Choi, Songhwai Oh
  • Leveraging Hierarchical Parametric Network for Skeletal Joints Action Segmentation and Recognition [pdf] - Di Wu, Ling Shao
  • Optimizing Average Precision using Weakly Supervised Data [pdf] - Aseem Behl, M. Pawan Kumar, C.V. Jawahar
#1207 - Object Partitioning using Local Convexity [pdf]
Simon Christoph Stein, Jeremie Papon, Markus Schoeler, Florentin Woergoetter

Abstract: The problem of how to arrive at an appropriate 3D-segmentation of a scene remains difficult. While current state-of-the-art methods continue to gradually improve in benchmark performance, they also grow more and more complex, for example by incorporating chains of classifiers, which require training on large manually annotated data-sets. As an alternative to this, we present a new, efficient learning- and model-free approach for the segmentation of 3D point clouds into object parts. The algorithm begins by decomposing the scene into an adjacency-graph of surface patches based on a voxel grid. Edges in the graph are then classified as either convex or concave using a novel combination of simple criteria which operate on the local geometry of these patches. This way the graph is divided into locally convex connected subgraphs, which -- with high accuracy -- represent object parts. Additionally, we propose a novel depth dependent voxel grid to deal with the decreasing point-density at far distances in the point clouds. This improves segmentation, allowing the use of fixed parameters for vastly different scenes. The algorithm is straightforward to implement and requires no training data, while nevertheless producing results that are comparable to state-of-the-art methods which incorporate high-level concepts involving classification, learning and model fitting.
Similar papers:
  • Image Reconstruction from Bag-of-Visual-Words [pdf] - Hiroharu Kato, Tatsuya Harada
  • Class Specific 3D Object Shape Priors Using Surface Normals [pdf] - Christian Hne, Nikolay Savinov, Marc Pollefeys
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
  • Iterative Multilevel MRF Leveraging Context and Voxel Information for Brain Tumour Segmentation in MRI [pdf] - Nagesh Subbanna, Doina Precup, Tal Arbel
#1213 - Minimal Scene Descriptions from Structure from Motion Models [pdf]
Song Cao, Noah Snavely

Abstract: How much data do we need to describe a location? We explore this question in the context of 3D scene reconstructions created from running structure from motion on large Internet photo collections, where reconstructions can contain many millions of 3D points. We consider several methods for computing much more compact representations of such reconstructions for the task of location recognition, with the goal of maintaining good performance with very small models. In particular, we introduce a new method for computing compact models that takes into account both image-point relationships, as well as feature distinctiveness, and show that this method produces small models that yield better recognition performance than previous model reduction techniques.
Similar papers:
  • Congruency-Based Reranking [pdf] - Itai Ben Shalom, Adiel Ben Shalom, Noga Levy, Lior Wolf, Tamir Hazan, Nachum Dershowitz, Yaniv Bar, Roni Shweka, Yaacov Choueka
  • Locally Optimized Product Quantization [pdf] - Yannis Kalantidis, Yannis Avrithis
  • Geometric Urban Geo-Localization [pdf] - Mayank Bansal, Kostas Daniilidis
  • Discriminative Feature-to-Point Matching in Image-Based Localization [pdf] - Michael Donoser, Dieter Schmalstieg
#1219 - Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf]
Sudipta Sinha, Daniel Scharstein, Richard Szeliski

Abstract: We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses for a semi-global matching algorithm. Our local plane hypotheses are derived from initial sparse feature correspondences followed by an iterative clustering step. Local plane sweeps are then performed around each slanted plane to produce out-of-plane parallax and matching-cost estimates. A final global optimization stage, implemented using semi-global matching, assigns each pixel to one of the local plane hypotheses. By only exploring a small fraction of the whole disparity space volume, our technique achieves significant speedups over previous algorithms and achieves state-of-the-art accuracy on high-resolution stereo pairs of up to 19 megapixels.
Similar papers:
  • Light Field Stereo Matching Using Bilateral Statistics of Surface Cameras [pdf] - Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, Jingyi Yu
  • Graph Cut based Continuous Stereo Matching using Locally Shared Labels [pdf] - Tatsunori Taniai, Yasuyuki Matsushita, Takeshi Naemura
  • Learning to Detect Ground Control Points for Improving the Accuracy of Stereo Matching [pdf] - Aristotle Spyropoulos, Nikos Komodakis, Philippos Mordohai
  • Stereo under Sequential Optimal Sampling: A Statistical Analysis Framework for Search Space Reduction [pdf] - Yilin Wang, Jan-Michael Frahm, Enrique Dunn, Ke Wang
#1223 - Two-Class Weather Labeling [pdf]
Cewu Lu, DI LIN, Jiaya Jia, Chi-keung Tang

Abstract: Given a single outdoor image this paper proposes a collaborative learning approach for labeling the image as either sunny or cloudy. Never adequately addressed, this two-class labeling problem is by no means trivial given the great variety of outdoor images. Our weather feature combines everyday weather cues after properly encoding them into feature vectors. These encoded cues then work collaboratively in synergy under a unified optimization framework that is aware of the presence (or absence) of a given weather cue during learning and classification. Extensive experiments and comparisons are performed to verify our method. The other contribution consists of a new weather image dataset consisting of 10K sunny and cloudy images which will also be freely available with the executable of our implementation.
Similar papers:
  • Object Discovery and Segmentation via Discriminative Visual Subcategories [pdf] - Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta
  • Investigating Haze-relevant Features in A Learning Framework for Image Dehazing [pdf] - Ketan Tang, Jianchao Yang, Jue Wang
  • Automatic Feature Learning for Robust Shadow Detection [pdf] - Salman Khan, mohammed Bennamoun, Ferdous Sohel, Roberto Togneri
  • Shadow Removal from Single RGB-D Images [pdf] - Yao Xiao, Efstratios Tsougenis, Chi-keung Tang
#1226 - A Procrustean Markov Process for Non-Rigid Structure Recovery [pdf]
Minsik Lee, Chong-Ho Choi, Songhwai Oh

Abstract: Recovering a non-rigid 3D structure from a series of 2D observations is still a difficult problem to solve accurately. Many constraints have been proposed to facilitate the recovery, and one of the most successful constraints is smoothness due to the fact that most real-world objects change continuously. However, many existing methods require to determine the degree of smoothness beforehand, which is not viable in practical situations. In this paper, we propose a new probabilistic model that incorporates the smoothness constraint without requiring any prior knowledge. Our approach regards the sequence of 3D shapes as a simple stationary Markov process with Procrustes alignment, whose parameters are learned during the fitting process. The Markov process is assumed to be stationary because deformation is finite and recurrent in general, and the 3D shapes are assumed to be Procrustes aligned in order to discriminate deformation from motion. The proposed method outperforms the state-of-the-art methods, even though the computation time is rather moderate compared to the other existing methods.
Similar papers:
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Good Vibrations: A Modal Analysis Approach for Sequential Non-Rigid Structure from Motion [pdf] - Antonio Agudo, Lourdes Agapito, Begoa Calvo, Jose M. Montiel
  • Deformable Object Matching via Deformation Decomposition based 2D Label MRF [pdf] - Kangwei Liu, zhang Junge, Kaiqi Huang, Tieniu Tan
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
#1228 - Distance Encoded Product Quantization [pdf]
Jae-Pil Heo, Zhe Lin, Sung-eui Yoon

Abstract: Many binary code embedding techniques have been proposed for large-scale approximate nearest neighbor search in computer vision. Recently, product quantization that encodes the cluster index in each subspace has been shown to provide impressive accuracy for nearest neighbor search. In this paper, we explore a simple question: is it best to use all the bit budget for encoding a cluster index in each subspace? We have found that as data points are located farther away from the centers of their clusters, the error of estimated distances among those points becomes larger. To address this issue, we propose a novel encoding scheme that distributes the available bit budget to encoding both the cluster index and the quantized distance between a point and its cluster center. We also propose two different distance metrics tailored to our encoding scheme. We have tested our method against the-state-of-the-art techniques on several well-known benchmarks, and found that our method consistently improves the accuracy over other tested methods. This result is achieved mainly because our method accurately estimates distances between two data points with the new binary codes and distance metric.
Similar papers:
  • Additive Quantization for Extreme Vector Compression [pdf] - Artem Babenko, Victor Lempitsky
  • Locally Linear Hashing for Extracting Non-Linear Manifolds [pdf] - Go Irie, Zhenguo Li, Xiao-Ming Wu, Shi-Fu Chang
  • Locally Optimized Product Quantization [pdf] - Yannis Kalantidis, Yannis Avrithis
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
#1233 - Gyro-Based Multi-Image Deconvolution for Removing Handshake Blur [pdf]
Sung Hee Park, Marc Levoy

Abstract: Image deblurring to remove blur caused by camera shake has been intensively studied. Nevertheless, most methods are brittle and computationally expensive. In this paper we analyze multi-image approaches, which capture and combine multiple frames in order to make deblur- ring more robust and tractable. In particular, we compare the performance of two approaches: align-and-average and multi-image deconvolution. Our deconvolution is non- blind, using a blur model obtained from real camera motion as measured by a gyroscope. We show that in most situ- ations such deconvolution outperforms align-and-average. Wealsoshow, perhapssurprisingly, thatdeconvolutiondoes not benefit from increasing exposure time beyond a certain threshold. To demonstrate the effectiveness and efficiency of our method, we apply it to still-resolution imagery of nat- ural scenes captured using a mobile camera with flexible camera control and an attached gyroscope.
Similar papers:
  • Separable Kernel for Image Deblurring [pdf] - Lu Fang, Haifeng Liu, Feng Wu
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
  • Discriminative Blur Detection Features [pdf] - Jianping Shi, Li Xu, Jiaya Jia
#1234 - Joint Motion Segmentation and Background Subtraction in Dynamic Scenes [pdf]
Adeel Mumtaz, Weichen Zhang, Antoni Chan

Abstract: We propose a joint foreground-background mixture model (FBM) that simultaneously performs background subtraction and motion segmentation in complex dynamic scenes. Our FBM consist of a set of location-specific dynamic texture (DT) components, for modeling local background motion, and set of global DT components, for modeling consistent foreground motion. We derive an EM algorithm for estimating the parameters of the FBM. We also apply spatial constraints to the FBM using an Markov random field grid, and derive a corresponding variational approximation for inference. Unlike existing approaches to background subtraction, our FBM does not require a manually selected threshold or a separate training video. Unlike existing motion segmentation techniques, our FBM can segment foreground motions over complex background with mixed motions, and detect stopped objects. Since most dynamic scene datasets only contain videos with a single foreground object over a simple background, we develop a new challenging dataset with multiple foreground objects over complex dynamic backgrounds. In experiments, we show that jointly modeling the background and foreground segments with FBM yields significant improvements in accuracy on both background subtraction and motion segmentation, compared to state-of-the-art methods.
Similar papers:
  • How to Evaluate Foreground Maps? [pdf] - Ran Margolin, Lihi Zelnik-Manor, Ayellet Tal
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
#1237 - Leveraging Hierarchical Parametric Network for Skeletal Joints Action Segmentation and Recognition [pdf]
Di Wu, Ling Shao

Abstract: Over the last few years, with the popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D skeletal data. A number of approaches have been proposed to extract representative features from 3D skeletal data, such as, most commonly, hard wired geometric or bio-inspired shape context features. We propose a hierarchical, dynamic framework that first extracts high level skeletal joints features and then uses the learned representation for estimating emission probability to infer the action class. Gaussian mixture models are primarily used for modeling the emission distribution of hidden Markov models. We show that better action recognition using skeletal features can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features to predict probability distributions over states of hidden Markov models. The framework can be easily extended to include an ergodic state to segment and recognize actions simultaneously.
Similar papers:
  • Deeply-Learned Slow Feature Analysis for Action Recognition [pdf] - LIN SUN
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
#1238 - T-Linkage: a Continuous Relaxation of J-Linkage for Multi-Model Fitting [pdf]
Luca Magri, Andrea Fusiello

Abstract: This paper presents an improvement of the J-linkage algorithm for fitting multiple instances of a model to noisy data corrupted by outliers. The binary preference analysis implemented by J-linkage is replaced by a continuous (soft, or fuzzy) generalization that proves to perform better than J-linkage on simulated data, and compares favorably with state of the art methods on public domain real datasets.
Similar papers:
  • Joint-Histogram Weighted Median Filter [pdf] - Qi Zhang, Li Xu, Jiaya Jia
  • Subspace Clustering for Sequential Data [pdf] - Stephen Tierney, Junbin Gao, Yi Guo
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
#1248 - Partial Symmetry in Polynomial Systems and Its Application in Computer Vision [pdf]
Yubin Kuang, Yinqiang Zheng, Kalle Astroem

Abstract: Polynomial solving is one of key components for solving geometry problems in computer vision. Fast and stable polynomial solvers are essential for numerous applications e.g.\ minimal problems or finding for all stationary points of certain algebraic errors. Recently, full symmetry in the polynomial systems has been utilized to simplify and speed up state-of-the-art polynomial solvers based on Gr{\"o}bner basis method \cite{ask2012exploiting}. In this paper, we further explore partial symmetry (i.e.\ only a subset of unknowns are symmetric) in the polynomial systems. We develop novel numerical schemes to utilize such partial symmetry. We then demonstrate the advantage of our schemes in several computer vision problems. In both synthetic and real experiments, we show that utilizing partial symmetry allow us to obtain faster and more accurate polynomial solvers than the general solvers.
Similar papers:
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Mirror Symmetry Histograms for Capturing Geometric Properties in Images [pdf] - Marcelo Cicconet, Davi Geiger, Michael Werman, Kristin Gunsalus
  • Higher-Order Clique Reduction Without Auxiliary Variables [pdf] - Hiroshi Ishikawa
  • A General and Simple Method for Camera Pose and Focal Length Determination [pdf] - Yinqiang Zheng, Shigeki Sugimoto, Imari Sato, Masatoshi Okutomi
#1249 - Decorrelated Vectorial Total Variation [pdf]
Shunsuke Ono, Isao Yamada

Abstract: This paper proposes a new vectorial total variation prior (VTV) for color images. Different from existing VTVs, our VTV, named the decorrelated vectorial total variation prior (D-VTV), measures the discrete gradients of the luminance component and that of the chrominance one in a separated manner, which significantly reduces undesirable uneven color effects. Moreover, a higher-order generalization of the D-VTV, which we call the decorrelated vectorial total generalized variation prior (D-VTGV), is also developed for avoiding the staircasing effect that accompanies the use of VTVs. A noteworthy property of the D-VT(G)V is that it enables us to efficiently minimize objective functions involving it by a primal-dual splitting method. Experimental results illustrate their utility.
Similar papers:
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
#1269 - Locally Linear Hashing for Extracting Non-Linear Manifolds [pdf]
Go Irie, Zhenguo Li, Xiao-Ming Wu, Shi-Fu Chang

Abstract: Most hashing methods aim to preserve either the variance (e.g. PCA-based hashing) or the pairwise affinity (e.g. spectral hashing) of data manifolds. However, neither property is adequate to capture their non-linear geometric structures. In this paper, we tackle this problem by exploring the locally linear structures of manifolds. We propose a new hashing method to reconstruct their locally linear structures in the binary Hamming space, which are learned by locality-sensitive sparse coding. The problem is naturally cast as a joint minimization of reconstruction error and quantization error, which is NP-hard. Nevertheless, a local optimum can be obtained efficiently via alternating optimization between optimal reconstruction and quantization. Our method distinguishes itself from others in its remarkable ability to extract nearest neighbors of the query lying on the same manifold instead of in the ambient space. We perform extensive experiments on various image benchmark datasets. Our results improve the performances of the state-of-the-art methods by 28-74% typically, and 627% in the best case for face data.
Similar papers:
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
  • Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf] - Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang
#1272 - Gesture Recognition Portfolios for Personalization [pdf]
Angela Yao, Luc Van Gool, Pushmeet Kohli

Abstract: Human gestures, like speech and handwriting, are often unique to the individual. Training a generic classifier which is applicable to everyone can be very difficult and as such, it has become a standard to use personalized classifiers in speech and handwriting recognition. In this paper, we address the problem of personalization in the context of gesture recognition, and propose a novel and extremely efficient way of doing personalization. Unlike traditional personalization methods which learn a single classifier that later gets adapted, our approach learns a set (portfolio) of classifiers during training, one of which is selected for each test subject based on the personalization data. We formulate classifier personalization as a selection problem and propose several algorithms to compute the set of candidate classifiers. Our experiments show that such an approach is much more efficient than adapting the classifier parameters but can still achieve comparable or better results.
Similar papers:
  • A Learning-to-Rank Approach for Image Color Enhancement [pdf] - Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • Beta Process Multiple Kernel Learning [pdf] - Bingbing Ni, Pierre Moulin
  • Leveraging Hierarchical Parametric Network for Skeletal Joints Action Segmentation and Recognition [pdf] - Di Wu, Ling Shao
#1280 - Additive Quantization for Extreme Vector Compression [pdf]
Artem Babenko, Victor Lempitsky

Abstract: We introduce a new compression scheme for highdimensional vectors that approximates the vectors using sums ofM codewords coming fromM different codebooks. We show that the proposed scheme permits efficient distance and scalar product computations between compressed and uncompressed vectors. We further suggest vector encoding and codebook learning algorithms that can minimize the coding error within the proposed scheme. In the experiments, we demonstrate that the proposed compression can be used instead of or together with product quantization. Compared to product quantization and its optimized versions, the proposed compression approach leads to lower coding approximation errors, higher accuracy of approximate nearest neighbor search in the datasets of visual descriptors, and lower image classification error, whenever the classifiers are learned on or applied to compressed vectors.
Similar papers:
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • Asymmetric sparse kernel approximations for large-scale visual search [pdf] - Damek Davis, Stefano Soatto, Jonathan Balzer
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
  • Compact Representation for Image Classification: To Choose or to Compress? [pdf] - Yu Zhang, Jianxin Wu, Jianfei Cai
#1282 - Covariance descriptors for 3D shape matching and retrieval [pdf]
Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin

Abstract: Several descriptors have been proposed in the past for 3D shape analysis, yet none of them achieves best performance on all shape classes. In this paper we propose a novel method for 3D shape analysis using the covariance matrices of the descriptors rather than the descriptors themselves. Covariance matrices enable efficient fusion of different types of features and modalities. They capture the geometric and the spatial properties as well as their correlation within the same representation. Covariance matrices however lie on the manifold of Symmetric Positive Definite (SPD) tensors, a special type of Riemannian manifolds, which makes comparison and clustering of such matrices challenging. In this paper we study covariance matrices in their native space and make use of the geodesic distances on the manifold as a dissimilarity measure. We demonstrate the performance of this metric on 3D face matching and recognition tasks. We then generalize the Bag of Features paradigm, originally designed in Euclidean spaces, to the Riemannian manifold of SPD matrices. We propose a new clustering procedure that takes into account the geometry of the Riemannian manifold. We evaluate the performance of the proposed framework on 3D shape matching and retrieval applications and demonstrate its superiority compared to descriptor-based techniques.
Similar papers:
  • On the quotient representation for the essential manifold [pdf] - Roberto Tron, Kostas Daniilidis
  • Using Projection Kurtosis Concentration Of Natural Images For Blind Noise Covariance Matrix Estimation [pdf] - Siwei Lyu
  • Model Transport: Towards Scalable Transfer Learning on Manifolds [pdf] - Oren Freifeld, Soren Hauberg, Michael Black
  • Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI [pdf] - YUANXIANG WANG, Hesamoddin Salehian, Guang Cheng, Baba Vemuri
#1287 - Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf]
Amal Delaunoy, Marc Pollefeys

Abstract: Motivated by a Bayesian vision of the 3D multi-view reconstruction from images problem, we propose a dense 3D reconstruction technique that jointly refines the shape and the camera parameters of a scene by minimizing the photometric reprojection error between a generated model and the observed images, hence considering all pixels in the original images. The minimization is performed using a gradient descent scheme coherent with the shape representation (here a triangular mesh), where we carefully derive evolution equations including the derivatives of the visibility function. This can be used at a last refinement step in 3D reconstruction pipelines and helps improving the 3D reconstruction's quality by estimating the 3D shape and camera calibration more accurately. Examples are shown for multi-view stereo where the texture is also jointly optimized and improved, but could be used for any generative approaches dealing with multi-view reconstruction settings (i.e. depth map fusion, multi-view photometric stereo).
Similar papers:
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
#1291 - Super-resolving Appearance of 3D Deformable Shapes from Multiple Videos [pdf]
Jean-Sebastien Franco, Vagia Tsiminaki, Edmond Boyer

Abstract: We examine the problem of retrieving and super-resolving the appearance of objects observed in multiple videos under small object motions. Super-resolution has been vastly explored in the case of monocular video, where the data redundancy necessary to reconstruct the image stems from temporal accumulation. On the other hand, a handful of methods have examined texture super-resolution of a static 3D object observed from several cameras, where the data redundancy is obtained through the different viewpoints. We introduce a unified framework to leverage both possibilities for super-resolution, which uniformly deals with any source of geometric variability. To this goal we use 2D warps for all views and temporal frames, and a simple linear projection model from texture to image space. Despite its simplicity, the method is able to successfully improve the texture appearance with temporal information, as shown experimentally. Additionally, we show that our method obtains better results than state of the art 3D shape super-resolution methods existing for the static case.
Similar papers:
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Describing Textures in the Wild [pdf] - Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi
  • Lacunarity Analysis on Image Patterns for Texture Classification [pdf] - Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
#1314 - Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf]
Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar

Abstract: The notion of relative attributes as introduced by Parikh and Grauman (ICCV, 2011) [24] provides an appealing way of comparing two images based on their visual properties (or attributes) such as ``smiling'' for face images, ``naturalness'' for outdoor images, etc. For learning such attributes, a Ranking SVM based formulation was proposed that uses globally represented pairs of annotated images. In this paper, we extend this idea towards learning relative attributes using local parts that are shared across categories. First, instead of using a global representation, we introduce a part-based representation combining a pair of images that specifically compares corresponding parts. Then, with each part we associate a locally adaptive ``significance-coefficient'' that represents its discriminative ability with respect to a particular attribute. For each attribute, the significance-coefficients are learned simultaneously with a max-margin ranking model in an iterative manner. Compared to the baseline method, the new method not only achieves significant improvement in relative attribute prediction accuracy, it is also shown to significantly improve the performance of relative attribute feedback based interactive image search.
Similar papers:
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
#1315 - SeamSeg: Video Object Segmentation using Patch Seams [pdf]
Avinash Ramakanth, Venkatesh Babu Radhakrishnan

Abstract: In this paper, we propose a video object segmentation algorithm by extending the formulation of seams, from image and video retargetting. In retargetting, the primary aim is to reduce the image size while preserving the salient image contents. To achieve this, the energy function used is based on edge strength. Typically, seams, which are connected paths of low energy, are utilised for retargetting. Here, we modify the formulation of seams to facilitate robust video object segmentation. The energy function associated with the proposed video seams provides temporal linking of objects across frames, while accurately modelling object motion. The proposed energy function takes into account the similarity of patches along the seam, temporal consistency of motion and spatial coherency of seams. Label propagation in the boundary regions, the most critical step in accurate object segmentation, is achieved with high fidelity, utilising the proposed video seams. To achieve accurate object segmentation without additional overheads, we curtail the error propagation from boundary regions using rough-set based modelling. The performance of proposed approach is evaluated on benchmark datasets and found to out-perform existing supervised and unsupervised state-of-the-art approaches.
Similar papers:
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Single Image Super-resolution using Deformable Patches [pdf] - Yu Zhu, Yanning Zhang, Alan Yuille
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
#1319 - Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision [pdf]
Liang-Chieh Chen, Sanja Fidler, Alan Yuille, Raquel Urtasun

Abstract: Labeling large-scale datasets with very accurate object segmentations is an elaborate task that requires a high degree of quality control and an expenditure of at least tens of thousands of dollars. Thus, coming up with solutions that can automatically do labeling given weak supervision is key to reduce this cost. In this paper we show how to exploit 3D information (i.e., stereo and/or point clouds,) to automatically generate very accurate object segmentations given annotated 3D bounding boxes. We formulate the problem as the one of inference in a binary MRF which exploits appearance models, stereo and/or noisy point clouds, a repository of 3D CAD models as well as topological constraints. We demonstrate the effectiveness of our approach in the context of autonomous driving, and show that we can segment cars with 86% intersection over union, performing as well as highly recommended MTurkers!
Similar papers:
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • Visual Tracking Using Pertinent Patch Selection and Masking [pdf] - Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • Aerial Reconstructions via Probabilistic Data Fusion [pdf] - Randi Cabezas, Oren Freifeld, Guy Rosman, John Fisher III
#1325 - A Hierarchical Context Model for Event Recognition in Surveillance Video [pdf]
Xiaoyang Wang, Qiang Ji

Abstract: Due to great challenges such as tremendous intra-class variations and low image resolution, context information has been playing a more and more important role for accurate and robust event recognition in surveillance videos. The context information can generally be divided into the feature level context, the semantic level context, and the prior level context. These three levels of context provide crucial bottom-up, middle level, and top down information that can benefit the recognition task itself. Unlike existing researches that generally integrate the context information at one of the three levels, we propose a hierarchical context model that simultaneously exploits contexts at all three levels and systematically incorporate them into event recognition. To tackle the learning and inference challenges brought in by the model hierarchy, we develop complete learning and inference algorithms for the proposed hierarchical context model based on variational Bayes method. Experiments on VIRAT 1.0 and 2.0 Ground Datasets demonstrate the effectiveness of the proposed hierarchical context model for improving the event recognition performance even under great challenges like large intra-class variations and low image resolution.
Similar papers:
  • Video Classification Based on Generalized Maximum Co-occurrence Cliques [pdf] - Amir Roshan Zamir, Shayan Modiri Assari
  • Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf] - Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer
  • DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf] - Chen Sun, Ram Nevatia
  • Event Detection using Multi-Level Relevance Labels and Multiple Features [pdf] - Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander Hauptmann
#1328 - Tell Me What You See and I will Show You Where It Is [pdf]
Jia Xu, Alexander Schwing, Raquel Urtasun

Abstract: We tackle the problem of weakly labeled semantic segmentation, where the information is given only in the form of image tags that encode which classes are present in the scene. This is an extremely difficult problem as no pixel-wise labelings are available, not even at training time. In this paper, we show that this problem can be formalized as performing learning and inference in a latent structure prediction framework. The graphical model encodes the presence and absence of a class as well as the assignments of semantic labels to super-pixels. As a consequence, we are able to leverage techniques and algorithms with good theoretical properties. We demonstrate the effectiveness of our approach in the challenging Sift Flow dataset and show superior performance to the state-of-the-art.
Similar papers:
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Optimizing Average Precision using Weakly Supervised Data [pdf] - Aseem Behl, M. Pawan Kumar, C.V. Jawahar
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
  • NMF-KNN: Image Annotation using Weighted Multi-view Non-Negative Matrix Factorization [pdf] - Mahdi Kalayeh, Haroon Idrees, Mubarak Shah
#1330 - Event Detection using Multi-Level Relevance Labels and Multiple Features [pdf]
Zhongwen Xu, Ivor W. Tsang, Yi Yang, Zhigang Ma, Alexander Hauptmann

Abstract: We address the challenging problem of utilizing related exemplars for complex event detection while multiple features are available. Related exemplars are labeled as related to the event but not exactly matched. Related exemplars share certain positive elements of the event, but have no uniform pattern due to the huge variance of relevance levels among different related exemplars. None of the existing multiple feature fusion methods can deal with the related exemplars. In this paper, we propose an algorithm which adaptively utilizes the related exemplars by cross feature learning. Ordinal labels are used to represent the multiple relevance levels of the related videos. Label candidates of related exemplars are generated by exploring the possible relevance levels of each related exemplar via a cross-feature voting strategy. Maximum margin criterion is then applied in our framework to discriminate the positive and negative exemplars, as well as the related exemplars from different relevance levels. We test our algorithm using the large scale TRECVID 2011 dataset and it gains promising performance.
Similar papers:
  • Efficient Boosted Exemplar-based Face Detection [pdf] - Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua
  • Video Classification Based on Generalized Maximum Co-occurrence Cliques [pdf] - Amir Roshan Zamir, Shayan Modiri Assari
  • DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf] - Chen Sun, Ram Nevatia
  • A Hierarchical Context Model for Event Recognition in Surveillance Video [pdf] - Xiaoyang Wang, Qiang Ji
#1331 - Fast and Robust Archetypal Analysis for Representation Learning [pdf]
Yuansi Chen, Julien Mairal, Zaid Harchaoui

Abstract: We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to impactful problems has indeed been severely limited. Our paper addresses this issue in order to reinstate archetypal analysis. We develop a fast optimization scheme based on an active-set strategy, and provide the first scalable open-source implementation. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large-scale image collection visualization.
Similar papers:
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
  • Learning Inhomogeneous FRAME Models for Object Patterns [pdf] - Jianwen Xie, Wenze Hu, Song Chun Zhu, Ying Nian Wu
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
#1332 - Superpixel-grounded Deformable Part Models [pdf]
Eduard Trulls, Iasonas Kokkinos, Francesc Moreno-Noguer, Alberto Sanfeliu

Abstract: In this work we propose a simple and fast technique of combining bottom-up segmentation, in the form of SLIC superpixels, with Deformable Part Models (DPMs). Our approach can be understood as `cleaning up' the low-level HOG features by exploiting the spatial support of SLIC superpixels; effectively we split feature variation into object-specific changes, and generic background/contextual changes. Rather than committing to a single segmentation we use a large pool of SLIC superpixels and combine these in a scale-, position- and object-dependent manner to build soft segmentation masks. The segmentation masks can be computed fast enough that we can repeat this process over every candidate window, during training and detection, for both the root and part filters. We use these masks to construct enhanced, background-invariant features to train DPMs. We test our approach on the PASCAL VOC 2007 dataset, which outperforms the standard DPM in 13 out of 15 classes, yielding an average increase of 1.7% AP. Additionally, we demonstrate the robustness of this approach extending it to dense SIFT descriptors for large displacement optical flow.
Similar papers:
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
#1336 - Relative Pose Estimation for a Multi-Camera System with Known Vertical Direction [pdf]
Gim Hee Lee, Marc Pollefeys, Friedrich Fraundorfer

Abstract: In this paper, we present our minimal 4-point and linear 8-point algorithms to estimate the relative pose of a multi-camera system with known vertical directions, i.e. known absolute roll and pitch angles. We solve the minimal 4-point algorithm with the hidden variable resultant method and show that it leads to an 8-degree univariate polynomial that gives up to 8 real solutions. We identify a degenerated case from the linear 8-point algorithm when it is solved with the standard Singular Value Decomposition (SVD) method and adopt a simple alternative solution which is easy to implement. We show that our proposed algorithms can be efficiently used within RANSAC for robust estimation. We evaluate the accuracy of our proposed algorithms by comparisons with various existing algorithms for the multi-camera system on simulations and show the feasibility of our proposed algorithms with results from multiple real-world datasets.
Similar papers:
  • Head Pose Estimation Based on Multivariate Label Distribution [pdf] - Xin Geng, Yu Xia
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • A Minimal Solution to the Generalized Pose-and-Scale Problem [pdf] - Jonathan Ventura, Clemens Arth, Gerhard Reitmayr, Dieter Schmalstieg
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
#1337 - Mirror Symmetry Histograms for Capturing Geometric Properties in Images [pdf]
Marcelo Cicconet, Davi Geiger, Michael Werman, Kristin Gunsalus

Abstract: We propose a data structure that captures global geometric properties in images: histograms of mirror symmetry coefficients. We compute such a coefficient for every pair of pixels taking into account their respective tangents and group them in a 6-dimensional histogram. By marginalizing this symmetry histogram in various ways, we develop algorithms for a range of applications: recovery of the contour representation of an image; detection of nearly-circular cells; location of the main axis of reflection symmetry; detection of cell-division in movies of developing embryos; detection of worm-tips and indirect cell-counting via Machine Learning. Our approach generalizes a series of histogram-related methods, and the proposed algorithms perform with state-of-the-art accuracy.
Similar papers:
  • Detection, Rectification and Segmentation of Co-planar Repeated Patterns [pdf] - James Pritts, Ondrej Chum, Jiri Matas
  • Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf] - Yusuke Yoshiyasu
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • Partial Symmetry in Polynomial Systems and Its Application in Computer Vision [pdf] - Yubin Kuang, Yinqiang Zheng, Kalle Astroem
#1339 - Region-based Discriminative Feature Pooling for Scene Text Recognition [pdf]
Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu

Abstract: We present a new feature representation method for scene text recognition problem, particularly focusing on improving scene character recognition. Many existing methods rely on histogram of oriented gradient (HOG) or part-based models, which do not span the feature space well for characters in natural scene images, especially given large variation in fonts with clutter backgrounds. In this work, we propose a discriminative feature pooling method that automatically learns the most informative sub-regions of each scene character within a multi-class classification framework, whereas each sub-region seamlessly integrates a set of low-level image features through integral images. The proposed feature representation is compact, computationally efficient, and able to effectively model distinctive spatial structures of each individual character class. Extensive experiments conducted on challenging datasets (Chars74K, ICDAR'03, ICDAR'11, SVT) show that our method significantly outperforms existing methods on scene character classification and scene text recognition tasks.
Similar papers:
  • Large-scale visual font recognition [pdf] - Guang Chen, Jianchao Yang, Hailin Jin, Jonathan Brandt, Eli Shechtman, Aseem Agarwala, Tony Han
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • StoryGraphs: Narrative Charts for TV series [pdf] - Makarand Tapaswi, Martin Buml, Rainer Stiefelhagen
  • Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition [pdf] - Cong Yao, Xiang Bai, Baoguang Shi, Wenyu Liu
#1341 - Turning Mobile Phones into 3D Scanners [pdf]
Kalin Kolev, Petri Tanskanen, Pablo Speciale, Marc Pollefeys

Abstract: In this paper, we propose an efficient and accurate scheme for the integration of multiple stereo-based depth measurements. For each provided depth map a confidence-based weight is assigned to each depth estimate by evaluating local geometry orientation, underlying camera setting and photometric evidence. Subsequently, all hypotheses are fused together into a compact and consistent 3D model. Thereby, visibility conflicts are identified and resolved, and fitting measurements are averaged with regard to their confidence scores. The individual stages of the proposed approach are validated by comparing it to two alternative techniques which rely on a conceptually different fusion scheme and a different confidence inference, respectively. Pursuing live 3D reconstruction on mobile devices as a primary goal, we demonstrate that the developed method can easily be integrated into a system for monocular interactive 3D modeling by substantially improving its accuracy while adding an almost negligible overhead to its performance and retaining its interactive potential.
Similar papers:
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
#1346 - Predicting Multiple Attributes via Relative Multi-task Learning [pdf]
Lin Chen, Qiang Zhang, Baoxin Li

Abstract: Relative attributes learning aims to learn ranking functions describing the relative strength of attributes. Most of current learning approaches learn ranking functions for each attribute independently without considering possible intrinsic relatedness among the attributes. For a problem involving multiple attributes, it is reasonable to assume that utilizing such relatedness among the attributes would benefit learning, especially when the number of labeled training pairs are very limited. In this paper, we proposed a relative multi-attribute learning framework that integrates relative attributes into a multi-task learning scheme. The formulation allows us to exploit the advantages of the state-of-the-art regularization-based multi-task learning for improved attribute learning. In particular, using joint feature learning as the case studies, we evaluated our framework with both synthetic data and two real datasets. Experimental results suggest that the proposed framework has clear performance gain in ranking accuracy and zero-shot learning accuracy over existing methods of independent relative attributes learning and multi-task learning.
Similar papers:
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
#1351 - Lacunarity Analysis on Image Patterns for Texture Classification [pdf]
Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo

Abstract: This paper introduces a statistical approach to texture description, which can achieve highly discriminative ability for classifying texture images under a wide range of transformations, including photometric changes and geometric changes. The proposed method is based on the concept of lacunarity of the image patterns. Built upon the local binary patterns that are encoded at multiple scales, lacunarity analysis is applied to capture the self-similar behavior of the local structures. The proposed texture descriptor was applied to texture classification. Our method has demonstrated excellent performance in comparison with the existing state-of-the-art approaches on four challenging benchmark datasets.
Similar papers:
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
  • Super-resolving Appearance of 3D Deformable Shapes from Multiple Videos [pdf] - Jean-Sebastien Franco, Vagia Tsiminaki, Edmond Boyer
  • Describing Textures in the Wild [pdf] - Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Andrea Vedaldi
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
#1362 - Pseudoconvex Proximal Splitting for $L_\infty$ Problems in Multiview Geometry [pdf]
Anders Eriksson

Abstract: In this paper we study optimization methods for minimizing large-scale pseudoconvex $L_\infty$ problems in multiview geometry. We present a novel algorithm for solving this class of problem based on proximal splitting methods. We provide a brief derivation of the proposed method along with a general convergence analysis. The resulting meta-algorithm requires very little effort in terms of implementation and instead makes use of existing advanced solvers for non-linear optimization. Preliminary experiments on a number of real image datasets indicates that the proposed method experimentally matches or outperforms current state-of-the-art solvers for this class of problems.
Similar papers:
  • FAST LABEL: Easy and Efficient Optimization of Joint Multi-Label and Estimation Problems [pdf] - Byung-Woo Hong, Ganesh Sundaramoorthi
  • Low-Cost Compressive Sensing for Color Video and Depth [pdf] - Xin Yuan, Patrick Llull, Xuejun Liao, Jianbo Yang, David Brady, Guillermo Sapiro, Lawrence Carin
  • Generalized Nonconvex Nonsmooth Low-Rank Minimization [pdf] - Canyi Lu, Shuicheng Yan, Zhouchen Lin
  • Bregman Divergences for Infinite Dimensional Covariance Matrices [pdf] - Mehrtash Harandi, Mathieu Salzmann, Fatih Porikli
#1371 - High Accuracy Monocular Localization for Autonomous Driving Using Adaptive Ground Estimation [pdf]
Shiyu Song, Manmohan Chandraker

Abstract: Scale drift is a crucial challenge that prevents monocular autonomous driving from emulating the performance of stereo. This paper presents a real-time monocular SFM system that corrects for scale drift using a highly effective cue combination framework for ground plane estimation, yielding accuracy comparable to stereo over long driving sequences. Our ground plane estimation uses multiple cues like sparse features, dense inter-frame stereo and (when applicable) object bounding boxes. A data-driven mechanism is proposed to learn models from training data that relate observation covariances for each cue to error behavior of its underlying variables. During testing, this allows per-frame adaptation of observation covariances based on relative confidences inferred from visual data. Our framework significantly boosts not only the accuracy of monocular self-localization, but also that of applications like object localization that rely on the ground plane. Experiments on the KITTI dataset demonstrate the accuracy of our ground plane estimation, monocular SFM and object localization relative to ground truth, with detailed comparisons to prior art.
Similar papers:
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
  • Ground Plane Estimation using a Hidden Markov Model [pdf] - Ralf Dragon, Luc Van Gool
#1381 - Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf]
Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma

Abstract: The use of multiple features for tracking has been proved as an effective approach because limitation of each feature could be compensated. Since different types of variations such as illumination, occlusion and pose may happen in a video sequence, especially long sequence videos, how to dynamically select the appropriate features is one of the key problems in this approach. To address this issue in multi-cue visual tracking, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unstable features to be fused for tracking by using the advantages of sparse representation. As a result, robust tracking performance is obtained. Experimental results on publicly available videos show that the proposed method outperforms both existing sparse representation based and fusion-based trackers.
Similar papers:
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#1382 - Mixing Body-Part Sequences for Human Pose Estimation [pdf]
Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid

Abstract: In this paper, we present a method for estimating articulated human poses in videos. We cast this as an optimization problem defined on body parts with spatio-temporal links between them. Previous approaches for addressing this intractable problem have used different approximate solutions. Although such methods perform well on certain body parts, e.g. head, their performance on lower arms, i.e. elbows, wrists, remains poor. We present an alternative approximate method adapted to the pose estimation problem. Firstly, our approach takes into account temporal links with subsequent frames for the less-certain parts, namely elbows and wrists. Secondly, our method decomposes poses into limbs, generates limb sequences across time, and recomposes poses by mixing these body part sequences. We introduce a new dataset ``Poses in the Wild'', which is more challenging than existing ones, with sequences containing background clutter, occlusions, and severe camera motion. We experimentally compare our method with recent works on this new dataset as well as on two publicly available datasets, and show significant improvement.
Similar papers:
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
  • Human Pose Estimation: New Benchmark and State of the Art Analysis [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
  • Posebits for Monocular Pose Estimation [pdf] - Gerard Pons-Moll, Bodo Rosenhahn, David Fleet
  • Robust Estimation of 3D Human Poses from Single Images [pdf] - CHUNYU WANG, Yizhou Wang, Zhouchen Lin, Alan Yuille, Wen Gao
#1389 - 3D Pose from Motion for Cross-view Action Recognition via Non-linear Circulant Temporal Encoding [pdf]
Ankur Gupta, Martinez Julieta, Jim Little, Robert Woodham

Abstract: We describe a new approach to transfer knowledge across views for action recognition by using examples from a large collection of unlabelled motion capture (mocap) data to connect different views. We achieve this by directly matching purely motion based features from videos to mocap. Our approach is able to recover 3D pose sequences without performing any body part tracking. We use these matches to generate multiple motion projections and thus add view invariance to our action recognition model. We also introduce a closed form solution for approximate non-linear Circulant Temporal Encoding (nCTE), which allows us to efficiently perform the matches in the frequency domain. We test our approach on the challenging unsupervised modality of the IXMAS dataset, and use publicly available motion capture data for matching. Without any additional annotation effort, we are able to significantly outperform the current state-of-the-art.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Posebits for Monocular Pose Estimation [pdf] - Gerard Pons-Moll, Bodo Rosenhahn, David Fleet
#1391 - Better Feature Tracking Through Subspace Constraints [pdf]
Bryan Poling, Gilad Lerman, Arthur Szlam

Abstract: Feature tracking in video is a crucial task in computer vision. Usually, the tracking problem is handled one feature at a time, using a single-feature tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives. While this approach works quite well when dealing with high-quality video and ``strong'' features, it often falters when faced with dark and noisy video containing low-quality features. We present a framework for jointly tracking a set of features, which enables sharing information between the different features in the scene. We show that our method can be employed to track features for both rigid and nonrigid motions (possibly of few moving bodies) even when some features are occluded. Furthermore, it can be used to significantly improve tracking results in poorly-lit scenes (where there is a mix of good and bad features). Our approach does not require direct modeling of the structure or the motion of the scene, and runs in real time on a single CPU core.
Similar papers:
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#1394 - Efficient Computation of Relative Pose for Multi-Camera Systems [pdf]
Laurent Kneip, Hongdong Li

Abstract: We present a novel solution to compute the relative pose of a generalized camera. Existing solutions are either not general, have too high computational complexity, or require too many correspondences, which impedes an efficient or accurate usage within Ransac schemes. We factorize the problem as a low-dimensional, iterative optimization over relative rotation only, directly derived from well-known epipolar constraints. Common generalized cameras often consist of camera clusters, and give rise to omni-directional landmark observations. We prove that our iterative scheme performs well in such practically relevant situations, eventually resulting in computational efficiency similar to linear solvers, and accuracy close to bundle adjustment, while using less correspondences. Experiments on both virtual and real multi-camera systems prove superior overall performance for robust, real-time multi-camera motion estimation.
Similar papers:
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • A Minimal Solution to the Generalized Pose-and-Scale Problem [pdf] - Jonathan Ventura, Clemens Arth, Gerhard Reitmayr, Dieter Schmalstieg
  • Relative Pose Estimation for a Multi-Camera System with Known Vertical Direction [pdf] - Gim Hee Lee, Marc Pollefeys, Friedrich Fraundorfer
#1399 - Aerial Reconstructions via Probabilistic Data Fusion [pdf]
Randi Cabezas, Oren Freifeld, Guy Rosman, John Fisher III

Abstract: We propose an integrated probabilistic model for multi-modal fusion of aerial imagery and LiDAR data. The resulting model allows for reconstruction and analysis of large 3D scenes. An advantage of the approach is that it explicitly models uncertainty, allows for missing data, and provides a consistent framework for incorporating additional measurement modalities. As compared with image-based methods, dense reconstruction of complex urban scenes is feasible with relatively fewer observations. Furthermore, the proposed model allows one to estimate absolute scale and orientation and reason about other aspects of the scene, e.g., detection moving objects. As formulated, the model lends itself to massively-parallel computations, that is, utilizing both general-purpose and domain-specific components of modern graphic hardware, we are able to do fast inference over complex and detailed scenes. We demonstrate our results on large-scale reconstruction of an urban terrain from LiDAR and visual aerial photography data.
Similar papers:
  • Geometric Urban Geo-Localization [pdf] - Mayank Bansal, Kostas Daniilidis
  • Turning Mobile Phones into 3D Scanners [pdf] - Kalin Kolev, Petri Tanskanen, Pablo Speciale, Marc Pollefeys
  • Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf] - Jiejie Zhu
  • Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision [pdf] - Liang-Chieh Chen, Sanja Fidler, Alan Yuille, Raquel Urtasun
#1404 - Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf]
Ivan Lillo, Juan Carlos Niebles, Alvaro Soto

Abstract: This paper proposes a framework for recognizing complex human activities in videos. Our method describes human activities in a hierarchical discriminative model that operates at three semantic levels. At the lower level, body poses are encoded in a representative but discriminative pose dictionary. At the intermediate level encoded poses span a space on which simple human actions are composed. At the highest level, our model captures temporal and spatial compositions of actions into complex human activities. Our human activity classifier simultaneously models which body parts are relevant to the action of interest as well as their appearance and composition using a discriminative approach. By formulating model learning in a max-margin framework, our approach achieves powerful multi-class discrimination while providing useful annotations at the intermediate semantic level. We show how our hierarchical compositional model provides natural handling of occlusions, as well as novel compositions. To evaluate the effectiveness of our proposed framework, we introduce a new dataset of composed human activities. We provide empirical evidence that our method achieves state-of-the-art classification accuracies.
Similar papers:
  • Incremental Activity Modeling and Recognition in Streaming Videos [pdf] - MAHMUDUL HASAN, Amit Roy-Chowdhury
  • The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf] - Hilde Kuehne, Ali Arslan, Thomas Serre
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
#1405 - Unifying Spatial and Attribute Selection for Distracter-resilient Tracking [pdf]
Nan Jiang, Ying Wu

Abstract: Visual distracters are detrimental and generally very difficult to handle in target tracking, because they generate false positive candidates for target matching. The resilience of region-based matching to the distracters depends not only on the matching metric, but also on the characteristics of the target region to be matched. The two tasks, i.e., learning the best metric and selecting the distracter-resilient target regions, actually correspond to the attribute selection and spatial selection processes in the human visual perception. This paper presents an initial attempt to unify the modeling of these two tasks for an effective solution, based on the introduction of a new quantity called Soft Visual Margin. As a function of both matching metric and spatial location, it measures the discrimination between the target and its spatial distracters, and characterizes the reliability of matching. Different from other formulations of margin, this new quantity is analytical and is insensitive to noisy data. This paper presents a novel method to jointly determine the best spatial location and the optimal metric. Based on that, a solid distracter-resilient region tracker is designed, and its effectiveness is validated and demonstrated through extensive experiments.
Similar papers:
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf] - Lucy Liang, Kristen Grauman
#1406 - Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf]
Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer

Abstract: Modeling interactions of multiple co-occurring objects in a complex activity is becoming increasingly popular in the video domain. The Dynamic Bayesian Network (DBN) has been applied to this problem in the past due to its natural ability to statistically capture complex temporal dependencies. However, standard DBN structure learning algorithms are generatively learned, require manual structure definitions, and/or are computationally complex or restrictive. We propose a novel structure learning solution that fuses the Granger Causality statistic, a direct measure of temporal dependence, with the Adaboost feature selection algorithm to automatically constrain the temporal links of a DBN in a discriminative manner. This approach enables us to completely define the DBN structure prior to parameter learning, which reduces computational complexity in addition to providing a more descriptive structure. We refer to this modeling approach as the Granger Constraints DBN (GCDBN). Our experiments show how the GCDBN outperforms two of the most relevant state-of-the-art graphical models in complex activity classification on handball video data, surveillance data, and synthetic data.
Similar papers:
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
  • Incremental Activity Modeling and Recognition in Streaming Videos [pdf] - MAHMUDUL HASAN, Amit Roy-Chowdhury
  • Super Normal Vector for Activity Recognition Using Depth Sequences [pdf] - Xiaodong Yang, Yingli Tian
  • A Hierarchical Context Model for Event Recognition in Surveillance Video [pdf] - Xiaoyang Wang, Qiang Ji
#1411 - Head Pose Estimation Based on Multivariate Label Distribution [pdf]
Xin Geng, Yu Xia

Abstract: Accurate ground truth pose is essential to the training of most existing head pose estimation algorithms. However, in many cases, the ``ground truth'' pose is obtained in rather subjective ways, such as asking the human subjects to stare at different markers on the wall. In such case, it is better to use soft labels rather than explicit hard labels as the ground truth. Therefore, this paper proposes to associate a multivariate label distribution (MLD) to each image. An MLD covers a neighborhood around the original pose. Labeling the images with MLD can not only alleviate the problem of inaccurate pose labels, but also boost the training examples associated to each pose without actually increasing the total amount of training examples. Two algorithms are proposed to learn from the MLD by minimizing the weighted Jeffrey's divergence between the predicted MLD and the ground truth MLD. Experimental results show that the MLD-based methods perform significantly better than the compared state-of-the-art head pose estimation algorithms.
Similar papers:
  • Look at the Driver, Look at the Road: No Distraction! No Accident! [pdf] - Mahdi Rezaei, Reinhard Klette
  • Relative Pose Estimation for a Multi-Camera System with Known Vertical Direction [pdf] - Gim Hee Lee, Marc Pollefeys, Friedrich Fraundorfer
  • Learning-by-Synthesis for Appearance-based 3D Gaze Estimation [pdf] - Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
#1413 - Total-Variation Minimization on Unstructured Volumetric Mesh: Biophysical Applications on Reconstruction of 3D Ischemic Myocardium [pdf]
Jingjia Xu, Azar Rahimi Dehaghani, Fei Gao, Linwei Wang

Abstract: This paper describes the development and application of a new approach to total-variation (TV) minimization for reconstruction problems on geometrically-complex and unstructured volumetric mesh. The driving application of this study is the reconstruction of 3D ischemic regions in the heart from noninvasive body-surface potential data, where the use of a TV-prior can be expected to promote the reconstruction of two piecewise smooth regions of healthy and ischemic electrical properties with localized gradient in between. Compared to TV minimization on regular grids of pixels/voxels, the complex unstructured volumetric mesh of the heart poses unique challenges including the impact of mesh resolutions on the TV-prior and the difficulty of gradient calculation. In this paper, we introduce a variational TV-prior and, when combined with the iteratively re-weighted least-square concept, a new algorithm to TV minimization that is computationally efficient and robust to the discretization resolution. In a large set of simulation studies as well as two initial real-data studies, we demonstrate that the use of a TV prior outperforms L2-based penalties in the reconstruction of ischemic regions, and that the proposed TV-minimization algorithm shows higher accuracy, robustness, and computational efficiency compared to that with the commonly used discrete TV prior. Furthermore, we also compare the performance of the proposed TV minimization algorithm in combination with a L2- versus L1-based
Similar papers:
  • Class Specific 3D Object Shape Priors Using Surface Normals [pdf] - Christian Hne, Nikolay Savinov, Marc Pollefeys
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#1421 - RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf]
Tak-Wai Hui, King-Ngi Ngan

Abstract: Low-cost RGB-D imaging system such as Kinect is widely utilized for dense 3D reconstruction. However, Kinect generally suffers from two major problems. The spatial resolution of the depth image is low. The depth image often contains numerous holes where no depth measurements were available. This can be due to the bad infra-red reflectance property of some objects in the scene. Since the spatial resolution of the color image is higher than that of the depth image, this paper introduces a new method to enhance the depth images from a moving Kinect using the depth cue from the induced optical flow. We not only fill holes in the raw depth image, but also recover the fine details of the scene. We address the problem of depth image enhancement by minimizing an energy functional. In order to reduce the computational complexity, we have treated the textured and homogeneous regions in the color image differently. Experimental results on real-image data are provided to show the effectiveness of the proposed method.
Similar papers:
  • SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization [pdf] - Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun
  • A Compositional Model for Low-Dimensional Image Set Representation [pdf] - Hossein Mobahi, Ce Liu, Bill Freeman
  • SphereFlow: 6 DoF Scene Flow from RGB-D Pairs [pdf] - Michael Hornacek, Andrew Fitzgibbon, Margrit Gelautz, Carsten Rother
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
#1423 - Semantic Object Selection [pdf]
Ejaz Ahmed, Scott Cohen, Brian Price

Abstract: Interactive object segmentation has great practical importance in computer vision. Many interactive methods have been proposed utilizing user input in the form of mouse clicks and mouse strokes, and often requiring a lot of user intervention. In this paper, we present a system with a far simpler input method: the user needs only give the name of the desired object. With the tag provided by the user we query a text image database to gather exemplars of the object. Using object proposals and borrowing ideas from image retrieval and object detection, the object is localized in the image. An appearance model generated from the exemplars and the location prior are used in an energy minimization framework to select the object. Our method outperforms the state-of-the-art on existing datasets and on a more challenging dataset we collected.
Similar papers:
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Efficient Boosted Exemplar-based Face Detection [pdf] - Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua
#1432 - Learning-by-Synthesis for Appearance-based 3D Gaze Estimation [pdf]
Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato

Abstract: Inferring human gaze from low-resolution eye images is still a challenging task despite its practical importance in many application scenarios. This paper presents a learning-by-synthesis approach to accurate image-based gaze estimation that is person- and head pose-independent. Unlike existing appearance-based methods that assume person-specific training data, we use a large amount of cross-subject training data to train a 3D gaze estimator. We collect the largest and fully calibrated multi-view gaze dataset and perform a 3D reconstruction in order to generate dense training data of eye images. By using the synthesized dataset to learn a random regression forest, we show that our method outperforms existing methods that use low-resolution eye images.
Similar papers:
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Head Pose Estimation Based on Multivariate Label Distribution [pdf] - Xin Geng, Yu Xia
  • Temporal Segmentation of Egocentric Videos [pdf] - Chetan Arora, Yair Poleg, Shmuel Peleg
  • Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras [pdf] - Kenneth Funes Mora, Jean-Marc Odobez
#1436 - Time Machine: Continuous Manifold Based Adaptation for Evolving Visual Domains [pdf]
Judy Hoffman, Trevor Darrell, Kate Saenko

Abstract: We pose the following question: what happens when test data not only differs from training data, but differs from it in a continually evolving way? The classic domain adaptation paradigm considers the world to be separated into stationary domains with clear boundaries between them. However, in many real-world applications, examples cannot be naturally separated into discrete domains, but arise from a continuously evolving underlying process. Examples include video with gradually changing lighting and spam email with evolving spammer tactics. We formulate a novel problem of adapting to such continuous domains, and present a solution based on smoothly varying embeddings. Recent work has shown the utility of considering discrete visual domains as fixed points embedded in a manifold of lower-dimensional subspaces. Adaptation can be achieved via transforms or kernels learned between such stationary source and target subspaces. We propose a method to consider non-stationary domains, which we refer to as Continuous Manifold Adaptation (CMA). We treat each target sample as potentially being drawn from a different subspace on the domain manifold, and present a novel technique for continuous transform-based adaptation. Our approach can learn to distinguish categories using training data collected at some point in the past, and continue to update its model of the categories for some time into the future, without receiving any additional labels. Experiments on two visual datasets demonst
Similar papers:
  • Transfer Joint Matching for Visual Domain Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Philip Yu
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
#1437 - Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf]
Mao Ye, Ruigang Yang

Abstract: In this paper we present a novel real-time algorithm for simultaneous pose and shape estimation for articulated objects, such as human beings and animals. The key of our pose estimation component is to embed the articulated deformation model with exponential-maps-based parametrization into a Gaussian Mixture Model. Benefiting from the probabilistic measurement model, our algorithm requires no explicit point correspondences as opposed to most existing methods. Consequently, our approach is less sensitive to local minimum and well handles fast and complex motions. Extensive evaluations on publicly available datasets demonstrate that our method outperforms most state-of-art pose estimation algorithms with large margin, especially in the case of challenging motions. Moreover, our novel shape adaptation algorithm based on the same probabilistic model automatically captures the shape of the subjects during the dynamic pose estimation process. Experiments show that our shape estimation method achieves comparable accuracy with state of the arts, yet requiring no extra calibration procedure.
Similar papers:
  • Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf] - Qing Zhang, BO FU
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
#1438 - Multi-modal Learning in Loosely-organized Web Images [pdf]
Kun Duan, David Crandall, Dhruv Batra

Abstract: Photo-sharing websites have become very popular in the last few years,leading to huge collections of online images. In addition to image data, these websites collect a variety of multi-modal metadata about photos including text tags, captions, GPS coordinates, camera metadata, user profiles, etc. However, this metadata is not well constrained and is often noisy, sparse, or missing altogether. In this paper, we propose a framework to model these "loosely organized" multi-modal datasets, and show how to perform loosely-supervised learning using a novel latent Conditional Random Field framework. We also show how to learn parameters of the LCRF automatically from a small set of validation data, using Information Theoretic Metric Learning (ITML) to learn distance functions and a structural SVM formulation to learn the potential functions. We apply our framework in four datasets of images from Flickr, evaluating our approach both qualitatively and quantitatively against several baselines.
Similar papers:
  • NMF-KNN: Image Annotation using Weighted Multi-view Non-Negative Matrix Factorization [pdf] - Mahdi Kalayeh, Haroon Idrees, Mubarak Shah
  • Tell Me What You See and I will Show You Where It Is [pdf] - Jia Xu, Alexander Schwing, Raquel Urtasun
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
#1440 - Visual Tracking Using Pertinent Patch Selection and Masking [pdf]
Dae-Youn Lee, Jae-Young Sim, Chang-Su Kim

Abstract: A novel visual tracking algorithm using patch-based appearance models is proposed in this paper. We first divide the bounding box of a target object into multiple patches and then select only pertinent patches, which occur repeatedly near the center of the bounding box, to construct the foreground appearance model. We also divide the input image into non-overlapping blocks, construct a background model at each block location, and integrate these background models for tracking. Using the appearance models, we obtain an accurate foreground probability map. Finally, we estimate the optimal object position by maximizing the likelihood, which is obtained by convolving the foreground probability map with the pertinence mask. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art tracking algorithms significantly in terms of center position errors and success rates.
Similar papers:
  • How to Evaluate Foreground Maps? [pdf] - Ran Margolin, Lihi Zelnik-Manor, Ayellet Tal
  • Joint Motion Segmentation and Background Subtraction in Dynamic Scenes [pdf] - Adeel Mumtaz, Weichen Zhang, Antoni Chan
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
#1441 - Image Reconstruction from Bag-of-Visual-Words [pdf]
Hiroharu Kato, Tatsuya Harada

Abstract: The objective of this paper is image reconstruction from the Bag-of-Visual-Words (BoVW), which is the de facto standard feature for image retrieval and recognition. Despite its wide use, no one has reconstructed an original image of BoVW. This task is challenging for two reasons: 1) BoVW contains quantization errors when local descriptors are assigned to visual words. 2) BoVW lacks geometry information of local descriptors when we count the occurrence of visual words by ignoring those locations. To tackle this difficult task, we use a large-scale image database to estimate the spatial arrangement of local descriptors; then this task creates a jigsaw puzzle problem with adjacency and global location costs of local descriptors. Solving this optimization problem is also challenging because it is known as an NP-Hard problem. We propose a heuristic but efficient method to optimize it. To underscore the effectiveness of our method, we apply it to BoVWs calculated from about 100 different categories, and demonstrate that our method surprisingly can reconstruct original images, although the image features lack spatial information and include quantization errors.
Similar papers:
  • Immediate, scalable object category detection [pdf] - Yusuf Aytar, Andrew Zisserman
  • Packing and Padding: Coupled Multi-index for Accurate Image Retrieval [pdf] - Liang Zheng, Shengjin Wang, Ziqiong Liu, Qi Tian
  • Who Do I Look Like? Determining Parent-Offspring Resemblance via Genetic Features [pdf] - Afshin Dehghan, Enrique Ortiz
  • Object Partitioning using Local Convexity [pdf] - Simon Christoph Stein, Jeremie Papon, Markus Schoeler, Florentin Woergoetter
#1444 - Rigid Motion Segmentation using Randomized Voting [pdf]
Heechul Jung, Jeongwoo Ju, Junmo Kim

Abstract: In this paper, we propose a novel rigid motion segmentation algorithm called randomized voting (RV). This algorithm is based on epipolar geometry, and computes a score using the distance between the feature point and the corresponding epipolar line. This score is accumulated and utilized for final grouping. Our algorithm basically deals with two frames, so it is also applicable to the two-view motion segmentation problem. For evaluation of our algorithm, Hopkins 155 dataset, which is a representative test set for rigid motion segmentation, is adopted; it consists of two and three rigid motions. Our algorithm has provided the most accurate motion segmentation results among all of the state-of-the-art algorithms. The average error rate is 0.77%. In addition, when there is measurement noise, our algorithm is comparable with other state-of-the-art algorithms.
Similar papers:
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • In Search of Inliers: 3D Correspondence by Local and Global Voting [pdf] - Anders Buch, Yang Yang, Norbert Krger, Henrik Petersen
#1445 - Salient Region Detection via High-Dimensional Color Transform [pdf]
Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim

Abstract: In this paper, we introduce a novel technique to automatically detect the salient region of an image via high-dimensional color transform. Our main idea is to represent a saliency map of an image as a linear combination of high-dimensional color space where salient regions and backgrounds can be distinctively separated. This is based on an observation that salient regions often have distinctive colors compared to the background in human perception, but human perception is often complicated and highly nonlinear. By mapping a low dimensional RGB color to a feature vector in a high-dimensional color space, we show that we can linearly separate the salient regions from the background by finding an optimal linear combination of color coefficients in the high-dimensional color space. Our high dimensional color space incorporates multiple color representations including RGB, CIELab, HSV and with gamma corrections to enrich its representative power. Our experimental results on three benchmark datasets show that our technique is effective, and it is computationally efficient in comparison to previous state-of-the-art techniques.
Similar papers:
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • Saliency Optimization from Robust Background Detection [pdf] - Wangjiang Zhu, Shuang Liang, Yichen Wei, Jian Sun
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#1459 - Blind Image Quality Assessment using Semi-supervised Rectifier Networks [pdf]
Huixuan Tang, Neel Joshi, Ashish Kapoor

Abstract: It is often desirable to evaluate the quality of images with a perceptually relevant measure that does not require a reference image. Recent approaches to this problem have used human provided quality scores with machine learning to learn a measure. The biggest hurdles to these efforts are: 1) the difficulty of generalizing across diverse types of distortions and 2) collecting the enormity of human scored training data that is needed to learn the measure. We present a new blind image quality measure that addresses these difficulties by learning a robust, nonlinear kernel regression function using a rectifier neural network. The method is pre-trained with unlabeled data and fine-tuned with labeled data. It generalizes across a large set of images and distortion types without the need for a large amount of labeled data. We evaluate our approach on two benchmark datasets and show that our method outperforms the current state of the art. Furthermore, we show that our semi-supervised approach is robust to using varying amounts of labeled data.
Similar papers:
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf] - Hilde Kuehne, Ali Arslan, Thomas Serre
  • The Shape-Time Random Field for Semantic Video Labeling [pdf] - Andrew Kae, Erik Learned-Miller, Benjamin Marlin
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
#1469 - Quality Assessment for Comparing Image Enhancement Algorithms [pdf]
Zhengying Chen, Tingting Jiang, Yonghong Tian

Abstract: As image enhancement algorithms are developed in recent years, how to compare the performances of different image enhancement algorithms becomes a novel task. In this paper, we propose a framework to do quality assessment for comparing image enhancement algorithms. Not like traditional image quality assessment approaches, we focus on the relative quality ranking between enhanced images rather than giving an absolute quality score for a single enhanced image. We construct a dataset which contains source images in bad visibility and their enhanced images processed by different enhancement algorithms, and then do subjective assessment in a pair-wise way to get the relative ranking of these enhanced images. A rank function is trained to fit the subjective assessment results, and can be used to predict ranks of new enhanced images which indicate the relative quality of enhancement algorithms. The experimental results show that our proposed approach statistically outperforms state-of-the-art general-purpose NR-IQA algorithms.
Similar papers:
  • Feature-Independent Action Spotting Without Human Localization, Segmentation or Frame-wise Tracking [pdf] - Chuan Sun, Hassan Foroosh
  • Similarity-Aware Patchwork Assembly for Depth Image Super-Resolution [pdf] - Jing Li, Zhichao Lu, Gang Zeng, Hongbin Zha
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • A Learning-to-Rank Approach for Image Color Enhancement [pdf] - Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang
#1474 - Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI [pdf]
YUANXIANG WANG, Hesamoddin Salehian, Guang Cheng, Baba Vemuri

Abstract: Tractography refers to the process of tracing out the nerve fiber bundles from diffusion Magnetic Resonance Images (dMRI) data acquired either in vivo or ex-vivo. Tractography is a mature research topic within the field of diffusion MRI analysis, nevertheless, several new methods are being proposed on a regular basis thereby justifying the need, as the problem is not fully solved. Tractography is usually applied to the model (used to represent the diffusion MR signal or a derived quantity) reconstructed from the acquired data. Separating shape and orientation of these models was previously shown to approximately preserve diffusion anisotropy (a useful bio-marker) in the ubiquitous problem of interpolation. However, no further intrinsic geometric properties of this framework were exploited to date in literature. In this paper, we propose a new intrinsic recursive filter on the product manifold of shape and orientation. The recursive filter, dubbed IUKFPro, is a generalization of the unscented Kalman filter (UKF) to this product manifold. The salient contributions of this work are: (1) A new intrinsic UKF for the product manifold of shape and orientation. (2) Derivation of the Riemannian geometry of the product manifold. (3) IUKFPro is tested on synthetic and real data sets from various tractography challenge competitions. From the experimental results, it is evident that IUKFPro performs better than several competing schemes in literature with regards to the some of the err
Similar papers:
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • On the quotient representation for the essential manifold [pdf] - Roberto Tron, Kostas Daniilidis
  • Model Transport: Towards Scalable Transfer Learning on Manifolds [pdf] - Oren Freifeld, Soren Hauberg, Michael Black
  • Covariance descriptors for 3D shape matching and retrieval [pdf] - Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
#1483 - Optimizing Average Precision using Weakly Supervised Data [pdf]
Aseem Behl, M. Pawan Kumar, C.V. Jawahar

Abstract: The performance of binary classification tasks, such as action classification and object detection, is often measured in terms of the average precision (AP). Yet it is common practice in computer vision to employ the support vector machine (SVM) classifier, which optimizes a surrogate 0-1 loss. The popularity of SVM can be attributed to its empirical performance. Specifically, in fully supervised settings, SVM tends to provide similar accuracy to the AP-SVM classifier, which directly optimizes an AP-based loss. However, we hypothesize that in the significantly more challenging and practically useful setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. In order to test this hypothesis, we propose a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over a weakly supervised dataset. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based binary classifiers on two challenging problems: action classification and character recognition.
Similar papers:
  • Latent Dictionary Learning for Sparse Representation based Classification [pdf] - Meng Yang, Luc Van Gool
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
  • Object Classification with Adaptive Regions [pdf] - Hakan Bilen, Marco Pedersoli, Vinay Namboodiri, Tinne Tuytelaars, Luc Van Gool
  • Tell Me What You See and I will Show You Where It Is [pdf] - Jia Xu, Alexander Schwing, Raquel Urtasun
#1488 - Pyramid-based Visual Tracking Using Sparsity Represented Mean Transform [pdf]
Zhe Zhang, Kin Hong Wong

Abstract: In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space and name this new method as mean transform. The mean transform method estimates the motion, including the location, orientation and scale, of the interested object window simultaneously and effectively. Secondly, a pixel-wise dense patch sampling technique and a region-wise trivial template designing scheme are introduced which enable our approach to run very accurately and efficiently. Additionally, instead of using either holistic representation or local representation only, we apply spatial pyramids by combining these two representations into our approach to deal with the partial occlusion problems robustly. Observed from the experimental results, our approach outperforms state-of-the-art methods in many benchmark sequences.
Similar papers:
  • Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf] - Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma
  • Speeding Up Tracking by Ignoring Features [pdf] - Lu Zhang, Hamdi Dibeklioglu, Laurens van der Maaten
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#1492 - Similarity Comparisons for Interactive Fine-Grained Categorization [pdf]
Catherine Wah, Grant Van Horn, Steven Branson, Subhransu Maji, Pietro Perona, Serge Belongie

Abstract: Current human-in-the-loop fine-grained visual categorization systems depend on a predefined vocabulary of attributes and parts, usually determined by experts. In this work, we move away from that expert-driven and attribute-centric paradigm and present a novel interactive classification system that incorporates computer vision and perceptual similarity metrics in a unified framework. At test time, users are asked to judge relative similarity between a query image and various sets of images; these general queries do not require expert-defined terminology and are applicable to other domains and basic-level categories, enabling a flexible, efficient, and scalable system for fine-grained categorization with humans in the loop. Our system outperforms existing state-of-the-art systems for relevance feedback-based image retrieval as well as interactive classification, resulting in a reduction of up to 43% in the average number of questions needed to correctly classify an image.
Similar papers:
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Error-tolerant Scribbles Based Interactive Image Segmentation [pdf] - Junjie Bai, Xiaodong Wu
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
#1497 - A Cause and Effect analysis of motion trajectories for modeling actions [pdf]
Sanath Narayan, Kalpathi Ramakrishnan

Abstract: An action is typically composed of different parts of the object moving in particular sequences. The presence of different motions (represented as a 1D histogram) has been used in the traditional bag-of-words (BoW) approach for recognizing actions. However the interactions among the motions also form a crucial part of an action. Different object-parts have varying degrees of interactions with the other parts during an action cycle. It is these interactions we want to quantify in order to bring in additional information about the actions. In this paper we propose a causality based approach for quantifying the interactions to aid action classification. Granger causality is used to compute the cause and effect relationships for pairs of motion trajectories of a video. A 2D histogram descriptor for the video is constructed using these pairwise measures. Our proposed method of obtaining pairwise measures for videos is also applicable for large datasets. We have conducted experiments on challenging action recognition databases such as HMDB51 and UCF50 and shown that our causality descriptor helps in encoding additional information regarding the actions and outperforms the state-of-the art approaches.
Similar papers:
  • Feature-Independent Action Spotting Without Human Localization, Segmentation or Frame-wise Tracking [pdf] - Chuan Sun, Hassan Foroosh
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#1500 - An Exemplar-based CRF for Multi-instance Object Segmentation [pdf]
Xuming He, Stephen Gould

Abstract: We address the problem of joint detection and segmentation of multiple object instances in an image, a key step towards scene understanding. Inspired by data-driven methods, we propose an exemplar-based approach to the task of instance segmentation, in which a set of reference image/shape masks is used to find multiple objects. We design a novel CRF framework that jointly models object appearance, shape deformation, and object occlusion. To tackle the challenging MAP inference problem, we derive an alternating procedure that interleaves object segmentation and shape/appearance adaptation. We evaluate our method on two datasets with instance labels and show promising results.
Similar papers:
  • Beta Process Multiple Kernel Learning [pdf] - Bingbing Ni, Pierre Moulin
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • Superpixel-grounded Deformable Part Models [pdf] - Eduard Trulls, Iasonas Kokkinos, Francesc Moreno-Noguer, Alberto Sanfeliu
  • Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf] - Joseph Tighe, Marc Niethammer, Svetlana Lazebnik
#1518 - Generating object segmentation proposals using global and local search [pdf]
Pekka Rantalankila, Juho Kannala, Esa Rahtu

Abstract: We present a method for generating object segmentation proposals from groups of superpixels. The goal is to propose accurate segmentations for all objects of an image. The proposed object hypotheses can be used as input to object detection systems and thereby improve efficiency by replacing exhaustive search. The segmentations are generated in a class-independent manner and therefore the computational cost of the approach is independent of the number of object classes. Our approach combines both global and local search in the space of sets of superpixels. The local search is implemented by greedily merging adjacent pairs of superpixels to build a bottom-up segmentation hierarchy. The regions from such a hierarchy directly provide a part of our region proposals. The global search provides the other part by performing a set of graph cut segmentations on a superpixel graph obtained from an intermediate level of the hierarchy. The parameters of the graph cut problems are learnt in such a manner that they provide complementary sets of regions. Experiments with Pascal VOC images show that we reach state-of-the-art with greatly reduced computational cost.
Similar papers:
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • Superpixel-grounded Deformable Part Models [pdf] - Eduard Trulls, Iasonas Kokkinos, Francesc Moreno-Noguer, Alberto Sanfeliu
#1530 - Learning Important Spatial Pooling Regions for Scene Classification [pdf]
DI LIN, Cewu Lu, Renjie Liao, Jiaya Jia

Abstract: We address the false response influence problem when learning and applying discriminative parts to construct the mid-level representation in scene classification. It is often caused by the complexity of latent image structure that yields false response when convolving part filters with input images. This problem makes mid-level representation, even after pooling, not distinct enough to classify input data correctly to scene categories. Our solution is to learn important spatial pooling regions along with their appearance. Our experiments show that this new framework significantly suppresses false response and produces good results on several datasets, including MIT-Indoor, 15-Scene, and UIUC 8-Sport. When combined with global image features, we achieve state-of-the-art performance.
Similar papers:
  • Learning Receptive Fields for Pooling from Tensors of Feature Response [pdf] - Can Xu, Nuno Vasconcelos
  • Orientational Pyramid Matching for Recognizing Indoor Scenes [pdf] - Lingxi Xie, Jingdong Wang, Bo Zhang, Qi Tian
  • Generalized Max Pooling [pdf] - Naila Murray, Florent Perronnin
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
#1531 - Multipoint Filtering with Local Polynomial Approximation and Range Guidance [pdf]
Xiao Tan, Changming Sun, Tuan Pham

Abstract: This paper presents a novel method for performing guided image filtering using multipoint local polynomial approximation (LPA) with range guidance. In our method, the LPA is extended from a pointwise model into a multipoint model for reliable filtering and better preserving image gradients which usually contain the essential information in the image to be filtered. In addition, we develop a scheme for generating a spatial adaptive support region around each point in constant time invariant to the size of the region. By using the hybrid of the local polynomial model and color/intensity based range guidance, the proposed method not only preserves edges but also does a much better job in preserving gradients than existing popular filtering methods. Our method proves to be effective in a number of applications: depth image upsampling, joint image de-noising, details enhancement, and image abstraction. Experimental results show that our method provides better results than state-of-the-art methods and it is also computationally efficient.
Similar papers:
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images [pdf] - Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu
  • Similarity-Aware Patchwork Assembly for Depth Image Super-Resolution [pdf] - Jing Li, Zhichao Lu, Gang Zeng, Hongbin Zha
  • Cross-Scale Cost Aggregation for Stereo Matching [pdf] - Kang Zhang, Yuqiang Fang, Dongbo Min, Lifeng Sun, Shiqiang Yang, Shuicheng Yan, Qi Tian
#1532 - Real-time Model-based Articulated Object Pose Detection and Tracking with Variable Rigidity Constraints [pdf]
Karl Pauwels, Leonardo Rubio, Eduardo Ros

Abstract: A novel model-based approach is introduced for real-time detecting and tracking of the pose of general articulated objects. A variety of dense motion and depth cues are integrated into a novel articulated Iterative Closest Point (ICP) approach. The proposed method can independently track the six-degrees-of-freedom pose of over a hundred of rigid parts in real-time while, at the same time, imposing articulation constraints on the relative motion of different parts. We propose a novel rigidization framework for optimally handling unobservable parts during tracking. This involves rigidly attaching the minimal amount of unseen parts to the rest of the structure in order to most effectively use the currently available knowledge. We show how this framework can be used also for detecting rather than tracking which allows for automatic system initialization or incorporating pose estimates obtained from independent object part detectors. Improved performance over alternative solutions is shown on real-world sequences.
Similar papers:
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • Better Feature Tracking Through Subspace Constraints [pdf] - Bryan Poling, Gilad Lerman, Arthur Szlam
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
#1538 - Separable Kernel for Image Deblurring [pdf]
Lu Fang, Haifeng Liu, Feng Wu

Abstract: In this paper, we deal with the image deblurring problem in a completely new perspective by proposing separable kernel to represent the inherent properties of the camera and scene system. Specifically, we decompose a blur kernel into three individual descriptors (trajectory, intensity and point spread function) so that they can be optimized separately. To demonstrate the advantages, we extract one-pixel-width trajectories of blur kernels and propose a random perturbation algorithm to optimize them but still keeping their continuity. For many cases, where current deblurring approaches fall into local minimum, excellent deblurred results and correct blur kernels can be obtained by individually optimizing the kernel trajectories. Our work strongly suggests that more constraints and priors should be introduced to blur kernels in solving the deblurring problem because blur kernels have lower dimensions than images.
Similar papers:
  • Discriminative Blur Detection Features [pdf] - Jianping Shi, Li Xu, Jiaya Jia
  • Total Variation Blind Deconvolution: The Devil is in the Details [pdf] - Daniele Perrone, Paolo Favaro
  • Joint Depth Estimation and Camera Shake Removal from Single Blurry Image [pdf] - Zhe Hu, Li Xu, Ming-Hsuan Yang
  • Blind Multi-Image Restoration [pdf] - Haichao Zhang
#1546 - Discrete-Continuous Depth Estimation from a Single Image [pdf]
Miaomiao Liu, Mathieu Salzmann, Xuming He

Abstract: In this paper, we tackle the problem of estimating the depth of a scene from a single image. This is a challenging task, since a single image on its own does not provide any depth cue. To address this, we exploit the availability of a pool of images for which the depth is known. More specifically, we formulate monocular depth estimation as a discrete-continuous optimization problem, where the continuous variables encode the depth of the superpixels in the input image, and the discrete ones represent relationships between neighboring superpixels. The solution to this discrete-continuous optimization problem is then obtained by performing inference in a graphical model using particle belief propagation. The unary potentials in this graphical model are computed by making use the images with known depth. We demonstrate the effectiveness of our model in both the indoor and outdoor scenarios. Our experimental evaluation shows that our depth estimates are more accurate than existing methods on standard datasets.
Similar papers:
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf] - Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool
#1548 - A Learning-to-Rank Approach for Image Color Enhancement [pdf]
Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang

Abstract: We present a machine-learned ranking approach for automatically enhancing the color of a photograph. Unlike previous techniques that train on pairs of images before and after adjustment by a human user, our method takes into account the intermediate steps taken in the enhancement process, which provide detailed information on the person's color preferences. To make use of this data, we formulate the color enhancement task as a learning-to-rank problem in which ordered pairs of images are used for training, and then various color enhancements of a novel input image can be evaluated from their corresponding rank values. From the parallels between the decision tree structures we use for ranking and the decisions made by a human during the editing process, we posit that breaking a full enhancement sequence into individual steps can facilitate training. Our experiments show that this approach compares well to existing methods for automatic color enhancement.
Similar papers:
  • Similarity Comparisons for Interactive Fine-Grained Categorization [pdf] - Catherine Wah, Grant Van Horn, Steven Branson, Subhransu Maji, Pietro Perona, Serge Belongie
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
  • Depth Enhancement via Low-rank Matrix Completion [pdf] - Si Lu, Xiaofeng Ren, Feng Liu
  • Quality Assessment for Comparing Image Enhancement Algorithms [pdf] - Zhengying Chen, Tingting Jiang, Yonghong Tian
#1552 - Gait Recognition under Speed Transition [pdf]
Al Mansur, Rasyid Aqmar, Yasushi Makihara, Yasushi Yagi

Abstract: This paper describes a method of gait recognition from accelerated or decelerated gait image sequences. As a speed change occurs due to a change of pitch (the first-order derivative of a phase, namely, a gait stance) and/or stride, we model this speed change using a cylindrical manifold whose azimuth and height corresponds to the phase and the stride, respectively. Radial basis function (RBF) interpolation framework is used to learn subject specific mapping matrices for mapping from manifold to image space. Given an input speed transited gait image sequence of a test subject, we estimate the mapping matrix of the test subject as well as the phase and stride sequence using an energy minimization framework. The following three points are considered: (1) fitness of the synthesized images to the input image sequence as well as to an eigenspace constructed by exemplars of training subjects; (2) smoothness of the phase and the stride sequence; and (3) pitch and stride fitness to the pitch-stride preference model. Using the estimated mapping matrix, we synthesize a constant-speed gait image sequence, and extract a conventional period-based gait feature from it for matching. We conducted experiments using real speed transited gait image sequences with 179 subjects and demonstrated the effectiveness of the proposed method.
Similar papers:
  • Dual Linear Regression Based Classification for Face Cluster Recognition [pdf] - Liang Chen
  • Multilabel Ranking with Inconsistent Rankers [pdf] - Xin Geng, Longrun Luo
  • Efficient feature extraction, encoding and classification\\ for action recognition [pdf] - Vadim Kantorov, Ivan Laptev
  • Human Body Shape Estimation Using a Multi-Resolution Manifold Forest [pdf] - Frank Perbet, Sam Johnson, Minh-Tri Pham, Bjrn Stenger
#1553 - Towards Unified Human Parsing and Pose Estimation [pdf]
Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan

Abstract: We study the problem of human body configuration analysis, more specifically, human parsing and human pose estimation. These two tasks, \ie identifying the semantic regions and body joints respectively over the human body image, are intrinsically highly correlated. However, previous works generally solve these two problems separately or iteratively. In this work, we propose a unified framework for simultaneous human parsing and pose estimation based on semantic parts. By utilizing Parselets~\cite{ICCV_2013_Parselet} and Mixture of Joint-Group Templates (MJGT) as the representations for these semantic parts, we seamlessly formulate the human parsing and pose estimation problem jointly within a unified framework via a tailored And-Or graph. A novel Grid Layout Feature is then designed to effectively capture the spatial co-occurrence/occlusion information between/within the Parselets and MJGTs. Thus the mutually complementary nature of these two tasks can be harnessed to boost the performance of each other. The resultant unified model can be solved using the structure learning framework in a principled way. Comprehensive evaluations on two benchmark datasets for both human parsing and pose estimation tasks demonstrate the effectiveness of the proposed framework when compared with the state-of-the-art methods.
Similar papers:
  • An Exemplar-based CRF for Multi-instance Object Segmentation [pdf] - Xuming He, Stephen Gould
  • Clothing Co-Parsing by Joint Image Segmentation and Labeling [pdf] - Wei Yang, Liang Lin, Ping Luo
  • Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf] - Joseph Tighe, Marc Niethammer, Svetlana Lazebnik
  • Efficient Structured Parsing of Facades Using Dynamic Programming [pdf] - Andrea Cohen, Alexander Schwing, Marc Pollefeys
#1554 - Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf]
Eleonora Vig, Michael Dorr, David Cox

Abstract: Saliency prediction typically relies on hand-crafted (multiscale) features that are combined in different ways to form a "master" saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incrementally adding more and more hand-tuned features (such as car or face detectors) to existing models. In contrast, we here follow an entirely automatic data-driven approach that performs a large scale search for optimal features. We identify those instances of a richly-parameterized bio-inspired model family (hierarchical neuromorphic networks) that successfully predict image saliency. Because of the high dimensionality of this parameter space, we use automated hyperparameter optimization to efficiently guide the search. The optimal blend of such multilayer features combined with a simple linear classifier achieves excellent performance on several image saliency benchmarks. Models outperform the state of the art on MIT1003, on which features and classifiers are learned. Without additional training, these models generalize well to two other image saliency data sets, Toronto and NUSEF, despite their different image content. Finally, our algorithm scores best of all the 19 models evaluated to date on the MIT300 saliency challenge, which uses a hidden test set to facilitate an unbiased comparison.
Similar papers:
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • A Reverse Hierarchy Model for Predicting Eye Fixations [pdf] - Tianlin Shi, Xiaolin Hu, Ming Liang
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#1556 - NMF-KNN: Image Annotation using Weighted Multi-view Non-Negative Matrix Factorization [pdf]
Mahdi Kalayeh, Haroon Idrees, Mubarak Shah

Abstract: The real world image databases such as Flickr are characterized by continuous addition of new images. The recent approaches for image annotation - the problem of assigning tags to images - have two major drawbacks. First, either models are learned using the entire training data, or to handle the issue of dataset imbalance, tag-specific discriminative models are trained. Such models become obsolete and require relearning when new images and tags are added to database. Second, the task of feature-fusion is typically dealt using ad hoc approaches. In this paper, we present a weighted extension of Multi-view Non-Negative Matrix Factorization (NMF) to address the aforementioned drawbacks. The key idea is to learn query-specific generative model on the features of nearest neighbors using the proposed NMF-KNN which imposes consensus constraint on the coefficient matrices across different features. This results in coefficient vectors across features to be consistent and, thus, naturally solves the problem of feature fusion while the weight matrices introduced in the proposed formulation alleviate the issue of dataset imbalance. Furthermore, our approach, being query-specific, is agnostic to addition of images and tags in a database. We tested our method on two datasets used for evaluation of image annotation and obtained competitive results.
Similar papers:
  • Multi-modal Learning in Loosely-organized Web Images [pdf] - Kun Duan, David Crandall, Dhruv Batra
  • Robust Refinement of GPS-Tags Using RandomWalks with an Adaptive Damping Factor [pdf] - Amir Roshan Zamir
  • Tell Me What You See and I will Show You Where It Is [pdf] - Jia Xu, Alexander Schwing, Raquel Urtasun
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
#1558 - A Depth-Aware Descriptor for Action Recognition [pdf]
Cewu Lu, Jiaya Jia, Chi-keung Tang

Abstract: We develop a binary action descriptor that is depthaware and thus achieves for the same action type good invariance under varying time, scale, viewpoint, rotation and background. It is robust to occlusion and data corruption as well. The descriptor runs very fast thanks to its binary feature. Working together with standard learning algorithm, the proposed descriptor achieves state-of-the-art or even better performance on benchmark datasets in our extensive experimental validation with impressive time performance.
Similar papers:
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
  • Actionness Ranking with Lattice Conditional Ordinal Random Fields [pdf] - Wei Chen, Caimgin Xiong, Jason Corso
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
#1565 - Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf]
Waqas Sultani, Imran Saleemi

Abstract: This paper attempts to address the problem of recognizing human actions while training and testing on distinct datasets, when test videos are neither labeled nor available during training. In this scenario, learning of a joint vocabulary, or domain transfer techniques are not applicable. In the process of attempting the problem at hand, we explore the reasons for poor classifier performance when tested on novel datasets, and quantify the effect of scene backgrounds on action representations and recognition. We perform different types of partitioning of the gist feature space for several datasets and compute measures of background scene complexity, as well as, for the extent to which scenes are helpful in action classification. We then propose a new process to obtain a measure of confidence in each pixel of the video being a foreground region, using motion, appearance, and saliency together in a 3D MRF based framework. We also propose multiple ways to exploit the foreground confidence: to improve bag-of-words vocabulary, histogram representation of a video, and a novel histogram decomposition based representation and kernel. We have performed extensive experiments on several datasets that improve recognition accuracy, especially when training and testing across datasets, as compared to baseline methods.
Similar papers:
  • Actionness Ranking with Lattice Conditional Ordinal Random Fields [pdf] - Wei Chen, Caimgin Xiong, Jason Corso
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#1566 - Error-tolerant Scribbles Based Interactive Image Segmentation [pdf]
Junjie Bai, Xiaodong Wu

Abstract: Scribbles in scribble-based interactive segmentation such as graph-cut are usually assumed to be perfectly accurate, \ie, foreground scribble pixels will never be segmented as background in the final segmentation. However, it can be hard to draw perfectly accurate scribbles, especially on fine structures of the image or on mobile touch-screen devices. In this paper, we propose a novel ratio energy function that tolerates errors in the user input while encouraging maximum use of the user input information. More specifically, the ratio energy aims to minimize the graph-cut energy while maximizing the user input respected in the segmentation. The ratio energy function can be exactly optimized using an efficient iterated graph cut algorithm. The robustness of the proposed method is validated on the GrabCut dataset using both synthetic scribbles and manual scribbles. The experimental results show that the proposed algorithm is robust to the errors in the user input and preserves the ``anchoring'' capability of the user input.
Similar papers:
  • Object-based Multiple Foreground Video Co-segmentation [pdf] - Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
  • Semantic Object Selection [pdf] - Ejaz Ahmed, Scott Cohen, Brian Price
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • Similarity Comparisons for Interactive Fine-Grained Categorization [pdf] - Catherine Wah, Grant Van Horn, Steven Branson, Subhransu Maji, Pietro Perona, Serge Belongie
#1567 - Efficient Squared Curvature [pdf]
Claudia Nieuwenhuis, Eno Toeppe, Lena Gorelick, Olga Veksler, Yuri Boykov

Abstract: Curvature has received increased attention as an important alternative to length based regularization in computer vision. In contrast to length, it preserves elongated structures and fine details. Existing approaches are either inefficient, or have low angular resolution and yield results with strong block artifacts. We derive a new model for computing squared curvature based on integral geometry. The model counts responses of straight line triple cliques. The corresponding energy decomposes into submodular and supermodular pairwise potentials. We show that this energy can be efficiently minimized even for high angular resolutions using the trust region framework. Our results confirm that we obtain accurate and visually pleasing solutions without strong artifacts at reasonable runtimes.
Similar papers:
  • Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf] - Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra
  • Multi Label Generic Cuts: Optimal Inference in Multi Label Multi Clique MRF-MAP Problems [pdf] - Chetan Arora, S.N. Maheshwari
  • Is Rotation a Nuisance in Shape Recognition? [pdf] - Qiuhong Ke, Yi Li
  • Noising versus Smoothing for Vertex Identification in Unknown Shapes [pdf] - Konstantinos Raftopoulos, Marin Ferecatu
#1570 - Backscatter Compensated Photometric Stereo with 3 Sources [pdf]
Chourmouzios Tsiotsios, Maria Angelopoulou, Tae-Kyun Kim, Andrew Davison

Abstract: Photometric stereo offers the possibility of object shape reconstruction via reasoning about the amount of light reflected from oriented surfaces. However, in murky media such as sea water, the illuminating light interacts with the medium and some of it is backscattered towards the camera. Due to this additive light component, the standard Photometric Stereo equations lead to poor quality shape estimation. Previous authors have attempted to reformulate the approach but have either neglected backscatter entirely or disregarded its non-uniformity on the sensor when camera and lights are close to each other. We show that by compensating effectively for the backscatter component, a linear formulation of Photometric Stereo is allowed which recovers an accurate normal map using only 3 lights. Our backscatter compensation method for point-sources can be used for estimating the uneven backscatter directly from single images without any prior knowledge about the characteristics of the medium or the scene. We support our method comparing with previous approaches through extensive experimental results, where a variety of objects are imaged in a big water tank whose turbidity is systematically increased, and show reconstruction quality which degrades little relative to clean water results even with a very significant scattering level.
Similar papers:
  • Aliasing Detection and Reduction in Plenoptic Imaging [pdf] - Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
#1588 - Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification [pdf]
Luping Zhou, Lei Wang, Philip Ogunbona

Abstract: Recent studies show that mental disorders change the functional organization of the brain, which could be investigated via various imaging techniques. Analyzing such changes is becoming critical as it could provide new biomarkers for diagnosing and monitoring the progression of the diseases. Functional connectivity analysis studies the covary activity of neuronal populations in different brain regions. The sparse inverse covariance estimation (SICE), also known as graphical LASSO, is one of the most important tools for functional connectivity analysis, which estimates the interregional partial correlations of the brain. Although being increasingly used for predicting mental disorders, SICE is basically a generative method that may not necessarily perform well on classifying neuroimaging data. In this paper, we propose a learning framework to effectively improve the discriminative power of SICEs by taking advantage of the samples in the opposite class. We formulate our objective as convex optimization problems for both one-class and two-class classifications. By analyzing these optimization problems, we not only solve them efficiently in their dual form, but also gain insights into this new learning framework. The proposed framework is applied to analyzing the brain metabolic covariant networks built upon FDG-PET images for the prediction of the Alzheimer's disease, and shows significant improvement of classification performance for both one-class and two-class scenarios.
Similar papers:
  • Learning-Based Atlas Selection for Multiple-Atlas Segmentation [pdf] - Gerard Sanroma, Guorong Wu, Yaozong Gao, Dinggang Shen
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Joint Unsupervised Multi-Class Image Segmentation [pdf] - Fan Wang, Qixing Huang, Maks Ovsjanikov, Leonidas J. Guibas
  • Iterative Multilevel MRF Leveraging Context and Voxel Information for Brain Tumour Segmentation in MRI [pdf] - Nagesh Subbanna, Doina Precup, Tal Arbel
#1594 - FAST LABEL: Easy and Efficient Optimization of Joint Multi-Label and Estimation Problems [pdf]
Byung-Woo Hong, Ganesh Sundaramoorthi

Abstract: In this paper, we derive an easy-to-implement and efficient algorithm for solving multi-label image partitioning problems (specifically for the problem addressed by Region Competition) where it is desired to jointly determine a parameter for each of the regions defined by the partition. Given an estimate of the parameters, a fast approximate solution to the multi-label sub-problem is derived by a global update using simple smoothing and thresholding steps. The method is empirically validated to be robust to fine details of the image that plague local solutions. Further, in comparison to global methods for the multi-label problem, our method is more efficient and it is easy for a non-specialist to implement. Indeed, we give sample Matlab code for the multi-label Chan-Vese problem in this paper! We perform experiments to compare the proposed method to the state-of-the-art in multi-label solutions to Region Competition and show our method achieves equal or better accuracy, with the advantage being speed and ease of implementation.
Similar papers:
  • Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification [pdf] - Luping Zhou, Lei Wang, Philip Ogunbona
  • A Convex Relaxation of Ambrosio-Tortorelli's Elliptic Functional \\for the Mumford-Shah Functional [pdf] - Youngwook Kee, Junmo Kim
  • Pseudoconvex Proximal Splitting for $L_\infty$ Problems in Multiview Geometry [pdf] - Anders Eriksson
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
#1600 - Dense Semantic Image Segmentation with Objects and Attributes [pdf]
Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr

Abstract: The concepts of objects and attributes are both important for precisely describing images, since verbal descriptions often contain both adjectives and nouns (e.g. `I see a shiny red wall'). In this paper, we formulate the problem of joint visual attribute and object class image segmentation as a dense multi-labeling problem, where each pixel in an image can be associated with both an object-class and a set of visual attributes labels. In order to learn the label correlations, we adopt a boosting based piecewise training approach with respect to the visual appearance and co-occurrence cues. We use a filtering-based mean-field approximation approach for efficient joint inference. Further, we develop a hierarchical model to incorporate region-level object and attribute information. Experiments on the aPascal, CORE and attribute augmented NYU indoor scenes datasets show that the proposed approach is able to achieve state-of-the-art results.
Similar papers:
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Inferring Analogous Attributes [pdf] - Chao-Yeh Chen, Kristen Grauman
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
#1603 - Learning Non-Linear Reconstruction Models for Image Set Classification [pdf]
Munawar Hayat, mohammed Bennamoun, Senjian An

Abstract: We propose a deep learning framework for image set classification with application to face recognition. An Adaptive Deep Network Template (ADNT) is defined whose parameters are initialized by performing unsupervised pre-training in a layer-wise fashion using Gaussian Restricted Boltzmann Machines (GRBMs). The pre-initialized ADNT is then separately trained for images of each class and class-specific models are learnt. Based on the minimum reconstruction error from the learnt class-specific models, a majority voting strategy is used for classification. The proposed framework is extensively evaluated for the task of image set classification based face recognition on Honda/UCSD, CMU Mobo, YouTube Celebrities and a Kinect dataset. Our experimental results show that the proposed method achieves the best performance on all datasets with a 9% relative increase in the performance compared with the existing state-of-the-art for the challenging YouTube Celebrities dataset.
Similar papers:
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf] - Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
#1606 - Depth and Skeleton Associated Action Recognition without Online Accessible RGB-D Cameras [pdf]
Yen-Yu Lin, Ju-Hsuan Hua, Nick Tang, Min-Hung Chen, Hong-Yuan Liao

Abstract: The recent advances in RGB-D cameras have allowed us to better solve increasingly complex computer vision tasks. However, modern RGB-D cameras are still restricted by the short effective distances. The limitation may make RGB-D cameras not online accessible in practice, and degrade their applicability. We propose an alternative scenario to address this problem, and illustrate it with the application to action recognition. We use Kinect to offline collect an auxiliary, multi-modal database, in which not only the RGB videos but also the depth maps and skeleton structures of actions of interest are available. Our approach aims to enhance action recognition in RGB videos by leveraging the extra database. Specifically, it optimizes a feature transformation, by which the actions to be recognized can be concisely reconstructed by entries in the auxiliary database. In this way, the inter-database variations are adapted. More importantly, each action can be augmented with additional depth and skeleton images retrieved from the auxiliary database. The proposed approach has been evaluated on three benchmarks of action recognition. The promising results manifest that the augmented depth and skeleton features can lead to remarkable boost in recognition accuracy.
Similar papers:
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
#1609 - Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf]
Matthew Berger, Lee Seversky

Abstract: Long-term modeling of background motion in videos is an important and challenging problem used in numerous applications such as segmentation and event recognition. A major challenge in modeling the background from point trajectories lies in dealing with the variable length duration of trajectories, which can be due to such factors as trajectories entering and leaving the frame or occlusion from different depth layers. This work proposes an online method for background modeling of dynamic point trajectories via tracking of a linear subspace describing the background motion. To cope with variability in trajectory durations, we cast subspace tracking as an instance of subspace estimation under missing data, using a least-absolute deviations formulation to robustly estimate the background in the presence of arbitrary foreground motion. Relative to previous works, our approach is extremely fast and scales to arbitrarily long videos by processing new frames as they arrive in a sequential fashion.
Similar papers:
  • Subspace Clustering for Sequential Data [pdf] - Stephen Tierney, Junbin Gao, Yi Guo
  • SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization [pdf] - Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun
  • Bi-label Propagation for Generic Multiple Object Tracking [pdf] - Wenhan Luo, Tae-Kyun Kim, Bjrn Stenger, Xiaowei Zhao, Roberto Cipolla
  • Better Feature Tracking Through Subspace Constraints [pdf] - Bryan Poling, Gilad Lerman, Arthur Szlam
#1611 - Adaptive Object Retrieval with Kernel Reconstructive Hashing [pdf]
Haichuan Yang, Xiao Bai, Jun Zhou, Peng Ren, Jian Cheng, Zhihong Zhang

Abstract: Hashing is very useful for fast approximate similarity search on large database. In the unsupervised settings, most hashing methods aim at preserving the similarity defined by Euclidean distance. Hash codes generated by these approaches only keep their Hamming distance corresponding to the pairwise Euclidean distance, ignoring the local distribution of each data point. This objective does not hold for k-nearest neighbors search. In this paper, we firstly propose a new adaptive similarity measure which is consistent with k-NN search, and prove that it leads to a valid kernel. Then we propose a hashing scheme which uses binary codes to preserve the kernel function. Using low-rank approximation, our hashing framework is more effective than existing methods that preserve similarity over arbitrary kernel. The proposed kernel function, hashing framework, and their combination have demonstrated significant advantages compared with several state-of-the-art methods.
Similar papers:
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Locally Linear Hashing for Extracting Non-Linear Manifolds [pdf] - Go Irie, Zhenguo Li, Xiao-Ming Wu, Shi-Fu Chang
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
#1643 - Learning Inhomogeneous FRAME Models for Object Patterns [pdf]
Jianwen Xie, Wenze Hu, Song Chun Zhu, Ying Nian Wu

Abstract: The FRAME (Filters, Random field, And Maximum Entropy) model is a spatially stationary (homogeneous) Markov random field model for texture patterns. The model is a maximum entropy distribution that reproduces the observed marginal histograms of responses from a bank of filters, where the histograms are pooled spatially over all the image pixels. In this article, we investigate an inhomogeneous version of the FRAME model and apply it to modeling object patterns. The inhomogeneous FRAME is a non-stationary Markov random field model that reproduces the observed distributions or statistics of filter responses at all the individual locations, scales and orientations without spatial pooling. Our experiments show that the inhomogeneous FRAME model is capable of generating a wide variety of object patterns in natural images. We then propose a sparsified version of the inhomogeneous FRAME model where the model reproduces observed statistical properties at selected locations, scales and orientations. We propose to select these locations, scales and orientations by a shared sparse coding scheme, and we explore the connection between the sparse FRAME model and the linear additive sparse coding model. Our experiments show that it is possible to learn sparse FRAME models in unsupervised fashion and the learned models are useful for object classification.
Similar papers:
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
  • Fast and Robust Archetypal Analysis for Representation Learning [pdf] - Yuansi Chen, Julien Mairal, Zaid Harchaoui
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
#1646 - Automatic Feature Learning for Robust Shadow Detection [pdf]
Salman Khan, mohammed Bennamoun, Ferdous Sohel, Roberto Togneri

Abstract: We present a practical framework to automatically detect shadows in real world scenes from a single photograph. Previous works on shadow detection put a lot of effort in designing shadow variant and invariant hand-crafted features. In contrast, our framework automatically learns the most relevant features in a supervised manner using multiple convolutional deep neural networks (ConvNets). The 7-layer network architecture of each ConvNet consists of alternating convolution and sub-sampling layers. The proposed framework learns features at the super-pixel level and along the object boundaries. In both cases, features are extracted using a context aware window centered at interest points. The predicted posteriors based on the learned features are fed to a conditional random field model to generate smooth shadow contours. Our proposed framework consistently performed better than the state-of-the-art on all major shadow databases (collected under a variety of conditions).
Similar papers:
  • Deep Learning Hidden Identity Features for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Discriminative Deep Metric Learning for Face Verification in the Wild [pdf] - Junlin Hu, Jiwen Lu, Yap-Peng Tan
  • Two-Class Weather Labeling [pdf] - Cewu Lu, DI LIN, Jiaya Jia, Chi-keung Tang
  • Shadow Removal from Single RGB-D Images [pdf] - Yao Xiao, Efstratios Tsougenis, Chi-keung Tang
#1658 - In Search of Inliers: 3D Correspondence by Local and Global Voting [pdf]
Anders Buch, Yang Yang, Norbert Krger, Henrik Petersen

Abstract: We present a method for finding correspondence between 3D models. From an initial set of feature correspondences, our method uses a fast voting scheme to separate the inliers from the outliers. The novelty of our method lies in the use of a combination of local and global constraints to determine if a vote should be cast. On a local scale, we use simple, low-level geometric invariants. On a global scale, we apply covariant constraints for finding compatible correspondences. We guide the sampling for collecting voters by downward dependencies on previous voting stages. All of this together results in an accurate matching procedure. We evaluate our algorithm by controlled and comparative testing on different datasets, giving superior performance compared to state of the art methods. In a final experiment, we apply our method for 3D object detection, showing potential use of our method within higher-level vision.
Similar papers:
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf] - Yusuke Yoshiyasu
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Rigid Motion Segmentation using Randomized Voting [pdf] - Heechul Jung, Jeongwoo Ju, Junmo Kim
#1662 - Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf]
Ilya Kostrikov, Esther Horbert, Bastian Leibe

Abstract: In this paper, we propose a novel labeling cost for globally optimal continuous optimization for multi-view reconstruction. Existing approaches use data terms with specific weaknesses that are vulnerable to common challenges, such as low-textured regions or specularities. Our new probabilistic method implicitly discards outliers and can be shown to become more exact the closer we get to the true object surface. Our approach achieves top results among all published methods on the Middlebury DINO SPARSE dataset and also delivers accurate results on several other datasets with widely varying challenges, for which it works in unchanged form.
Similar papers:
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Class Specific 3D Object Shape Priors Using Surface Normals [pdf] - Christian Hne, Nikolay Savinov, Marc Pollefeys
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
#1672 - Unified Face Analysis by Iterative Multi-Output Random Forests [pdf]
Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo

Abstract: In this paper, we present a unified method for face image analysis, i.e., jointly estimating facial pose, expression and detecting facial landmarks in real-world facial images. The relations among the tasks are fully exploited to boost the performance of each task. To achieve this goal, we cast it as a joint probability estimation problem and propose an iterative Multi-Output Random Forests (iMORF) algorithm. Specifically, a hierarchical face analysis forest is learned to perform classification of head pose and facial expression at the top level. With the latent shape prior provided by the estimated head pose and facial expression, more accurate facial landmark detection is obtained at the bottom level. Once we get the prediction of facial landmarks, the shape-related geometric features are extracted together with the image appearance features to further improve the estimation of the head pose and facial expression. These two steps for pose/expression and landmark, are iterated until convergence, i.e., no change in estimated landmark positions. Experiments on publicly available real world face datasets demonstrate that the performance of all individual tasks is greatly improved by our iMORF algorithm, and our method outperforms state-of-the-arts.
Similar papers:
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
#1678 - SphereFlow: 6 DoF Scene Flow from RGB-D Pairs [pdf]
Michael Hornacek, Andrew Fitzgibbon, Margrit Gelautz, Carsten Rother

Abstract: We address the problem of computing dense scene flow between a pair of consecutive RGB-D frames. We seek correspondences between the two frames with respect to patches of 3D points that we identify as the inliers of spheres. Our main contribution is to show that by reasoning in terms of such patches under 6 DoF rigid body motions in 3D, we succeed in obtaining compelling results without relying on either of two simplifying assumptions that permeate much of the earlier literature: brightness constancy or local surface planarity. As a consequence, our output is a dense field of 6 DoF 3D rigid body motions, in contrast to the 3D translations that are the norm in scene flow. Reasoning in terms of 6 DoF motions additionally allows us to introduce a 6 DoF consistency check for the flow computed in both directions, a patchwise silhouette check to help reason about alignments in occlusion areas, and an intuitive local rigidity prior to promote smoothness of the flow fields. We carry out our optimization in two steps, obtaining a first correspondence field using PatchMatch, and subsequently using $\alpha$-expansion to jointly handle occlusions and regularize the field. We show attractive flow results on challenging synthetic and real-world scenes that push the practical limits of the aforementioned assumptions.
Similar papers:
  • DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences [pdf] - Hongsheng Yang, Wen-Yan Lin, Jiangbo Lu
  • A Compositional Model for Low-Dimensional Image Set Representation [pdf] - Hossein Mobahi, Ce Liu, Bill Freeman
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
#1685 - Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf]
Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja

Abstract: Part-based visual tracking is advantageous due to its robustness against partial occlusion. However, how to effectively exploit the confidence scores of individual parts to construct a robust tracker is still a challenging problem. % In this paper, we address this problem by simultaneously matching parts in each of multiple frames, which is realized by a locality-constrained low-rank sparse learning method that establishes multi-frame part correspondences through optimization of partial permutation matrices. % The proposed part matching tracker (PMT) has a number of attractive properties. (1) It exploits the spatial-temporal locality-constrained property for robust part matching. (2) It matches local parts from multiple frames jointly by considering their low-rank and sparse structure information, which can effectively handle part appearance variations due to occlusion or noise. (3) The proposed PMT model has the inbuilt mechanism of leveraging multi-mode target templates, so that the dilemma of template updating when encountering occlusion in tracking can be better handled. This contrasts with existing methods that only do part matching between a pair of frames. % We evaluate PMT and compare with $10$ popular state-of-the-art methods on challenging benchmarks. Experimental results show that PMT consistently outperform these existing trackers.
Similar papers:
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf] - Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
#1687 - Finding Matches in a Haystack: A Max-Pooling Strategy for Graph Matching in the Presence of Outliers [pdf]
Minsu Cho, Jian Sun, Jean Ponce

Abstract: A major challenge in real-world matching problems is to tolerate the numerous outliers arising in typical visual tasks. Variations in object appearance, shape, and structure within the same object class make it hard to distinguish inliers from outliers due to clutters. In this paper, we propose a novel approach to graph matching, which is not only resilient to deformations but also remarkably tolerant to outliers. By adopting a max-pooling strategy within the graph matching framework, the proposed algorithm evaluates each candidate match using its most promising neighbors, and gradually propagates the corresponding scores to update the neighbors. As final output, it assigns a reliable score to each match together with its supporting neighbors, thus providing contextual information for further verification. We demonstrate the robustness and utility of our method with synthetic and real image experiments.
Similar papers:
  • Generalized Max Pooling [pdf] - Naila Murray, Florent Perronnin
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
  • Asymmetrical Gauss Mixture Models for Point Sets Matching [pdf] - Wenbing Tao, Kun Sun
#1690 - Histograms of Pattern Sets for Image Classification and Object Recognition [pdf]
Winn Voravuthikunchai, bruno Cremilleux, Frederic Jurie

Abstract: This paper introduces a novel image representation capturing feature dependencies through the mining of meaningful combinations of visual features. This representation leads to a compact and discriminative encoding of images that can be used for image classification, object detection or object recognition. The method relies on (i) multiple random projections of the input space followed by local binarization of projected histograms encoded as sets of items, and (ii) the representation of images as Histograms of Pattern Sets (HoPS). The approach is validated on four publicly available datasets (Daimler Pedestrian Classification, Oxford Flowers Classification, KTH Texture Categorization, PASCAL VOC2007), allowing comparisons with many recent approaches. The proposed image representation reaches state-of-the-art performance on each of these datasets.
Similar papers:
  • Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf] - Jiejie Zhu
  • A Study on Cross-Population Age Estimation [pdf] - Chao Zhang, Guodong Guo
  • Unsupervised Learning for Graph Matching: An Attempt to Define and Extract Soft Attributed Patterns [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
  • Lacunarity Analysis on Image Patterns for Texture Classification [pdf] - Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
#1694 - Robust 3D Tracking with Descriptor Fields [pdf]
Alberto Crivellaro, Vincent Lepetit

Abstract: We introduce a method that can register challenging specular and poorly textured 3D environments, on which previous approaches fail. We assume that a small set of reference images of the environment and a partial 3D model is already available. Like previous approaches, we register the input images by aligning them with one of the reference images using the 3D information. However, previous approaches typically rely on the pixel intensities for the alignment, which is prone to fail in presence of specularities or in absence of texture. A key component of our approach is an efficient novel local descriptor that we use to describe each image location. We show that we can rely on this descriptor in place of the intensity to significantly improve the alignment robustness at a minor increase of the computational cost, and we explain why our descriptor performs so well.
Similar papers:
  • Learning-Based Atlas Selection for Multiple-Atlas Segmentation [pdf] - Gerard Sanroma, Guorong Wu, Yaozong Gao, Dinggang Shen
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
#1700 - Using a deformation field model for localizing faces and facial points under weak supervision [pdf]
Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool

Abstract: Face detection and facial points localization are interconnected tasks. Recently it has been shown that solving these two tasks jointly with a mixture of trees of parts (MTP) leads to state-of-the-art results. However, MTP and most of the methods for facial point localization proposed so far, requires a complete annotation of the training data at facial point level. This is used to predefine the structure of the trees and to place the parts correctly. In this work we extend the mixtures from trees to more general loopy graphs. In this way we can learn in a weakly supervised manner (using only the face location and orientation) a powerful deformable detector that implicitly aligns its parts to the detected face in the image. By attaching some reference points to the correct parts of our detector we can then localize the facial points. In terms of detection our method clearly outperforms the state-of-the-art even if competing with methods that use facial point annotations during training. Additionally, without any facial point annotation at the level of individual training images, our method can localize facial points with an accuracy similar to fully supervised approaches.
Similar papers:
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
#1701 - Seeing What You're Told: Sentence-Guided Activity Recognition In Video [pdf]
Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Siskind

Abstract: We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.
Similar papers:
  • DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf] - Chen Sun, Ram Nevatia
  • Persistent Tracking for Wide Area Aerial Surveillance [pdf] - Jan Prokaj, Gerard Medioni
  • Visual Semantic Search: Retrieving Videos via Complex Textual Queries [pdf] - Dahua Lin, Sanja Fidler, Chen Kong, Raquel Urtasun
  • What are you talking about? Text-to-Image Co-reference [pdf] - Chen Kong, Sanja Fidler, Mohit Bansal, Dahua Lin, Raquel Urtasun
#1709 - Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf]
Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi

Abstract: As compared to the conventional RGB or gray-scale image, the multispectral image (MSI) helps to deliver more faithful representation for real scenes, and greatly enhances the performance of many computer vision tasks. In practice, however, an MSI is always corrupted by various noises. In this paper we propose an effective MSI denoising approach by combinatorially considering two intrinsic characteristics underlying an MSI: the nonlocal similarity over space and the global correlation across spectrum. In specific, through explicitly considering spatial self-similarity of an MSI we construct a nonlocal tensor dictionary learning model with a group-block-sparsity constraint, which helps set similar full-band patches (FBP) share the same atoms from the spatial and spectral dictionaries. Furthermore, through exploiting spectral correlation of an MSI and assuming over-redundancy of dictionaries, the constrained nonlocal MSI dictionary learning model can be decomposed into a series of unconstrained low-rank tensor approximation problems, and can be readily solved by off-the-shelf higher order statistics. Experimental results show that our method outperforms all state-of-the-art MSI denoising methods under comprehensive quantitative performance measures.
Similar papers:
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images [pdf] - Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu
  • Weighted Nuclear Norm Minimization with Application to Image Denoising [pdf] - Shuhang Gu, Lei Zhang, Xiangchu Feng, Wangmeng Zuo
#1711 - Class Specific 3D Object Shape Priors Using Surface Normals [pdf]
Christian Hne, Nikolay Savinov, Marc Pollefeys

Abstract: Dense 3D reconstruction of real world objects containing textureless, reflective and specular parts is a challenging task. Using general smoothness priors such as surface area regularization can lead to defects in form of disconnected parts or unwanted indentations. We argue that this problem can be solved by exploiting the object class specific local surface orientations, e.g. a car is always close to horizontal in the roof area. Therefore, we formulate an object class specific shape prior in form of spatially varying anisotropic smoothness terms. The parameters of the shape prior are extracted from training data. We detail how our shape prior formulation directly fits into recently proposed volumetric multi-label reconstruction approaches. This allows a segmentation between the object and its supporting ground. In our experimental evaluation we show reconstructions using our trained shape prior on several challenging datasets.
Similar papers:
  • Occluding Contours for Multi-View Stereo [pdf] - Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, Steve Seitz
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
  • High Quality Photometric Reconstruction using a Depth Camera [pdf] - Avishek Chatterjee, Sk Mohammadul Haque, Venu Madhav Govindu
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
#1729 - Human Shape and Pose Tracking Using Keyframes [pdf]
Chun-Hao Huang, Edmond Boyer, Slobodan Ilic

Abstract: This paper considers human motion tracking with multi-view set-ups and investigates a robust strategy that learns online key poses to drive a shape tracking method. The interest arises with realistic dynamic scenes where occlusions or segmentation errors occur. The resulting corrupted observations present missing data and outliers that deteriorate tracking results. In order to cope with such data we propose to use key poses of the tracked person as multiple reference models. In contrast to many existing approaches that rely on a single reference model, multiple templates represent a larger variability of the human poses. They can provide therefore better initial hypotheses when tracking with ambiguous and noisy data. Our approach identifies these reference models online, during tracking, as distinctive keyframes. The most suitable one is then chosen as the reference model for the tracking initialization at each frame. In addition, taking advantage of the proximity between successive frames, an efficient outlier handling technique is proposed to prevent the model from associating to irrelevant outliers. The two strategies are successfully experimented with a surface deformation framework that estimates both the pose and the shape. Evaluations and comparisons on existing datasets also demonstrate the benefit of the approach with respect to the state of the art.
Similar papers:
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
#1737 - Speeding Up Tracking by Ignoring Features [pdf]
Lu Zhang, Hamdi Dibeklioglu, Laurens van der Maaten

Abstract: Most modern object trackers combine a motion prior with sliding-window detection, using binary classifiers that predict the presence of the target object based on histogram features. Although the accuracy of such trackers is generally very good, they are often impractical because of their high computational requirements. To resolve this problem, the paper presents a new approach that limits the computational costs of trackers by ignoring features in image regions that --after inspecting a few features-- are unlikely to contain the target object. To this end, we derive an upper bound on the probability that a location is most likely to contain the target object, and we ignore (features in) locations for which this upper bound is small. We demonstrate the effectiveness of our new approach in experiments with model-free and model-based trackers that use linear models in combination with HOG features. The results of our experiments demonstrate that our approach allows us to reduce the average number of inspected features by up to 90% without affecting the accuracy of the tracker.
Similar papers:
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Pyramid-based Visual Tracking Using Sparsity Represented Mean Transform [pdf] - Zhe Zhang, Kin Hong Wong
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#1738 - A Bayesian Framework For the Local Configuration of Retinal Junctions [pdf]
Touseef Qureshi, Andrew Hunter, Bashir Al-Diri

Abstract: Retinal images contain forests of mutually intersecting and overlapping venous and arterial vascular trees. The geometry of these trees shows adaptation to vascular diseases including diabetes, stroke and hypertension. Segmentation of the retinal vascular network is complicated by inconsistent vessel contrast, fuzzy edges, variable image quality, media opacities, complex intersections and overlaps. This paper presents a Bayesian approach to resolving the configuration of vascular junctions to correctly construct the vascular trees. A probabilistic model of vascular joints (terminals, bridges and bifurcations) and their configuration in junctions is built, and Maximum A Posteriori (MAP) estimation used to select most likely configurations. The models is built using a reference set of 4208 joints extracted from the DRIVE public domain vascular segmentation dataset, and evaluated on 4361 joints from the DRIVE test set, demonstrating an accuracy of 95.2%.
Similar papers:
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
  • Local Readjustment for High-Resolution 3D Reconstruction [pdf] - Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
#1741 - Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Dense Superpixels [pdf]
Andras Bodis-Szomoru, Hayko Riemenschneider, Luc Van Gool

Abstract: We present a novel approach for producing dense reconstructions from multiple images and from the underlying sparse Structure-from-Motion (SfM) data in an efficient way. State-of-the-art Multi-View Stereo (MVS) algorithms deliver dense depth maps and/or complex meshes with very high detail, and redundancy over regular surfaces. In turn, our interest lies in a light-weight method that is applicable in large-scale, primarily in the field of urban scene reconstruction from ground-based images. To overcome the problem of sparsity, we assume piecewise planarity of man-made scenes and exploit both visibility information and a fast over-segmentation of the images. The reconstruction problem is an energy formulation of a multi-view plane labelling problem, which we solve jointly over the superpixels while avoiding expensive photoconsistency computations. The resulting planar primitives, augmented by detailed superpixel boundaries are computed in about 10 s per image.
Similar papers:
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • Discrete-Continuous Depth Estimation from a Single Image [pdf] - Miaomiao Liu, Mathieu Salzmann, Xuming He
  • Efficient High-Resolution Stereo Matching using Local Plane Sweeps [pdf] - Sudipta Sinha, Daniel Scharstein, Richard Szeliski
  • Ground Plane Estimation using a Hidden Markov Model [pdf] - Ralf Dragon, Luc Van Gool
#1743 - Linear Ranking Analysis [pdf]
Deng Weihong, Jiani Hu, Jun Guo

Abstract: We extend the classical linear discriminant analysis (LDA) technique to linear ranking analysis (LRA), by considering the ranking order of classes centroids on the projected subspace. Under the constrain on the ranking order of the classes, two criteria are proposed: 1) minimization of the classification error with the assumption that each class is homogenous Guassian distributed; 2) maximization of the sum (average) of the $k$ minimum distances of all neighboring-class (centroid) pairs. Both criteria can be efficiently solved by the convex optimization for one-dimensional subspace. Greedy algorithm is applied to extend the results to the multi-dimensional subspace. Experimental results show that 1) LRA with both criteria achieve state-of-the-art performance on the tasks of ranking learning and zero-shot learning; and 2) the maximum margin criterion provides a discriminative subspace selection method, which can significantly remedy the class separation problem in comparing with several representative extensions of LDA.
Similar papers:
  • Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes [pdf] - Lucy Liang, Kristen Grauman
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Fine-Grained Visual Comparisons with Local Learning [pdf] - Aron Yu, Kristen Grauman
#1754 - A Principled Approach for Coarse-to-Fine MAP Inference [pdf]
Christopher Zach

Abstract: In this work we reconsider labeling problems with (virtually) continuous state spaces, which are of relevance in low level computer vision. In order to cope with such huge state spaces multi-scale methods have been proposed to approximately solve such labeling tasks. Although performing well in many cases, these methods do usually not come with any guarantees on the returned solution. A general and principled approach to solve labeling problems is based on the well-known linear programming relaxation, which appears to be prohibitive for large state spaces at the first glance. We demonstrate that a coarse-to-fine exploration strategy in the label space is able to optimize the LP relaxation for non-trivial problem instances with reasonable run-times and moderate memory requirements.
Similar papers:
  • Discrete-Continuous Depth Estimation from a Single Image [pdf] - Miaomiao Liu, Mathieu Salzmann, Xuming He
  • Fast and Exact: Shape Segmentation Using ADMM and Structured Prediction [pdf] - Haithem Boussaid, Iasonas Kokkinos
  • Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf] - Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra
  • Multi Label Generic Cuts: Optimal Inference in Multi Label Multi Clique MRF-MAP Problems [pdf] - Chetan Arora, S.N. Maheshwari
#1756 - Structured Output Random Forests for Accurate Object Detection [pdf]
Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof

Abstract: In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects, which results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach, as well as regressing its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the structured output Random Forest, resulting in better detection models. Our experiments demonstrate that (i) our approach gives comparable or even better results on standard detection benchmarks, (ii) the structured output prediction of the Random Forest delivers more accurate bounding boxes in terms of overlap with ground truth, especially when tightening the evaluation criterion and (iii) the detector itself becomes better by only including the structured output information during training.
Similar papers:
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Scalable Object Detection using Deep Neural Networks [pdf] - Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov
  • Dense Non-Rigid Shape Correspondence using Random Forests [pdf] - Emanuele Rodola, Samuel Rota Bulo', Thomas Windheuser, Matthias Vestner, Daniel Cremers
  • Incremental Learning of NCM Forests for Large-Scale Image Classification [pdf] - Marko Ristin, Matthieu Guillaumin, Juergen Gall, Luc Van Gool
#1762 - Geometric Urban Geo-Localization [pdf]
Mayank Bansal, Kostas Daniilidis

Abstract: We propose a purely geometric correspondence-free approach to urban geo-localization using 3D point-ray features extracted from the Digital Elevation Map of an urban environment. We derive a novel formulation for estimating the camera pose locus using 3D-to-2D correspondence of a single point and a single direction alone. We show how this allows us to compute putative correspondences between building corners in the DEM and the query image by exhaustively combining pairs of point-ray features. Then, we employ the two-point method to estimate both the camera pose and compute correspondences between buildings in the DEM and the query image. Finally, we show that the computed camera poses can be efficiently ranked by a simple skyline projection step using building edges from the DEM. Our experimental evaluation illustrates the promise of a purely geometric approach to the urban geo-localization problem.
Similar papers:
  • Fast and Reliable Two-View Translation Estimation [pdf] - Johan Fredriksson, Olof Enqvist, Fredrik Kahl
  • Minimal Scene Descriptions from Structure from Motion Models [pdf] - Song Cao, Noah Snavely
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Congruency-Based Reranking [pdf] - Itai Ben Shalom, Adiel Ben Shalom, Noga Levy, Lior Wolf, Tamir Hazan, Nachum Dershowitz, Yaniv Bar, Roni Shweka, Yaacov Choueka
#1765 - Semi-supervised Spectral Clustering for Image Set Classification [pdf]
Arif Mahmood, Ajmal Mian, Robyn Owens

Abstract: We present an image set classification algorithm based on unsupervised clustering of labeled training and unlabeled test data where labels are only used in the stopping criterion. The probability distribution of each class over the set of clusters is used to define a true set based similarity measure. To this end, we propose an iterative sparse spectral clustering algorithm. In each iteration, proximity matrix is efficiently recomputed to better represent the local subspace structure. Initial clusters capture the global data structure and finer clusters at the later stages capture the subtle class differences not visible at the global scale. Image sets are compactly represented with multiple Grassmannian manifolds which are subsequently embedded in Euclidean space with the proposed spectral clustering algorithm. We also propose an efficient eigenvector solver which not only reduces the computational cost of spectral clustering by many folds but also improves the clustering quality and final classification results. Experiments on five standard datasets and comparison with seven existing techniques show the efficacy of our algorithm.
Similar papers:
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
  • Dual Linear Regression Based Classification for Face Cluster Recognition [pdf] - Liang Chen
#1767 - Posebits for Monocular Pose Estimation [pdf]
Gerard Pons-Moll, Bodo Rosenhahn, David Fleet

Abstract: In this work we address the problem of 3D human pose estimation from a single image.It constitutes an extremely hard problem due to inherent depth ambiguities corresponding to bits of information that are not observable by typical generative models. We address this by introducing \emph{posebits}. Posebits are units of information that resolve typical ambiguities in monocular imagery. They are boolean geometric relationships between body parts designed to provide qualitative information about poses (\eg \, left-leg in front of right-leg or hands close to each other). We infer posebits bottom-up from image features using \emph{structural SVMs}. Then, pose samples consistent with the posebits are sampled and evaluated against the image in a top-down fashion. Using posebits as a mid-layer representation for inference has several other advantages: First, pose estimation becomes a much less ambiguous task conditioned on posebits. Second, annotation simplifies to answering a small set of simple yes/no questions, and 3D MoCap data can be easily clustered in semantically similar classes. This allows for fast collection of large datasets in contrast to manual annotation of 3D poses from images. There exist several new potential applications of posebits, here we show how they can be used to successfully to estimate pose from a single image and for semantic image retrieval.
Similar papers:
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Robust Estimation of 3D Human Poses from Single Images [pdf] - CHUNYU WANG, Yizhou Wang, Zhouchen Lin, Alan Yuille, Wen Gao
  • Mixing Body-Part Sequences for Human Pose Estimation [pdf] - Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid
  • 3D Pose from Motion for Cross-view Action Recognition via Non-linear Circulant Temporal Encoding [pdf] - Ankur Gupta, Martinez Julieta, Jim Little, Robert Woodham
#1768 - On Projective Reconstruction In Arbitrary Dimensions [pdf]
Behrooz Nasihatkon, Richard Hartley, Jochen Trumpf

Abstract: We study the theory of projective reconstruction for multiple projections from an arbitrary dimensional projective space into lower-dimensional spaces. This problem is important due to its applications in the analysis of dynamical scenes. The current theory, due to Hartley and Schaffalitzky, is based on the Grassmann tensor, generalizing the ideas of fundamental matrix, trifocal tensor and quadrifocal tensor in the well-studied case of 3D to 2D projections. We present a theory whose point of departure is the projective equations rather than the Grassmann tensor. This is a better fit for the analysis of approaches such as bundle adjustment and projective factorization which seek to directly solve the projective equations. In a first step, we prove that there is a unique Grassmann tensor corresponding to each set of image points, a question that remained open in the work of Hartley and Schaffalitzky. Then, we prove that projective equivalence follows from the set of projective equations, provided that the depths are all nonzero. Finally, we demonstrate possible wrong solutions to the projective factorization problem, where not all the projective depths are restricted to be nonzero.
Similar papers:
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
  • Newton Greedy Pursuit: a Quadratic Approximation Method for Sparsity-Constrained Optimization [pdf] - Xiao-Tong Yuan, Qingshan Liu
  • Feature-Independent Action Spotting Without Human Localization, Segmentation or Frame-wise Tracking [pdf] - Chuan Sun, Hassan Foroosh
  • Efficient pruning LMI conditions for Branch-and-Prune Rank and Chirallity-Constrained Estimation of the Dual Absolute Quadric [pdf] - Adlane Habed, Danda Pani Paudel, Cdric Demonceaux, David Fofi
#1770 - An Online Learned Elementary Grouping Model for Multi-target Tracking [pdf]
Xiaojing Chen, Zhen Qin, Le An, Bir Bhanu

Abstract: We introduce an online approach to learn possible elementary groups (groups that contain only two targets) for inferring high level context that can be used to improve multi-target tracking in a data-association based framework. Unlike most existing association-based tracking approaches that use only low level information (e.g., time, appearance, and motion) to build the affinity model and consider each target as an independent agent, we online learn social grouping behavior to provide additional information for producing more robust tracklets affinities. Social grouping behavior of pairwise targets is first learned from confident tracklets and encoded in a disjoint grouping graph. The grouping graph is further completed with the help of group tracking. The proposed method is efficient and can be easily integrated into any basic affinity model. We evaluate our approach on two public datasets, and show significant improvements compared with the state-of-the-art methods.
Similar papers:
  • Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf] - Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan
  • Tracklet Association with Online Reidentification in Network Flow Optimaiztion for Long-term Multi-Person Tracking [pdf] - BING WANG, Gang Wang, Kap Luk Chan, LI WANG
  • Multiple Target Tracking Based on Hierarchical Relation Hypergraph [pdf] - Longyin Wen, Wenbo Li, Zhen Lei, Stan Li
  • Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning [pdf] - Seung-Hwan Bae, Kuk-Jin Yoon
#1771 - Efficient Structured Parsing of Facades Using Dynamic Programming [pdf]
Andrea Cohen, Alexander Schwing, Marc Pollefeys

Abstract: We propose a sequential optimization technique for segmenting a rectified image of a facade into semantic categories. Our method retrieves a parsing which respects common architectural constraints and also returns a certificate for global optimality. Contrasting the suggested method, the considered facade labeling problem is typically tackled as a classification task or as grammar parsing. Both approaches are not capable of fully exploiting the regularity of the problem. Therefore, our technique very significantly improves the accuracy compared to the state-of-the-art while being an order of magnitude faster. In addition, in over 90% of the test images we obtain a certificate for optimality.
Similar papers:
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Piecewise Planar and Compact Floorplan Reconstruction from Images [pdf] - Ricardo Cabral, Yasutaka Furukawa
  • Beyond Pixel Labels: Image Parsing with Object Instances and Occlusion Ordering [pdf] - Joseph Tighe, Marc Niethammer, Svetlana Lazebnik
  • Towards Unified Human Parsing and Pose Estimation [pdf] - Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan
#1775 - Simultaneous Twin Kernel Learning for Structured Prediction [pdf]
Chetan Tonde, Ahmed Elgammal

Abstract: Many problems in computer vision, including human pose estimation, image segmentation, handwritten digit reconstruction and others, can be posed as structured prediction problems. Kernel methods for structured prediction like structured support vector machines (SVMStruct), twin gaussian processes (TGP's), structured gaussian processes (GPStruct), vector valued RKHS's and many others, offer a powerful way of solving these problems. However, for all of these kernel-based approaches, poor choice of the kernel often results in reduced performance. Learning the kernel function has received significant interest, but most of the techniques are computationally expensive, restrictive in terms of the kernel they can learn, or they focus only learning kernels on inputs (one-way). In this work, we propose a novel technique for learning the kernels on both inputs and outputs, simultaneously. We call this approach Twin Kernel Learning (TKL). This technique is general in sense that, it can learn arbitrary kernels, and as a special case include 'one-way' kernel learning. We formulate this problem specifically for the case of structured prediction using Twin Gaussian Processes, where we learn the covariance functions of both inputs and outputs and compare it with the baseline results where no kernel learning is performed. We demonstrate through our experimental evaluation on several synthetic and real world datasets that we can consistently improve the performance of our algorithms with a le
Similar papers:
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Random Laplace Feature Maps for Semigroup Kernels on Histograms [pdf] - Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael Mahoney
  • Bregman Divergences for Infinite Dimensional Covariance Matrices [pdf] - Mehrtash Harandi, Mathieu Salzmann, Fatih Porikli
  • Transformation Pursuit for Image Classification [pdf] - Mattis Paulin, Jerome REVAUD, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid
#1776 - Fast and Exact: Shape Segmentation Using ADMM and Structured Prediction [pdf]
Haithem Boussaid, Iasonas Kokkinos

Abstract: In this work we address the multi-label shape segmentation problem in the energy minimization setting, by using a graphical model for an ensemble of shapes where each shape is represented as a cyclic graph and shape consistency is enforced by additional inter-shape connections. Our contributions are two-fold: firstly, we build on Dual Decomposition (DD) to efficiently solve the resulting optimization problems. We decompose the model's graph into a set of open, chain-structured, graphs that can be rapidly optimized using Dynamic Programming/Generalized Distance Transforms; we achieve rapid convergence by using the Alternating Direction Method of Multipliers (ADMM) and show that for graphs with spatial variables, as is the case for shape models, ADMM yields substantially faster convergence than plain DD-based methods. Secondly, we employ structured prediction to encompass loss functions that better match the medical image segmentation performance criteria: using the commonly employed mean contour distance (MCD) as a structured loss during training, we obtain a clear performance improvement. We obtain systematic improvements over the current state-of-the-art in a large X-Ray image segmentation benchmark, demonstrating the merit of exact and efficient inference with sophisticated, structured models.
Similar papers:
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
  • Patch-based Evaluation of Image Segmentation [pdf] - Christian Ledig, Wenzhe Shi, Wenjia Bai, Daniel Rueckert
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Fully Automated Non-rigid Segmentation with Distance Regularized Level Set Evolution Initialized and Constrained by Deep-structured Inference [pdf] - Tuan Ngo, Gustavo Carneiro
#1781 - Word Channel Based Multiscale Pedestrian Detection Without Image Resizing and Using Only One Classifier [pdf]
Arthur Costea, Sergiu Nedevschi

Abstract: Most pedestrian detection approaches that achieve high accuracy and precision rate and can be used for real-time applications are based on histograms of gradient orientations. Multiscale detection is attained by resizing the image several times and by recomputing the image features or using multiple classifiers for different scales. In this paper we present a pedestrian detection approach that uses the same classifier for all pedestrian scales based on image features computed for a single scale. We go beyond the low level pixel-wise gradient orientation bins and use higher level visual words from a trained dictionary. Boosting is used to learn classification features from integral visual word channels. The proposed approach is evaluated on multiple datasets and achieves outstanding results on the INRIA and Caltech-USA benchmarks, outperforming current state of the art methods. By using a GPU implementation we achieve a classification rate of over 10 million bounding boxes per second and a 16 FPS rate for multiscale detection in a 640480 image.
Similar papers:
  • Learning an image-based motion context for multiple people tracking [pdf] - Laura Leal-Taix, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
  • Pedestrian Detection in Low-resolution Imagery by Learning Multi-scale Intrinsic Motion Structures (MIMS) [pdf] - Jiejie Zhu
  • Informed Haar-like Features Improve Pedestrian Detection [pdf] - Shanshan Zhang, Christian Bauckhage, Armin Cremers
#1786 - Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf]
Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan

Abstract: Motivated by multi-distribution divergences, which originate in information theory, we propose a notion of `multi-point' kernels, and study their applications. We study a class of kernels based on Jensen type divergences and show that these can be extended to measure similarity among multiple points. We study tensor flattening methods and develop a multi-point (kernel) spectral clustering (MSC) method. We further emphasize on a special case of the proposed kernels, which is a multi-point extension of the linear (dot-product) kernel and show the existence of cubic time tensor flattening algorithm in this case. Finally, we illustrate the usefulness of our contributions using standard data sets and image segmentation tasks.
Similar papers:
  • Random Laplace Feature Maps for Semigroup Kernels on Histograms [pdf] - Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael Mahoney
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
#1804 - Evolutionary Quasi-random Search for Hand Articulations Tracking [pdf]
Iason Oikonomidis, Manolis Lourakis, Antonis Argyros

Abstract: We present a new method for tracking the 3D position, global orientation and full articulation of human hands. Inspired by recent advances in model-based, hypothesize-and-test methods, the high-dimensional parameter space of hand configurations is explored with a novel evolutionary optimization technique. The proposed method capitalizes on the fact that the quasi-random samples of the Sobol sequence have low discrepancy and exhibit a more uniform coverage of the sampled space compared to random samples obtained from the uniform distribution. The method has been tested for the problems of tracking the articulation of a single hand (27D parameter space) and two hands (54D space). Extensive experiments have been carried out with synthetic and real data, in comparison with state of the art methods. The quantitative evaluation shows that the new approach achieves a speed-up of four (single hand tracking) and eight (two hands tracking) without compromising tracking accuracy. Interestingly, the proposed method is preferable compared to the state of the art either in the case of limited computational resources or in the case of more complex (i.e., higher dimensional) problems, a fact that raises considerably the applicability of the method in a number of application domains.
Similar papers:
  • Real-time Model-based Articulated Object Pose Detection and Tracking with Variable Rigidity Constraints [pdf] - Karl Pauwels, Leonardo Rubio, Eduardo Ros
  • Multiple Granularity Analysis for Fine-grained Action Detection [pdf] - Bingbing Ni, Pierre Moulin
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
#1810 - Seeing the Arrow of Time [pdf]
Lyndsey Pickup, Zheng Pan, Donglai Wei, Yichang Shih, Andrew Zisserman, Bill Freeman, Bernhard Schoelkopf

Abstract: We explore whether we can observe Time's Arrow in a temporal sequence--is it possible to tell whether a video is running forwards or backwards? We investigate this somewhat philosophical question using computer vision and machine learning techniques. We explore three methods by which we might detect Time's Arrow in video sequences, based on distinct ways in which motion in video sequences might be asymmetric in time. We demonstrate good video forwards/backwards classification results on a selection of YouTube video clips, and on natively-captured sequences (with no temporally-dependent video compression). The motions our models have learned help discriminate forwards from backwards time.
Similar papers:
  • DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting [pdf] - Chen Sun, Ram Nevatia
  • 6 Seconds of Sound and Vision: Creativity in Micro-Videos [pdf] - Miriam Redi, Michele Trevisiol, Rossano Schifanella, neil O'Hare, Alejandro Jaimes
  • A Cause and Effect analysis of motion trajectories for modeling actions [pdf] - Sanath Narayan, Kalpathi Ramakrishnan
  • SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization [pdf] - Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun
#1813 - Actionness Ranking with Lattice Conditional Ordinal Random Fields [pdf]
Wei Chen, Caimgin Xiong, Jason Corso

Abstract: Action analysis in image and video has been attracting more and more attention in computer vision area. Recognizing specific actions in video clips has been the main focus. We move in a new, more general direction in this paper and ask the critical fundamental question: what is action, how is action different from motion, and in a given image or video where is the action? We study the philosophical and visual characteristics of action, which lead us to define actionness: intentional bodily movement of biological agents (people, animals). To solve the general problem, we propose the lattice conditional ordinal random field model that incorporates local evidence as well as neighboring order agreement. We implement the new model in the continuous domain and apply it to scoring actionness in both image and video datasets. Our experiments demonstrate not only that our new random field model can outperform the popular ranking SVM but also that indeed action is distinct from motion.
Similar papers:
  • Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition [pdf] - Waqas Sultani, Imran Saleemi
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
#1816 - Gauss-Newton Constrained Local Models [pdf]
GEORGIOS TZIMIROPOULOS, Maja Pantic

Abstract: Arguably, Constrained Local Models (CLMs) are one of the most prominent approaches for fitting deformable models with impressive results being recently reported for both controlled lab and unconstrained settings. Fitting in most CLM methods is typically formulated as a two-step process during which local templates are first correlated with the image to yield a filter response for each landmark and then shape optimization is performed over these filter responses. We argue that such a fitting strategy may be problematic because optimization of shape and appearance is decoupled. To address this limitation, in this paper, we propose a new model/fitting strategy which results in a joint translational motion model for the model parts so that a cost function of shape and appearance is jointly minimized using Gauss-Newton optimization. We additionally show how significant computational reductions can be achieved by building a full model during training but then efficiently optimizing the proposed cost function on a sparse grid during fitting. This results in complexity that could possibly allow a close to real-time implementation. We coin the proposed formulation Gauss-Newton CLM (GN-CLM). Finally, we compare its performance against another recently proposed state-of-the-art CLM method and show that the proposed GN-CLM outperforms it by a large margin.
Similar papers:
  • Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization [pdf] - Brandon Smith, Jonathan Brandt, Zhe Lin, Li Zhang
  • Automatic Construction of Deformable Models In-The-Wild [pdf] - Epameinondas Antonakos, Stefanos Zafeiriou
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
#1827 - Incremental Face Alignment in the Wild [pdf]
Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic

Abstract: The development of facial databases with an abundance of annotated facial data captured under unconstrained 'in-the-wild' conditions have made discriminative facial deformable models the de facto choice for generic facial landmark localization. Even though very good performance for the facial landmark localization has been shown by many recently proposed discriminative techniques, when it comes to the applications that require excellent accuracy, such as facial behaviour analysis and facial motion capture, the semi-automatic person-specific or even tedious manual tracking is still the preferred choice. One way to construct a person-specific model automatically is through incremental updating of the generic model. This paper deals with the problem of updating a discriminative facial deformable model, a problem that has not been thoroughly studied in the literature. In particular, we study for the first time, to the best of our knowledge, the strategies to update a discriminative model that is trained by a cascade of regressors. We propose very efficient strategies to update the model and we show that is possible to automatically construct robust discriminative person and imaging condition specific models 'in-the-wild' that outperform state-of-the-art generic face alignment strategies.
Similar papers:
  • The Fastest Deformable Part Model for Object Detection [pdf] - Junjie Yan, Zhen Lei, Stan Li
  • Automatic Construction of Deformable Models In-The-Wild [pdf] - Epameinondas Antonakos, Stefanos Zafeiriou
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
  • One Millisecond Face Alignment with an Ensemble of Regression Trees [pdf] - Vahid Kazemi, Josephine Sullivan
#1829 - Hierarchical Feature Hashing for Fast Dimensionality Reduction [pdf]
Bin Zhao, Eric Xing

Abstract: Curse of dimensionality is a practical and challenging problem in image categorization, especially in cases with a large number of classes. Multi-class classification encounters severe computational and storage problems when dealing with these large scale tasks. In this paper, we propose hierarchical feature hashing to effectively reduce dimensionality of parameter space without sacrificing classification accuracy, and at the same time exploit information in semantic taxonomy among categories. We provide detailed theoretical analysis on our proposed hashing method. Moreover, experimental results on object recognition and scene classification further demonstrate the effectiveness of hierarchical feature hashing.
Similar papers:
  • Collaborative Hashing [pdf] - Xianglong Liu, Junfeng He, Cheng Deng, Bo Lang
  • Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification [pdf] - Yadong MU, Gang Hua, Wei Fan, Shi-Fu Chang
  • Collective Matrix Factorization Hashing for Multimodal Data [pdf] - Guiguang Ding, Yuchen Guo, Jile Zhou
  • Fast Supervised Hashing with Decision Trees for High-Dimensional Data [pdf] - Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, David Suter
#1830 - Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras [pdf]
Kenneth Funes Mora, Jean-Marc Odobez

Abstract: We propose a head pose independent gaze estimation model for distant RGB-D cameras. It relies on a geometric understanding of the 3D gaze action and generation of eye images. By introducing a semantic segmentation of the eye region within a generative process, the model (i) avoids the critical feature tracking of geometrical approaches requiring high resolution images; (ii) decouples the person dependent geometry from the ambient conditions, allowing adaptation to different conditions without retraining. Priors in the generative framework are adequate for training from few samples. In addition, the model is capable of gaze extrapolation allowing for less restrictive training schemes. Comparisons with state of the art methods validate these properties which make our method highly valuable for addressing many diverse tasks in sociology, HRI and HCI.
Similar papers:
  • Look at the Driver, Look at the Road: No Distraction! No Accident! [pdf] - Mahdi Rezaei, Reinhard Klette
  • Head Pose Estimation Based on Multivariate Label Distribution [pdf] - Xin Geng, Yu Xia
  • Temporal Segmentation of Egocentric Videos [pdf] - Chetan Arora, Yair Poleg, Shmuel Peleg
  • Learning-by-Synthesis for Appearance-based 3D Gaze Estimation [pdf] - Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato
#1835 - Efficient feature extraction, encoding and classification\\ for action recognition [pdf]
Vadim Kantorov, Ivan Laptev

Abstract: Local video features provide state-of-the-art performance for action recognition. While the accuracy of action recognition has been continuously improved over the recent years, the low speed of feature extraction and subsequent recognition prevents current methods from scaling up to real-size problems. We address this issue and first develop highly efficient video features using motion information in video compression. We next explore feature encoding by Fisher vectors and demonstrate accurate action recognition using fast linear classifiers. Our method improves the speed of video feature extraction, feature encoding and action classification by two orders of magnitude at the cost of minor reduction in recognition accuracy. We validate our approach and compare it to the state of the art on three recent action recognition datasets.
Similar papers:
  • A Depth-Aware Descriptor for Action Recognition [pdf] - Cewu Lu, Jiaya Jia, Chi-keung Tang
  • Cross-view Action Modeling, Learning and Recognition [pdf] - Jiang Wang, Xiaohan Nie, Yin Xia, Ying Wu, Song Chun Zhu
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Towards Good Practices for Action Video Encoding [pdf] - Jianxin Wu, Yu Zhang
#1837 - Tissue Classification via Multispectral Convolutional Sparse Coding [pdf]
Yin Zhou, Hang Chang, Kenneth Barner, Paul Spellman, Bahram Parvin

Abstract: Image-based classification of tissue histology plays an important role in predicting clinical outcomes. However this task is very challenging due to the presence of large technical variations (e.g., fixation, staining) and biological heterogeneities (e.g., cell type, cell state). In the field of biomedical imaging, for the purposes of visualization and/or quantification, different stains are typically used for different targets of interest (e.g., cellular/subcellular events), which generates multispectrum data (images) through various types of microscopes and, as a result, provides the possibility of learning biological component-specific features by exploiting multispectral information. We propose a multispectral feature learning model that automatically learns a set of convolution filter banks from separate spectrums to efficiently discover the intrinsic tissue morphometric signatures, based on convolutional sparse coding (CSC). The learned feature representations are then aggregated through the spatial pyramid matching framework (SPM) and finally classified using a linear SVM. The proposed system has been evaluated using two large-scale tumor cohorts, collected from The Cancer Genome Atlas (TCGA). Experimental results show that the proposed model 1) outperforms systems utilizing sparse coding for unsupervised feature learning (e.g., PSDSPM [8]); 2) is competitive with systems built upon features with biological prior knowledge (e.g., SMLSPM [7]).
Similar papers:
  • Deeply-Learned Slow Feature Analysis for Action Recognition [pdf] - LIN SUN
  • Is Rotation a Nuisance in Shape Recognition? [pdf] - Qiuhong Ke, Yi Li
  • Iterative Multilevel MRF Leveraging Context and Voxel Information for Brain Tumour Segmentation in MRI [pdf] - Nagesh Subbanna, Doina Precup, Tal Arbel
  • Switchable Deep Network for Pedestrian Detection [pdf] - Ping Luo, Yonglong Tian
#1840 - Nonparametric Part Transfer for Fine-grained Recognition [pdf]
Christoph Gring, Erik Rodner, Alexander Freytag, Joachim Denzler

Abstract: In the following paper, we present an approach for fine-grained recognition based on a new part detection method. In particular, we propose a nonparametric label transfer technique which transfers part constellations from objects with similar global shapes. The possibility for transferring part annotations to unseen images allows for coping with a high degree of pose and view variations in scenarios where traditional detection models (such as deformable part models) fail. Our approach is especially valuable for fine-grained recognition scenarios where intraclass variations are extremely high, and precisely localized features need to be extracted. Furthermore, we show the importance of carefully designed visual extraction strategies, such as combination of complementary feature types and iterative image segmentation, and the resulting impact on the recognition performance. In experiments, our simple yet powerful approach achieves 35.9% and 57.8% accuracy on the CUB-2010 and 2011 bird datasets, which is the current best performance for these benchmarks.
Similar papers:
  • Associative embeddings for large-scale knowledge transfer with self-assessment [pdf] - Alexander Vezhnevets, Vittorio Ferrari
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Instance-weighted Transfer Learning of Active Appearance Models [pdf] - Daniel Haase, Erik Rodner, Joachim Denzler
  • Color Transfer using Probabilistic Moving Least Squares [pdf] - Youngbae Hwang, Joon-Young Lee, In So Kweon, Seon Joo Kim
#1843 - On the quotient representation for the essential manifold [pdf]
Roberto Tron, Kostas Daniilidis

Abstract: The essential matrix, which encodes the epipolar constraint between projected points in two views, is a corner stone of modern computer vision. Previous works have proposed different characterizations of the space of essential matrices as a Riemannian manifold. However, these works either do not consider the symmetric role played by the two views, or do not fully take into account the geometric peculiarities of the epipolar constraint. We address these limitations and give a characterization as a quotient manifold which preserves the geometrical interpretation in terms of camera poses. While our main focus in on theoretical aspects, we include experiments in pose averaging, and show that the proposed formulation produces a meaningful distance between essential matrices.
Similar papers:
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI [pdf] - YUANXIANG WANG, Hesamoddin Salehian, Guang Cheng, Baba Vemuri
  • Model Transport: Towards Scalable Transfer Learning on Manifolds [pdf] - Oren Freifeld, Soren Hauberg, Michael Black
  • Covariance descriptors for 3D shape matching and retrieval [pdf] - Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
#1848 - Topic Modeling of Multimodal Data: an Autoregressive Approach [pdf]
Yin Zheng, Yu-Jin Zhang, Hugo Larochelle

Abstract: Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the hidden topic features by incorporating label information into the training objective of the model and show how to employ SupDocNADE to learn a joint representation from image visual words, annotation words and class label information. We also describe how to leverage information about the spatial position of the visual words for SupDocNADE to achieve better performance in a simple, yet effective manner. We test our model on the LabelMe and UIUC-Sports datasets and show that it compares favorably to other topic models such as the supervised variant of LDA and a Spatial Matching Pyramid (SPM) approach.
Similar papers:
  • Efficient Nonlinear Markov Models for Human Motion [pdf] - Andreas Lehrmann, Peter Gehler, Sebastian Nowozin
  • Max-Margin Boltzmann Machines for Object Segmentation [pdf] - Jimei Yang, Simon Safar, Ming-Hsuan Yang
  • Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media [pdf] - Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian
  • Active Annotation Translation [pdf] - Steven Branson, Pietro Perona
#1849 - What are you talking about? Text-to-Image Co-reference [pdf]
Chen Kong, Sanja Fidler, Mohit Bansal, Dahua Lin, Raquel Urtasun

Abstract: In this paper we exploit complex sentential descriptions of RGB-D scenes in order to improve 3D object detection as well as to determine which particular object each noun/pronoun is referring to in the image. Towards this goal, we developed a structure prediction model that is able to parse both the image in terms of 3D object cuboids as well as complex sentences describing the visual content. We demonstrate the effectiveness of our approach in the challenging NYU-RGBD, which we enrich with complex descriptions, and show that our approach can improve 3D detection as well as scene classification, and is able to estimate reliably the text-image alignment problem. Furthermore, by employing the visual information, our approach is able to beat the Stanford parser in estimating co-references.
Similar papers:
  • Understanding Objects in Detail with Fine-grained Attributes [pdf] - Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Seeing What You're Told: Sentence-Guided Activity Recognition In Video [pdf] - Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Siskind
  • Visual Semantic Search: Retrieving Videos via Complex Textual Queries [pdf] - Dahua Lin, Sanja Fidler, Chen Kong, Raquel Urtasun
#1851 - Curvilinear Structure Tracking by Low Rank Tensor Approximation with Model Propagation [pdf]
Erkang Cheng, Yu Pang, Ying Zhu, Haibin Ling

Abstract: Robust tracking of deformable object like catheter or vascular structures in X-ray images is an important technique used in image guided medical interventions for effective motion compensation and dynamic multi-modality image fusion. Tracking of such anatomical structures and devices is very challenging due to large degrees of appearance changes, low visibility of X-ray images and the deformable nature of the underlying motion field as a result of complex 3D anatomical movements projected into 2D images. To address these issues, we propose a new deformable tracking method using the tensor-based algorithm with model propagation. Specifically, the deformable tracking is formulated as a multi-dimensional assignment problem which is solved by rank-1 l1 tensor approximation. The model prior is propagated in the course of deformable tracking. Both the higher order information and the model prior provide powerful discriminative cues for reducing ambiguity arising from the complex background, and consequently improve the tracking robustness. To validate the proposed approach, we applied it to catheter and vascular structures tracking and tested on X-ray fluoroscopic sequences obtained from 17 clinical cases. The results show, both quantitatively and qualitatively, that our approach achieves a mean tracking error of 1.4 pixels for vascular structure 1.3 pixels for catheter tracking.
Similar papers:
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Multi-target Tracking with Motion Context in Tensor Power Iteration [pdf] - Xinchu Shi, Haibin Ling, Weiming Hu, Chunfeng Yuan
#1858 - A Study on Cross-Population Age Estimation [pdf]
Chao Zhang, Guodong Guo

Abstract: We study the problem of cross-population age estimation. Human aging is determined by the genes and influenced by many factors. Different populations, e.g., males and females, Caucasian and Asian, may age differently. Previous research has discovered the aging difference among different populations, and reported large errors in age estimation when crossing gender and/or ethnicity. In this paper we propose novel methods for cross-population age estimation with a good performance. The proposed methods are based on projecting the different aging patterns into a common space where the aging patterns can be correlated even though they come from different populations. The projections are also discriminative between age classes due to the integration of the classical discriminant analysis technique. Further, we study the amount of data needed in the target population to learn a cross-population age estimator. Finally, we study the feasibility of multi-source cross-population age estimation. Experiments are conducted on a large database of more than 21,000 face images selected from the MORPH. Our studies are valuable to significantly reduce the burden of training data collection for age estimation on a new population, utilizing existing aging patterns even from different populations.
Similar papers:
  • Merging SVMs with Linear Discriminant Analysis: A Combined Model [pdf] - Symeon Nikitidis, Stefanos Zafeiriou, Maja Pantic
  • Linear Ranking Analysis [pdf] - Deng Weihong, Jiani Hu, Jun Guo
  • Histograms of Pattern Sets for Image Classification and Object Recognition [pdf] - Winn Voravuthikunchai, bruno Cremilleux, Frederic Jurie
  • Illumination-Aware Age Progression [pdf] - Supasorn Suwajanakorn, Ira Kemelmacher, Steve Seitz
#1868 - Edge-aware Gradient Domain Optimization Framework for Image Filtering by Local Propagation [pdf]
Miao Hua, Xiaohui Bie, Wencheng Wang

Abstract: Gradient domain methods are popular for image processing, however, those methods even the edge-preserving ones cannot preserve edges well in some cases. In this paper, we present edge-aware constraints to better preserve edges for general gradient domain image filtering and theoretically analyse why those constraints are edge-aware. Our edge-aware constraints are easy to implement, fast to compute and can be seamlessly integrated into the general gradient domain optimization framework. The new gradient domain optimization framework can better preserve edges while maintaining similar image filtering effects as the original image filters. We also demonstrate the strength of our edge-aware constraints on various applications such as image smoothing, image colorization and Poisson image cloning.
Similar papers:
  • Domain Adaptation on the Statistical Manifold [pdf] - Mahsa Baktashmotlagh, Mehrtash Harandi, Brian Lovell, Mathieu Salzmann
  • Occluding Contours for Multi-View Stereo [pdf] - Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, Steve Seitz
  • Sparse Representation for Edit Propagation of High-Resolution Images [pdf] - Xiaowu Chen, Jianwei Li, Dongqing Zou, Xiaochun Cao, Qinping Zhao, Hao (Richard) Zhang
  • Recognizing RGB Images by Learning from RGB-D Data [pdf] - Lin Chen, Wen Li, Dong Xu
#1869 - Visual Semantic Search: Retrieving Videos via Complex Textual Queries [pdf]
Dahua Lin, Sanja Fidler, Chen Kong, Raquel Urtasun

Abstract: In this paper, we tackle the problem of semantic retrieval of videos from complex queries. Towards this goal we first parse the descriptions into a semantic graph, which is then matched to visual concepts using a generalized bipartite matching algorithm. Our approach exploits object appearance, motion and spatial relations, and learns the importance of each term using structure prediction. We demonstrate the effectiveness of our approach on a new dataset designed specifically for semantic search in the context of autonomous driving. We show that our approach is able to locate a major portion of the objects described in the query with high accuracy, and improve the relevance in video retrieval.
Similar papers:
  • A Multigraph Representation for Improved Unsupervised/Semi-supervised Learning of Human Actions [pdf] - Simon Jones, Ling Shao
  • Online Object Tracking, Learning and Parsing with And-Or Graphs [pdf] - Yang Lu, Tianfu Wu, Song Chun Zhu
  • Seeing What You're Told: Sentence-Guided Activity Recognition In Video [pdf] - Siddharth Narayanaswamy, Andrei Barbu, Jeffrey Siskind
  • What are you talking about? Text-to-Image Co-reference [pdf] - Chen Kong, Sanja Fidler, Mohit Bansal, Dahua Lin, Raquel Urtasun
#1874 - Unsupervised Trajectory Modelling using Temporal Information via Minimal Paths [pdf]
Brais Cancela, Alberto Iglesias, Marcos Ortega, Manuel Penedo

Abstract: This paper presents a novel methodology for modelling pedestrian trajectories over a scene, based in the hypothesis that, when people try to reach a destination, they use the path that takes less time, taking into account environmental information like the type of terrain or what other people did before. Thus, a minimal path approach can be used to model human trajectory behaviour. We develop a modified Fast Marching Method that allows us to include both velocity and orientation in the Front Propagation Approach, without increasing its computational complexity. Combining all the information, we create a time surface that shows the time a target need to reach any given position in the scene. We also create different metrics in order to compare the time surface against the real behaviour. Experimental results over a public dataset prove the initial hypothesis' correctness.
Similar papers:
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Manifold Based Dynamic Texture Synthesis from Extremely Few Samples [pdf] - Hongteng Xu, Hongyuan Zha, Mark Davenport
  • Subspace Tracking under Dynamic Dimensionality for Online Background Subtraction [pdf] - Matthew Berger, Lee Seversky
  • Learning an image-based motion context for multiple people tracking [pdf] - Laura Leal-Taix, Michele Fenzi, Alina Kuznetsova, Bodo Rosenhahn, Silvio Savarese
#1875 - Incremental Activity Modeling and Recognition in Streaming Videos [pdf]
MAHMUDUL HASAN, Amit Roy-Chowdhury

Abstract: Human activity recognition in videos is a difficult but widely studied problem in computer vision due to its numerous practical applications. Most of the state-of-the-art approaches to human activity recognition need an intensive training stage and assume that all of the training examples are labeled and available beforehand. But these assumptions are unrealistic for many applications where we have to deal with streaming videos. In these continuous streaming videos, as new activities are seen, they can be leveraged upon to improve the current activity recognition model. In this work, we aim to develop an incremental activity learning framework that will be able to continuously update the activity models and learn new ones as more videos are seen. Our proposed approach leverages upon state-of-the-art machine learning tools, most notably active learning systems, and leads to the development of an online activity recognition framework for streaming videos. It does not require tedious manual labeling of every incoming examples of each activity class. We perform rigorous experiments on challenging human activity datasets, which demonstrate the robustness of our incremental activity modeling framework.
Similar papers:
  • Complex Activity Recognition using Granger Constrained DBN (GCDBN) in Sports and Surveillance Video [pdf] - Eran Swears, Anthony Hoogs, Qiang Ji, Kim Boyer
  • From Stochastic Grammar to Bayes Network: Probabilistic Parsing of Complex Activity [pdf] - Nam Vo, Aaron Bobick
  • The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities [pdf] - Hilde Kuehne, Ali Arslan, Thomas Serre
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
#1891 - Multi-Forest Tracker: A Chameleon in Tracking [pdf]
DAVID JOSEPH TAN, Slobodan Ilic

Abstract: In this paper, we address the problem of object tracking in intensity images and depth data. We propose a generic framework that can be used either for tracking $2$D templates in RGB images or for tracking $3$D objects in depth images. To overcome problems like occlusions, strong illumination changes and motion blur, that notoriously make energy minimization-based tracking methods get trapped in the local minima, we propose a learning-based method that is robust to all these problems. We use random forests to learn the relation between the parameters that defines the object's motion, and the changes it induce on the image intensities or the point cloud of the template. It follows that, when the template moves, we use the change on the image intensities or point cloud to predict the parameter of this motion.This leads to extremely fast tracking performance running at less than $2$~ms per frame and is robust to occlusions when tracking in intensity or depth images. Moreover, it demonstrates extreme robustness to strong illuminations changes for tracking using intensity images, and high robustness in tracking 3D objects from arbitrary viewpoints even in the presence of motion blur that causes missing or erroneous data in depth images. Exhaustive experimental evaluation and comparison to the related approaches strongly demonstrates the benefits of our method.
Similar papers:
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Multi-Cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation [pdf] - Xiangyuan Lan, Pong C YUEN, Andy Jinhua Ma
  • Partial Occlusion Handling for Visual Tracking via Robust Part Matching [pdf] - Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
#1895 - Reliable Multi-view Stereopsis Evaluation [pdf]
Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola

Abstract: The seminal multiple view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis methodology. Although seminal, these benchmark datasets are limited in scope with few reference scenes. Here, we try to take these works a step further by proposing a new multi-view stereo dataset, which is an order of magnitude larger in number of scenes and with a significant increase in diversity. Specifically, we propose a dataset containing 80 scenes of large variability. Each scene consists of 49 or 64 accurate camera positions and reference structured light scans, all acquired by a 6-axis industrial robot. To apply this dataset we propose an extension of the evaluation protocol from the Middlebury evaluation, reflecting the more complex geometry of some of our scenes. The proposed dataset is used to evaluate the state of the art multi-view stereo algorithms of Tola et al., Campbell et al. and Furukawa et al. Hereby we demonstrate the usability of the dataset as well as gain insight into the workings and challenges of multi-view stereopsis. Through these experiments we empirically validate some of the central hypotheses of multi-view stereopsis, as well as determining and reaffirming some of the central challenges.
Similar papers:
  • Scattering Parameters and Surface Normals from Homogeneous Translucent Materials using Photometric Stereo [pdf] - Bo Dong, Kathleen Moore, Weiyi Zhang, Pieter Peers
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Timing-Based Local Descriptor for Dynamic Surfaces [pdf] - Tony Tung, Takashi Matsuyama
  • Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction [pdf] - Ilya Kostrikov, Esther Horbert, Bastian Leibe
#1913 - Understanding Objects in Detail with Fine-grained Attributes [pdf]
Subhransu Maji, Iasonas Kokkinos, Stavros Tsogkas, Ross Girshick, Matthew Blaschko, Esa Rahtu, Juho Kannala, Andrea Vedaldi

Abstract: Each of 7,413 aeroplane instances is annotated with segmentations for five part types (bottom) and their modifiers (top). The data internal variability is significant, including modern large airliners, ancient biplanes and triplanes, jet planes, propellor planes, gliders, etc. For convenience, aerpolanes are divided into ``typical'' (planes with one wing, one fuselage, and one vertical stabilizer) and ``atypical'' (planes with more diverse structure); this subdivision can be used as ``easy'' and ``hard'' subsets of the data. Several detailed modifiers are associated to parts. For example, the undercarriage wheel group modifier specifies whether an undercarriage has one wheel on one axel, two wheels on one axel and so on.
Similar papers:
  • Predicting User Annoyance Using Image Attributes [pdf] - Gordon Christie, Amar Parkash, Ujwal Krothapalli, Devi Parikh
  • Predicting Multiple Attributes via Relative Multi-task Learning [pdf] - Lin Chen, Qiang Zhang, Baoxin Li
  • Relative Parts: Disctinctive Parts for Learning Relative Attributes [pdf] - Yashaswi Verma, Ramachandruni Sandeep, C.V. Jawahar
  • Dense Semantic Image Segmentation with Objects and Attributes [pdf] - Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, philip Torr
#1914 - Efficient Hierarchical Graph-Based Segmentation of RBGD Videos [pdf]
Steven Hickson, Irfan Essa, Henrik Christensen, Stan Birchfield

Abstract: We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach. The algorithm processes a moving window over several point clouds to group similar regions over a graph, resulting in an initial over-segmentation. These regions are then merged to yield a dendrogram using agglomerative clustering via a minimum spanning tree algorithm. Bipartite graph matching at a given level of the hierarchical tree yields the final segmentation of the point clouds by maintaining region identities over arbitrarily long periods of time. We show that a multistage segmentation with depth then color yields better results than a linear combination of depth and color. Due to its incremental processing, our algorithm can process videos of any length and in a streaming pipeline. The algorithm's ability to produce robust, efficient segmentation is demonstrated with numerous experimental results on challenging sequences from our own as well as public RGBD data sets.
Similar papers:
  • Object Partitioning using Local Convexity [pdf] - Simon Christoph Stein, Jeremie Papon, Markus Schoeler, Florentin Woergoetter
  • Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow [pdf] - Linchao Bao, Qingxiong Yang, Hailin Jin
  • RGB-D Depth Map Enhancement with Depth and Motion in Complement [pdf] - Tak-Wai Hui, King-Ngi Ngan
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
#1919 - Iterative Multilevel MRF Leveraging Context and Voxel Information for Brain Tumour Segmentation in MRI [pdf]
Nagesh Subbanna, Doina Precup, Tal Arbel

Abstract: In this paper, we introduce a fully automated multistage graphical probabilistic framework to segment brain tumours from multimodal Magnetic Resonance Images (MRIs) acquired from real patients. As a starting point, a Bayesian classification of the tumour is derived based on Gabor texture features, and subsequent computations are focused on areas of high tumour probabilities. An iterative, multistage Markov Random Field (MRF) framework is then devised to classify the various tumour subclasses (e.g. edema, tumour core, enhancing tumour and necrotic core). At the voxel level, an adapted MRF is devised based on both local observations, and neighbouring class and intensity features. This leads to over-segmentation and numerous false positive tumour subclass regions. A higher level MRF is then devised in order to leverage both contextual texture information and relative spatial consistency of the tumour subclass positions. Here, each node represents a possible subclass region and the graphical model takes the form of an irregular lattice. The higher level, regional information is then passed back down to the voxel-based MRF for further refinement and the two stages iterate until convergence. Experiments are performed on publicly available, patient brain tumour images from the MICCAI 2012 \cite{BRATS2012} and 2013 Brain Tumour Segmentation Challenges\cite{BRATS2013} and compared to the top performing techniques. The results demonstrate that the method achieves the top pe
Similar papers:
  • Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI [pdf] - YUANXIANG WANG, Hesamoddin Salehian, Guang Cheng, Baba Vemuri
  • Patch-based Evaluation of Image Segmentation [pdf] - Christian Ledig, Wenzhe Shi, Wenjia Bai, Daniel Rueckert
  • Object Partitioning using Local Convexity [pdf] - Simon Christoph Stein, Jeremie Papon, Markus Schoeler, Florentin Woergoetter
  • Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification [pdf] - Luping Zhou, Lei Wang, Philip Ogunbona
#1933 - Stable Template-Based Isometric 3D Reconstruction in All Imaging Conditions by Linear Least-Squares [pdf]
Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli, Toby Collins

Abstract: It has been recently shown that reconstructing an isometric surface from a single input image matched to a 3D template was a well-posed problem. This however does not tell us how reconstruction algorithms will behave in practical conditions, where the amount of perspective is generally small and the projection thus behaves like weak-perspective or orthography. We here bring answers to what is theoretically recoverable in such imaging conditions, and explain why existing convex numerical solutions and analytical solutions to 3D reconstruction will become unstable. We then propose a new algorithm which works under all imaging conditions, from strong to loose perspective. We empirically show that the gain of stability is tremendous, bringing our results close to the iterative minimization of a statistically-optimal cost. Our algorithm has a low complexity, is simple and uses only one round of linear least-squares.
Similar papers:
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf] - Thanh Nguyen
  • A General and Simple Method for Camera Pose and Focal Length Determination [pdf] - Yinqiang Zheng, Shigeki Sugimoto, Imari Sato, Masatoshi Okutomi
  • Symmetry-Aware Isometric Matching of Incomplete 3D Surfaces [pdf] - Yusuke Yoshiyasu
#1934 - Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts [pdf]
Shuang Wu, Florian Luisier, Sravanthi Bondugula, Pradeep Natarajan

Abstract: Current state-of-the-art systems for visual content analysis require large training sets for each class of interest, and performance degrades rapidly with fewer examples. In this paper, we present a general framework for the zero-shot learning problem of performing high-level event detection with no training exemplars, using only textual descriptions. This task goes beyond the traditional zero-shot framework of adapting a given set of classes with training data to unseen classes. We leverage video and image collections with free-form text descriptions from widely available web sources to learn a large bank of concepts, in addition to using several off-the-shelf concept detectors, speech, and video text for representing videos. We utilize natural language processing technologies to generate event description features. The extracted features are then projected to a common high-dimensional space using text expansion, and similarity is computed in this space. We present extensive experimental results on the large TRECVID MED corpus to demonstrate our approach. Our results show that the proposed concept detection methods significantly outperform current attribute classifiers such as Classemes [31], ObjectBank [19], and SUN attributes [25]. Further, we find that fusion, both within as well as between modalities, is crucial for optimal performance.
Similar papers:
  • Orientation Robust Textline Detection in Natural Images [pdf] - Le Kang, Yi Li
  • Visual Semantic Search: Retrieving Videos via Complex Textual Queries [pdf] - Dahua Lin, Sanja Fidler, Chen Kong, Raquel Urtasun
  • From Human-Annotated to Machine-Discovered Concepts using Consensus Regularization [pdf] - Afshin Dehghan, Haroon Idrees
  • Video Classification Based on Generalized Maximum Co-occurrence Cliques [pdf] - Amir Roshan Zamir, Shayan Modiri Assari
#1940 - Merging SVMs with Linear Discriminant Analysis: A Combined Model [pdf]
Symeon Nikitidis, Stefanos Zafeiriou, Maja Pantic

Abstract: A key problem often encountered by many learning algorithms in computer vision dealing with high dimensional data is the so called ``curse of dimensionality'' which arises when the available training samples are less than the input feature space dimensionality. To remedy this problem, we propose a joint dimensionality reduction and classification framework by formulating an optimization problem within the maximum margin class separation task. The proposed optimization problem is solved using alternative optimization where we jointly compute the low dimensional maximum margin projections and the separating hyperplanes in the projection subspace. Moreover, in order to reduce the computational cost of the developed optimization algorithm we incorporate orthogonality constraints on the derived projection bases and show that the resulting combined model is an alternation between identifying the optimal separating hyperplanes and performing a linear discriminant analysis on the support vectors. Experiments on facial expression and object recognition validate the effectiveness of the proposed method against state-of-the-art dimensionality reduction algorithms.
Similar papers:
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Learning Expressionlets on Spatio-Temporal Manifold for Dynamic Facial Expression Recognition [pdf] - Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
  • Facial Expression Recognition via a Boosted Deep Belief Network [pdf] - Ping Liu, shizhong han, zibo meng, Yan Tong
#1943 - Dual Linear Regression Based Classification for Face Cluster Recognition [pdf]
Liang Chen

Abstract: We are dealing with the face cluster recognition problem where there are multiple images per subject in both gallery and probe sets. It is never guaranteed to have a clear spatio-temporal relation among the multiple images of each subject. Considering that the image vectors of each subject, either in gallery or in probe, span a subspace; an algorithm, Dual Linear Regression Classification (DLRC), for the face cluster recognition problem is developed where the distance between two subspaces is defined as the similarity value between a gallery subject and a probe subject. DLRC attempts to find a ``virtual" face image located in the intersection of the subspaces spanning from both clusters of face images. The ``distance" between the ``virtual" face images reconstructed from both subspaces is then taken as the distance between these two subspaces. We further prove that such distance can be formulated under a single linear regression model where we indeed can find the ``distance" without reconstructing the ``virtual" face images. Extensive experimental evaluations demonstrated the effectiveness of DLRC algorithm compared to other algorithms.
Similar papers:
  • Distance Encoded Product Quantization [pdf] - Jae-Pil Heo, Zhe Lin, Sung-eui Yoon
  • 3D-aided face recognition robust to expression and pose variations [pdf] - Baptiste Chu, Sami Romdhani, Liming Chen
  • Semi-Supervised Coupled Dictionary Learning for Person Re-identification [pdf] - Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, Jiajun Bu
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
#1954 - Transitive Distance Clustering with K-Means Duality [pdf]
Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu

Abstract: We propose a very intuitive and simple approximation for the conventional spectral clustering methods. It effectively alleviates the computational burden of spectral clustering - reducing the time complexity from O(n^3) to O(n^2) - while capable of gaining better performance in our experiments. Specifically, by involving a more realistic and effective distance and the "k-means duality" property, our algorithm can handle datasets with complex cluster shapes, multi-scale clusters and noise. We also show its superiority in a series of its real applications on tasks including digit clustering as well as image segmentation.
Similar papers:
  • Decomposable Nonlocal Tensor Dictionary Learning for Multispectral Image Denoising [pdf] - Yi Peng, Deyu Meng, Zongben Xu, Biao Zhang, Chenqiang Gao, Yang Yi
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
#1958 - Learning Receptive Fields for Pooling from Tensors of Feature Response [pdf]
Can Xu, Nuno Vasconcelos

Abstract: A new method for learning pooling receptive fields for recognition is presented. The method exploits the statistics of the 3D tensor of SIFT responses to an image. It is argued that the eigentensors of this tensor contain the information necessary for learning class-specific pooling receptive fields. It is shown that this information can be extracted by a simple PCA analysis of a specific tensor flattening. A novel algorithm is then proposed for fitting box-like receptive fields to the eigenimages extracted from a collection of images. The resulting receptive fields can be combined with any of the recently popular coding strategies for image classification. This combination is experimentally shown to improve classification accuracy for both vector quantization and Fisher vector (FV) encodings. It is then shown that the combination of the FV encoding with the proposed receptive fields has state-of-the-art performance for both object recognition and scene classification. Finally, when compared with previous attempts at learning receptive fields for pooling, the method is simpler and achieves better results.
Similar papers:
  • Orientational Pyramid Matching for Recognizing Indoor Scenes [pdf] - Lingxi Xie, Jingdong Wang, Bo Zhang, Qi Tian
  • Bags of Spacetime Energies for Dynamic Scene Recognition [pdf] - Christoph Feichtenhofer, Axel Pinz, Richard Wildes
  • Generalized Max Pooling [pdf] - Naila Murray, Florent Perronnin
  • Ask the image: supervised pooling to preserve feature locality [pdf] - Sean Ryan Fanello, Nicoletta Noceti, Carlo Ciliberto, Giorgio Metta, Francesca Odone
#1962 - Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera [pdf]
Avinash Kumar, Narendra Ahuja

Abstract: We consider the problem of calibrating a small field of view central perspective non-frontal camera whose lens and sensor may not lie on parallel planes due to manufacturing imperfections or intentional tilting. Generally, all lens-sensor configurations can be modeled as non-frontal with varying degrees. For modeling non-frontal sensors, approaches based on generic rotation matrix (three Euler angles) relating lens and sensor, lead to additional degrees of freedom which make linear calibration equations under-determined. This problem is altogether avoided by a different decentering distortion based approach, which models the effect of non-frontalness on image formation. This model is approximate, can handle only small tilts and cannot estimate the tilt explicitly. Thus, it cannot be used to calibrate cameras where tilt is important, \eg tilt-shift camera. Also, calibrating a rotation-based non-frontal sensor in a pupil-centric setting has been shown to be more accurate in estimating sensor tilt as compared to using a thin-lens setting. But, prior work has developed pupil-centric imaging for a single axis lens-sensor tilt while real cameras have arbitrary tilt. In this paper, we focus on non-frontal calibration based on rotation modeling and first show that only two Euler angles are sufficient to parameterize sensor tilt. Second, we generalize pupil-centric imaging for arbitrary rotated sensor. Third, we propose to use a novel pupil-centric base
Similar papers:
  • Raw-to-raw: Mapping between image sensor color responses [pdf] - Rang Nguyen, Dilip Prasad, Michael Brown
  • Photometric Bundle Adjustment for Dense Multi-View 3D Modeling [pdf] - Amal Delaunoy, Marc Pollefeys
  • Two-View Camera Calibration for Multi-Layer Flat Refractive Interface [pdf] - Xida Chen, Yee Hong Yang
  • Simultaneous Localization and Calibration [pdf] - Qian-Yi Zhou, Vladlen Koltun
#1963 - Confidence-Rated Multiple Instance Boosting for Object Detection [pdf]
Karim Ali, Kate Saenko

Abstract: Over the past years, Multiple Instance Learning (MIL) has proven to be an effective framework for learning with weakly labeled data. Applications of MIL to object detection, however, were limited to handling the uncertainties of manual annotations. In this paper, we propose a new MIL method for object detection that is capable of handling noisier automatically obtained annotations. Our approach consists in first obtaining confidence estimates over the label space and second incorporating these estimates within a new Boosting procedure. We demonstrate the efficiency of our procedure on two detection tasks, namely horse detection and pedestrian detection, where the training data is primarily annotated by a coarse area of interest detector and show substantial improvements over existing MIL methods. In both cases, we demonstrate that an efficient appearance model can be learned using our approach.
Similar papers:
  • Beta Process Multiple Kernel Learning [pdf] - Bingbing Ni, Pierre Moulin
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation [pdf] - Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Zhuowen Tu
  • Informed Haar-like Features Improve Pedestrian Detection [pdf] - Shanshan Zhang, Christian Bauckhage, Armin Cremers
#1964 - 6 Seconds of Sound and Vision: Creativity in Micro-Videos [pdf]
Miriam Redi, Michele Trevisiol, Rossano Schifanella, neil O'Hare, Alejandro Jaimes

Abstract: The general notion of creativity, as opposed to related concepts such as beauty or interestingness, has not been studied from the perspective of automatic analysis of multimedia content. Meanwhile, short online videos shared on social media platforms, or micro-videos, have arisen as a new medium for creative expression. In this paper we study creative micro-videos in an effort to understand the features that make a video creative, and to address the problem of automatic detection of creative content. Defining creative video as videos that are novel and have aesthetic value, we conduct a crowdsourcing experiment to create a dataset of 4,000 micro-videos labelled as creative and non-creative. We propose a set of computational features that we map to the components of our definition of creativity, and conduct an analysis to determine which of these features correlate most with creative video. Finally, we evaluate a supervised approach to automatically detect creative video, with promising results, showing that it is necessary to model both aesthetic value and novelty to achieve optimal classification accuracy.
Similar papers:
  • Jointly Summarizing Large-Scale Web Images and Videos for the Storyline Reconstruction [pdf] - Gunhee Kim, Leonid Sigal, Eric Xing
  • Visual Persuasion: Inferring the Communicative Intents of Images [pdf] - Jungseock Joo, Weixin Li, Francis Steen, Song Chun Zhu
  • Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts [pdf] - Shuang Wu, Florian Luisier, Sravanthi Bondugula, Pradeep Natarajan
  • Seeing the Arrow of Time [pdf] - Lyndsey Pickup, Zheng Pan, Donglai Wei, Yichang Shih, Andrew Zisserman, Bill Freeman, Bernhard Schoelkopf
#1972 - RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf]
Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic

Abstract: Construction of Facial Deformable Models (FDMs) is a very active research field in Computer Vision, mainly due to their numerous applications, and to the very challenging nature of the problem itself: face is a highly deformable object, the appearance of which drastically changes under different poses, expressions and illuminations. Although several methodologies for constructing generic FDMs, that can be robustly fitted in static images, mainly for facial landmark localization, have recently appeared, when it comes to tasks that require very high accuracy, for example behaviour analysis or facial motion capture, persons pecific FDMs are mainly applied, requiring manual facial landmark annotation for each person and person-specific training. Recently, due to advancements on automatic subspace recovery and image congealing it was made possible to learn a person-specific model by applying image congealing methodologies to a set of images of the person. Unfortunately, these methodologies involve time consuming optimization procedures requiring eigendecompositions of high-dimensional matrices. In this paper, by using a generic texture model we show that is not only possible to reduce the computational complexity but also to increase landmark localization accuracy. Finally, we show that the proposed method is not only faster but also robust to gross non-Gaussian noise compared to the state-of-the art methods.
Similar papers:
  • Automatic Construction of Deformable Models In-The-Wild [pdf] - Epameinondas Antonakos, Stefanos Zafeiriou
  • A Hierarchical Probabilistic Model for Facial Feature Detection [pdf] - Yue Wu, Ziheng Wang, Qiang Ji
  • Unified Face Analysis by Iterative Multi-Output Random Forests [pdf] - Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
  • Using a deformation field model for localizing faces and facial points under weak supervision [pdf] - Marco Pedersoli, Tinne Tuytelaars, Luc Van Gool
#1982 - Persistence-based Object Recognition [pdf]
Frederic Chazal, MAksim Ovsjanikov, Chunyuan Li

Abstract: This paper presents a framework for object recognition using topological persistence. In particular, we show that the so-called persistence diagrams built from functions defined on the objects can serve as compact and informative descriptors for images and shapes. Complementary to the bag-of-features representation, which captures the distribution of values of a given function, persistence diagrams can be used to characterize its structural properties, reflecting spatial information in an invariant way. In practice, the choice of function is simple: each dimension of the feature vector can be viewed as a function. The proposed method is general: it can work on various multimedia data, including 2d shapes, textures and triangle meshes. Extensive experiments on 3D shape retrieval, hand gesture recognition and texture classification demonstrate the performance of the proposed method in comparison with state-of-the-art methods. Additionally, our approach yields higher recognition accuracy when used in conjunction with the bag-of-features.
Similar papers:
  • The Synthesizability of texture examples [pdf] - Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
  • Lacunarity Analysis on Image Patterns for Texture Classification [pdf] - Yuhui Quan, Yong Xu, Yuping Sun, Yu Luo
  • Stable and Informative Spectral Signatures for Graph Matching [pdf] - Nan Hu, Raif Rustamov, Leonidas J. Guibas
  • Fast and robust identification of persistent homotopy types of noisy images [pdf] - Vitaliy Kurlin
#1983 - Region-based particle filter for video object segmentation [pdf]
David Varas, Ferran Marques

Abstract: We present a video object segmentation approach that extends the particle filter to a region-based image representation. Image partition is considered part of the particle filter measurement, which enriches the available information and leads to a re-formulation of the particle filter. The prediction step uses a co-clustering between the previous image object partition and a partition of the current one, which allows us to tackle the evolution of non-rigid structures. Particles are defined as unions of regions in the current image partition and their propagation is computed through a single co-clustering. The proposed technique is assessed quantitatively on the SegTrack dataset and qualitatively on the LabelMe Video dataset, leading to satisfactory perceptual results outperforming state-of-the-art methods.
Similar papers:
  • Visual Tracking via Probability Continuous Outlier Model [pdf] - Dong Wang, Huchuan Lu
  • Scalable 3D Tracking of Multiple Interacting Objects [pdf] - Nikolaos Kyriazis, Antonis Argyros
  • Non-Parametric Bayesian Constrained Local Models [pdf] - Pedro Martins, Rui Caseiro, Jorge Batista
  • Diversity-Enhanced Condensation Algorithm and Its Application for Robust and Accurate Endoscope Electromagnetic Tracking [pdf] - Ying Wan, Xiongbiao Luo, Sean He, Jie Yang, Terry Peters, kensaku Mori
#1997 - Fast and robust identification of persistent homotopy types of noisy images [pdf]
Vitaliy Kurlin

Abstract: We present a fast algorithm to identify the topological shape (homotopy type) of a noisy dotted image in the plane. The algorithm has O(n log n) time and O(n) space in the number n of points in a given image. The only input is a point cloud. The output is the number of all non-trivial loops that persist (have a long life span) when the image is analyzed at all possible scales. We give theoretical guarantees when the algorithm correctly identifies the homotopy type by using only a noisy sample of a triangulable set.
Similar papers:
  • Fast Approximate Inference in Higher Order MRF-MAP Labeling Problems [pdf] - Chetan Arora, S.N. Maheshwari, Subhashis Banerjee, Prem Kalra
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Separation of Line Drawings Based on Split Faces for 3D Object Reconstruction [pdf] - Changqing ZOU
  • Persistence-based Object Recognition [pdf] - Frederic Chazal, MAksim Ovsjanikov, Chunyuan Li
#1999 - Learning optimal features for salient object detection [pdf]
Song Lu, Vijay Mahadevan, Nuno Vasconcelos

Abstract: We introduce a novel approach for salient object detection. The approach starts by partitioning an image into superpixels, and computing two types of features for each superpixel. One is the bottom-up saliency of the superpixel region, and the other is a set of "objectness" features that are informative of how likely the superpixel is to be part of an object. A graph is then formed with the superpixels as nodes, and edge weights representing a measure of similarity between two superpixels. Starting from an arbitrary initialization, the saliency information is propagated over the graph using a random walk process, whose equilibrium state yields the object saliency map. Unlike other graph based salient object detection approaches, we learn the initial salient seed locations using a large margin framework. We show that the proposed approach outperforms the state of the art on a number of salient object detection datasets.
Similar papers:
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • A Reverse Hierarchy Model for Predicting Eye Fixations [pdf] - Tianlin Shi, Xiaolin Hu, Ming Liang
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
#2002 - Fast and Reliable Two-View Translation Estimation [pdf]
Johan Fredriksson, Olof Enqvist, Fredrik Kahl

Abstract: It has long been recognized that one of the fundamental difficulties in the estimation of two-view epipolar geometry is the capability of handling outliers. In this paper, we develop a fast and tractable algorithm that maximizes the number of inliers under the assumption of a purely translating camera. Compared to classical random sampling methods, our approach is guaranteed to compute the optimal solution of a cost function based on reprojection errors and it has better time complexity. The performance is in fact independent of the inlier/outlier ratio of the data. This opens up for a more reliable approach to robust ego-motion estimation. Our basic translation estimator can be embedded into a system that computes the full camera rotation. We demonstrate the applicability in several difficult settings with large amount of outliers. It turns out to be particularly tractable for small rotations and rotations around one axis (which is the case for cellular phones where the gravitation axis can be measured). Experimental results show that compared to standard \textsc{ransac} methods based on minimal solvers, our algorithm produces more accurate estimates in the presence of large outlier ratios.
Similar papers:
  • Fast Rotation Search with Stereographic Projections for 3D Registration [pdf] - Alvaro Parra Bustos, Tat-Jun Chin, David Suter
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • Accurate Localization and Pose Estimation for Large 3D Models [pdf] - Linus Svrm, Olof Enqvist, Magnus Oskarsson, Fredrik Kahl
  • Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection [pdf] - Luis Ferraz, Xavier Binefa, Francesc Moreno-Noguer
#2003 - Single Image Super-resolution using Deformable Patches [pdf]
Yu Zhu, Yanning Zhang, Alan Yuille

Abstract: We proposed a deformable patch based method for single image super-resolution. By the concept of deformation, a patch is not regarded as a fixed vector but a flexible deformation flow. Via deformable patches, the dictionary can cover more patterns that do not appear, thus becoming more expressive. We present the energy function with slow, smooth and flexible prior for deformation model. During example-based super-resolution, we develop the deformation similarity based on the minimized energy function for basic patch matching. For robustness, we involve multiple deformed patches combination for the final reconstruction. Experiments evaluate the deformation effectiveness and super-resolution visual quality, showing that the deformable patch helps improve the representation accuracy and perform better results than the state-of-art methods.
Similar papers:
  • Detecting Objects using Deformation Dictionaries [pdf] - Bharath Hariharan, Piotr Dollar, Larry Zitnick
  • Learning Mid-level Filters for Person Re-identification [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Deformable Object Matching via Deformation Decomposition based 2D Label MRF [pdf] - Kangwei Liu, zhang Junge, Kaiqi Huang, Tieniu Tan
  • Modeling Image Patches with a Generic Dictionary of Mini-Epitomes [pdf] - George Papandreou, Liang-Chieh Chen, Alan Yuille
#2005 - Learning to Group Objects [pdf]
Victoria Yanulevskaya, Jasper Uijlings, Nicu Sebe

Abstract: This paper presents a novel method to generate a hypothesis set of class-independent object regions. It has been shown that such object regions can be used to focus computer vision techniques on the parts of an image that matter most leading to significant improvements in both object localisation and semantic segmentation in recent years. Of course, the higher quality of class-independent object regions, the better subsequent computer vision algorithms can perform. In this paper we focus on generating higher quality object hypotheses. We start from an oversegmentation for which we propose to extract a wide variety of region-features. We group regions together in a hierarchical fashion, for which we train a Random Forest which predicts at each stage of the hierarchy the best possible merge. Hence unlike other approaches, we use relatively powerful features and classifiers at an early stage of the generation of likely object regions. Finally, we identify and combine stable regions in order to capture objects which consist of dissimilar parts. We show on the PASCAL 2007 and 2012 datasets that our method yields higher quality regions than competing approaches while it is at the same time more computationally efficient.
Similar papers:
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
  • Discrete-Continuous Gradient Orientation Estimation for Faster Unsupervised Segmentation [pdf] - Michael Donoser, Dieter Schmalstieg
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
#2010 - Object Discovery and Segmentation via Discriminative Visual Subcategories [pdf]
Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

Abstract: In this paper, we propose a simple yet surprisingly powerful approach that combines the power of generative modeling for segmentation with effectiveness of discriminative models for detection to propose an algorithm that can discover objects and their segmentations from noisy Internet images. The key idea behind our approach is to learn and exploit top-down priors for joint segmentation. Unlike previous approaches which build a single prior model for each semantic class, our approach develops prior models for visually homogeneous clusters called visual subcategories. Our approach jointly discovers these visual subcategories and learns segmentation prior models for each subcategory. The strong priors learned from these visual subcategories are then combined with discriminatively trained detectors and bottom up cues to produce clean object segmentations. Our experimental results indicate state-of-the-art performance on the difficult dataset introduced by [34].
Similar papers:
  • Multi-fold MIL Training for Weakly Supervised Object Localization [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Transitive Distance Clustering with K-Means Duality [pdf] - Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu
  • Semi-supervised Spectral Clustering for Image Set Classification [pdf] - Arif Mahmood, Ajmal Mian, Robyn Owens
  • Modeling long-tail distributions of object subcategories [pdf] - Xiangxin Zhu, Dragomir Anguelov, Deva Ramanan
#2017 - Better Shading for Better Shape Recovery [pdf]
Moumen El-Melegy

Abstract: The basic idea of shape from shading is to infer the shape of a surface from its shading information in a single image. Since this problem is ill-posed, a number of simplifying assumptions have been often used. However they rarely hold in practice. This paper presents a simple shading-correction algorithm that transforms the image to a new image that better satisfies the assumptions typically needed by existing algorithms, thus improving the accuracy of shape recovery. The algorithm takes advantage of some local shading measures that have been driven under these assumptions. The method is successfully evaluated on real data with ground-truth 3D shapes.
Similar papers:
  • Reliable Multi-view Stereopsis Evaluation [pdf] - Anders Dahl, Henrik Aans, Rasmus Jensen, George Vogiatzis, Engin Tola
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
  • Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo [pdf] - DI XU, Qi Duan, Jianmin Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf] - Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
#2022 - Human Pose Estimation: New Benchmark and State of the Art Analysis [pdf]
Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele

Abstract: Human pose estimation has made significant progress during the last years. However current benchmark datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we introduce a novel benchmark dataset that makes a significant advance in terms of diversity and difficulty, a contribution that we feel is required for future developments in human body models. This comprehensive dataset was collected using an established taxonomy of over 600 human activities. The collected images cover a wider range of human poses than previous datasets such as sports, recreational activities and householding, and also includes special cases such as frontal views and strongly articulated people. We provide a rich set of labels including positions of body joints, full 3D torso and head orientation, occlusion labels for joints and body parts, along activity labels. For each image we are providing adjacent video frames to facilitate the use of motion information. Given these rich annotations we perform a detailed analysis of leading human pose estimation approaches and provide insights for the success and failures of these methods.
Similar papers:
  • Multi-source Deep Learning for Human Pose Estimation [pdf] - Wanli Ouyang, Xiaogang Wang, Xiao Chu
  • Mixing Body-Part Sequences for Human Pose Estimation [pdf] - Anoop Cherian, Julien Mairal, Karteek Alahari, Cordelia Schmid
  • Discriminative Hierarchical Modeling of Spatio-Temporally Composable Human Activities [pdf] - Ivan Lillo, Juan Carlos Niebles, Alvaro Soto
  • Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts [pdf] - Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Nam-Gyu Cho, Sanja Fidler, Raquel Urtasun, Alan Yuille
#2027 - Automatic Construction of Deformable Models In-The-Wild [pdf]
Epameinondas Antonakos, Stefanos Zafeiriou

Abstract: Deformable objects are everywhere. Faces, cars, bicycles, chairs etc. Recently, there has been a wealth of research on training deformable models for object detection, part localization and recognition using annotated data. In order to train deformable models with good generalization ability, a large amount of carefully annotated data is required, which is a highly time consuming and costly task. We propose the first - to the best of our knowledge - method for automatic construction of deformable models using images captured in totally unconstrained conditions, recently referred to as in-the-wild. The only requirements of the method are a crude bounding box object detector and a priori knowledge of the objects shape (e.g. a point distribution model). The object detector can be as simple as the Viola-Jones algorithm (e.g. even the cheapest digital camera features a robust face detector). The 2D shape model can be created simply by deforming and projecting to the camera plane a 3D CAD model of the object. In our experiments on facial deformable models, we show that the proposed automatically built model not only performs well, but also outperforms discriminative models trained on carefully annotated data. To the best of our knowledge, this is the first time it is shown that an automatically constructed model can perform as well as methods trained directly on annotated data.
Similar papers:
  • Incremental Face Alignment in the Wild [pdf] - Akshay Asthana, Stefanos Zafeiriou, Shiyang Cheng, Maja Pantic
  • Probabilistic Active Appearance Models [pdf] - Joan Alabort-i-Medina, Stefanos Zafeiriou
  • Gauss-Newton Constrained Local Models [pdf] - GEORGIOS TZIMIROPOULOS, Maja Pantic
  • RAPS: Robust and Efficient Automatic Construction of Person-Specific Deformable Models [pdf] - Christos Sagonas, Stefanos Zafeiriou, Yannis Panagakis, Maja Pantic
#2039 - Scalable Object Detection using Deep Neural Networks [pdf]
Dumitru Erhan, Christian Szegedy, Alexander Toshev, Dragomir Anguelov

Abstract: Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bounding box and a confidence score for each object category in the image. Such a model captures the whole-image context around the objects but cannot handle multiple instances of the same object in the image without naively replicating the number of outputs for each instance. In this work, we propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing \textit{any} object of interest. The model naturally handles a variable number of instances for each class and allows for cross-class generalization at the highest levels of the network. We are able to obtain competitive recognition performance on VOC2007 and ILSVRC2012, while using only the top few predicted locations in each image and a small number of neural network evaluations.
Similar papers:
  • Action localization by tubelets from motion [pdf] - Mihir Jain, Jan Van Gemert, Herve Jegou, Patrick Bouthemy, Cees Snoek
  • Structured Output Random Forests for Accurate Object Detection [pdf] - Samuel Schulter, Christian Leistner, Peter Roth, Horst Bischof
  • Multiple Structured-Instance Learning for Semantic Segmentation with Uncertain Training Data [pdf] - Feng-Ju Chang, Yen-Yu Lin, Kuang-Jui Hsu
  • Co-localization in Real-World Images [pdf] - Kevin Tang, Armand Joulin, Li-Jia Li, Li Fei-Fei
#2040 - The Photometry of Intrinsic Images [pdf]
Marc Serra, Robert Benavente, Maria Vanrell, Dimitris Samaras, Olivier Penacchio

Abstract: Intrinsic characterization of scenes is often the best way to overcome the illumination variability artifacts that complicate most computer vision problems from 3D reconstruction to object or material recognition. This paper examines the deficiency of existing intrinsic image models to accurately account for the effects of illuminant color and sensor characteristics in the estimation of intrinsic images and presents a generic framework which incorporates insights from color constancy research to the intrinsic image decomposition problem. The proposed mathematical formulation includes information about the color of the illuminant and the effects of the camera sensors, both of which modify the observed color of the reflectance of the objects in the scene during the acquisition process. By modeling these effects, we get a "truly intrinsic" reflectance image, which we call absolute reflectance, which is invariant to changes of illuminant or camera sensors. This model allow us to represent a wide range of intrinsic image decompositions depending on the specific assumptions on the geometric properties of the scene configuration and the spectral properties of the light source and the acquisition system, thus unifying previous models in a single general framework. We demonstrate that even partial information about sensors improves significantly the estimated reflectance images, thus making our method applicable for a wide range of sensors. We validate our general intrinsic imag
Similar papers:
  • Aliasing Detection and Reduction in Plenoptic Imaging [pdf] - Zhaolin Xiao, Qing Wang, Jingyi Yu, Guoqing Zhou
  • Deblurring Low-light Images with Light Streaks [pdf] - Zhe Hu, Sunghyun Cho, Jue Wang, Ming-Hsuan Yang
  • Better Shading for Better Shape Recovery [pdf] - Moumen El-Melegy
  • Calibrating a non-isotropic near point light source using a plane [pdf] - Jaesik Park, Sudipta Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon
#2047 - Scalable Multitask Representation Learning for Scene Classification [pdf]
Maksim Lapin, Matthias Hein, Bernt Schiele

Abstract: The basic idea of multitask learning is that learning tasks jointly is better than learning each task individually. In particular, if only a few training samples are available for each task, sharing a jointly trained representation with related tasks helps to improve performance. In this paper we propose a novel multitask learning method which jointly learns a low-dimensional representation and the corresponding classifiers thus profiting from inter-class relations. Our method scales with respect to the original dimension of the features and thus can be used for very high-dimensional feature representations such as the Fisher Vector. Our multitask learning approach outperforms the current state of the art on the SUN397 scene classification benchmark consistently for varying numbers of training samples.
Similar papers:
  • Product Sparse Coding [pdf] - Tiezheng Ge, Kaiming He, Jian Sun
  • A Principled Approach for Coarse-to-Fine MAP Inference [pdf] - Christopher Zach
  • Fast and Robust Archetypal Analysis for Representation Learning [pdf] - Yuansi Chen, Julien Mairal, Zaid Harchaoui
  • Deep Fisher Kernels [pdf] - Mayu Sakurada, Vladyslav Sydorov , Christoph Lampert
#2051 - Incorporating Scene Context and Object Layout into Appearance Modeling [pdf]
Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi

Abstract: A scene category imposes tight distributions over the kind of objects that might appear in the scene, the appearance of those objects and their layout. In this paper, we propose a method to learn scene structures that can encode three main interlacing components of a scene: the scene category, the context-specific appearance of objects, and their layout. Our experimental evaluations show that our learned scene structures outperform state-of-the-art method of Deformable Part Models in detecting objects in a scene. Our scene structure provides a level of scene understanding that is amenable to deep inferences such as intelligent predictions about a covered part of an image. The scene structures can also generate features that can later be used for scene categorization. We also show promising results on scene categorization.
Similar papers:
  • Are Cars Just 3D Boxes? - Jointly Estimating the 3D Shape of Multiple Objects [pdf] - Muhammad Zeeshan Zia, Michael Stark, Konrad Schindler
  • The Role of Context for Object Detection and Semantic Segmentation in the Wild [pdf] - Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, Alan Yuille
  • When 3D Reconstruction Meets Ubiquitous RGB-D Images [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
  • Orientational Pyramid Matching for Recognizing Indoor Scenes [pdf] - Lingxi Xie, Jingdong Wang, Bo Zhang, Qi Tian
#2052 - Instance-weighted Transfer Learning of Active Appearance Models [pdf]
Daniel Haase, Erik Rodner, Joachim Denzler

Abstract: There has been a lot of work on face modeling, analysis, and landmark detection, with Active Appearance Models being one of the most successful techniques. A major drawback of these models is the large number of detailed annotated training examples needed for learning. Therefore, we present a transfer learning method that is able to learn from related training data using an instance-level transfer technique. Our method is derived using a generalization of importance sampling and in contrast to previous work we explicitly try to tackle the transfer already during learning instead of adapting the fitting process. In our studied application of face landmark detection, we efficiently transfer facial expressions from other human individuals and are thus able to learn a precise face active appearance model only from neutral faces of a single individual. Our approach is evaluated on two common face datasets and outperforms previous transfer method.
Similar papers:
  • Automatic Face Reenactment [pdf] - Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormaehlen, Patrick Perez, Christian Theobalt
  • Nonparametric Part Transfer for Fine-grained Recognition [pdf] - Christoph Gring, Erik Rodner, Alexander Freytag, Joachim Denzler
  • Learning to Learn, from Transfer Learning to Domain Adaptation: A Unifying Perspective [pdf] - Novi Patricia, Barbara Caputo
  • Color Transfer using Probabilistic Moving Least Squares [pdf] - Youngbae Hwang, Joon-Young Lee, In So Kweon, Seon Joo Kim
#2064 - FastSeg: More Efficiency on Multiple Figure-Ground Segmentations [pdf]
ahmad Humayun, Fuxin Li, James Rehg

Abstract: Recently, figure-ground segmentation algorithms that generate a pool of overlapping segment proposals have been popular. These algorithms have high recall on most objects in a scene and could be used to generate boundary-aligning proposals for subsequent object recognition engines, achieving excellent performance. What has remained unexplored is the idea of obtaining such a hypotheses pool in a computationally efficient way. By precomputing a graph which can be used for parametric min-cut over different seed enumerations, we save time spent on generating the segment pool. Besides, we have made design choices that avoid extensive computations and achieve better efficiency without losing performance. In particular, we show the segmentation performance of our algorithm is similar to the state-of-the-art on the PASCAL VOC dataset, while being an order of magnitude faster.
Similar papers:
  • Submodular Object Recognition [pdf] - Fan Zhu, Zhuolin Jiang, Ling Shao
  • Generating object segmentation proposals using global and local search [pdf] - Pekka Rantalankila, Juho Kannala, Esa Rahtu
  • Reconstructing Evolving Tree Structures in Time Lapse Sequences [pdf] - Przemysaw Gowacki, Miguel Pinheiro, Raphael Sznitman , Engin Turetken, Daniel Lebrecht, Anthony Holtmaat, Jan Kybic, Pascal Fua
  • Co-Segmentation of Textured 3D Shapes with Sparse Annotations [pdf] - Mehmet Yumer, Won Chun, Ameesh Makadia
#2072 - Random Laplace Feature Maps for Semigroup Kernels on Histograms [pdf]
Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael Mahoney

Abstract: To dramatically accelerate the training and testing complexity of nonlinear kernel methods, several recent papers have proposed explicit embeddings of the input data into low-dimensional feature spaces where fast linear methods can instead be used to generate approximate solutions. Analogous to random Fourier feature maps to approximate shift-invariant kernels, such as the Gaussian kernel, on R^d, we develop a new randomized technique called random Laplace features, to approximate a family of kernel functions adapted to the semigroup structure of R_+^d. This is the space in which histograms and other non-negative data representations reside. We provide theoretical results on the uniform convergence of random Laplace features. Empirical analyses on image classification and surveillance event detection tasks demonstrates the attractiveness of using random Laplace features relative to several other feature maps proposed in the literature.
Similar papers:
  • Asymmetric sparse kernel approximations for large-scale visual search [pdf] - Damek Davis, Stefano Soatto, Jonathan Balzer
  • Human Action Recognition Based on Context-Dependent Graph Kernels [pdf] - Baoxin Wu, Chunfeng Yuan, Weiming Hu
  • Simultaneous Twin Kernel Learning for Structured Prediction [pdf] - Chetan Tonde, Ahmed Elgammal
  • Spectral Clustering with Jensen-type kernels and their multi-point extensions [pdf] - Debarghya Ghoshdastidar, Ambedkar Dukkipati, Ajay Adsul, Aparna Vijayan
#2080 - Laplacian Coordinates for Seeded Image Segmentation [pdf]
Wallace Casaca, Gustavo Nonato, Gabriel Taubin

Abstract: Seed-based image segmentation methods have gained much attention lately, mainly due to their good performance in segmenting complex images with little user interaction. Such popularity leveraged the development of many new variations of seed-based image segmentation, which vary greatly regarding mathematical formulation and complexity. Most existing methods in fact rely on complex mathematical formulations that typically do not guarantee unique solution for the segmentation problem while still being prone to be trapped in local minima. In this work we present a novel framework for seed-based image segmentation that is mathematically simple, easy to implement, and guaranteed to produce a unique solution. Moreover, the formulation holds an anisotropic behavior, that is, pixels sharing similar attributes are kept closer to each other while big jumps are naturally imposed on the boundary between image regions, thus ensuring better fitting on object boundaries. We show that the proposed framework outperform state-of-the-art techniques in terms of quantitative quality metrics as well as qualitative visual results.
Similar papers:
  • Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision [pdf] - Liang-Chieh Chen, Sanja Fidler, Alan Yuille, Raquel Urtasun
  • Edge-aware Gradient Domain Optimization Framework for Image Filtering by Local Propagation [pdf] - Miao Hua, Xiaohui Bie, Wencheng Wang
  • Multi-feature Spectral Clustering with Minimax Optimization [pdf] - Hongxing Wang, Chaoqun Weng, Junsong Yuan
  • Joint Unsupervised Multi-Class Image Segmentation [pdf] - Fan Wang, Qixing Huang, Maks Ovsjanikov, Leonidas J. Guibas
#2081 - Quality Dynamic Human Body Modeling Using a Single Low-cost Depth Camera [pdf]
Qing Zhang, BO FU

Abstract: In this paper we present a novel autonomous pipeline to build the personalized parametric model (pose-driven avatar) only using a single depth sensor. Our method first captures a few high-quality scans of the user rotating herself at multiple poses from different views. We fit each incomplete scan using template fitting techniques with a generic human template, and register all scans to every pose using global consistency constraints. After registration, these watertight models under different poses are used to train a parametric model in a fashion similar to the SCAPE method. Once the parametric model is built, it can be used as an animitable avatar or more interestingly creating dynamic 3D models from single-view depth videos. Experimental results demonstrate the effectiveness of our system to produce dynamic models.
Similar papers:
  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement [pdf] - Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence [pdf] - Ruizhe Wang, Jongmoo Choi, Gerard Medioni
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • User-Specific Hand Modeling from Monocular Depth Sequences [pdf] - Jonathan Taylor, Richard Stebbing, Varun Ramakrishna, Cem Keskin, Jamie Shotton, Shahram Izadi, Andrew Fitzgibbon, Aaron Hertzmann
#2083 - A Reverse Hierarchy Model for Predicting Eye Fixations [pdf]
Tianlin Shi, Xiaolin Hu, Ming Liang

Abstract: A number of psychological and physiological evidences suggest that visual attention works in a coarse-to-fine way, which lays a basis for the reverse hierarchy theory (RHT). This theory states that attention propagates from the top level of the visual hierarchy that processes gist and abstract information of input, to the bottom level that processes local details. Inspired by the theory, we develop a computational model for saliency detection in images. First, the original image is downsampled to different scales to constitute a fine-to-coarse pyramid. Then, saliency on each layer is obtained by image super-resolution reconstruction from the layer above, which is defined as unpredictability from this coarse to fine reconstruction. Finally, the saliency on each layer of the pyramid is converted into stochastic fixations through a probabilistic model, where attention initiates from the top layer and propagates downward the pyramid. Extensive experiments on two standard eye-tracking datasets show that the proposed method can achieve competitive results with state-of-the-art models.
Similar papers:
  • Time-Mapping Using Space-Time Saliency [pdf] - Feng Zhou, Sing Bing Kang, Michael Cohen
  • Salient Region Detection via High-Dimensional Color Transform [pdf] - Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
  • Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images [pdf] - Eleonora Vig, Michael Dorr, David Cox
  • Learning optimal features for salient object detection [pdf] - Song Lu, Vijay Mahadevan, Nuno Vasconcelos
#2102 - A Novel Chamfer Template Matching Method Using Variational Mean Field [pdf]
Thanh Nguyen

Abstract: This paper proposes a novel mean field-based Chamfer template matching method. In our method, each template is represented as a field model and matching a template with an input image is formulated as estimation of a maximum of posterior in the field model. Variational approach is then adopted to approximate the estimation. The proposed method was applied for two different variants of Chamfer template matching and evaluated through the task of object detection. Experimental results on benchmark datasets including ETHZShapeClass and INRIAHorse have shown that the proposed method could significantly improve the accuracy of template matching while did not much sacrifice the efficiency. Comparisons with other recent template matching algorithms have also shown the robustness of the proposed method.
Similar papers:
  • Unsupervised Learning of Dictionaries of Hierarchical Compositional Models [pdf] - Jifeng Dai, Yi Hong, WENZE Hu, Ying Nian Wu
  • Multi-Forest Tracker: A Chameleon in Tracking [pdf] - DAVID JOSEPH TAN, Slobodan Ilic
  • Real-time Simultaneous Pose and Shape Estimation for Articulated Objects with a Single Depth Camera [pdf] - Mao Ye, Ruigang Yang
  • Immediate, scalable object category detection [pdf] - Yusuf Aytar, Andrew Zisserman
#2113 - Efficient pruning LMI conditions for Branch-and-Prune Rank and Chirallity-Constrained Estimation of the Dual Absolute Quadric [pdf]
Adlane Habed, Danda Pani Paudel, Cdric Demonceaux, David Fofi

Abstract: We present a new globally optimal algorithm for self-calibrating a moving camera with constant parameters. Our method aims at estimating the Dual Absolute Quadric (DAQ) under the rank-3 and, optionally, camera centers chirality constraints. We employ the Branch-and-Prune paradigm and explore the space of only 5 parameters. Pruning in our method relies on solving Linear Matrix Inequality (LMI) feasibility and Generalized Eigenvalue (GEV) problems that solely depend upon the entries of the DAQ. These LMI and GEV problems are used to rule out branches in the search tree in which a quadric not satisfying the rank and chirality conditions on camera centers is guaranteed not to exist. The chirality LMI conditions are obtained by relying on the mild assumption that the camera undergoes a rotation of no more than $90^\circ$ between consecutive views. Unlike existing global methods for DAQ estimation, our algorithm can optimize a normalized objective and achieves global optimality in a competitive running-time.
Similar papers:
  • Sequential Convex Relaxation for Mutual-Information-Based\\Unsupervised Figure-Ground Segmentation [pdf] - Youngwook Kee, Mohamed Souiai, Daniel Cremers, Junmo Kim
  • Efficient Computation of Relative Pose for Multi-Camera Systems [pdf] - Laurent Kneip, Hongdong Li
  • Ground Plane Estimation using a Hidden Markov Model [pdf] - Ralf Dragon, Luc Van Gool
  • On Projective Reconstruction In Arbitrary Dimensions [pdf] - Behrooz Nasihatkon, Richard Hartley, Jochen Trumpf