CS Conference Navigator

Improving discovery of relevant computer science research through visualization and clustering

ICCV 2013

Other guides: NIPS 2013, CVPR 2013, ICML 2013, NIPS 2012,
Info: maintained by, source code
Visualization of publically available papers presented at ICCV 2013

Hover over a node to see the paper title. Click on a color to only show papers connected to that cluster. Zoom and move around with normal map controls.



Papers are linked together based on TF-IDF similarity and are colored using their predicted topic index.

Toggle the topics below to sort by category. The top 10 words from each cluster are shown.

Filter current papers by keyword or author:
Compensating for Motion during Direct-Global Separation [pdf]
Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan

Abstract: Separating the direct and global components of radiance can aid shape recovery algorithms and can provide useful information about materials in a scene. Practical methods for finding the direct and global components use multiple images captured under varying illumination patterns and require the scene, light source and camera to remain sta- tionary during the image acquisition process. In this pa- per, we develop a motion compensation method that relaxes this condition and allows direct-global separation to be per- formed on video sequences of dynamic scenes captured by moving projector-camera systems. Key to our method is be- ing able to register frames in a video sequence to each other in the presence of time varying, high frequency active illu- mination patterns. We compare our motion compensated method to alternatives such as single shot separation and frame interleaving as well as ground truth. We present re- sults on challenging video sequences that include various types of motions and deformations in scenes that contain complex materials like fabric, skin, leaves and wax.
Similar papers:
  • A General Dense Image Matching Framework Combining Direct and Feature-Based Costs [pdf] - Jim Braux-Zin, Romain Dupont, Adrien Bartoli
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Towards Motion Aware Light Field Video for Dynamic Scenes [pdf] - Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
Extrinsic Camera Calibration without a Direct View Using Spherical Mirror [pdf]
Amit Agrawal

Abstract: We consider the problem of estimating the extrinsic pa- rameters (pose) of a camera with respect to a reference 3D object without a direct view. Since the camera does not view the object directly, previous approaches have utilized reflec- tions in a planar mirror to solve this problem. However, a planar mirror based approach requires a minimum of three reflections and has degenerate configurations where esti- mation fails. In this paper, we show that the pose can be obtained using a single reflection in a spherical mirror of known radius. This makes our approach simpler and easier in practice. In addition, unlike planar mirrors, the spher- ical mirror based approach does not have any degenerate configurations, leading to a robust algorithm. While a planar mirror reflection results in a virtual per- spective camera, a spherical mirror reflection results in a non-perspective axial camera. The axial nature of rays al- lows us to compute the axis (direction of sphere center) and few pose parameters in a linear fashion. We then derive an analytical solution to obtain the distance to the sphere cen- ter and remaining pose parameters and show that it corre- sponds to solving a 16th degree equation. We present com- parisons with a recent method that use planar mirrors and show that our approach recovers more accurate pose in the presence of noise. Extensive simulations and results on real data validate our algorithm.
Similar papers:
  • Refractive Structure-from-Motion on Underwater Images [pdf] - Anne Jordt-Sedlazeck, Reinhard Koch
  • Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [pdf] - Pierre Moulon, Pascal Monasse, Renaud Marlet
  • Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach [pdf] - R. Melo, M. Antunes, J.P. Barreto, G. Falcao, N. Goncalves
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf]
Dror Aiger, Efi Kokiopoulou, Ehud Rivlin

Abstract: We propose two solutions for both nearest neigh- bors and range search problems. For the nearest neighbors problem, we propose a c-approximate so- lution for the restricted version of the decision prob- lem with bounded radius which is then reduced to the nearest neighbors by a known reduction. For range searching we propose a scheme that learns the parameters in a learning stage adopting them to the case of a set of points with low intrinsic dimen- sion that are embedded in high dimensional space (common scenario for image point descriptors). We compare our algorithms to the best known methods for these problems, i.e. LSH, ANN and FLANN. We show analytically and experimentally that we can do better for moderate approximation factor. Our algorithms are trivial to parallelize. In the experi- ments conducted, running on couple of million im- ages, our algorithms show meaningful speed-ups when compared with the above mentioned methods.
Similar papers:
  • Fast Neighborhood Graph Search Using Cartesian Concatenation [pdf] - Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo
  • Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval [pdf] - Yannis Avrithis
  • Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval [pdf] - Cai-Zhi Zhu, Herve Jegou, Shin'Ichi Satoh
  • Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation [pdf] - Basura Fernando, Tinne Tuytelaars
  • Fast Subspace Search via Grassmannian Based Hashing [pdf] - Xu Wang, Stefan Atev, John Wright, Gilad Lerman
Pose Estimation and Segmentation of People in 3D Movies [pdf]
Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev

Abstract: We seek to obtain a pixel-wise segmentation and pose estimation of multiple people in a stereoscopic video. This involves challenges such as dealing with unconstrained stereoscopic video, non-stationary cameras, and complex indoor and outdoor dynamic scenes. The contributions of our work are two-fold: First, we develop a segmentation model incorporating person detection, pose estimation, as well as colour, motion, and disparity cues. Our new model explicitly represents depth ordering and occlusion. Second, we introduce a stereoscopic dataset with frames extracted from feature-length movies StreetDance 3D and Pina. The dataset contains 2727 realistic stereo pairs and in- cludes annotation of human poses, person bounding boxes, and pixel-wise segmentations for hundreds of people. The dataset is composed of indoor and outdoor scenes depicting multiple people with frequent occlusions. We demonstrate results on our new challenging dataset, as well as on the H2view dataset from (Sheasby et al. ACCV 2012).
Similar papers:
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
  • A Rotational Stereo Model Based on XSlit Imaging [pdf] - Jinwei Ye, Yu Ji, Jingyi Yu
  • Line Assisted Light Field Triangulation and Stereo Matching [pdf] - Zhan Yu, Xinqing Guo, Haibing Lin, Andrew Lumsdaine, Jingyi Yu
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • PM-Huber: PatchMatch with Huber Regularization for Stereo Matching [pdf] - Philipp Heise, Sebastian Klose, Brian Jensen, Alois Knoll
Measuring Flow Complexity in Videos [pdf]
Saad Ali

Abstract: In this paper a notion of flow complexity that measures the amount of interaction among objects is introduced and an approach to compute it directly from a video sequence is proposed. The approach employs particle trajectories as the input representation of motion and maps it into a braid based representation. The mapping is based on the obser- vation that 2D trajectories of particles take the form of a braid in space-time due to the intermingling among parti- cles over time. As a result of this mapping, the problem of estimating the flow complexity from particle trajectories be- comes the problem of estimating braid complexity, which in turn can be computed by measuring the topological entropy of a braid. For this purpose recently developed mathemati- cal tools from braid theory are employed which allow rapid computation of topological entropy of braids. The approach is evaluated on a dataset consisting of open source videos depicting variations in terms of types of moving objects, scene layout, camera view angle, motion patterns, and ob- ject densities. The results show that the proposed approach is able to quantify the complexity of the flow, and at the same time provides useful insights about the sources of the complexity.
Similar papers:
  • Online Motion Segmentation Using Dynamic Label Propagation [pdf] - Ali Elqursh, Ahmed Elgammal
  • Towards Understanding Action Recognition [pdf] - Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, Michael J. Black
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Action Recognition with Improved Trajectories [pdf] - Heng Wang, Cordelia Schmid
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
Handwritten Word Spotting with Corrected Attributes [pdf]
Jon Almazan, Albert Gordo, Alicia Fornes, Ernest Valveny

Abstract: We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset com- prised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length rep- resentation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query- by-example, where the query is an image, and query-by- string, where the query is a string. We also propose a cal- ibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.
Similar papers:
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
  • Recognizing Text with Perspective Distortion in Natural Scenes [pdf] - Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Calibration-Free Gaze Estimation Using Human Gaze Patterns [pdf]
Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab

Abstract: We present a novel method to auto-calibrate gaze esti- mators based on gaze patterns obtained from other viewers. Our method is based on the observation that the gaze pat- terns of humans are indicative of where a new viewer will look at [12]. When a new viewer is looking at a stimu- lus, we first estimate a topology of gaze points (initial gaze points). Next, these points are transformed so that they match the gaze patterns of other humans to find the correct gaze points. In a flexible uncalibrated setup with a web camera and no chin rest, the proposed method was tested on ten sub- jects and ten images. The method estimates the gaze points after looking at a stimulus for a few seconds with an aver- age accuracy of 4.3. Although the reported performance is lower than what could be achieved with dedicated hard- ware or calibrated setup, the proposed method still provides a sufficient accuracy to trace the viewer attention. This is promising considering the fact that auto-calibration is done in a flexible setup , without the use of a chin rest, and based only on a few seconds of gaze initialization data. To the best of our knowledge, this is the first work to use human gaze patterns in order to auto-calibrate gaze estimators.
Similar papers:
  • Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics [pdf] - Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin, Thierry Dutoit
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Predicting Primary Gaze Behavior Using Social Saliency Fields [pdf] - Hyun Soo Park, Eakta Jain, Yaser Sheikh
  • Semantically-Based Human Scanpath Estimation with HMMs [pdf] - Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin
  • Learning to Predict Gaze in Egocentric Video [pdf] - Yin Li, Alireza Fathi, James M. Rehg
Monte Carlo Tree Search for Scheduling Activity Recognition [pdf]
Mohamed R. Amer, Sinisa Todorovic, Alan Fern, Song-Chun Zhu

Abstract: This paper presents an efficient approach to video pars- ing. Our videos show a number of co-occurring individ- ual and group activities. To address challenges of the do- main, we use an expressive spatiotemporal AND-OR graph (ST-AOG) that jointly models activity parts, their spatiotem- poral relations, and context, as well as enables multitarget tracking. The standard ST-AOG inference is prohibitively expensive in our setting, since it would require running a multitude of detectors, and tracking their detections in a long video footage. This problem is addressed by for- mulating a cost-sensitive inference of ST-AOG as Monte Carlo Tree Search (MCTS). For querying an activity in the video, MCTS optimally schedules a sequence of detectors and trackers to be run, and where they should be applied in the space-time volume. Evaluation on the benchmark datasets demonstrates that MCTS enables two-magnitude speed-ups without compromising accuracy relative to the standard cost-insensitive inference.
Similar papers:
  • From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding [pdf] - Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
  • YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition [pdf] - Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
Allocentric Pose Estimation [pdf]
M. Jose Antonio, Luc De_Raedt, Tinne Tuytelaars

Abstract: The task of object pose estimation has been a challenge since the early days of computer vision. To estimate the pose (or viewpoint) of an object, people have mostly looked at object intrinsic features, such as shape or appearance. Surprisingly, informative features provided by other, exter- nal elements in the scene, have so far mostly been ignored. At the same time, contextual cues have been shown to be of great benefit for related tasks such as object detection or action recognition. In this paper, we explore how in- formation from other objects in the scene can be exploited for pose estimation. In particular, we look at object con- figurations. We show that, starting from noisy object de- tections and pose estimates, exploiting the estimated pose and location of other objects in the scene can help to esti- mate the objects poses more accurately. We explore both a camera-centered as well as an object-centered represen- tation for relations. Experiments on the challenging KITTI dataset show that object configurations can indeed be used as a complementary cue to appearance-based pose estima- tion. In addition, object-centered relational representations can also assist object detection.
Similar papers:
  • Discovering Object Functionality [pdf] - Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Parsing IKEA Objects: Fine Pose Estimation [pdf] - Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees [pdf]
Oisin Mac Aodha, Gabriel J. Brostow

Abstract: Typical approaches to classification treat class labels as disjoint. For each training example, it is assumed that there is only one class label that correctly describes it, and that all other labels are equally bad. We know however, that good and bad labels are too simplistic in many scenarios, hurting accuracy. In the realm of example dependent cost- sensitive learning, each label is instead a vector represent- ing a data points affinity for each of the classes. At test time, our goal is not to minimize the misclassification rate, but to maximize that affinity. We propose a novel exam- ple dependent cost-sensitive impurity measure for decision trees. Our experiments show that this new impurity measure improves test performance while still retaining the fast test times of standard classification trees. We compare our ap- proach to classification trees and other cost-sensitive meth- ods on three computer vision problems, tracking, descriptor matching, and optical flow, and show improvements in all three domains.
Similar papers:
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • Unsupervised Random Forest Manifold Alignment for Lipreading [pdf] - Yuru Pei, Tae-Kyun Kim, Hongbin Zha
  • Random Forests of Local Experts for Pedestrian Detection [pdf] - Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores, Bastian Leibe
  • Latent Task Adaptation with Large-Scale Hierarchies [pdf] - Yangqing Jia, Trevor Darrell
  • Alternating Regression Forests for Object Detection and Pose Estimation [pdf] - Samuel Schulter, Christian Leistner, Paul Wohlhart, Peter M. Roth, Horst Bischof
Higher Order Matching for Consistent Multiple Target Tracking [pdf]
Chetan Arora, Amir Globerson

Abstract: This paper addresses the data assignment problem in multi frame multi object tracking in video sequences. Traditional methods employing maximum weight bipar- tite matching offer limited temporal modeling. It has re- cently been shown [6, 8, 24] that incorporating higher or- der temporal constraints improves the assignment solution. Finding maximum weight matching with higher order con- straints is however NP-hard and the solutions proposed un- til now have either been greedy [8] or rely on greedy round- ing of the solution obtained from spectral techniques [15]. We propose a novel algorithm to find the approximate solu- tion to data assignment problem with higher order temporal constraints using the method of dual decomposition and the MPLP message passing algorithm [21]. We compare the proposed algorithm with an implementation of [8] and [15] and show that proposed technique provides better solution with a bound on approximation factor for each inferred so- lution.
Similar papers:
  • Elastic Net Constraints for Shape Matching [pdf] - Emanuele Rodola, Andrea Torsello, Tatsuya Harada, Yasuo Kuniyoshi, Daniel Cremers
  • Joint Optimization for Consistent Multiple Graph Matching [pdf] - Junchi Yan, Yu Tian, Hongyuan Zha, Xiaokang Yang, Ya Zhang, Stephen M. Chu
  • Orderless Tracking through Model-Averaged Posterior Estimation [pdf] - Seunghoon Hong, Suha Kwak, Bohyung Han
  • Conservation Tracking [pdf] - Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
  • Shufflets: Shared Mid-level Parts for Fast Object Detection [pdf] - Iasonas Kokkinos
Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval [pdf]
Yannis Avrithis

Abstract: Inspired by the close relation between nearest neighbor search and clustering in high-dimensional spaces as well as the success of one helping to solve the other, we introduce a new paradigm where both problems are solved simultane- ously. Our solution is recursive, not in the size of input data but in the number of dimensions. One result is a cluster- ing algorithm that is tuned to small codebooks but does not need all data in memory at the same time and is practically constant in the data size. As a by-product, a tree struc- ture performs either exact or approximate quantization on trained centroids, the latter being not very precise but ex- tremely fast. A lesser contribution is a new indexing scheme for image retrieval that exploits multiple small codebooks to provide an arbitrarily fine partition of the descriptor space. Large scale experiments on public datasets exhibit state of the art performance and remarkable generalization.
Similar papers:
  • What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? [pdf] - Masakazu Iwamura, Tomokazu Sato, Koichi Kise
  • Joint Inverted Indexing [pdf] - Yan Xia, Kaiming He, Fang Wen, Jian Sun
  • Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors [pdf] - Nakamasa Inoue, Koichi Shinoda
  • Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf] - Dror Aiger, Efi Kokiopoulou, Ehud Rivlin
  • Fast Neighborhood Graph Search Using Cartesian Concatenation [pdf] - Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo
Finding Causal Interactions in Video Sequences [pdf]
Mustafa Ayazoglu, Burak Yilmaz, Mario Sznaier, Octavia Camps

Abstract: This paper considers the problem of detecting causal in- teractions in video clips. Specifically, the goal is to detect whether the actions of a given target can be explained in terms of the past actions of a collection of other agents. We propose to solve this problem by recasting it into a directed graph topology identification, where each node corresponds to the observed motion of a given target, and each link in- dicates the presence of a causal correlation. As shown in the paper, this leads to a block-sparsification problem that can be efficiently solved using a modified Group-Lasso type approach, capable of handling missing data and outliers (due for instance to occlusion and mis-identified correspon- dences). Moreover, this approach also identifies time in- stants where the interactions between agents change, thus providing event detection capabilities. These results are il- lustrated with several examples involving nontrivial inter- actions amongst several human subjects.
Similar papers:
  • GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity [pdf] - Jia Xu, Vamsi K. Ithapu, Lopamudra Mukherjee, James M. Rehg, Vikas Singh
  • New Graph Structured Sparsity Model for Multi-label Image Annotations [pdf] - Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
  • Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf] - Jiajia Luo, Wei Wang, Hairong Qi
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf] - Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
Randomized Ensemble Tracking [pdf]
Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, Camille Monnier

Abstract: We propose a randomized ensemble algorithm to model the time-varying appearance of an object for visual track- ing. In contrast with previous online methods for updat- ing classifier ensembles in tracking-by-detection, the weight vector that combines weak classifiers is treated as a ran- dom variable and the posterior distribution for the weight vector is estimated in a Bayesian manner. In essence, the weight vector is treated as a distribution that reflects the confidence among the weak classifiers used to construct and adapt the classifier ensemble. The resulting formula- tion models the time-varying discriminative ability among weak classifiers so that the ensembled strong classifier can adapt to the varying appearance, backgrounds, and occlu- sions. The formulation is tested in a tracking-by-detection implementation. Experiments on 28 challenging benchmark videos demonstrate that the proposed method can achieve results comparable to and often better than those of state- of-the-art approaches.
Similar papers:
  • Regionlets for Generic Object Detection [pdf] - Xiaoyu Wang, Ming Yang, Shenghuo Zhu, Yuanqing Lin
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
  • Online Robust Non-negative Dictionary Learning for Visual Tracking [pdf] - Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
  • Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve [pdf] - Sakrapee Paisitkriangkrai, Chunhua Shen, Anton Van Den Hengel
  • Handling Occlusions with Franken-Classifiers [pdf] - Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool
Unsupervised Domain Adaptation by Domain Invariant Projection [pdf]
Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann

Abstract: Domain-invariant representations are key to addressing the domain shift problem where the training and test exam- ples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be di- rectly suitable for such a comparison, since some of the fea- tures may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and tar- get domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a stan- dard domain adaptation benchmark dataset.
Similar papers:
  • Transfer Feature Learning with Joint Distribution Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, Philip S. Yu
  • Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information [pdf] - Andy J. Ma, Pong C. Yuen, Jiawei Li
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Unsupervised Visual Domain Adaptation Using Subspace Alignment [pdf] - Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
Space-Time Robust Representation for Action Recognition [pdf]
Nicolas Ballas, Yi Yang, Zhen-Zhong Lan, Bertrand Delezoide, Francoise Preteux, Alexander Hauptmann

Abstract: We address the problem of action recognition in uncon- strained videos. We propose a novel content driven pool- ing that leverages space-time context while being robust to- ward global space-time transformations. Being robust to such transformations is of primary importance in uncon- strained videos where the action localizations can drasti- cally shift between frames. Our pooling identifies regions of interest using video structural cues estimated by differ- ent saliency functions. To combine the different structural information, we introduce an iterative structure learning al- gorithm, WSVM (weighted SVM), that determines the opti- mal saliency layout of an action model through a sparse reg- ularizer. A new optimization method is proposed to solve the WSVM highly non-smooth objective function. We evaluate our approach on standard action datasets (KTH, UCF50 and HMDB). Most noticeably, the accuracy of our algo- rithm reaches 51.8% on the challenging HMDB dataset which outperforms the state-of-the-art of 7.3% relatively.
Similar papers:
  • Saliency Detection via Dense and Sparse Reconstruction [pdf] - Xiaohui Li, Huchuan Lu, Lihe Zhang, Xiang Ruan, Ming-Hsuan Yang
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Active Learning of an Action Detector from Untrimmed Videos [pdf]
Sunil Bandla, Kristen Grauman

Abstract: Collecting and annotating videos of realistic human ac- tions is tedious, yet critical for training action recognition systems. We propose a method to actively request the most useful video annotations among a large set of unlabeled videos. Predicting the utility of annotating unlabeled video is not trivial, since any given clip may contain multiple ac- tions of interest, and it need not be trimmed to temporal regions of interest. To deal with this problem, we propose a detection-based active learner to train action category models. We develop a voting-based framework to local- ize likely intervals of interest in an unlabeled clip, and use them to estimate the total reduction in uncertainty that an- notating that clip would yield. On three datasets, we show our approach can learn accurate action detectors more effi- ciently than alternative active learning strategies that fail to accommodate the untrimmed nature of real video data.
Similar papers:
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding [pdf] - Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf]
Chenglong Bao, Jian-Feng Cai, Hui Ji

Abstract: In recent years, how to learn a dictionary from input im- ages for sparse modelling has been one very active topic in image processing and recognition. Most existing dic- tionary learning methods consider an over-complete dic- tionary, e.g. the K-SVD method. Often they require solv- ing some minimization problem that is very challenging in terms of computational feasibility and efficiency. How- ever, if the correlations among dictionary atoms are not well constrained, the redundancy of the dictionary does not necessarily improve the performance of sparse coding. This paper proposed a fast orthogonal dictionary learning method for sparse image representation. With comparable performance on several image restoration tasks, the pro- posed method is much more computationally efficient than the over-complete dictionary based learning methods.
Similar papers:
  • Anchored Neighborhood Regression for Fast Example-Based Super-Resolution [pdf] - Radu Timofte, Vincent De_Smet, Luc Van_Gool
  • Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition [pdf] - Hans Lobel, Rene Vidal, Alvaro Soto
  • Robust Dictionary Learning by Error Source Decomposition [pdf] - Zhuoyuan Chen, Ying Wu
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
Fast High Dimensional Vector Multiplication Face Recognition [pdf]
Oren Barkan, Jonathan Weill, Lior Wolf, Hagai Aronowitz

Abstract: This paper advances descriptor-based face recognition by suggesting a novel usage of descriptors to form an over-complete representation, and by proposing a new metric learning pipeline within the same/not-same framework. First, the Over-Complete Local Binary Patterns (OCLBP) face representation scheme is introduced as a multi-scale modified version of the Local Binary Patterns (LBP) scheme. Second, we propose an efficient matrix-vector multiplication-based recognition system. The system is based on Linear Discriminant Analysis (LDA) coupled with Within Class Covariance Normalization (WCCN). This is further extended to the unsupervised case by proposing an unsupervised variant of WCCN. Lastly, we introduce Diffusion Maps (DM) for non-linear dimensionality reduction as an alternative to the Whitened Principal Component Analysis (WPCA) method which is often used in face recognition. We evaluate the proposed framework on the LFW face recognition dataset under the restricted, unrestricted and unsupervised protocols. In all three cases we achieve very competitive results.
Similar papers:
  • A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-Dimensional Visual Data [pdf] - Lingqiao Liu, Lei Wang
  • Deep Learning Identity-Preserving Face Space [pdf] - Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Face Recognition via Archetype Hull Ranking [pdf] - Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
Volumetric Semantic Segmentation Using Pyramid Context Features [pdf]
Jonathan T. Barron, Mark D. Biggin, Pablo Arbelaez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik

Abstract: We present an algorithm for the per-voxel semantic seg- mentation of a three-dimensional volume. At the core of our algorithm is a novel pyramid context feature, a de- scriptive representation designed such that exact per-voxel linear classification can be made extremely efficient. This feature not only allows for efficient semantic segmentation but enables other aspects of our algorithm, such as novel learned features and a stacked architecture that can reason about self-consistency. We demonstrate our technique on 3D fluorescence microscopy data of Drosophila embryos for which we are able to produce extremely accurate semantic segmentations in a matter of minutes, and for which other algorithms fail due to the size and high-dimensionality of the data, or due to the difficulty of the task.
Similar papers:
  • A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis [pdf] - Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jimenez Cardenas, Thomas Brox, Bernt Schiele
  • Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation [pdf] - Michael Maire, Stella X. Yu
  • Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks [pdf] - Mojtaba Seyedhosseini, Mehdi Sajjadi, Tolga Tasdizen
  • Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology [pdf] - Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin
  • Pyramid Coding for Functional Scene Element Recognition in Video Scenes [pdf] - Eran Swears, Anthony Hoogs, Kim Boyer
A Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration [pdf]
Adrien Bartoli, Daniel Pizarro, Toby Collins

Abstract: We study the uncalibrated isometric Shape-from- Template problem, that consists in estimating an isometric deformation from a template shape to an input image whose focal length is unknown. Our method is the first that combines the following fea- tures: solving for both the 3D deformation and the cam- eras focal length, involving only local analytical solutions (there is no numerical optimization), being robust to mis- matches, handling general surfaces and running extremely fast. This was achieved through two key steps. First, an un- calibrated 3D deformation is computed thanks to a novel piecewise weak-perspective projection model. Second, the cameras focal length is estimated and enables upgrading the 3D deformation to metric. We use a variational frame- work, implemented using a smooth function basis and sam- pled local deformation models. The only degeneracy which we easily detect for focal length estimation is a flat and fronto-parallel surface. Experimental results on simulated and real datasets show that our method achieves a 3D shape accuracy slightly below state of the art methods using a precalibrated or the true focal length, and a focal length accuracy slightly below static calibration methods.
Similar papers:
  • A Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach [pdf] - Yun Zeng, Chaohui Wang, Xianfeng Gu, Dimitris Samaras, Nikos Paragios
  • Hierarchical Data-Driven Descent for Efficient Optimal Deformation Estimation [pdf] - Yuandong Tian, Srinivasa G. Narasimhan
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf] - Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
  • Pose Estimation with Unknown Focal Length Using Points, Directions and Lines [pdf] - Yubin Kuang, Kalle Astrom
How Do You Tell a Blackbird from a Crow? [pdf]
Thomas Berg, Peter N. Belhumeur

Abstract: How do you tell a blackbird from a crow? There has been great progress toward automatic methods for visual recog- nition, including fine-grained visual categorization in which the classes to be distinguished are very similar. In a task such as bird species recognition, automatic recognition sys- tems can now exceed the performance of non-experts most people are challenged to name a couple dozen bird species, let alone identify them. This leads us to the question, Can a recognition system show humans what to look for when identifying classes (in this case birds)? In the context of fine-grained visual categorization, we show that we can au- tomatically determine which classes are most visually sim- ilar, discover what visual features distinguish very similar classes, and illustrate the key features in a way meaning- ful to humans. Running these methods on a dataset of bird images, we can generate a visual field guide to birds which includes a tree of similarity that displays the similarity re- lations between all species, pages for each species showing the most similar other species, and pages for each pair of similar species illustrating their differences.
Similar papers:
  • Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction [pdf] - Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
  • Quadruplet-Wise Image Similarity Learning [pdf] - Marc T. Law, Nicolas Thome, Matthieu Cord
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Fine-Grained Categorization by Alignments [pdf] - E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
  • Symbiotic Segmentation and Part Localization for Fine-Grained Categorization [pdf] - Yuning Chai, Victor Lempitsky, Andrew Zisserman
PhotoOCR: Reading Text in Uncontrolled Conditions [pdf]
Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven

Abstract: We describe PhotoOCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commer- cially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification; we build on this progress by demonstrating a complete OCR system using these tech- niques. We also incorporate modern datacenter-scale dis- tributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging con- ditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency; mean processing time is 600 ms per im- age. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple bench- marks. The system is currently in use in many applica- tions at Google, and is available as a user input modality in Google Translate for Android.
Similar papers:
  • Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors [pdf] - Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
  • From Where and How to What We See [pdf] - S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
  • Scene Text Localization and Recognition with Oriented Stroke Detection [pdf] - Lukas Neumann, Jiri Matas
  • Recognizing Text with Perspective Distortion in Natural Scenes [pdf] - Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
Finding Actors and Actions in Movies [pdf]
P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic

Abstract: We address the problem of learning a joint model of actors and actions in movies using weak supervision pro- vided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discrimi- native clustering framework. The corresponding optimiza- tion problem is formulated as a quadratic program under linear constraints. People in video are represented by au- tomatically extracted and tracked faces together with cor- responding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recog- nizing characters and their actions in feature length movies Casablanca and American Beauty.
Similar papers:
  • Action and Event Recognition with Fisher Vectors on a Compact Feature Set [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
  • Latent Multitask Learning for View-Invariant Action Recognition [pdf] - Behrooz Mahasseni, Sinisa Todorovic
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf]
Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti

Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confound- ing factors. In this study, we pursue a critical and quanti- tative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accu- racy. We quantitatively compare 32 state-of-the-art mod- els (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, al- though model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fix- ation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last bench- mark, for the first time, gauges the ability of models to de- code the stimulus category from statistics of fixations, sac- cades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.
Similar papers:
  • Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics [pdf] - Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin, Thierry Dutoit
  • Saliency Detection: A Boolean Map Approach [pdf] - Jianming Zhang, Stan Sclaroff
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Event Recognition in Photo Collections with a Stopwatch HMM [pdf]
Lukas Bossard, Matthieu Guillaumin, Luc Van_Gool

Abstract: The task of recognizing events in photo collections is central for automatically organizing images. It is also very challenging, because of the ambiguity of photos across dif- ferent event classes and because many photos do not con- vey enough relevant information. Unfortunately, the field still lacks standard evaluation data sets to allow compar- ison of different approaches. In this paper, we introduce and release a novel data set of personal photo collections containing more than 61,000 images in 807 collections, an- notated with 14 diverse social event classes. Casting collections as sequential data, we build upon re- cent and state-of-the-art work in event recognition in videos to propose a latent sub-event approach for event recogni- tion in photo collections. However, photos in collections are sparsely sampled over time and come in bursts from which transpires the importance of specific moments for the pho- tographers. Thus, we adapt a discriminative hidden Markov model to allow the transitions between states to be a func- tion of the time gap between consecutive images, which we coin as Stopwatch Hidden Markov model (SHMM). In our experiments, we show that our proposed model outperforms approaches based only on feature pooling or a classical hidden Markov model. With an average accuracy of 56%, we also highlight the difficulty of the data set and the need for future advances in event recognition in photo collections.
Similar papers:
  • Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach [pdf] - Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim
  • Dynamic Pooling for Complex Event Recognition [pdf] - Weixin Li, Qian Yu, Ajay Divakaran, Nuno Vasconcelos
  • How Related Exemplars Help Complex Event Detection in Web Videos? [pdf] - Yi Yang, Zhigang Ma, Zhongwen Xu, Shuicheng Yan, Alexander G. Hauptmann
  • Event Detection in Complex Scenes Using Interval Temporal Constraints [pdf] - Yifan Zhang, Qiang Ji, Hanqing Lu
  • Modeling 4D Human-Object Interactions for Event and Object Recognition [pdf] - Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
Estimating the Material Properties of Fabric from Video [pdf]
Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman

Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is es- sential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under var- ious unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We ex- tend features previously developed to compactly represent static image textures to describe video textures, such as fab- ric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database of fabric videos with cor- responding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predict- ing the material properties of fabric from a video, and (c) a perceptual study of humans ability to estimate the material properties of fabric from videos and images.
Similar papers:
  • Action Recognition with Improved Trajectories [pdf] - Heng Wang, Cordelia Schmid
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Optimal Orthogonal Basis and Image Assimilation: Motion Modeling [pdf] - Etienne Huot, Giuseppe Papari, Isabelle Herlin
  • A Simple Model for Intrinsic Image Decomposition with Depth Cues [pdf] - Qifeng Chen, Vladlen Koltun
  • Matching Dry to Wet Materials [pdf] - Yaser Yacoob
Local Signal Equalization for Correspondence Matching [pdf]
Derek Bradley, Thabo Beeler

Abstract: Correspondence matching is one of the most common problems in computer vision, and it is often solved using photo-consistency of local regions. These approaches typ- ically assume that the frequency content in the local re- gion is consistent in the image pair, such that matching is performed on similar signals. However, in many practical situations this is not the case, for example with low depth of field cameras a scene point may be out of focus in one view and in-focus in the other, causing a mismatch of fre- quency signals. Furthermore, this mismatch can vary spa- tially over the entire image. In this paper we propose a local signal equalization approach for correspondence matching. Using a measure of local image frequency, we equalize lo- cal signals using an efficient scale-space image representa- tion such that their frequency contents are optimally suited for matching. Our approach allows better correspondence matching, which we demonstrate with a number of stereo reconstruction examples on synthetic and real datasets.
Similar papers:
  • Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length [pdf] - Nicolas Martin, Vincent Couture, Sebastien Roy
  • Multi-channel Correlation Filters [pdf] - Hamed Kiani Galoogahi, Terence Sim, Simon Lucey
  • Depth from Combining Defocus and Correspondence Using Light-Field Cameras [pdf] - Michael W. Tao, Sunil Hadap, Jitendra Malik, Ravi Ramamoorthi
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
  • A Rotational Stereo Model Based on XSlit Imaging [pdf] - Jinwei Ye, Yu Ji, Jingyi Yu
Bayesian 3D Tracking from Monocular Video [pdf]
Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard

Abstract: We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cam- eras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multi- target tracking must address the fact that the models di- mension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not compa- rable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associa- tions has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are com- parable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.
Similar papers:
  • The Way They Move: Tracking Multiple Targets with Similar Appearance [pdf] - Caglayan Dicle, Octavia I. Camps, Mario Sznaier
  • Topology-Constrained Layered Tracking with Latent Flow [pdf] - Jason Chang, John W. Fisher_III
  • Orderless Tracking through Model-Averaged Posterior Estimation [pdf] - Seunghoon Hong, Suha Kwak, Bohyung Han
  • Conservation Tracking [pdf] - Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
  • Latent Data Association: Bayesian Model Selection for Multi-target Tracking [pdf] - Aleksandr V. Segal, Ian Reid
A General Dense Image Matching Framework Combining Direct and Feature-Based Costs [pdf]
Jim Braux-Zin, Romain Dupont, Adrien Bartoli

Abstract: Dense motion field estimation (typically optical flow, stereo disparity and surface registration) is a key computer vision problem. Many solutions have been proposed to com- pute small or large displacements, narrow or wide baseline stereo disparity, but a unified methodology is still lacking. We here introduce a general framework that robustly com- bines direct and feature-based matching. The feature-based cost is built around a novel robust distance function that handles keypoints and weak features such as segments. It allows us to use putative feature matches which may con- tain mismatches to guide dense motion estimation out of local minima. Our framework uses a robust direct data term (AD-Census). It is implemented with a powerful second or- der Total Generalized Variation regularization with external and self-occlusion reasoning. Our framework achieves state of the art performance in several cases (standard optical flow benchmarks, wide-baseline stereo and non-rigid sur- face registration). Our framework has a modular design that customizes to specific application needs. Introduction A dense motion field, also called optical flow, is a very useful cue for problems such as tracking, segmentation, local- ization and reconstruction, or non-rigid surfaces registration. Optical flow estimation is an old computer vision problem. While early techniques were patch-based [19], current ones estimate dense flow fields with variational methods built upon the work by Horn and Schu
Similar papers:
  • Action Recognition with Improved Trajectories [pdf] - Heng Wang, Cordelia Schmid
  • Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation [pdf] - Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • DeepFlow: Large Displacement Optical Flow with Deep Matching [pdf] - Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Robust Face Landmark Estimation under Occlusion [pdf]
Xavier P. Burgos-Artizzu, Pietro Perona, Piotr Dollar

Abstract: Human faces captured in real-world conditions present large variations in shape and occlusions due to differences in pose, expression, use of accessories such as sunglasses and hats and interactions with objects (e.g. food). Current face landmark estimation approaches struggle under such conditions since they fail to provide a principled way of han- dling outliers. We propose a novel method, called Robust Cascaded Pose Regression (RCPR) which reduces exposure to outliers by detecting occlusions explicitly and using ro- bust shape-indexed features. We show that RCPR improves on previous landmark estimation methods on three popu- lar face datasets (LFPW, LFW and HELEN). We further explore RCPRs performance by introducing a novel face dataset focused on occlusion, composed of 1,007 faces pre- senting a wide range of occlusion patterns. RCPR reduces failure cases by half on all four datasets, at the same time as it detects face occlusions with a 80/40% precision/recall.
Similar papers:
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Exemplar-Based Graph Matching for Robust Facial Landmark Localization [pdf] - Feng Zhou, Jonathan Brandt, Zhe Lin
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model [pdf] - Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, Dimitris N. Metaxas
Nested Shape Descriptors [pdf]
Jeffrey Byrne, Jianbo Shi

Abstract: In this paper, we propose a new family of binary local feature descriptors called nested shape descriptors. These descriptors are constructed by pooling oriented gradients over a large geometric structure called the Hawaiian ear- ring, which is constructed with a nested correlation struc- ture that enables a new robust local distance function called the nesting distance. This distance function is unique to the nested descriptor and provides robustness to outliers from order statistics. In this paper, we define the nested shape descriptor family and introduce a specific member called the seed-of-life descriptor. We perform a trade study to de- termine optimal descriptor parameters for the task of im- age matching. Finally, we evaluate performance compared to state-of-the-art local feature descriptors on the VGG- Affine image matching benchmark, showing significant per- formance gains. Our descriptor is the first binary descriptor to outperform SIFT on this benchmark.
Similar papers:
  • BOLD Features to Detect Texture-less Objects [pdf] - Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
  • To Aggregate or Not to aggregate: Selective Match Kernels for Image Search [pdf] - Giorgos Tolias, Yannis Avrithis, Herve Jegou
  • SIFTpack: A Compact Representation for Efficient SIFT Matching [pdf] - Alexandra Gilinsky, Lihi Zelnik Manor
  • Shape Index Descriptors Applied to Texture-Based Galaxy Analysis [pdf] - Kim Steenstrup Pedersen, Kristoffer Stensbo-Smidt, Andrew Zirm, Christian Igel
  • An Adaptive Descriptor Design for Object Recognition in the Wild [pdf] - Zhenyu Guo, Z. Jane Wang
Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition [pdf]
Ricardo Cabral, Fernando De_La_Torre, Joao P. Costeira, Alexandre Bernardino

Abstract: Bilinear factorization: minf(XUV) U,V Nuclear norm regularization: minf(XZ)+Z Z Variational definition of the nuclear norm: Z= min 1U2+V2 Z=UV 2 F F Unified model: minf(XUV)+U2 +V2 U,V 2FF Low rank models have been widely used for the represen- tation of shape, appearance or motion in computer vision problems. Traditional approaches to fit low rank models make use of an explicit bilinear factorization. These ap- proaches benefit from fast numerical methods for optimiza- tion and easy kernelization. However, they suffer from seri- ous local minima problems depending on the loss function and the amount/type of missing data. Recently, these low- rank models have alternatively been formulated as convex problems using the nuclear norm regularizer; unlike factor- ization methods, their numerical solvers are slow and it is unclear how to kernelize them or to impose a rank a priori. This paper proposes a unified approach to bilinear fac- torization and nuclear norm regularization, that inherits the benefits of both. We analyze the conditions under which these approaches are equivalent. Moreover, based on this analysis, we propose a new optimization algorithm and a rank continuation strategy that outperform state-of-the- art approaches for Robust PCA, Structure from Motion and Photometric Stereo with outliers and missing data.
Similar papers:
  • Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf] - Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim
  • Non-convex P-Norm Projection for Robust Sparsity [pdf] - Mithun Das Gupta, Sanjeev Kumar
  • Bayesian Robust Matrix Factorization for Image and Video Processing [pdf] - Naiyan Wang, Dit-Yan Yeung
  • Robust Matrix Factorization with Unknown Noise [pdf] - Deyu Meng, Fernando De_La_Torre
  • Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision [pdf] - Tae-Hyun Oh, Hyeongwoo Kim, Yu-Wing Tai, Jean-Charles Bazin, In So Kweon
Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model [pdf]
Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang

Abstract: Automatic image categorization has become increas- ingly important with the development of Internet and the growth in the size of image databases. Although the im- age categorization can be formulated as a typical multi- class classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the predic- tion performance, obtaining the image labels is a time con- suming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different fea- tures describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of fea- ture as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multi- modal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultane- ously.
Similar papers:
  • Ensemble Projection for Semi-supervised Image Classification [pdf] - Dengxin Dai, Luc Van_Gool
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Learning CRFs for Image Parsing with Adaptive Subgradient Descent [pdf] - Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan
  • New Graph Structured Sparsity Model for Multi-label Image Annotations [pdf] - Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
New Graph Structured Sparsity Model for Multi-label Image Annotations [pdf]
Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang

Abstract: In multi-label image annotations, because each image is associated to multiple categories, the semantic terms (la- bel classes) are not mutually exclusive. Previous research showed that such label correlations can largely boost the annotation accuracy. However, all existing methods only directly apply the label correlation matrix to enhance the label inference and assignment without further learning the structural information among classes. In this paper, we model the label correlations using the relational graph, and propose a novel graph structured sparse learning model to incorporate the topological constraints of relation graph in multi-label classifications. As a result, our new method will capture and utilize the hidden class structures in relational graph to improve the annotation results. In proposed objec- tive, a large number of structured sparsity-inducing norm- s are utilized, thus the optimization becomes difficult. To solve this problem, we derive an efficient optimization algo- rithm with proved convergence. We perform extensive ex- periments on six multi-label image annotation benchmark data sets. In all empirical results, our new method shows better annotation results than the state-of-the-art approach- es.
Similar papers:
  • Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation [pdf] - Suyog Dutt Jain, Kristen Grauman
  • Joint Optimization for Consistent Multiple Graph Matching [pdf] - Junchi Yan, Yu Tian, Hongyuan Zha, Xiaokang Yang, Ya Zhang, Stephen M. Chu
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
  • Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model [pdf] - Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
  • Learning Graphs to Match [pdf] - Minsu Cho, Karteek Alahari, Jean Ponce
An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points [pdf]
Lilian Calvet, Pierre Gurdjos

Abstract: This work aims at introducing a new unified Structure- from-Motion (SfM) paradigm in which images of circular point-pairs can be combined with images of natural points. An imaged circular point-pair encodes the 2D Euclidean structure of a world plane and can easily be derived from the image of a planar shape, especially those including cir- cles. A classical SfM method generally runs two steps: first a projective factorization of all matched image points (into projective cameras and points) and second a camera self- calibration that updates the obtained world from projective to Euclidean. This work shows how to introduce images of circular points in these two SfM steps while its key contri- bution is to provide the theoretical foundations for combin- ing classical linear self-calibration constraints with ad- ditional ones derived from such images. We show that the two proposed SfM steps clearly contribute to better results than the classical approach. We validate our contributions on synthetic and real images.
Similar papers:
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Pose Estimation with Unknown Focal Length Using Points, Directions and Lines [pdf] - Yubin Kuang, Kalle Astrom
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • A Flexible Scene Representation for 3D Reconstruction Using an RGB-D Camera [pdf] - Diego Thomas, Akihiro Sugimoto
  • Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf] - Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim
A Practical Transfer Learning Algorithm for Face Verification [pdf]
Xudong Cao, David Wipf, Fang Wen, Genquan Duan, Jian Sun

Abstract: Face verification involves determining whether a pair of facial images belongs to the same or different subjects. This problem can prove to be quite challenging in many im- portant applications where labeled training data is scarce, e.g., family album photo organization software. Herein we propose a principled transfer learning approach for merg- ing plentiful source-domain data with limited samples from some target domain of interest to create a classifier that ide- ally performs nearly as well as if rich target-domain data were present. Based upon a surprisingly simple generative Bayesian model, our approach combines a KL-divergence- based regularizer/prior with a robust likelihood function leading to a scalable implementation via the EM algorithm. As justification for our design choices, we later use prin- ciples from convex analysis to recast our algorithm as an equivalent structured rank minimization problem leading to a number of interesting insights related to solution struc- ture and feature-transform invariance. These insights help to both explain the effectiveness of our algorithm as well as elucidate a wide variety of related Bayesian approaches. Experimental testing with challenging datasets validate the utility of the proposed algorithm.
Similar papers:
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Transfer Feature Learning with Joint Distribution Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, Philip S. Yu
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
Similarity Metric Learning for Face Recognition [pdf]
Qiong Cao, Yiming Ying, Peng Li

Abstract: Recently, there is a considerable amount of efforts de- voted to the problem of unconstrained face verification, where the task is to predict whether pairs of images are from the same person or not. This problem is challenging and difficult due to the large variations in face images. In this paper, we develop a novel regularization framework to learn similarity metrics for unconstrained face verification. We formulate its objective function by incorporating the ro- bustness to the large intra-personal variations and the dis- criminative power of novel similarity metrics. In addition, our formulation is a convex optimization problem which guarantees the existence of its global solution. Experiments show that our proposed method achieves the state-of-the-art results on the challenging Labeled Faces in the Wild (LFW) database [10].
Similar papers:
  • Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf] - Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof
  • Fast High Dimensional Vector Multiplication Face Recognition [pdf] - Oren Barkan, Jonathan Weill, Lior Wolf, Hagai Aronowitz
  • Quadruplet-Wise Image Similarity Learning [pdf] - Marc T. Law, Nicolas Thome, Matthieu Cord
  • Face Recognition via Archetype Hull Ranking [pdf] - Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor [pdf]
Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin

Abstract: Recently, studies on sketch, such as sketch retrieval and sketch classification, have received more attention in the computer vision community. One of its most fundamental and essential problems is how to more effectively describe a sketch image. Many existing descriptors, such as shape context, have achieved great success. In this paper, we pro- pose a new descriptor, namely Symmetric-aware Flip In- variant Sketch Histogram (SYM-FISH) to refine the shape context feature. Its extraction process includes three steps. First the Flip Invariant Sketch Histogram (FISH) descrip- tor is extracted on the input image, which is a flip-invariant version of the shape context feature. Then we explore the symmetry character of the image by calculating the kurto- sis coefficient. Finally, the SYM-FISH is generated by con- structing a symmetry table. The new SYM-FISH descrip- tor supplements the original shape context by encoding the symmetric information, which is a pervasive characteristic of natural scene and objects. We evaluate the efficacy of the novel descriptor in two applications, i.e., sketch retrieval and sketch classification. Extensive experiments on three datasets well demonstrate the effectiveness and robustness of the proposed SYM-FISH descriptor.
Similar papers:
  • Learning a Dictionary of Shape Epitomes with Applications to Image Labeling [pdf] - Liang-Chieh Chen, George Papandreou, Alan L. Yuille
  • Nested Shape Descriptors [pdf] - Jeffrey Byrne, Jianbo Shi
  • Cosegmentation and Cosketch by Unsupervised Learning [pdf] - Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
  • Detecting Curved Symmetric Parts Using a Deformable Disc Model [pdf] - Tom Sie Ho Lee, Sanja Fidler, Sven Dickinson
  • 3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval [pdf] - Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu
Symbiotic Segmentation and Part Localization for Fine-Grained Categorization [pdf]
Yuning Chai, Victor Lempitsky, Andrew Zisserman

Abstract: We propose a new method for the task of fine-grained vi- sual categorization. The method builds a model of the base- level category that can be fitted to images, producing high- quality foreground segmentation and mid-level part local- izations. The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the in- stance (e.g. bird) in each image. Both segmentation and part localizations are then used to encode the image con- tent into a highly-discriminative visual signature. The model is symbiotic in that part discov- ery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (e.g. part layout). Our model builds on top of the part-based object category detector of Felzenszwalb et al., and also on the powerful GrabCut segmentation algorithm of Rother et al., and adds a simple spatial saliency coupling between them. In our evaluation, the model improves the categorization accuracy over the state-of-the-art. It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently.
Similar papers:
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Fine-Grained Categorization by Alignments [pdf] - E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
  • Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction [pdf] - Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
Rectangling Stereographic Projection for Wide-Angle Image Visualization [pdf]
Che-Han Chang, Min-Chun Hu, Wen-Huang Cheng, Yung-Yu Chuang

Abstract: This paper proposes a new projection model for map- ping a hemisphere to a plane. Such a model can be use- ful for viewing wide-angle images. Our model consists of two steps. In the first step, the hemisphere is projected onto a swung surface constructed by a circular profile and a rounded rectangular trajectory. The second step maps the projected image on the swung surface onto the image plane through the perspective projection. We also propose a method for automatically determining proper parameters for the projection model based on image content. The pro- posed model has several advantages. It is simple, efficient and easy to control. Most importantly, it makes a bet- ter compromise between distortion minimization and line preserving than popular projection models, such as stereo- graphic and Pannini projections. Experiments and analysis demonstrate the effectiveness of our model.
Similar papers:
  • Elastic Net Constraints for Shape Matching [pdf] - Emanuele Rodola, Andrea Torsello, Tatsuya Harada, Yasuo Kuniyoshi, Daniel Cremers
  • Content-Aware Rotation [pdf] - Kaiming He, Huiwen Chang, Jian Sun
  • Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach [pdf] - R. Melo, M. Antunes, J.P. Barreto, G. Falcao, N. Goncalves
  • Coherent Object Detection with 3D Geometric Context from a Single Image [pdf] - Jiyan Pan, Takeo Kanade
  • Lifting 3D Manhattan Lines from a Single Image [pdf] - Srikumar Ramalingam, Matthew Brand
Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor Histopathology [pdf]
Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin

Abstract: Image-based classification of histology sections, in terms of distinct components (e.g., tumor, stroma, normal), pro- vides a series of indices for tumor composition. Further- more, aggregation of these indices, from each whole slide image (WSI) in a large cohort, can provide predictive mod- els of the clinical outcome. However, performance of the existing techniques is hindered as a result of large technical variations and biological heterogeneities that are always present in a large cohort. We propose a system that au- tomatically learns a series of basis functions for represent- ing the underlying spatial distribution using stacked pre- dictive sparse decomposition (PSD). The learned represen- tation is then fed into the spatial pyramid matching frame- work (SPM) with a linear SVM classifier. The system has been evaluated for classification of (a) distinct histological components for two cohorts of tumor types, and (b) colony organization of normal and malignant cell lines in 3D cell culture models. Throughput has been increased through the utility of graphical processing unit (GPU), and evalu- ation indicates a superior performance results, compared with previous research.
Similar papers:
  • Constructing Adaptive Complex Cells for Robust Visual Tracking [pdf] - Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
  • Robust Dictionary Learning by Error Source Decomposition [pdf] - Zhuoyuan Chen, Ying Wu
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Low-Rank Sparse Coding for Image Classification [pdf] - Tianzhu Zhang, Bernard Ghanem, Si Liu, Changsheng Xu, Narendra Ahuja
  • Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf] - Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
Topology-Constrained Layered Tracking with Latent Flow [pdf]
Jason Chang, John W. Fisher_III

Abstract: We present an integrated probabilistic model for layered object tracking that combines dynamics on implicit shape representations, topological shape constraints, adaptive ap- pearance models, and layered flow. The generative model combines the evolution of appearances and layer shapes with a Gaussian process flow and explicit layer ordering. Efficient MCMC sampling algorithms are developed to en- able a particle filtering approach while reasoning about the distribution of object boundaries in video. We demonstrate the utility of the proposed tracking algorithm on a wide vari- ety of video sources while achieving state-of-the-art results on a boundary-accurate tracking dataset.
Similar papers:
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • Fast Object Segmentation in Unconstrained Video [pdf] - Anestis Papazoglou, Vittorio Ferrari
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
Efficient and Robust Large-Scale Rotation Averaging [pdf]
Avishek Chatterjee, Venu Madhav Govindu

Abstract: In this paper we address the problem of robust and effi- cient averaging of relative 3D rotations. Apart from having an interesting geometric structure, robust rotation averag- ing addresses the need for a good initialization for large- scale optimization used in structure-from-motion pipelines. Such pipelines often use unstructured image datasets har- vested from the internet thereby requiring an initialization method that is robust to outliers. Our approach works on the Lie group structure of 3D rotations and solves the prob- lem of large-scale robust rotation averaging in two ways. Firstly, we use modern l1 optimizers to carry out robust av- eraging of relative rotations that is efficient, scalable and robust to outliers. In addition, we also develop a two- step method that uses the l1 solution as an initialisation for an iteratively reweighted least squares (IRLS) approach. These methods achieve excellent results on large-scale, real world datasets and significantly outperform existing meth- ods, i.e. the state-of-the-art discrete-continuous optimiza- tion method of [3] as well as the Weiszfeld method of [8]. We demonstrate the efficacy of our method on two large- scale real world datasets and also provide the results of the two aforementioned methods for comparison.
Similar papers:
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Content-Aware Rotation [pdf] - Kaiming He, Huiwen Chang, Jian Sun
  • Direct Optimization of Frame-to-Frame Rotation [pdf] - Laurent Kneip, Simon Lynen
  • Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [pdf] - Pierre Moulon, Pascal Monasse, Renaud Marlet
A Generalized Low-Rank Appearance Model for Spatio-temporally Correlated Rain Streaks [pdf]
Yi-Lei Chen, Chiou-Ting Hsu

Abstract: In this paper, we propose a novel low-rank appearance model for removing rain streaks. Different from previous work, our method needs neither rain pixel detection nor time-consuming dictionary learning stage. Instead, as rain streaks usually reveal similar and repeated patterns on imaging scene, we propose and generalize a low-rank model from matrix to tensor structure in order to capture the spatio-temporally correlated rain streaks. With the appearance model, we thus remove rain streaks from image/video (and also other high-order image structure) in a unified way. Our experimental results demonstrate competitive (or even better) visual quality and efficient run-time in comparison with state of the art.
Similar papers:
  • Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement [pdf] - Jin Gao, Junliang Xing, Weiming Hu, Steve Maybank
  • Robust Tucker Tensor Decomposition for Effective Image Representation [pdf] - Miao Zhang, Chris Ding
  • Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision [pdf] - Tae-Hyun Oh, Hyeongwoo Kim, Yu-Wing Tai, Jean-Charles Bazin, In So Kweon
  • DCSH - Matching Patches in RGBD Images [pdf] - Yaron Eshet, Simon Korman, Eyal Ofek, Shai Avidan
  • Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal [pdf] - Ruixuan Wang, Emanuele Trucco
A Simple Model for Intrinsic Image Decomposition with Depth Cues [pdf]
Qifeng Chen, Vladlen Koltun

Abstract: We present a model for intrinsic decomposition of RGB-D images. Our approach analyzes a single RGB-D image and estimates albedo and shading fields that explain the input. To disambiguate the problem, our model esti- mates a number of components that jointly account for the reconstructed shading. By decomposing the shading field, we can build in assumptions about image formation that help distinguish reflectance variation from shading. These assumptions are expressed as simple nonlocal regularizers. We evaluate the model on real-world images and on a chal- lenging synthetic dataset. The experimental results demon- strate that the presented approach outperforms prior mod- els for intrinsic decomposition of RGB-D images.
Similar papers:
  • Estimating the Material Properties of Fabric from Video [pdf] - Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
  • SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition [pdf] - Qiegen Liu, Jianbo Liu, Pei Dong, Dong Liang
  • A Method of Perceptual-Based Shape Decomposition [pdf] - Chang Ma, Zhongqian Dong, Tingting Jiang, Yizhou Wang, Wen Gao
  • Compensating for Motion during Direct-Global Separation [pdf] - Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
Accurate and Robust 3D Facial Capture Using a Single RGBD Camera [pdf]
Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai

Abstract: This paper presents an automatic and robust approach that accurately captures high-quality 3D facial perfor- mances using a single RGBD camera. The key of our ap- proach is to combine the power of automatic facial feature detection and image-based 3D nonrigid registration tech- niques for 3D facial reconstruction. In particular, we de- velop a robust and accurate image-based nonrigid regis- tration algorithm that incrementally deforms a 3D template mesh model to best match observed depth image data and important facial features detected from single RGBD im- ages. The whole process is fully automatic and robust be- cause it is based on single frame facial registration frame- work. The system is flexible because it does not require any strong 3D facial priors such as blendshape models. We demonstrate the power of our approach by capturing a wide range of 3D facial expressions using a single RGBD camera and achieve state-of-the-art accuracy by comparing against alternative methods.
Similar papers:
  • Internet Based Morphable Model [pdf] - Ira Kemelmacher-Shlizerman
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Facial Action Unit Event Detection by Cascade of Tasks [pdf] - Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
  • Capturing Global Semantic Relationships for Facial Action Unit Recognition [pdf] - Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
  • Like Father, Like Son: Facial Expression Dynamics for Kinship Verification [pdf] - Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers
Constructing Adaptive Complex Cells for Robust Visual Tracking [pdf]
Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng

Abstract: Representation is a fundamental problem in object track- ing. Conventional methods track the target by describing its local or global appearance. In this paper we present that, besides the two paradigms, the composition of local region histograms can also provide diverse and important object cues. We use cells to extract local appearance, and construct complex cells to integrate the information from cells. With different spatial arrangements of cells, complex cells can explore various contextual information at multiple scales, which is important to improve the tracking perfor- mance. We also develop a novel template-matching algo- rithm for object tracking, where the template is composed of temporal varying cells and has two layers to capture the target and background appearance respectively. An adap- tive weight is associated with each complex cell to cope with occlusion as well as appearance variation. A fusion weight is associated with each complex cell type to preserve the global distinctiveness. Our algorithm is evaluated on 25 challenging sequences, and the results not only confirm the contribution of each component in our tracking system, but also outperform other competing trackers.
Similar papers:
  • Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines [pdf] - Shuran Song, Jianxiong Xiao
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
  • Decomposing Bag of Words Histograms [pdf] - Ankit Gandhi, Karteek Alahari, C.V. Jawahar
  • A Color Constancy Model with Double-Opponency Mechanisms [pdf] - Shaobing Gao, Kaifu Yang, Chaoyi Li, Yongjie Li
  • Conservation Tracking [pdf] - Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
Group Norm for Learning Structured SVMs with Unstructured Latent Variables [pdf]
Daozheng Chen, Dhruv Batra, William T. Freeman

Abstract: Latent variables models have been applied to a number of computer vision problems. However, the complexity of the latent space is typically left as a free design choice. A larger latent space results in a more expressive model, but such models are prone to overfitting and are slower to per- form inference with. The goal of this paper is to regularize the complexity of the latent space and learn which hidden states are really relevant for prediction. Specifically, we propose using group-sparsity-inducing regularizers such as l1-l2 to estimate the parameters of Structured SVMs with unstructured latent variables. Our experiments on digit recognition and object detection show that our approach is indeed able to control the complexity of latent space without any significant loss in accuracy of the learnt model.
Similar papers:
  • Building Part-Based Object Detectors via 3D Geometry [pdf] - Abhinav Shrivastava, Abhinav Gupta
  • Latent Task Adaptation with Large-Scale Hierarchies [pdf] - Yangqing Jia, Trevor Darrell
  • Training Deformable Part Models with Decorrelated Features [pdf] - Ross Girshick, Jitendra Malik
  • Learning to Share Latent Tasks for Action Recognition [pdf] - Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
  • Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach [pdf] - Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim
Learning a Dictionary of Shape Epitomes with Applications to Image Labeling [pdf]
Liang-Chieh Chen, George Papandreou, Alan L. Yuille

Abstract: The first main contribution of this paper is a novel method for representing images based on a dictionary of shape epitomes. These shape epitomes represent the local edge structure of the image and include hidden variables to encode shift and rotations. They are learnt in an unsu- pervised manner from groundtruth edges. This dictionary is compact but is also able to capture the typical shapes of edges in natural images. In this paper, we illustrate the shape epitomes by applying them to the image labeling task. In other work, described in the supplementary material, we apply them to edge detection and image modeling. We apply shape epitomes to image labeling by using Conditional Random Field (CRF) Models. They are alter- natives to the superpixel or pixel representations used in most CRFs. In our approach, the shape of an image patch is encoded by a shape epitome from the dictionary. Unlike the superpixel representation, our method avoids making early decisions which cannot be reversed. Our resulting hi- erarchical CRFs efficiently capture both local and global class co-occurrence properties. We demonstrate its quanti- tative and qualitative properties of our approach with image labeling experiments on two standard datasets: MSRC-21 and Stanford Background.
Similar papers:
  • Shape Anchors for Data-Driven Multi-view Reconstruction [pdf] - Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
  • A Framework for Shape Analysis via Hilbert Space Embedding [pdf] - Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
  • Cosegmentation and Cosketch by Unsupervised Learning [pdf] - Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
NEIL: Extracting Visual Knowledge from Web Data [pdf]
Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

Abstract: We propose NEIL (Never Ending Image Learner), a com- puter program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from In- ternet data. NEIL uses a semi-supervised learning algo- rithm that jointly discovers common sense relationships (e.g., Corolla is a kind of/looks similar to Car,Wheel is a part of Car) and labels instances of the given visual categories. It is an attempt to develop the worlds largest visual structured knowledge base with minimum human la- beling effort. As of 10th October 2013, NEIL has been con- tinuously running for 2.5 months on 200 core cluster (more than 350K CPU hours) and has an ontology of 1152 object categories, 1034 scene categories and 87 attributes. During this period, NEIL has discovered more than 1700 relation- ships and has labeled more than 400K visual instances. 1. Motivation Recent successes in computer vision can be primarily at- tributed to the ever increasing size of visual knowledge in terms of labeled instances of scenes, objects, actions, at- tributes, and the contextual relationships between them. But as we move forward, a key question arises: how will we gather this structured visual knowledge on a vast scale? Re- cent efforts such as ImageNet [8] and Visipedia [30] have tried to harness human intelligence for this task. However, we believe that these approaches lack both the richness and the scalability required for gathering massive amounts of visual knowledge. For example, at the
Similar papers:
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Human Attribute Recognition by Rich Appearance Dictionary [pdf] - Jungseock Joo, Shuo Wang, Song-Chun Zhu
  • From Subcategories to Visual Composites: A Multi-level Framework for Object Detection [pdf] - Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
Robust Dictionary Learning by Error Source Decomposition [pdf]
Zhuoyuan Chen, Ying Wu

Abstract: Sparsity models have recently shown great promise in many vision tasks. Using a learned dictionary in sparsity models can in general outperform predefined bases in clean data. In practice, both training and testing data may be corrupted and contain noises and outliers. Although recent studies attempted to cope with corrupted data and achieved encouraging results in testing phase, how to handle corrup- tion in training phase still remains a very difficult problem. In contrast to most existing methods that learn the dictio- nary from clean data, this paper is targeted at handling cor- ruptions and outliers in training data for dictionary learn- ing. We propose a general method to decompose the recon- structive residual into two components: a non-sparse com- ponent for small universal noises and a sparse component for large outliers, respectively. In addition, further analysis reveals the connection between our approach and the par- tial dictionary learning approach, updating only part of the prototypes (or informative codewords) with remaining (or noisy codewords) fixed. Experiments on synthetic data as well as real applications have shown satisfactory per- formance of this new robust dictionary learning approach.
Similar papers:
  • Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf] - Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
  • Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution [pdf] - Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian Lovell
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
Efficient Salient Region Detection with Soft Image Abstraction [pdf]
Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook

Abstract: on Ming-Ming Cheng Jonathan Warrell Wen-Yan Lin Shuai Zheng Vision Group, Oxford Brookes University Vibhav Vineet (b) Our result Nigel Crook (c) Ground truth Abstract Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale per- ceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial dis- tribution of image pixels, the proposed representation ab- stracts out unnecessary image details, allowing the assign- ment of comparable saliency values across similar regions, and producing perceptually accurate salient region detec- tion. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the pro- posed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.
Similar papers:
  • Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf] - Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Saliency Detection: A Boolean Map Approach [pdf] - Jianming Zhang, Stan Sclaroff
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Rank Minimization across Appearance and Shape for AAM Ensemble Fitting [pdf]
Xin Cheng, Sridha Sridharan, Jason Saragih, Simon Lucey

Abstract: Active Appearance Models (AAMs) employ a paradigm of inverting a synthesis model of how an object can vary in terms of shape and appearance. As a result, the abil- ity of AAMs to register an unseen object image is intrin- sically linked to two factors. First, how well the synthesis model can reconstruct the object image. Second, the de- grees of freedom in the model. Fewer degrees of freedom yield a higher likelihood of good fitting performance. In this paper we look at how these seemingly contrasting factors can complement one another for the problem of AAM fitting of an ensemble of images stemming from a constrained set (e.g. an ensemble of face images of the same person).
Similar papers:
  • Robust Face Landmark Estimation under Occlusion [pdf] - Xavier P. Burgos-Artizzu, Pietro Perona, Piotr Dollar
  • Exemplar-Based Graph Matching for Robust Facial Landmark Localization [pdf] - Feng Zhou, Jonathan Brandt, Zhe Lin
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model [pdf] - Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, Dimitris N. Metaxas
  • Optimization Problems for Fast AAM Fitting in-the-Wild [pdf] - Georgios Tzimiropoulos, Maja Pantic
Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf]
Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho

Abstract: This paper proposes a novel approach for sparse coding that further improves upon the sparse representation-based classification (SRC) framework. The proposed framework, Affine-Constrained Group Sparse Coding (ACGSC), ex- tends the current SRC framework to classification problems with multiple input samples. Geometrically, the affine- constrained group sparse coding essentially searches for the vector in the convex hull spanned by the input vectors that can best be sparse coded using the given dictionary. The resulting objective function is still convex and can be ef- ficiently optimized using iterative block-coordinate descent scheme that is guaranteed to converge. Furthermore, we provide a form of sparse recovery result that guarantees, at least theoretically, that the classification performance of the constrained group sparse coding should be at least as good as the group sparse coding. We have evaluated the proposed approach using three different recognition ex- periments that involve illumination variation of faces and textures, and face recognition under occlusions. Prelimi- nary experiments have demonstrated the effectiveness of the proposed approach, and in particular, the results from the recognition/occlusion experiment are surprisingly accurate and robust.
Similar papers:
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
  • Low-Rank Sparse Coding for Image Classification [pdf] - Tianzhu Zhang, Bernard Ghanem, Si Liu, Changsheng Xu, Narendra Ahuja
  • Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution [pdf] - Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian Lovell
  • Robust Dictionary Learning by Error Source Decomposition [pdf] - Zhuoyuan Chen, Ying Wu
  • Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf] - Jiajia Luo, Wei Wang, Hairong Qi
Multi-attributed Dictionary Learning for Sparse Coding [pdf]
Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai

Abstract: We present a multi-attributed dictionary learning algo- rithm for sparse coding. Considering training samples with multiple attributes, a new distance matrix is proposed by jointly incorporating data and attribute similarities. Then, an objective function is presented to learn category- dependent dictionaries that are compact (closeness of dic- tionary atoms based on data distance and attribute similar- ity), reconstructive (low reconstruction error with correct dictionary) and label-consistent (encouraging the labels of dictionary atoms to be similar). We have demonstrated our algorithm on action classification and face recognition tasks on several publicly available datasets. Experimental results with improved performance over previous dictionary learning methods are shown to validate the effectiveness of the proposed algorithm.
Similar papers:
  • Robust Dictionary Learning by Error Source Decomposition [pdf] - Zhuoyuan Chen, Ying Wu
  • Learning View-Invariant Sparse Representations for Cross-View Action Recognition [pdf] - Jingjing Zheng, Zhuolin Jiang
  • Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf] - Jiajia Luo, Wei Wang, Hairong Qi
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
Learning Graphs to Match [pdf]
Minsu Cho, Karteek Alahari, Jean Ponce

Abstract: Many tasks in computer vision are formulated as graph matching problems. Despite the NP-hard nature of the problem, fast and accurate approximations have led to sig- nificant progress in a wide range of applications. Learning graph models from observed data, however, still remains a challenging issue. This paper presents an effective scheme to parameterize a graph model, and learn its structural at- tributes for visual object matching. For this, we propose a graph representation with histogram-based attributes, and optimize them to increase the matching accuracy. Exper- imental evaluations on synthetic and real image datasets demonstrate the effectiveness of our approach, and show significant improvement in matching accuracy over graphs with pre-defined structures.
Similar papers:
  • Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features [pdf] - K.C. Amit Kumar, Christophe De_Vleeschouwer
  • Combining the Right Features for Complex Event Recognition [pdf] - Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller
  • Improving Graph Matching via Density Maximization [pdf] - Chao Wang, Lei Wang, Lingqiao Liu
  • Joint Optimization for Consistent Multiple Graph Matching [pdf] - Junchi Yan, Yu Tian, Hongyuan Zha, Xiaokang Yang, Ya Zhang, Stephen M. Chu
  • Learning Graph Matching: Oriented to Category Modeling from Cluttered Scenes [pdf] - Quanshi Zhang, Xuan Song, Xiaowei Shao, Huijing Zhao, Ryosuke Shibasaki
Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction [pdf]
Donghyeon Cho, Minhaeng Lee, Sunyeong Kim, Yu-Wing Tai

Abstract: Light-field imaging systems have got much attention re- cently as the next generation camera model. A light-field imaging system consists of three parts: data acquisition, manipulation, and application. Given an acquisition sys- tem, it is important to understand how a light-field camera converts from its raw image to its resulting refocused image. In this paper, using the Lytro camera as an example, we de- scribe step-by-step procedures to calibrate a raw light-field image. In particular, we are interested in knowing the spa- tial and angular coordinates of the micro lens array and the resampling process for image reconstruction. Since Lytro uses a hexagonal arrangement of a micro lens image, ad- ditional treatments in calibration are required. After cali- bration, we analyze and compare the performances of sev- eral resampling methods for image reconstruction with and without calibration. Finally, a learning based interpolation method is proposed which demonstrates a higher quality image reconstruction than previous interpolation methods including a method used in Lytro software.
Similar papers:
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
  • First-Photon Imaging: Scene Depth and Reflectance Acquisition from One Detected Photon per Pixel [pdf] - Ahmed Kirmani, Dongeek Shin, Dheera Venkatraman, Franco N. C. Wong, Vivek K Goyal
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • Structured Light in Sunlight [pdf] - Mohit Gupta, Qi Yin, Shree K. Nayar
  • Towards Motion Aware Light Field Video for Dynamic Scenes [pdf] - Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting [pdf]
Inchang Choi, Sunyeong Kim, Michael S. Brown, Yu-Wing Tai

Abstract: Single image matting techniques assume high-quality in- put images. The vast majority of images on the web and in personal photo collections are encoded using JPEG com- pression. JPEG images exhibit quantization artifacts that adversely affect the performance of matting algorithms. To address this situation, we propose a learning-based post-processing method to improve the alpha mattes ex- tracted from JPEG images. Our approach learns a set of sparse dictionaries from training examples that are used to transfer details from high-quality alpha mattes to alpha mattes corrupted by JPEG compression. Three different dictionaries are defined to accommodate different object structure (long hair, short hair, and sharp boundaries). A back-projection criteria combined within an MRF frame- work is used to automatically select the best dictionary to apply on the objects local boundary. We demonstrate that our method can produces superior results over existing state-of-the-art matting algorithms on a variety of inputs and compression levels.
Similar papers:
  • Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition [pdf] - Hans Lobel, Rene Vidal, Alvaro Soto
  • Anchored Neighborhood Regression for Fast Example-Based Super-Resolution [pdf] - Radu Timofte, Vincent De_Smet, Luc Van_Gool
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
Segmentation Driven Object Detection with Fisher Vectors [pdf]
Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid

Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and stor- age efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection signifi- cantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the PASCAL VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.
Similar papers:
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
  • Detecting Dynamic Objects with Multi-view Background Subtraction [pdf] - Raul Diaz, Sam Hallman, Charless C. Fowlkes
  • Codemaps - Segment, Classify and Search Objects Locally [pdf] - Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
  • Prime Object Proposals with Randomized Prim's Algorithm [pdf] - Santiago Manen, Matthieu Guillaumin, Luc Van_Gool
  • From Subcategories to Visual Composites: A Multi-level Framework for Object Detection [pdf] - Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
Cosegmentation and Cosketch by Unsupervised Learning [pdf]
Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu

Abstract: Cosegmentation refers to the problem of segmenting mul- tiple images simultaneously by exploiting the similarities between the foreground and background regions in these images. The key issue in cosegmentation is to align com- mon objects between these images. To address this issue, we propose an unsupervised learning framework for coseg- mentation, by coupling cosegmentation with what we call cosketch. The goal of cosketch is to automatically dis- cover a codebook of deformable shape templates shared by the input images. These shape templates capture distinct image patterns and each template is matched to similar im- age patches in different images. Thus the cosketch of the images helps to align foreground objects, thereby providing crucial information for cosegmentation. We present a sta- tistical model whose energy function couples cosketch and cosegmentation. We then present an unsupervised learn- ing algorithm that performs cosketch and cosegmentation by energy minimization. Experiments show that our method outperforms state of the art methods for cosegmentation on the challenging MSRC and iCoseg datasets. We also illustrate our method on a new dataset called Coseg-Rep where cosegmentation can be performed within a single im- age with repetitive patterns.
Similar papers:
  • Online Robust Non-negative Dictionary Learning for Visual Tracking [pdf] - Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
  • SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor [pdf] - Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin
  • Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach [pdf] - Reyes Rios-Cabrera, Tinne Tuytelaars
  • Learning a Dictionary of Shape Epitomes with Applications to Image Labeling [pdf] - Liang-Chieh Chen, George Papandreou, Alan L. Yuille
Ensemble Projection for Semi-supervised Image Classification [pdf]
Dengxin Dai, Luc Van_Gool

Abstract: This paper investigates the problem of semi-supervised classification. Unlike previous methods to regularize clas- sifying boundaries with unlabeled data, our method learns a new image representation from all available data (labeled and unlabeled) and performs plain supervised learning with the new feature. In particular, an ensemble of image pro- totype sets are sampled automatically from the available data, to represent a rich set of visual categories/attributes. Discriminative functions are then learned on these proto- type sets, and image are represented by the concatenation of their projected values onto the prototypes (similarities to them) for further classification. Experiments on four standard datasets show three interesting phenomena: (1) our method consistently outperforms previous methods for semi-supervised image classification; (2) our method lets it- self combine well with these methods; and (3) our method works well for self-taught image classification where unla- beled data are not coming from the same distribution as la- beled ones, but rather from a random collection of images.
Similar papers:
  • Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model [pdf] - Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
  • Learning CRFs for Image Parsing with Adaptive Subgradient Descent [pdf] - Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf] - Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof
Example-Based Facade Texture Synthesis [pdf]
Dengxin Dai, Hayko Riemenschneider, Gerhard Schmitt, Luc Van_Gool

Abstract: There is an increased interest in the efficient creation of city models, be it virtual or as-built. We present a method for synthesizing complex, photo-realistic facade images, from a single example. After parsing the example image into its semantic components, a tiling for it is generated. Novel tilings can then be created, yielding facade textures with different dimensions or with occluded parts inpainted. A genetic algorithm guides the novel facades as well as inpainted parts to be consistent with the example, both in terms of their overall structure and their detailed textures. Promising results for multiple standard datasets in partic- ular for the different building styles they contain demon- strate the potential of the method.
Similar papers:
  • A Deformable Mixture Parsing Model with Parselets [pdf] - Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan
  • Exemplar Cut [pdf] - Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
  • Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers [pdf] - Phillip Isola, Ce Liu
  • Real-World Normal Map Capture for Nearly Flat Reflective Surfaces [pdf] - Bastien Jacquet, Christian Hane, Kevin Koser, Marc Pollefeys
  • SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition [pdf] - Qiegen Liu, Jianbo Liu, Pei Dong, Dong Liang
Space-Time Tradeoffs in Photo Sequencing [pdf]
Tali Dekel_(Basha), Yael Moses, Shai Avidan

Abstract: Photo-sequencing is the problem of recovering the tem- poral order of a set of still images of a dynamic event, taken asynchronously by a set of uncalibrated cameras. Solving this problem is a first, crucial step for analyzing (or vi- sualizing) the dynamic content of the scene captured by a large number of freely moving spectators. We propose a geometric based solution, followed by rank aggregation to the photo-sequencing problem. Our algorithm trades spa- tial certainty for temporal certainty. Whereas the previous solution proposed by [4] relies on two images taken from the same static camera to eliminate uncertainty in space, we drop the static-camera assumption and replace it with temporal information available from images taken from the same (moving) camera. Our method thus overcomes the limitation of the static-camera assumption, and scales much better with the duration of the event and the spread of cam- eras in space. We present successful results on challenging real data sets and large scale synthetic data (250 images).
Similar papers:
  • Towards Motion Aware Light Field Video for Dynamic Scenes [pdf] - Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
  • Dynamic Pooling for Complex Event Recognition [pdf] - Weixin Li, Qian Yu, Ajay Divakaran, Nuno Vasconcelos
  • Camera Alignment Using Trajectory Intersections in Unsynchronized Videos [pdf] - Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
  • Lifting 3D Manhattan Lines from a Single Image [pdf] - Srikumar Ramalingam, Matthew Brand
  • Event Detection in Complex Scenes Using Interval Temporal Constraints [pdf] - Yifan Zhang, Qiang Ji, Hanqing Lu
Visual Reranking through Weakly Supervised Multi-graph Learning [pdf]
Cheng Deng, Rongrong Ji, Wei Liu, Dacheng Tao, Xinbo Gao

Abstract: Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval en- gines. The current trend lies in employing a crowd of re- trieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. Howev- er, a major challenge pertaining to current reranking meth- ods is how to take full advantage of the complementary property of distinct feature modalities. Given a query im- age and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image rerank- ing approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across d- ifferent graphs. Moreover, weakly supervised learning driv- en by image attributes is performed to denoise the pseudo- labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automat- ically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark
Similar papers:
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • Learning Graphs to Match [pdf] - Minsu Cho, Karteek Alahari, Jean Ponce
  • Attribute Adaptation for Personalized Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
Detecting Dynamic Objects with Multi-view Background Subtraction [pdf]
Raul Diaz, Sam Hallman, Charless C. Fowlkes

Abstract: The confluence of robust algorithms for structure from motion along with high-coverage mapping and imaging of the world around us suggests that it will soon be feasible to accurately estimate camera pose for a large class pho- tographs taken in outdoor, urban environments. In this pa- per, we investigate how such information can be used to improve the detection of dynamic objects such as pedes- trians and cars. First, we show that when rough camera location is known, we can utilize detectors that have been trained with a scene-specific background model in order to improve detection accuracy. Second, when precise camera pose is available, dense matching to a database of exist- ing images using multi-view stereo provides a way to elim- inate static backgrounds such as building facades, akin to background-subtraction often used in video analysis. We evaluate these ideas using a dataset of tourist photos with estimated camera pose. For template-based pedestrian de- tection, we achieve a 50 percent boost in average precision over baseline.
Similar papers:
  • NYC3DCars: A Dataset of 3D Vehicles in Geographic Context [pdf] - Kevin Matzen, Noah Snavely
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation [pdf] - Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Segmentation Driven Object Detection with Fisher Vectors [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Like Father, Like Son: Facial Expression Dynamics for Kinship Verification [pdf]
Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers

Abstract: Kinship verification from facial appearance is a difficult problem. This paper explores the possibility of employing facial expression dynamics in this problem. By using fea- tures that describe facial dynamics and spatio-temporal ap- pearance over smile expressions, we show that it is possible to improve the state of the art in this problem, and verify that it is indeed possible to recognize kinship by resemblance of facial expressions. The proposed method is tested on differ- ent kin relationships. On the average, 72.89% verification accuracy is achieved on spontaneous smiles.
Similar papers:
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Capturing Global Semantic Relationships for Facial Action Unit Recognition [pdf] - Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
  • Facial Action Unit Event Detection by Cascade of Tasks [pdf] - Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang
  • Accurate and Robust 3D Facial Capture Using a Single RGBD Camera [pdf] - Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
The Way They Move: Tracking Multiple Targets with Similar Appearance [pdf]
Caglayan Dicle, Octavia I. Camps, Mario Sznaier

Abstract: We introduce a computationally efficient algorithm for multi-object tracking by detection that addresses four main challenges: appearance similarity among targets, missing data due to targets being out of the field of view or oc- cluded behind other objects, crossing trajectories, and cam- era motion. The proposed method uses motion dynamics as a cue to distinguish targets with similar appearance, min- imize target mis-identification and recover missing data. Computational efficiency is achieved by using a General- ized Linear Assignment (GLA) coupled with efficient proce- dures to recover missing data and estimate the complexity of the underlying dynamics. The proposed approach works with tracklets of arbitrary length and does not assume a dynamical model a priori, yet it captures the overall mo- tion dynamics of the targets. Experiments using challenging videos show that this framework can handle complex target motions, non-stationary cameras and long occlusions, on scenarios where appearance cues are not available or poor.
Similar papers:
  • Conservation Tracking [pdf] - Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
  • Bayesian 3D Tracking from Monocular Video [pdf] - Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard
  • Latent Data Association: Bayesian Model Selection for Multi-target Tracking [pdf] - Aleksandr V. Segal, Ian Reid
  • Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features [pdf] - K.C. Amit Kumar, Christophe De_Vleeschouwer
  • Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos [pdf] - Baoyuan Wu, Siwei Lyu, Bao-Gang Hu, Qiang Ji
Facial Action Unit Event Detection by Cascade of Tasks [pdf]
Xiaoyu Ding, Wen-Sheng Chu, Fernando De_La_Torre, Jeffery F. Cohn, Qiao Wang

Abstract: Automatic facial Action Unit (AU) detection from video is a long-standing problem in facial expression analysis. AU detection is typically posed as a classification problem between frames or segments of positive examples and neg- ative ones, where existing work emphasizes the use of dif- ferent features or classifiers. In this paper, we propose a method called Cascade of Tasks (CoT) that combines the use of different tasks (i.e., frame, segment and transition) for AU event detection. We train CoT in a sequential manner embracing diversity, which ensures robustness and general- ization to unseen data. In addition to conventional frame- based metrics that evaluate frames independently, we pro- pose a new event-based metric to evaluate detection perfor- mance at event-level. We show how the CoT method con- sistently outperforms state-of-the-art approaches in both frame-based and event-based metrics, across three public datasets that differ in complexity: CK+, FERA and RU- FACS.
Similar papers:
  • Event Detection in Complex Scenes Using Interval Temporal Constraints [pdf] - Yifan Zhang, Qiang Ji, Hanqing Lu
  • Modeling 4D Human-Object Interactions for Event and Object Recognition [pdf] - Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
  • Like Father, Like Son: Facial Expression Dynamics for Kinship Verification [pdf] - Hamdi Dibeklioglu, Albert Ali Salah, Theo Gevers
  • Accurate and Robust 3D Facial Capture Using a Single RGBD Camera [pdf] - Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
  • Capturing Global Semantic Relationships for Facial Action Unit Recognition [pdf] - Ziheng Wang, Yongqiang Li, Shangfei Wang, Qiang Ji
Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification [pdf]
Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos

Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are dis- covered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative clas- sification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effec- tiveness of css-LDA model in both generative and discrim- inative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets
Similar papers:
  • Video Event Understanding Using Natural Language Descriptions [pdf] - Vignesh Ramanathan, Percy Liang, Li Fei-Fei
  • Handwritten Word Spotting with Corrected Attributes [pdf] - Jon Almazan, Albert Gordo, Alicia Fornes, Ernest Valveny
  • Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes [pdf] - Sukrit Shankar, Joan Lasenby, Roberto Cipolla
  • Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation [pdf] - Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
  • Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes [pdf] - Dahua Lin, Jianxiong Xiao
Multi-view Object Segmentation in Space and Time [pdf]
Abdelaziz Djelouah, Jean-Sebastien Franco, Edmond Boyer, Francois Le_Clerc, Patrick Perez

Abstract: In this paper, we address the problem of object segmen- tation in multiple views or videos when two or more view- points of the same scene are available. We propose a new approach that propagates segmentation coherence informa- tion in both space and time, hence allowing evidences in one image to be shared over the complete set. To this aim the segmentation is cast as a single efficient labeling prob- lem over space and time with graph cuts. In contrast to most existing multi-view segmentation methods that rely on some form of dense reconstruction, ours only requires a sparse 3D sampling to propagate information between viewpoints. The approach is thoroughly evaluated on standard multi- view datasets, as well as on videos. With static views, results compete with state of the art methods but they are achieved with significantly fewer viewpoints. With multiple videos, we report results that demonstrate the benefit of segmenta- tion propagation through temporal cues.
Similar papers:
  • GrabCut in One Cut [pdf] - Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
  • Fast Object Segmentation in Unconstrained Video [pdf] - Anestis Papazoglou, Vittorio Ferrari
  • Temporally Consistent Superpixels [pdf] - Matthias Reso, Jorn Jachalsky, Bodo Rosenhahn, Jorn Ostermann
  • Online Video SEEDS for Temporal Window Objectness [pdf] - Michael Van_Den_Bergh, Gemma Roig, Xavier Boix, Santiago Manen, Luc Van_Gool
  • Semi-supervised Learning for Large Scale Image Cosegmentation [pdf] - Zhengxiang Wang, Rujie Liu
Structured Forests for Fast Edge Detection [pdf]
Piotr Dollar, C. Lawrence Zitnick

Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computation- ally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning frame- work applied to random decision forests. Our novel ap- proach to learning decision trees robustly maps the struc- tured labels to a discrete space on which standard infor- mation gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge de- tection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.
Similar papers:
  • Random Forests of Local Experts for Pedestrian Detection [pdf] - Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores, Bastian Leibe
  • Dynamic Structured Model Selection [pdf] - David Weiss, Benjamin Sapp, Ben Taskar
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • Alternating Regression Forests for Object Detection and Pose Estimation [pdf] - Samuel Schulter, Christian Leistner, Paul Wohlhart, Peter M. Roth, Horst Bischof
  • Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria [pdf] - Christoph Straehle, Ullrich Koethe, Fred A. Hamprecht
A Deformable Mixture Parsing Model with Parselets [pdf]
Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan

Abstract: In this work, we address the problem of human pars- ing, namely partitioning the human body into semantic re- gions, by using the novel Parselet representation. Previous works often consider solving the problem of human pose estimation as the prerequisite of human parsing. We ar- gue that these approaches cannot obtain optimal pixel level parsing due to the inconsistent targets between these tasks. In this paper, we propose to use Parselets as the build- ing blocks of our parsing model. Parselets are a group of parsable segments which can generally be obtained by low- level over-segmentation algorithms and bear strong seman- tic meaning. We then build a Deformable Mixture Pars- ing Model (DMPM) for human parsing to simultaneously handle the deformation and multi-modalities of Parselets. The proposed model has two unique characteristics: (1) the possible numerous modalities of Parselet ensembles are ex- hibited as the And-Or structure of sub-trees; (2) to fur- ther solve the practical problem of Parselet occlusion or absence, we directly model the visibility property at some leaf nodes. The DMPM thus directly solves the problem of human parsing by searching for the best graph configura- tion from a pool of Parselet hypotheses without intermediate tasks. Comprehensive evaluations demonstrate the encour- aging performance of the proposed approach.
Similar papers:
  • Learning CRFs for Image Parsing with Adaptive Subgradient Descent [pdf] - Honghui Zhang, Jingdong Wang, Ping Tan, Jinglu Wang, Long Quan
  • Pedestrian Parsing via Deep Decompositional Network [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers [pdf] - Phillip Isola, Ce Liu
  • Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items [pdf] - Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg
Stable Hyper-pooling and Query Expansion for Event Detection [pdf]
Matthijs Douze, Jerome Revaud, Cordelia Schmid, Herve Jegou

Abstract: This paper makes two complementary contributions to event retrieval in large collections of videos. First, we propose hyper-pooling strategies that encode the frame de- scriptors into a representation of the video sequence in a stable manner. Our best choices compare favorably with regular pooling techniques based on k-means quantization. Second, we introduce a technique to improve the ranking. It can be interpreted either as a query expansion method or as a similarity adaptation based on the local context of the query video descriptor. Experiments on public bench- marks show that our methods are complementary and im- prove event retrieval results, without sacrificing efficiency.
Similar papers:
  • Fast Subspace Search via Grassmannian Based Hashing [pdf] - Xu Wang, Stefan Atev, John Wright, Gilad Lerman
  • Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf] - Dror Aiger, Efi Kokiopoulou, Ehud Rivlin
  • To Aggregate or Not to aggregate: Selective Match Kernels for Image Search [pdf] - Giorgos Tolias, Yannis Avrithis, Herve Jegou
  • Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval [pdf] - Cai-Zhi Zhu, Herve Jegou, Shin'Ichi Satoh
  • Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation [pdf] - Basura Fernando, Tinne Tuytelaars
PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects [pdf]
Stefan Duffner, Christophe Garcia

Abstract: In this paper, we present a novel algorithm for fast track- ing of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a prob- abilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adapta- tion and segmentation, the algorithm is able to track ob- jects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The pro- posed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-the- art tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient imple- mentation, and thus tracking is very fast.
Similar papers:
  • STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data [pdf] - Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
  • Orderless Tracking through Model-Averaged Posterior Estimation [pdf] - Seunghoon Hong, Suha Kwak, Bohyung Han
  • Initialization-Insensitive Visual Tracking through Voting with Salient Local Features [pdf] - Kwang Moo Yi, Hawook Jeong, Byeongho Heo, Hyung Jin Chang, Jin Young Choi
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Accurate Blur Models vs. Image Priors in Single Image Super-resolution [pdf]
Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz Nadler, Anat Levin

Abstract: Over the past decade, single image Super-Resolution (SR) research has focused on developing sophisticated im- age priors, leading to significant advances. Estimating and incorporating the blur model, that relates the high-res and low-res images, has received much less attention, however. In particular, the reconstruction constraint, namely that the blurred and downsampled high-res output should approxi- mately equal the low-res input image, has been either ig- nored or applied with default fixed blur models. In this work, we examine the relative importance of the image prior and the reconstruction constraint. First, we show that an accurate reconstruction constraint combined with a simple gradient regularization achieves SR results almost as good as those of state-of-the-art algorithms with sophisticated image priors. Second, we study both empirically and the- oretically the sensitivity of SR algorithms to the blur model assumed in the reconstruction constraint. We find that an accurate blur model is more important than a sophisticated image prior. Finally, using real camera data, we demon- strate that the default blur models of various SR algorithms may differ from the camera blur, typically leading to over- smoothed results. Our findings highlight the importance of accurately estimating camera blur in reconstructing raw low- res images acquired by an actual camera.
Similar papers:
  • On One-Shot Similarity Kernels: Explicit Feature Maps and Properties [pdf] - Stefanos Zafeiriou, Irene Kotsia
  • Forward Motion Deblurring [pdf] - Shicheng Zheng, Li Xu, Jiaya Jia
  • Deblurring by Example Using Dense Correspondence [pdf] - Yoav Hacohen, Eli Shechtman, Dani Lischinski
  • Dynamic Scene Deblurring [pdf] - Tae Hyun Kim, Byeongjoo Ahn, Kyoung Mu Lee
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
Restoring an Image Taken through a Window Covered with Dirt or Rain [pdf]
David Eigen, Dilip Krishnan, Rob Fergus

Abstract: Photographs taken through a window are often compro- mised by dirt or rain present on the window surface. Com- mon cases of this include pictures taken from inside a ve- hicle, or outdoor security cameras mounted inside a pro- tective enclosure. At capture time, defocus can be used to remove the artifacts, but this relies on achieving a shallow depth-of-field and placement of the camera close to the win- dow. Instead, we present a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image. We collect a dataset of clean/corrupted image pairs which are then used to train a specialized form of convolutional neural network. This learns how to map corrupted image patches to clean ones, implicitly capturing the characteristic appearance of dirt and water droplets in natural images. Our models demonstrate effective removal of dirt and rain in outdoor test conditions.
Similar papers:
  • Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks [pdf] - Mojtaba Seyedhosseini, Mehdi Sajjadi, Tolga Tasdizen
  • Hybrid Deep Learning for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • A Non-parametric Bayesian Network Prior of Human Pose [pdf] - Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin
  • Pedestrian Parsing via Deep Decompositional Network [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Face Recognition Using Face Patch Networks [pdf] - Chaochao Lu, Deli Zhao, Xiaoou Tang
On the Mean Curvature Flow on Graphs with Applications in Image and Manifold Processing [pdf]
Abdallah El_Chakik, Abderrahim Elmoataz, Ahcene Sadi

Abstract: In this paper, we propose an adaptation and transcrip- tion of the mean curvature level set equation on a general discrete domain (weighted graphs with arbitrary topology). We introduce the perimeters on graph using difference oper- ators and define the curvature as the first variation of these perimeters. Our proposed approach of mean curvature uni- fies both local and non local notions of mean curvature on Euclidean domains. Furthermore, it allows the extension to the processing of manifolds and data which can be repre- sented by graphs.
Similar papers:
  • Joint Optimization for Consistent Multiple Graph Matching [pdf] - Junchi Yan, Yu Tian, Hongyuan Zha, Xiaokang Yang, Ya Zhang, Stephen M. Chu
  • Learning Graphs to Match [pdf] - Minsu Cho, Karteek Alahari, Jean Ponce
  • Curvature-Aware Regularization on Riemannian Submanifolds [pdf] - Kwang In Kim, James Tompkin, Christian Theobalt
  • Partial Enumeration and Curvature Regularization [pdf] - Carl Olsson, Johannes Ulen, Yuri Boykov, Vladimir Kolmogorov
  • Shortest Paths with Curvature and Torsion [pdf] - Petter Strandmark, Johannes Ulen, Fredrik Kahl, Leo Grady
A Convex Optimization Framework for Active Learning [pdf]
Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty

Abstract: In many image/video/web classification problems, we have access to a large number of unlabeled samples. How- ever, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informa- tive unlabeled samples, in order to obtain a high classi- fication performance. Most existing active learning algo- rithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant in- formation overlap or they involve solving a non-convex op- timization. More importantly, the majority of active learn- ing algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunc- tion with any type of classifiers, including those of the fam- ily of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classi- fier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informa- tive unlabeled samples, which have th
Similar papers:
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
  • Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement [pdf] - Jin Gao, Junliang Xing, Weiming Hu, Steve Maybank
  • Active Learning of an Action Detector from Untrimmed Videos [pdf] - Sunil Bandla, Kristen Grauman
  • Collaborative Active Learning of a Kernel Machine Ensemble for Recognition [pdf] - Gang Hua, Chengjiang Long, Ming Yang, Yan Gao
  • Active Visual Recognition with Expertise Estimation in Crowdsourcing [pdf] - Chengjiang Long, Gang Hua, Ashish Kapoor
Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions [pdf]
Mohamed Elhoseiny, Babak Saleh, Ahmed Elgammal

Abstract: The main question we address in this paper is how to use purely textual description of categories with no training images to learn visual classifiers for these categories. We propose an approach for zero-shot learning of object cat- egories where the description of unseen categories comes in the form of typical text such as an encyclopedia entry, without the need to explicitly defined attributes. We pro- pose and investigate two baseline formulations, based on regression and domain adaptation. Then, we propose a new constrained optimization formulation that combines a re- gression function and a knowledge transfer function with additional constraints to predict the classifier parameters for new classes. We applied the proposed approach on two fine-grained categorization datasets, and the results indi- cate successful classifier prediction.
Similar papers:
  • Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes [pdf] - Sukrit Shankar, Joan Lasenby, Roberto Cipolla
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Translating Video Content to Natural Language Descriptions [pdf] - Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, Bernt Schiele
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
Online Motion Segmentation Using Dynamic Label Propagation [pdf]
Ali Elqursh, Ahmed Elgammal

Abstract: The vast majority of work on motion segmentation adopts the affine camera model due to its simplicity. Under the affine model, the motion segmentation problem becomes that of subspace separation. Due to this assumption, such methods are mainly offline and exhibit poor performance when the assumption is not satisfied. This is made evident in state-of-the-art methods that relax this assumption by using piecewise affine spaces and spectral clustering techniques to achieve better results. In this paper, we formulate the problem of motion segmentation as that of manifold sep- aration. We then show how label propagation can be used in an online framework to achieve manifold separation. The performance of our framework is evaluated on a benchmark dataset and achieves competitive performance while being online.
Similar papers:
  • Camera Alignment Using Trajectory Intersections in Unsynchronized Videos [pdf] - Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
  • Video Co-segmentation for Meaningful Action Extraction [pdf] - Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Robust Trajectory Clustering for Motion Segmentation [pdf] - Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Semi-dense Visual Odometry for a Monocular Camera [pdf]
Jakob Engel, Jurgen Sturm, Daniel Cremers

Abstract: We propose a fundamentally novel approach to real-time visual odometry for a monocular camera. It allows to ben- efit from the simplicity and accuracy of dense tracking which does not depend on visual features while running in real-time on a CPU. The key idea is to continuously esti- mate a semi-dense inverse depth map for the current frame, which in turn is used to track the motion of the camera using dense image alignment. More specifically, we estimate the depth of all pixels which have a non-negligible image gradi- ent. Each estimate is represented as a Gaussian probability distribution over the inverse depth. We propagate this in- formation over time, and update it with new measurements as new images arrive. In terms of tracking accuracy and computational speed, the proposed method compares favor- ably to both state-of-the-art dense and feature-based visual odometry and SLAM algorithms. As our method runs in real-time on a CPU, it is of large practical value for robotics and augmented reality applications. 1. Towards Dense Monocular Visual Odometry Tracking a hand-held camera and recovering the three- dimensional structure of the environment in real-time is among the most prominent challenges in computer vision. In the last years, dense approaches to these challenges have become increasingly popular: Instead of operating solely on visual feature positions, they reconstruct and track on the whole image using a surface-based map and thereby are fundamentally different
Similar papers:
  • PM-Huber: PatchMatch with Huber Regularization for Stereo Matching [pdf] - Philipp Heise, Sebastian Klose, Brian Jensen, Alois Knoll
  • Live Metric 3D Reconstruction on Mobile Phones [pdf] - Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys
  • A Rotational Stereo Model Based on XSlit Imaging [pdf] - Jinwei Ye, Yu Ji, Jingyi Yu
  • A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution [pdf] - Martin Kiechle, Simon Hawe, Martin Kleinsteuber
  • Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf] - David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
DCSH - Matching Patches in RGBD Images [pdf]
Yaron Eshet, Simon Korman, Eyal Ofek, Shai Avidan

Abstract: We extend patch based methods to work on patches in 3D space. We start with Coherency Sensitive Hashing [12] (CSH), which is an algorithm for matching patches between two RGB images, and extend it to work with RGBD im- ages. This is done by warping all 3D patches to a com- mon virtual plane in which CSH is performed. To avoid noise due to warping of patches of various normals and depths, we estimate a group of dominant planes and com- pute CSH on each plane separately, before merging the matching patches. The result is DCSH - an algorithm that matches world (3D) patches in order to guide the search for image plane matches. An independent contribution is an ex- tension of CSH, which we term Social-CSH. It allows a ma- jor speedup of the k nearest neighbor (kNN) version of CSH - its runtime growing linearly, rather than quadratically, in k. Social-CSH is used as a subcomponent of DCSH when many NNs are required, as in the case of image denoising. We show the benefits of using depth information to image re- construction and image denoising, demonstrated on several RGBD images.
Similar papers:
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
  • Person Re-identification by Salience Matching [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Shape Anchors for Data-Driven Multi-view Reconstruction [pdf] - Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman
  • Fast Direct Super-Resolution by Simple Functions [pdf] - Chih-Yuan Yang, Ming-Hsuan Yang
  • Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal [pdf] - Ruixuan Wang, Emanuele Trucco
Co-segmentation by Composition [pdf]
Alon Faktor, Michal Irani

Abstract: Given a set of images which share an object from the same semantic category, we would like to co-segment the shared object. We define good co-segments to be ones which can be easily composed (like a puzzle) from large pieces of other co-segments, yet are difficult to compose from remaining image parts. These pieces must not only match well but also be statistically significant (hard to com- pose at random). This gives rise to co-segmentation of ob- jects in very challenging scenarios with large variations in appearance, shape and large amounts of clutter. We further show how multiple images can collaborate and score each others co-segments to improve the overall fidelity and accuracy of the co-segmentation. Our co-segmentation can be applied both to large image collections, as well as to very few images (where there is too little data for unsupervised learning). At the extreme, it can be applied even to a single image, to extract its co-occurring objects. Our approach obtains state-of-the-art results on benchmark datasets. We further show very encouraging co-segmentation results on the challenging PASCAL-VOC dataset.
Similar papers:
  • Cosegmentation and Cosketch by Unsupervised Learning [pdf] - Jifeng Dai, Ying Nian Wu, Jie Zhou, Song-Chun Zhu
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
  • Video Segmentation by Tracking Many Figure-Ground Segments [pdf] - Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
  • Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf] - Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
Relative Attributes for Large-Scale Abandoned Object Detection [pdf]
Quanfu Fan, Prasad Gabbur, Sharath Pankanti

Abstract: Effective reduction of false alarms in large-scale video surveillance is rather challenging, especially for applica- tions where abnormal events of interest rarely occur, such as abandoned object detection. We develop an approach to prioritize alerts by ranking them, and demonstrate its great effectiveness in reducing false positives while keep- ing good detection accuracy. Our approach benefits from a novel representation of abandoned object alerts by relative attributes, namely staticness, foregroundness and abandon- ment. The relative strengths of these attributes are quan- tified using a ranking function[19] learnt on suitably de- signed low-level spatial and temporal features.These at- tributes of varying strengths are not only powerful in dis- tinguishing abandoned objects from false alarms such as people and light artifacts, but also computationally efficient for large-scale deployment. With these features, we apply a linear ranking algorithm to sort alerts according to their relevance to the end-user. We test the effectiveness of our approach on both public data sets and large ones collected from the real world.
Similar papers:
  • Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms [pdf] - Yu Pang, Haibin Ling
  • Attribute Adaptation for Personalized Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes [pdf] - Sukrit Shankar, Joan Lasenby, Roberto Cipolla
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
Supervised Binary Hash Code Learning with Jensen Shannon Divergence [pdf]
Lixin Fan

Abstract: This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Conse- quently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.
Similar papers:
  • What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? [pdf] - Masakazu Iwamura, Tomokazu Sato, Koichi Kise
  • Complementary Projection Hashing [pdf] - Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li
  • Learning Hash Codes with Listwise Supervision [pdf] - Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
  • Large-Scale Video Hashing via Structure Learning [pdf] - Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
  • A General Two-Step Approach to Learning-Based Hashing [pdf] - Guosheng Lin, Chunhua Shen, David Suter, Anton van_den_Hengel
Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias [pdf]
Chen Fang, Ye Xu, Daniel N. Rockmore

Abstract: Many standard computer vision datasets exhibit biases due to a variety of sources including illumination condi- tion, imaging system, and preference of dataset collectors. Biases like these can have downstream effects in the use of vision datasets in the construction of generalizable tech- niques, especially for the goal of the creation of a classifi- cation system capable of generalizing to unseen and novel datasets. In this work we propose Unbiased Metric Learn- ing (UML), a metric learning approach, to achieve this goal. UML operates in the following two steps: (1) By varying hyperparameters, it learns a set of less biased can- didate distance metrics on training examples from multiple biased datasets. The key idea is to learn a neighborhood for each example, which consists of not only examples of the same category from the same dataset, but those from other datasets. The learning framework is based on structural SVM. (2) We do model validation on a set of weakly-labeled web images retrieved by issuing class labels as keywords to search engine. The metric with best validation performance is selected. Although the web images sometimes have noisy labels, they often tend to be less biased, which makes them suitable for the validation set in our task. Cross-dataset im- age classification experiments are carried out. Results show significant performance improvement on four well-known computer vision datasets.
Similar papers:
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics [pdf] - Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin, Thierry Dutoit
  • Quadruplet-Wise Image Similarity Learning [pdf] - Marc T. Law, Nicolas Thome, Matthieu Cord
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning [pdf]
Zheyun Feng, Rong Jin, Anil Jain

Abstract: One of the key challenges in search-based image anno- tation models is to define an appropriate similarity mea- sure between images. Many kernel distance metric learning (KML) algorithms have been developed in order to capture the nonlinear relationships between visual features and se- mantics of the images. One fundamental limitation in apply- ing KML to image annotation is that it requires converting image annotations into binary constraints, leading to a sig- nificant information loss. In addition, most KML algorithms suffer from high computational cost due to the requirement that the learned matrix has to be positive semi-definitive (PSD). In this paper, we propose a robust kernel metric learning (RKML) algorithm based on the regression tech- nique that is able to directly utilize image annotations. The proposed method is also computationally more efficient be- cause PSD property is automatically ensured by regression. We provide the theoretical guarantee for the proposed algo- rithm, and verify its efficiency and effectiveness for image annotation by comparing it to state-of-the-art approaches for both distance metric learning and image annotation.
Similar papers:
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • A Framework for Shape Analysis via Hilbert Space Embedding [pdf] - Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi
  • On One-Shot Similarity Kernels: Explicit Feature Maps and Properties [pdf] - Stefanos Zafeiriou, Irene Kotsia
  • Incorporating Cloud Distribution in Sky Representation [pdf] - Kuan-Chuan Peng, Tsuhan Chen
  • How Do You Tell a Blackbird from a Crow? [pdf] - Thomas Berg, Peter N. Belhumeur
Super-resolution via Transform-Invariant Group-Sparse Regularization [pdf]
Carlos Fernandez-Granda, Emmanuel J. Candes

Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challeng- ing to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such de- formations by using recently developed tools based on con- vex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group spar- sity is very effective at high super-resolution factors. We view our approach as complementary to most recent super- resolution methods, which tend to focus on hallucinating high-frequency textures.
Similar papers:
  • Deblurring by Example Using Dense Correspondence [pdf] - Yoav Hacohen, Eli Shechtman, Dani Lischinski
  • Structured Forests for Fast Edge Detection [pdf] - Piotr Dollar, C. Lawrence Zitnick
  • Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf] - Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
  • Accurate Blur Models vs. Image Priors in Single Image Super-resolution [pdf] - Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz Nadler, Anat Levin
  • Fast Direct Super-Resolution by Simple Functions [pdf] - Chih-Yuan Yang, Ming-Hsuan Yang
Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation [pdf]
Basura Fernando, Tinne Tuytelaars

Abstract: In this paper we present a new method for object re- trieval starting from multiple query images. The use of mul- tiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature config- urations. This results in a powerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.
Similar papers:
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
  • Stable Hyper-pooling and Query Expansion for Event Detection [pdf] - Matthijs Douze, Jerome Revaud, Cordelia Schmid, Herve Jegou
  • Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf] - Dror Aiger, Efi Kokiopoulou, Ehud Rivlin
  • Fast Subspace Search via Grassmannian Based Hashing [pdf] - Xu Wang, Stefan Atev, John Wright, Gilad Lerman
  • Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval [pdf] - Cai-Zhi Zhu, Herve Jegou, Shin'Ichi Satoh
Unsupervised Visual Domain Adaptation Using Subspace Alignment [pdf]
Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars

Abstract: In this paper, we introduce a new domain adaptation (DA) algorithm where the source and target domains are represented by subspaces described by eigenvectors. In this context, our method seeks a domain adaptation solution by learning a mapping function which aligns the source sub- space with the target one. We show that the solution of the corresponding optimization problem can be obtained in a simple closed form, leading to an extremely fast algorithm. We use a theoretical result to tune the unique hyperparam- eter corresponding to the size of the subspaces. We run our method on various datasets and show that, despite its intrin- sic simplicity, it outperforms state of the art DA methods.
Similar papers:
  • Cross-View Action Recognition over Heterogeneous Feature Spaces [pdf] - Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information [pdf] - Andy J. Ma, Pong C. Yuen, Jiawei Li
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
  • Unsupervised Domain Adaptation by Domain Invariant Projection [pdf] - Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann
Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf]
David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof

Abstract: In this work we present a novel method for the chal- lenging problem of depth image upsampling. Modern depth cameras such as Kinect or Time of Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we for- mulate a convex optimization problem using higher order regularization for depth image upsampling. In this opti- mization an anisotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the upsam- pling. We derive a numerical algorithm based on a primal- dual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel up- sampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate groundtruth, which, for the first time, enable to benchmark depth upsampling methods using real sensor data.
Similar papers:
  • Live Metric 3D Reconstruction on Mobile Phones [pdf] - Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys
  • Depth from Combining Defocus and Correspondence Using Light-Field Cameras [pdf] - Michael W. Tao, Sunil Hadap, Jitendra Malik, Ravi Ramamoorthi
  • First-Photon Imaging: Scene Depth and Reflectance Acquisition from One Detected Photon per Pixel [pdf] - Ahmed Kirmani, Dongeek Shin, Dheera Venkatraman, Franco N. C. Wong, Vivek K Goyal
  • Semi-dense Visual Odometry for a Monocular Camera [pdf] - Jakob Engel, Jurgen Sturm, Daniel Cremers
  • A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution [pdf] - Martin Kiechle, Simon Hawe, Martin Kleinsteuber
Corrected-Moment Illuminant Estimation [pdf]
Graham D. Finlayson

Abstract: 13 IEEE International Conference on Computer Vision 'SVVIGXIH1SQIRX -PPYQMRERX )WXMQEXMSR %FWXVEGX -QEKI GSPSVW EVI FMEWIH F] XLI GSPSV SJ XLI TVIZEMPMRK MPPYQMREXMSR %W WYGL XLI GSPSV EX TM\IP GERRSX EP[E]W FI YWIH HMVIGXP] MR WSPZMRK ZMWMSR XEWOW JVSQ VIGSKRMXMSR XS XVEGOMRK XS KIRIVEP WGIRI YRHIVWXERHMRK -PPYQMRERX IWXM QEXMSR EPKSVMXLQW EXXIQTX XS MRJIV XLI GSPSV SJ XLI PMKLX MR GMHIRX MR E WGIRI ERH XLIR E GSPSV GEWX VIQSZEP WXIT HMW GSYRXW XLI GSPSV FMEW HYI XS MPPYQMREXMSR ,S[IZIV HIWTMXI WYWXEMRIH VIWIEVGL WMRGI EPQSWX XLI MRGITXMSR SJ GSQTYXIV ZMWMSR TVSKVIWW LEW FIIR QSHIWX 8LI FIWX EPKSVMXLQW RS[ SJXIR FYMPX SR XST SJ I\TIRWMZI JIEXYVI I\XVEGXMSR ERH QEGLMRI PIEVRMRK EVI SRP] EFSYX X[MGI EW KSSH EW XLI WMQ TPIWX ETTVSEGLIW 8LMW TETIV MR IJJIGX [MPP WLS[ LS[ WMQTPI QSQIRX FEWIH EPKSVMXLQW WYGL EW +VE];SVPH GER [MXL XLI EHHMXMSR SJ E WMQTPI GSVVIGXMSR WXIT HIPMZIV QYGL MQTVSZIH MPPYQMRERX IWXMQEXMSR TIVJSVQERGI 8LI GSVVIGXIH +VE];SVPH EPKS VMXLQ QETW XLI QIER MQEKI GSPSV YWMRK E \IH TIV GEQ IVE \ QEXVM\ XVERWJSVQ 1SVI KIRIVEPP] SYV QSQIRX ET TVSEGL IQTPS]W WX RH ERH LMKLIV SVHIV QSQIRXW SJ GSP SVW SV JIEXYVIW WYGL EW GSPSV HIVMZEXMZIW ERH XLIWI EKEMR EVI PMRIEVP] GSVVIGXIH XS KMZI ER MPPYQMRERX IWXMQEXI 8LI UYIWXMSR SJ LS[ XS GSVVIGX XLI QSQIRXW MW ER MQTSVXERX SRI ]IX [I [MPP WLS[ E WMQTPI EPXIVREXMRK PIEWXWUYEVIW XVEMRMRK TVSGIHYVI WYJ GIW 6IQEVOEFP] EGVSWW XLI QENSV HEXEWIXW
Similar papers:
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Latent Task Adaptation with Large-Scale Hierarchies [pdf] - Yangqing Jia, Trevor Darrell
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Towards Understanding Action Recognition [pdf] - Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, Michael J. Black
  • Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf] - Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof
Structured Learning of Sum-of-Submodular Higher Order Energy Functions [pdf]
Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih

Abstract: Submodular functions can be exactly minimized in poly- nomial time, and the special case that graph cuts solve with max flow [19] has had significant impact in computer vi- sion [5, 21, 28]. In this paper we address the important class of sum-of-submodular (SoS) functions [2, 18], which can be efficiently minimized via a variant of max flow called submodular flow [6]. SoS functions can naturally express higher order priors involving, e.g., local image patches; however, it is difficult to fully exploit their expressive power because they have so many parameters. Rather than trying to formulate existing higher order priors as an SoS func- tion, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set. We adopt a structural SVM approach [15, 34] and formulate the train- ing problem in terms of quadratic programming; as a re- sult we can efficiently search the space of SoS priors via an extended cutting-plane algorithm. We also show how the state-of-the-art max flow method for vision problems [11] can be modified to efficiently solve the submodular flow problem. Experimental comparisons are made against the OpenCV implementation of the GrabCut interactive seg- mentation technique [28], which uses hand-tuned parame- ters instead of machine learning. On a standard dataset [12] our method learns higher order priors with hundreds of parameter values, and produces significantly better s
Similar papers:
  • Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees [pdf] - Aastha Jain, Shuanak Chatterjee, Rene Vidal
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • Potts Model, Parametric Maxflow and K-Submodular Functions [pdf] - Igor Gridchyn, Vladimir Kolmogorov
Data-Driven 3D Primitives for Single Image Understanding [pdf]
David F. Fouhey, Abhinav Gupta, Martial Hebert

Abstract: What primitives should we use to infer the rich 3D world behind an image? We argue that these primitives should be both visually discriminative and geometrically informa- tive and we present a technique for discovering such primi- tives. We demonstrate the utility of our primitives by using them to infer 3D surface normals given a single image. Our technique substantially outperforms the state-of-the-art and shows improved cross-dataset performance.
Similar papers:
  • 3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding [pdf] - Scott Satkin, Martial Hebert
  • Point-Based 3D Reconstruction of Thin Objects [pdf] - Benjamin Ummenhofer, Thomas Brox
  • A Generic Deformation Model for Dense Non-rigid Surface Registration: A Higher-Order MRF-Based Approach [pdf] - Yun Zeng, Chaohui Wang, Xianfeng Gu, Dimitris Samaras, Nikos Paragios
  • Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects [pdf] - Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
  • Building Part-Based Object Detectors via 3D Geometry [pdf] - Abhinav Shrivastava, Abhinav Gupta
EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory [pdf]
Victor Fragoso, Pradeep Sen, Sergio Rodriguez, Matthew Turk

Abstract: Algorithms based on RANSAC that estimate models us- ing feature correspondences between images can slow down tremendously when the percentage of correct correspon- dences (inliers) is small. In this paper, we present a prob- abilistic parametric model that allows us to assign confi- dence values for each matching correspondence and there- fore accelerates the generation of hypothesis models for RANSAC under these conditions. Our framework lever- ages Extreme Value Theory to accurately model the statis- tics of matching scores produced by a nearest-neighbor fea- ture matcher. Using a new algorithm based on this model, we are able to estimate accurate hypotheses with RANSAC at low inlier ratios significantly faster than previous state- of-the-art approaches, while still performing comparably when the number of inliers is large. We present results of ho- mography and fundamental matrix estimation experiments for both SIFT and SURF matches that demonstrate that our method leads to accurate and fast model estimations.
Similar papers:
  • Elastic Net Constraints for Shape Matching [pdf] - Emanuele Rodola, Andrea Torsello, Tatsuya Harada, Yasuo Kuniyoshi, Daniel Cremers
  • Person Re-identification by Salience Matching [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Improving Graph Matching via Density Maximization [pdf] - Chao Wang, Lei Wang, Lingqiao Liu
  • DeepFlow: Large Displacement Optical Flow with Deep Matching [pdf] - Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
  • Multiple Non-rigid Surface Detection and Registration [pdf] - Yi Wu, Yoshihisa Ijiri, Ming-Hsuan Yang
Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf]
Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato

Abstract: Hyperspectral imaging is beneficial to many applica- tions but current methods do not consider fluorescent effects which are present in everyday items ranging from paper, to clothing, to even our food. Furthermore, everyday fluores- cent items exhibit a mix of reflectance and fluorescence. So proper separation of these components is necessary for an- alyzing them. In this paper, we demonstrate efficient sep- aration and recovery of reflective and fluorescent emission spectra through the use of high frequency illumination in the spectral domain. With the obtained fluorescent emis- sion spectra from our high frequency illuminants, we then present to our knowledge, the first method for estimating the fluorescent absorption spectrum of a material given its emission spectrum. Conventional bispectral measurement of absorption and emission spectra needs to examine all combinations of incident and observed light wavelengths. In contrast, our method requires only two hyperspectral im- ages. The effectiveness of our proposed methods are then evaluated through a combination of simulation and real experiments. We also demonstrate an application of our method to synthetic relighting of real scenes.
Similar papers:
  • Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length [pdf] - Nicolas Martin, Vincent Couture, Sebastien Roy
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • Structured Light in Sunlight [pdf] - Mohit Gupta, Qi Yin, Shree K. Nayar
  • Matching Dry to Wet Materials [pdf] - Yaser Yacoob
  • Compensating for Motion during Direct-Global Separation [pdf] - Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis [pdf]
Fabio Galasso, Naveen Shankar Nagaraja, Tatiana Jimenez Cardenas, Thomas Brox, Bernt Schiele

Abstract: Video segmentation research is currently limited by the lack of a benchmark dataset that covers the large variety of subproblems appearing in video segmentation and that is large enough to avoid overfitting. Consequently, there is little analysis of video segmentation which generalizes across subtasks, and it is not yet clear which and how video segmentation should leverage the information from the still-frames, as previously studied in image segmenta- tion, alongside video specific information, such as temporal volume, motion and occlusion. In this work we provide such an analysis based on annotations of a large video dataset, where each video is manually segmented by multiple per- sons. Moreover, we introduce a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that re- flects the tradeoff between over-segmentation and segmen- tation accuracy.
Similar papers:
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Online Motion Segmentation Using Dynamic Label Propagation [pdf] - Ali Elqursh, Ahmed Elgammal
  • Fast Object Segmentation in Unconstrained Video [pdf] - Anestis Papazoglou, Vittorio Ferrari
Multi-channel Correlation Filters [pdf]
Hamed Kiani Galoogahi, Terence Sim, Simon Lucey

Abstract: Modern descriptors like HOG and SIFT are now com- monly used in vision for pattern detection within im- age and video. From a signal processing perspective, this detection process can be efficiently posed as a cor- relation/convolution between a multi-channel image and a multi-channel detector/filter which results in a single- channel response map indicating where the pattern (e.g. object) has occurred. In this paper, we propose a novel framework for learning a multi-channel detector/filter ef- ficiently in the frequency domain, both in terms of training time and memory footprint, which we refer to as a multi- channel correlation filter. To demonstrate the effectiveness of our strategy, we evaluate it across a number of visual de- tection/localization tasks where we: (i) exhibit superior per- formance to current state of the art correlation filters, and (ii) superior computational and memory efficiencies com- pared to state of the art spatial detectors.
Similar papers:
  • Exemplar-Based Graph Matching for Robust Facial Landmark Localization [pdf] - Feng Zhou, Jonathan Brandt, Zhe Lin
  • Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model [pdf] - Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, Dimitris N. Metaxas
  • Training Deformable Part Models with Decorrelated Features [pdf] - Ross Girshick, Jitendra Malik
  • Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition [pdf] - Joao F. Henriques, Joao Carreira, Rui Caseiro, Jorge Batista
  • Local Signal Equalization for Correspondence Matching [pdf] - Derek Bradley, Thabo Beeler
Decomposing Bag of Words Histograms [pdf]
Ankit Gandhi, Karteek Alahari, C.V. Jawahar

Abstract: We aim to decompose a global histogram representation of an image into histograms of its associated objects and re- gions. This task is formulated as an optimization problem, given a set of linear classifiers, which can effectively dis- criminate the object categories present in the image. Our decomposition bypasses harder problems associated with accurately localizing and segmenting objects. We evaluate our method on a wide variety of composite histograms, and also compare it with MRF-based solutions. In addition to merely measuring the accuracy of decomposition, we also show the utility of the estimated object and background his- tograms for the task of image classification on the PASCAL VOC 2007 dataset.
Similar papers:
  • A Method of Perceptual-Based Shape Decomposition [pdf] - Chang Ma, Zhongqian Dong, Tingting Jiang, Yizhou Wang, Wen Gao
  • Detecting Avocados to Zucchinis: What Have We Done, and Where Are We Going? [pdf] - Olga Russakovsky, Jia Deng, Zhiheng Huang, Alexander C. Berg, Li Fei-Fei
  • Shape Index Descriptors Applied to Texture-Based Galaxy Analysis [pdf] - Kim Steenstrup Pedersen, Kristoffer Stensbo-Smidt, Andrew Zirm, Christian Igel
  • A Color Constancy Model with Double-Opponency Mechanisms [pdf] - Shaobing Gao, Kaifu Yang, Chaoyi Li, Yongjie Li
  • Constructing Adaptive Complex Cells for Robust Visual Tracking [pdf] - Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
A Color Constancy Model with Double-Opponency Mechanisms [pdf]
Shaobing Gao, Kaifu Yang, Chaoyi Li, Yongjie Li

Abstract: The double-opponent color-sensitive cells in the primary visual cortex (V1) of the human visual system (HVS) have long been recognized as the physiological basis of color constancy. We introduce a new color constancy model by imitating the functional properties of the HVS from the reti- na to the double-opponent cells in V1. The idea behind the model originates from the observation that the color distri- bution of the responses of double-opponent cells to the input color-biased images coincides well with the light source di- rection. Then the true illuminant color of a scene is easily estimated by searching for the maxima of the separate RGB channels of the responses of double-opponent cells in the RGB space. Our systematical experimental evaluations on two commonly used image datasets show that the proposed model can produce competitive results in comparison to the complex state-of-the-art approaches, but with a simple im- plementation and without the need for training.
Similar papers:
  • Efficient Image Dehazing with Boundary Constraint and Contextual Regularization [pdf] - Gaofeng Meng, Ying Wang, Jiangyong Duan, Shiming Xiang, Chunhong Pan
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
  • Decomposing Bag of Words Histograms [pdf] - Ankit Gandhi, Karteek Alahari, C.V. Jawahar
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • Constructing Adaptive Complex Cells for Robust Visual Tracking [pdf] - Dapeng Chen, Zejian Yuan, Yang Wu, Geng Zhang, Nanning Zheng
Discriminant Tracking Using Tensor Representation with Semi-supervised Improvement [pdf]
Jin Gao, Junliang Xing, Weiming Hu, Steve Maybank

Abstract: Visual tracking has witnessed growing methods in objec- t representation, which is crucial to robust tracking. The dominant mechanism in object representation is using im- age features encoded in a vector as observations to perform tracking, without considering that an image is intrinsically a matrix, or a 2nd-order tensor. Thus approaches following this mechanism inevitably lose a lot of useful information, and therefore cannot fully exploit the spatial correlation- s within the 2D image ensembles. In this paper, we ad- dress an image as a 2nd-order tensor in its original form, and find a discriminative linear embedding space approxi- mation to the original nonlinear submanifold embedded in the tensor space based on the graph embedding framework. We specially design two graphs for characterizing the in- trinsic local geometrical structure of the tensor space, so as to retain more discriminant information when reducing the dimension along certain tensor dimensions. However, spatial correlations within a tensor are not limited to the el- ements along these dimensions. This means that some part of the discriminant information may not be encoded in the embedding space. We introduce a novel technique called semi-supervised improvement to iteratively adjust the em- bedding space to compensate for the loss of discriminant information, hence improving the performance of our track- er. Experimental results on challenging videos demonstrate the effectiveness and robustness of the prop
Similar papers:
  • Tracking via Robust Multi-task Multi-view Joint Sparse Representation [pdf] - Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
  • Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms [pdf] - Yu Pang, Haibin Ling
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Robust Tucker Tensor Decomposition for Effective Image Representation [pdf] - Miao Zhang, Chris Ding
Fine-Grained Categorization by Alignments [pdf]
E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars

Abstract: The aim of this paper is fine-grained categorization with- out human interaction. Different from prior work, which relies on detectors for specific object parts, we propose to localize distinctive details by roughly aligning the objects using just the overall shape, since implicit to fine-grained categorization is the existence of a super-class shape shared among all classes. The alignments are then used to trans- fer part annotations from training images to test images (supervised alignment), or to blindly yet consistently seg- ment the object in a number of regions (unsupervised align- ment). We furthermore argue that in the distinction of fine- grained sub-categories, classification-oriented encodings like Fisher vectors are better suited for describing local- ized information than popular matching oriented features like HOG. We evaluate the method on the CU-2011 Birds and Stanford Dogs fine-grained datasets, outperforming the state-of-the-art.
Similar papers:
  • Predicting an Object Location Using a Global Image Representation [pdf] - Jose A. Rodriguez Serrano, Diane Larlus
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction [pdf] - Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
  • Symbiotic Segmentation and Part Localization for Fine-Grained Categorization [pdf] - Yuning Chai, Victor Lempitsky, Andrew Zisserman
  • Codemaps - Segment, Classify and Search Objects Locally [pdf] - Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders
SIFTpack: A Compact Representation for Efficient SIFT Matching [pdf]
Alexandra Gilinsky, Lihi Zelnik Manor

Abstract: Computing distances between large sets of SIFT descrip- tors is a basic step in numerous algorithms in computer vision. When the number of descriptors is large, as is of- ten the case, computing these distances can be extremely time consuming. In this paper we propose the SIFTpack: a compact way of storing SIFT descriptors, which enables significantly faster calculations between sets of SIFTs than the current solutions. SIFTpack can be used to represent SIFTs densely extracted from a single image or sparsely from multiple different images. We show that the SIFTpack representation saves both storage space and run time, for both finding nearest neighbors and for computing all dis- tances between all descriptors. The usefulness of SIFTpack is also demonstrated as an alternative implementation for K-means dictionaries of visual words.
Similar papers:
  • Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval [pdf] - Yannis Avrithis
  • An Adaptive Descriptor Design for Object Recognition in the Wild [pdf] - Zhenyu Guo, Z. Jane Wang
  • Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition [pdf] - Hans Lobel, Rene Vidal, Alvaro Soto
  • What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? [pdf] - Masakazu Iwamura, Tomokazu Sato, Koichi Kise
  • Nested Shape Descriptors [pdf] - Jeffrey Byrne, Jianbo Shi
Training Deformable Part Models with Decorrelated Features [pdf]
Ross Girshick, Jitendra Malik

Abstract: In this paper, we show how to train a deformable part model (DPM) fasttypically in less than 20 minutes, or four times faster than the current fastest methodwhile maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is latent LDA, a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not re- quire an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experi- mental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and part- based models, and have practical implications for speeding up tasks such as model selection.
Similar papers:
  • POP: Person Re-identification Post-rank Optimisation [pdf] - Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang
  • Learning Discriminative Part Detectors for Image Classification and Cosegmentation [pdf] - Jian Sun, Jean Ponce
  • Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach [pdf] - Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim
  • Group Norm for Learning Structured SVMs with Unstructured Latent Variables [pdf] - Daozheng Chen, Dhruv Batra, William T. Freeman
  • Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition [pdf] - Joao F. Henriques, Joao Carreira, Rui Caseiro, Jorge Batista
Hidden Factor Analysis for Age Invariant Face Recognition [pdf]
Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang

Abstract: Age invariant face recognition has received increasing attention due to its great potential in real world applica- tions. In spite of the great progress in face recognition tech- niques, reliably recognizing faces across ages remains a dif- ficult task. The facial appearance of a person changes sub- stantially over time, resulting in significant intra-class vari- ations. Hence, the key to tackle this problem is to separate the variation caused by aging from the person-specific fea- tures that are stable. Specifically, we propose a new method, called Hidden Factor Analysis (HFA). This method captures the intuition above through a probabilistic model with two latent factors: an identity factor that is age-invariant and an age factor affected by the aging process. Then, the ob- served appearance can be modeled as a combination of the components generated based on these factors. We also de- velop a learning algorithm that jointly estimates the latent factors and the model parameters using an EM procedure. Extensive experiments on two well-known public domain face aging datasets: MORPH (the largest public face ag- ing database) and FGNET, clearly show that the proposed method achieves notable improvement over state-of-the-art algorithms.
Similar papers:
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
  • Deep Learning Identity-Preserving Face Space [pdf] - Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Potts Model, Parametric Maxflow and K-Submodular Functions [pdf]
Igor Gridchyn, Vladimir Kolmogorov

Abstract: The problem of minimizing the Potts energy function frequently occurs in computer vision applications. One way to tackle this NP-hard problem was proposed by Kov- tun [20, 21]. It identifies a part of an optimal solution by running k maxflow computations, where k is the number of labels. The number of labeled pixels can be significant in some applications, e.g. 50-93% in our tests for stereo. We show how to reduce the runtime to O(log k) maxflow com- putations (or one parametric maxflow computation). Fur- thermore, the output of our algorithm allows to speed-up the subsequent alpha expansion for the unlabeled part, or can be used as it is for time-critical applications. To derive our technique, we generalize the algorithm of Felzenszwalb et al. [7] for Tree Metrics. We also show a connection to k-submodular functions from combinato- rial optimization, and discuss k-submodular relaxations for general energy functions.
Similar papers:
  • Learning Graphs to Match [pdf] - Minsu Cho, Karteek Alahari, Jean Ponce
  • Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees [pdf] - Aastha Jain, Shuanak Chatterjee, Rene Vidal
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • Structured Learning of Sum-of-Submodular Higher Order Energy Functions [pdf] - Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih
  • Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria [pdf] - Christoph Straehle, Ullrich Koethe, Fred A. Hamprecht
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition [pdf]
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko

Abstract: Despite a recent push towards large-scale object recog- nition, activity recognition remains limited to narrow do- mains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activi- ties in-the-wild. We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use se- mantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from web-scale natural language corpora to penalize un- likely combinations of actors/actions/objects; we also use a web-scale language model to fill in novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.
Similar papers:
  • Video Event Understanding Using Natural Language Descriptions [pdf] - Vignesh Ramanathan, Percy Liang, Li Fei-Fei
  • Learning the Visual Interpretation of Sentences [pdf] - C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende
  • ACTIVE: Activity Concept Transitions in Video Event Classification [pdf] - Chen Sun, Ram Nevatia
  • Monte Carlo Tree Search for Scheduling Activity Recognition [pdf] - Mohamed R. Amer, Sinisa Todorovic, Alan Fern, Song-Chun Zhu
  • Translating Video Content to Natural Language Descriptions [pdf] - Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, Bernt Schiele
An Adaptive Descriptor Design for Object Recognition in the Wild [pdf]
Zhenyu Guo, Z. Jane Wang

Abstract: Digital images nowadays show large appearance vari- abilities on picture styles, in terms of color tone, contrast, vignetting, and etc. These picture styles are directly re- lated to the scene radiance, image pipeline of the camera, and post processing functions (e.g., photography effect fil- ters). Due to the complexity and nonlinearity of these fac- tors, popular gradient-based image descriptors generally are not invariant to different picture styles, which could de- grade the performance for object recognition. Given that images shared online or created by individual users are taken with a wide range of devices and may be processed by various post processing functions, to find a robust ob- ject recognition system is useful and challenging. In this paper, we investigate the influence of picture styles on ob- ject recognition by making a connection between image de- scriptors and a pixel mapping function g, and accordingly propose an adaptive approach based on a g-incorporated kernel descriptor and multiple kernel learning, without es- timating or specifying the image styles used in training and testing. We conduct experiments on the Domain Adaptation data set, the Oxford Flower data set, and several variants of the Flower data set by introducing popular photography effects through post-processing. The results demonstrate that the proposed method consistently yields recognition im- provements over standard descriptors in all studied cases.
Similar papers:
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • On One-Shot Similarity Kernels: Explicit Feature Maps and Properties [pdf] - Stefanos Zafeiriou, Irene Kotsia
  • To Aggregate or Not to aggregate: Selective Match Kernels for Image Search [pdf] - Giorgos Tolias, Yannis Avrithis, Herve Jegou
  • Nested Shape Descriptors [pdf] - Jeffrey Byrne, Jianbo Shi
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
Support Surface Prediction in Indoor Scenes [pdf]
Ruiqi Guo, Derek Hoiem

Abstract: In this paper, we present an approach to predict the extent and height of supporting surfaces such as tables, chairs, and cabinet tops from a single RGBD image. We de- fine support surfaces to be horizontal, planar surfaces that can physically support objects and humans. Given a RGBD image, our goal is to localize the height and full extent of such surfaces in 3D space. To achieve this, we created a labeling tool and annotated 1449 images with rich, com- plete 3D scene models in NYU dataset. We extract ground truth from the annotated dataset and developed a pipeline for predicting floor space, walls, the height and full extent of support surfaces. Finally we match the predicted extent with annotated scenes in training scenes and transfer the the support surface configuration from training scenes. We evaluate the proposed approach in our dataset and demon- strate its effectiveness in understanding scenes in 3D space.
Similar papers:
  • Data-Driven 3D Primitives for Single Image Understanding [pdf] - David F. Fouhey, Abhinav Gupta, Martial Hebert
  • 3D Scene Understanding by Voxel-CRF [pdf] - Byung-Soo Kim, Pushmeet Kohli, Silvio Savarese
  • Point-Based 3D Reconstruction of Thin Objects [pdf] - Benjamin Ummenhofer, Thomas Brox
  • Coherent Object Detection with 3D Geometric Context from a Single Image [pdf] - Jiyan Pan, Takeo Kanade
  • Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers [pdf] - Phillip Isola, Ce Liu
Video Co-segmentation for Meaningful Action Extraction [pdf]
Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou

Abstract: Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extrac- t this common action. As a preprocessing step, we first remove background trajectories by a motion-based figure- ground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory co- saliency measure, which captures the notion that trajecto- ries recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching pro- cess which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class varia- tion in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Fi- nally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary la- beling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smooth- ness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimen- tal results show that the proposed method performs well in common action extraction.
Similar papers:
  • Action Recognition with Actons [pdf] - Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
  • Camera Alignment Using Trajectory Intersections in Unsynchronized Videos [pdf] - Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
  • From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding [pdf] - Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis
Fibonacci Exposure Bracketing for High Dynamic Range Imaging [pdf]
Mohit Gupta, Daisuke Iso, Shree K. Nayar

Abstract: Exposure bracketing for high dynamic range (HDR) imaging involves capturing several images of the scene at different exposures. If either the camera or the scene moves during capture, the captured images must be registered. Large exposure differences between bracketed images lead to inaccurate registration, resulting in artifacts such as ghosting (multiple copies of scene objects) and blur. We present two techniques, one for image capture (Fibonacci exposure bracketing) and one for image registration (gen- eralized registration), to prevent such motion-related arti- facts. Fibonacci bracketing involves capturing a sequence of images such that each exposure time is the sum of the previous N(N > 1) exposures. Generalized registration involves estimating motion between sums of contiguous sets of frames, instead of between individual frames. Together, the two techniques ensure that motion is always estimated between frames of the same total exposure time. This results in HDR images and videos which have both a large dynamic range and minimal motion-related artifacts. We show, by re- sults for several real-world indoor and outdoor scenes, that the proposed approach significantly outperforms several ex- isting bracketing schemes.
Similar papers:
  • Geometric Registration Based on Distortion Estimation [pdf] - Wei Zeng, Mayank Goswami, Feng Luo, Xianfeng Gu
  • Go-ICP: Solving 3D Registration Efficiently and Globally Optimally [pdf] - Jiaolong Yang, Hongdong Li, Yunde Jia
  • Street View Motion-from-Structure-from-Motion [pdf] - Bryan Klingner, David Martin, James Roseborough
  • A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration [pdf] - Maxime Meilland, Tom Drummond, Andrew I. Comport
  • Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging [pdf] - Hae-Gon Jeon, Joon-Young Lee, Yudeog Han, Seon Joo Kim, In So Kweon
Non-convex P-Norm Projection for Robust Sparsity [pdf]
Mithun Das Gupta, Sanjeev Kumar

Abstract: In this paper, we investigate the properties of Lp norm (p 1) within a projection framework. We start with the KKT equations of the non-linear optimization problem and then use its key properties to arrive at an algorithm for Lp norm projection on the non-negative simplex. We compare with L1 projection which needs prior knowledge of the true norm, as well as hard thresholding based sparsification pro- posed in recent compressed sensing literature. We show performance improvements compared to these techniques across different vision applications.
Similar papers:
  • Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision [pdf] - Tae-Hyun Oh, Hyeongwoo Kim, Yu-Wing Tai, Jean-Charles Bazin, In So Kweon
  • Elastic Net Constraints for Shape Matching [pdf] - Emanuele Rodola, Andrea Torsello, Tatsuya Harada, Yasuo Kuniyoshi, Daniel Cremers
  • A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding [pdf] - Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng, David Zhang
  • Correntropy Induced L2 Graph for Robust Subspace Clustering [pdf] - Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
  • Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition [pdf] - Ricardo Cabral, Fernando De_La_Torre, Joao P. Costeira, Alexandre Bernardino
Structured Light in Sunlight [pdf]
Mohit Gupta, Qi Yin, Shree K. Nayar

Abstract: Strong ambient illumination severely degrades the per- formance of structured light based techniques. This is espe- cially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 or- ders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light dis- tribution is determined based on a measurement of the am- bient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by control- ling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambi- ent illumination levels on a limited time and power budget.
Similar papers:
  • Separating Reflective and Fluorescent Components Using High Frequency Illumination in the Spectral Domain [pdf] - Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
  • Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction [pdf] - Donghyeon Cho, Minhaeng Lee, Sunyeong Kim, Yu-Wing Tai
  • Toward Guaranteed Illumination Models for Non-convex Objects [pdf] - Yuqian Zhang, Cun Mu, Han-Wen Kuo, John Wright
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • First-Photon Imaging: Scene Depth and Reflectance Acquisition from One Detected Photon per Pixel [pdf] - Ahmed Kirmani, Dongeek Shin, Dheera Venkatraman, Franco N. C. Wong, Vivek K Goyal
The Interestingness of Images [pdf]
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, Luc Van_Gool

Abstract: We investigate human interest in photos. Based on our own and others psychological experiments, we identify var- ious cues for interestingness, namely aesthetics, unusu- alness and general preferences. For the ranking of retrieved images, interestingness is more appropriate than cues pro- posed earlier. Interestingness is, for example, correlated with what people believe they will remember. This is op- posed to actual memorability, which is uncorrelated to both of them. We introduce a set of features computationally capturing the three main aspects of visual interestingness that we propose and build an interestingness predictor from them. Its performance is shown on three datasets with vary- ing context, reflecting diverse levels of prior knowledge of the viewers.
Similar papers:
  • What Do You Do? Occupation Recognition in a Photo via Social Context [pdf] - Ming Shao, Liangyue Li, Yun Fu
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Multi-view Object Segmentation in Space and Time [pdf] - Abdelaziz Djelouah, Jean-Sebastien Franco, Edmond Boyer, Francois Le_Clerc, Patrick Perez
  • Online Video SEEDS for Temporal Window Objectness [pdf] - Michael Van_Den_Bergh, Gemma Roig, Xavier Boix, Santiago Manen, Luc Van_Gool
  • Multi-channel Correlation Filters [pdf] - Hamed Kiani Galoogahi, Terence Sim, Simon Lucey
Deblurring by Example Using Dense Correspondence [pdf]
Yoav Hacohen, Eli Shechtman, Dani Lischinski

Abstract: This paper presents a new method for deblurring photos using a sharp reference example that contains some shared content with the blurry photo. Most previous deblurring methods that exploit information from other photos require an accurately registered photo of the same static scene. In contrast, our method aims to exploit reference images where the shared content may have undergone substantial photo- metric and non-rigid geometric transformations, as these are the kind of reference images most likely to be found in personal photo albums. Our approach builds upon a recent method for example- based deblurring using non-rigid dense correspondence (NRDC) [11] and extends it in two ways. First, we suggest exploiting information from the reference image not only for blur kernel estimation, but also as a powerful local prior for the non-blind deconvolution step. Second, we introduce a simple yet robust technique for spatially varying blur es- timation, rather than assuming spatially uniform blur. Un- like the above previous method, which has proven successful only with simple deblurring scenarios, we demonstrate that our method succeeds on a variety of real-world examples. We provide quantitative and qualitative evaluation of our method and show that it outperforms the state-of-the-art.
Similar papers:
  • A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration [pdf] - Maxime Meilland, Tom Drummond, Andrew I. Comport
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
  • Accurate Blur Models vs. Image Priors in Single Image Super-resolution [pdf] - Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz Nadler, Anat Levin
  • Forward Motion Deblurring [pdf] - Shicheng Zheng, Li Xu, Jiaya Jia
  • Dynamic Scene Deblurring [pdf] - Tae Hyun Kim, Byeongjoo Ahn, Kyoung Mu Lee
High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination [pdf]
Yudeog Han, Joon-Young Lee, In So Kweon

Abstract: We present a novel framework to estimate detailed shape of diffuse objects with uniform albedo from a single RGB-D image. To estimate accurate lighting in natural illumination environment, we introduce a general lighting model consist- ing of two components: global and local models. The global lighting model is estimated from the RGB-D input using the low-dimensional characteristic of a diffuse reflectance model. The local lighting model represents spatially vary- ing illumination and it is estimated by using the smoothly- varying characteristic of illumination. With both the global and local lighting model, we can estimate complex light- ing variations in uncontrolled natural illumination condi- tions accurately. For high quality shape capture, a shape- from-shading approach is applied with the estimated light- ing model. Since the entire process is done with a single RGB-D input, our method is capable of capturing the high quality shape details of a dynamic object under natural illu- mination. Experimental results demonstrate the feasibility and effectiveness of our method that dramatically improves shape details of the rough depth input.
Similar papers:
  • Illuminant Chromaticity from Image Sequences [pdf] - Veronique Prinet, Dani Lischinski, Michael Werman
  • A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution [pdf] - Martin Kiechle, Simon Hawe, Martin Kleinsteuber
  • Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf] - David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
  • Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects [pdf] - Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
  • Internet Based Morphable Model [pdf] - Ira Kemelmacher-Shlizerman
Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution [pdf]
Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian Lovell

Abstract: Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geome- try. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping, which en- ables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to han- dle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on sev- eral classification tasks (face recognition, action recogni- tion, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrim- ination accuracy, in comparison to state-of-the-art meth- ods such as kernelised Affine Hull Method and graph- embedding Grassmann discriminant analysis.
Similar papers:
  • Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf] - Jiajia Luo, Wei Wang, Hairong Qi
  • Log-Euclidean Kernels for Sparse Representation and Dictionary Learning [pdf] - Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
Viewing Real-World Faces in 3D [pdf]
Tal Hassner

Abstract: We present a data-driven method for estimating the 3D shapes of faces viewed in single, unconstrained photos (aka in-the-wild). Our method was designed with an empha- sis on robustness and efficiency with the explicit goal of deployment in real-world applications which reconstruct and display faces in 3D. Our key observation is that for many practical applications, warping the shape of a ref- erence face to match the appearance of a query, is enough to produce realistic impressions of the querys 3D shape. Doing so, however, requires matching visual features be- tween the (possibly very different) query and reference im- ages, while ensuring that a plausible face shape is pro- duced. To this end, we describe an optimization process which seeks to maximize the similarity of appearances and depths, jointly, to those of a reference model. We describe our system for monocular face shape reconstruction and present both qualitative and quantitative experiments, com- paring our method against alternative systems, and demon- strating its capabilities. Finally, as a testament to its suit- ability for real-world applications, we offer an open, on- line implementation of our system, providing unique means of instant 3D viewing of faces appearing in web photos.
Similar papers:
  • Semi-dense Visual Odometry for a Monocular Camera [pdf] - Jakob Engel, Jurgen Sturm, Daniel Cremers
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
  • Internet Based Morphable Model [pdf] - Ira Kemelmacher-Shlizerman
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
Content-Aware Rotation [pdf]
Kaiming He, Huiwen Chang, Jian Sun

Abstract: We present an image editing tool called Content-Aware Rotation. Casually shot photos can appear tilted, and are often corrected by rotation and cropping. This trivial so- lution may remove desired content and hurt image integri- ty. Instead of doing rigid rotation, we propose a warping method that creates the perception of rotation and avoids cropping. Human vision studies suggest that the perception of rotation is mainly due to horizontal/vertical lines. We de- sign an optimization-based method that preserves the rota- tion of horizontal/vertical lines, maintains the completeness of the image content, and reduces the warping distortion. An efficient algorithm is developed to address the challeng- ing optimization. We demonstrate our content-aware rota- tion method on a variety of practical cases.
Similar papers:
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • Rectangling Stereographic Projection for Wide-Angle Image Visualization [pdf] - Che-Han Chang, Min-Chun Hu, Wen-Huang Cheng, Yung-Yu Chuang
  • Direct Optimization of Frame-to-Frame Rotation [pdf] - Laurent Kneip, Simon Lynen
  • Efficient and Robust Large-Scale Rotation Averaging [pdf] - Avishek Chatterjee, Venu Madhav Govindu
  • Lifting 3D Manhattan Lines from a Single Image [pdf] - Srikumar Ramalingam, Matthew Brand
PM-Huber: PatchMatch with Huber Regularization for Stereo Matching [pdf]
Philipp Heise, Sebastian Klose, Brian Jensen, Alois Knoll

Abstract: Most stereo correspondence algorithms match support windows at integer-valued disparities and assume a con- stant disparity value within the support window. The re- cently proposed PatchMatch stereo algorithm [7] over- comes this limitation of previous algorithms by directly esti- mating planes. This work presents a method that integrates the PatchMatch stereo algorithm into a variational smooth- ing formulation using quadratic relaxation. The resulting algorithm allows the explicit regularization of the disparity and normal gradients using the estimated plane parame- ters. Evaluation of our method in the Middlebury bench- mark shows that our method outperforms the traditional integer-valued disparity strategy as well as the original al- gorithm and its variants in sub-pixel accurate disparity es- timation.
Similar papers:
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • Semi-dense Visual Odometry for a Monocular Camera [pdf] - Jakob Engel, Jurgen Sturm, Daniel Cremers
  • Pose Estimation and Segmentation of People in 3D Movies [pdf] - Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev
  • Line Assisted Light Field Triangulation and Stereo Matching [pdf] - Zhan Yu, Xinqing Guo, Haibing Lin, Andrew Lumsdaine, Jingyi Yu
  • A Rotational Stereo Model Based on XSlit Imaging [pdf] - Jinwei Ye, Yu Ji, Jingyi Yu
Real-Time Body Tracking with One Depth Camera and Inertial Sensors [pdf]
Thomas Helten, Meinard Muller, Hans-Peter Seidel, Christian Theobalt

Abstract: In recent years, the availability of inexpensive depth cameras, such as the Microsoft Kinect, has boosted the re- search in monocular full body skeletal pose tracking. Un- fortunately, existing trackers often fail to capture poses where a single camera provides insufficient data, such as non-frontal poses, and all other poses with body part oc- clusions. In this paper, we present a novel sensor fusion ap- proach for real-time full body tracking that succeeds in such difficult situations. It takes inspiration from previous track- ing solutions, and combines a generative tracker and a dis- criminative tracker retrieving closest poses in a database. In contrast to previous work, both trackers employ data from a low number of inexpensive body-worn inertial sen- sors. These sensors provide reliable and complementary information when the monocular depth information alone is not sufficient. We also contribute by new algorithmic so- lutions to best fuse depth and inertial data in both trackers. One is a new visibility model to determine global body pose, occlusions and usable depth correspondences and to decide what data modality to use for discriminative tracking. We also contribute with a new inertial-based pose retrieval, and an adapted late fusion step to calculate the final body pose.
Similar papers:
  • Two-Point Gait: Decoupling Gait from Body Shape [pdf] - Stephen Lombardi, Ko Nishino, Yasushi Makihara, Yasushi Yagi
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose? [pdf] - Elisabeta Marinoiu, Dragos Papava, Cristian Sminchisescu
Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition [pdf]
Joao F. Henriques, Joao Carreira, Rui Caseiro, Jorge Batista

Abstract: Competitive sliding window detectors require vast train- ing sets. Since a pool of natural images provides a nearly endless supply of negative samples, in the form of patches at different scales and locations, training with all the avail- able data is considered impractical. A staple of current ap- proaches is hard negative mining, a method of selecting rel- evant samples, which is nevertheless expensive. Given that samples at slightly different locations have overlapping sup- port, there seems to be an enormous amount of duplicated work. It is natural, then, to ask whether these redundancies can be eliminated. In this paper, we show that the Gram matrix describing such data is block-circulant. We derive a transformation based on the Fourier transform that block-diagonalizes the Gram matrix, at once eliminating redundancies and parti- tioning the learning problem. This decomposition is valid for any dense features and several learning algorithms, and takes full advantage of modern parallel architectures. Sur- prisingly, it allows training with all the potential samples in sets of thousands of images. By considering the full set, we generate in a single shot the optimal solution, which is usually obtained only after several rounds of hard negative mining. We report speed gains on Caltech Pedestrians and INRIA Pedestrians of over an order of magnitude, allowing training on a desktop computer in a couple of minutes.
Similar papers:
  • Multi-channel Correlation Filters [pdf] - Hamed Kiani Galoogahi, Terence Sim, Simon Lucey
  • POP: Person Re-identification Post-rank Optimisation [pdf] - Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang
  • Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve [pdf] - Sakrapee Paisitkriangkrai, Chunhua Shen, Anton Van Den Hengel
  • Random Forests of Local Experts for Pedestrian Detection [pdf] - Javier Marin, David Vazquez, Antonio M. Lopez, Jaume Amores, Bastian Leibe
  • Training Deformable Part Models with Decorrelated Features [pdf] - Ross Girshick, Jitendra Malik
Orderless Tracking through Model-Averaged Posterior Estimation [pdf]
Seunghoon Hong, Suha Kwak, Bohyung Han

Abstract: We propose a novel offline tracking algorithm based on model-averaged posterior estimation through patch match- ing across frames. Contrary to existing online and offline tracking methods, our algorithm is not based on temporally- ordered estimates of target state but attempts to select easy- to-track frames first out of the remaining ones without ex- ploiting temporal coherency of target. The posterior of the selected frame is estimated by propagating densities from the already tracked frames in a recursive manner. The den- sity propagation across frames is implemented by an ef- ficient patch matching technique, which is useful for our algorithm since it does not require motion smoothness as- sumption. Also, we present a hierarchical approach, where a small set of key frames are tracked first and non-key frames are handled by local key frames. Our tracking al- gorithm is conceptually well-suited for the sequences with abrupt motion, shot changes, and occlusion. We com- pare our tracking algorithm with existing techniques in real videos with such challenges and illustrate its superior per- formance qualitatively and quantitatively.
Similar papers:
  • Latent Data Association: Bayesian Model Selection for Multi-target Tracking [pdf] - Aleksandr V. Segal, Ian Reid
  • Tracking via Robust Multi-task Multi-view Joint Sparse Representation [pdf] - Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao
  • Initialization-Insensitive Visual Tracking through Voting with Salient Local Features [pdf] - Kwang Moo Yi, Hawook Jeong, Byeongho Heo, Hyung Jin Chang, Jin Young Choi
  • PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects [pdf] - Stefan Duffner, Christophe Garcia
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
Tracking via Robust Multi-task Multi-view Joint Sparse Representation [pdf]
Zhibin Hong, Xue Mei, Danil Prokhorov, Dacheng Tao

Abstract: Combining multiple observation views has proven bene- ficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely repre- sented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the repre- sentation matrix to two collaborative components which enable a more robust and accurate approximation. We show that the proposed formulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior perfor- mance of the proposed approach compared to several state- of-the-art trackers.
Similar papers:
  • Orderless Tracking through Model-Averaged Posterior Estimation [pdf] - Seunghoon Hong, Suha Kwak, Bohyung Han
  • PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects [pdf] - Stefan Duffner, Christophe Garcia
  • Finding the Best from the Second Bests - Inhibiting Subjective Bias in Evaluation of Visual Tracking Algorithms [pdf] - Yu Pang, Haibin Ling
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
  • Online Robust Non-negative Dictionary Learning for Visual Tracking [pdf] - Naiyan Wang, Jingdong Wang, Dit-Yan Yeung
Multi-scale Topological Features for Hand Posture Representation and Analysis [pdf]
Kaoning Hu, Lijun Yin

Abstract: In this paper, we propose a multi-scale topological fea- ture representation for automatic analysis of hand pos- ture. Such topological features have the advantage of be- ing posture-dependent while being preserved under certain variations of illumination, rotation, personal dependency, etc. Our method studies the topology of the holes between the hand region and its convex hull. Inspired by the princi- ple of Persistent Homology, which is the theory of computa- tional topology for topological feature analysis over multi- ple scales, we construct the multi-scale Betti Numbers ma- trix (MSBNM) for the topological feature representation. In our experiments, we used 12 different hand postures and compared our features with three popular features (HOG, MCT, and Shape Context) on different data sets. In addition to hand postures, we also extend the feature representations to arm postures. The results demonstrate the feasibility and reliability of the proposed method.
Similar papers:
  • Fingerspelling Recognition with Semi-Markov Conditional Random Fields [pdf] - Taehwan Kim, Greg Shakhnarovich, Karen Livescu
  • Model Recommendation with Virtual Probes for Egocentric Hand Detection [pdf] - Cheng Li, Kris M. Kitani
  • Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests [pdf] - Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data [pdf] - Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
  • Efficient Hand Pose Estimation from a Single Depth Image [pdf] - Chi Xu, Li Cheng
Recognising Human-Object Interaction via Exemplar Based Modelling [pdf]
Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, Shaogang Gong, Tao Xiang

Abstract: Human action can be recognised from a single still im- age by modelling Human-object interaction (HOI), which infers the mutual spatial structure information between hu- man and object as well as their appearance. Existing ap- proaches rely heavily on accurate detection of human and object, and estimation of human pose. They are thus sensi- tive to large variations of human poses, occlusion and un- satisfactory detection of small size objects. To overcome this limitation, a novel exemplar based approach is pro- posed in this work. Our approach learns a set of spatial pose-object interaction exemplars, which are density func- tions describing how a person is interacting with a manip- ulated object for different activities spatially in a proba- bilistic way. A representation based on our HOI exemplar thus has great potential for being robust to the errors in human/object detection and pose estimation. A new frame- work consists of a proposed exemplar based HOI descrip- tor and an activity specific matching model that learns the parameters is formulated for robust human activity recog- nition. Experiments on two benchmark activity datasets demonstrate that the proposed approach obtains state-of- the-art performance.
Similar papers:
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Discovering Object Functionality [pdf] - Bangpeng Yao, Jiayuan Ma, Li Fei-Fei
  • Modeling 4D Human-Object Interactions for Event and Object Recognition [pdf] - Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
  • Exemplar Cut [pdf] - Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
Collaborative Active Learning of a Kernel Machine Ensemble for Recognition [pdf]
Gang Hua, Chengjiang Long, Ming Yang, Yan Gao

Abstract: Active learning is an effective way of engaging users to interactively train models for visual recognition. The vast majority of previous works, if not all of them, focused on active learning with a single human oracle. The problem of active learning with multiple oracles in a collaborative setting has not been well explored. Moreover, most of the previous works assume that the labels provided by the hu- man oracles are noise free, which may often be violated in reality. We present a collaborative computational model for active learning with multiple human oracles. It leads to not only an ensemble kernel machine that is robust to label noises, but also a principled label quality measure to online detect irresponsible labelers. Instead of running indepen- dent active learning processes for each individual human oracle, our model captures the inherent correlations among the labelers through shared data among them. Our simula- tion experiments and experiments with real crowd-sourced noisy labels demonstrated the efficacy of our model.
Similar papers:
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
  • Active Learning of an Action Detector from Untrimmed Videos [pdf] - Sunil Bandla, Kristen Grauman
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Active Visual Recognition with Expertise Estimation in Crowdsourcing [pdf] - Chengjiang Long, Gang Hua, Ashish Kapoor
Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition [pdf]
De-An Huang, Yu-Chiang Frank Wang

Abstract: Cross-domain image synthesis and recognition are typi- cally considered as two distinct tasks in the areas of com- puter vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be eas- ily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space learning. The proposed learning model not only observes a common feature space for associating cross-domain image data for recognition purposes, the de- rived feature space is able to jointly update the dictionaries in each image domain for improved representation. This is why our method can be applied to both cross-domain image synthesis and recognition problems. Experiments on a vari- ety of synthesis and recognition tasks such as single image super-resolution, cross-view action recognition, and sketch- to-photo face recognition would verify the effectiveness of our proposed learning model.
Similar papers:
  • Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf] - Jiajia Luo, Wei Wang, Hairong Qi
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Learning View-Invariant Sparse Representations for Cross-View Action Recognition [pdf] - Jingjing Zheng, Zhuolin Jiang
Coupling Alignments with Recognition for Still-to-Video Face Recognition [pdf]
Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen

Abstract: The Still-to-Video (S2V) face recognition systems typi- cally need to match faces in low-quality videos captured un- der unconstrained conditions against high quality still face images, which is very challenging because of noise, image blur, low face resolutions, varying head pose, complex light- ing, and alignment difficulty. To address the problem, one solution is to select the frames of best quality from videos (hereinafter called quality alignment in this paper). Mean- while, the faces in the selected frames should also be geo- metrically aligned to the still faces offline well-aligned in the gallery. In this paper, we discover that the interactions among the three tasksquality alignment, geometric align- ment and face recognitioncan benefit from each other, thus should be performed jointly. With this in mind, we propose a Coupling Alignments with Recognition (CAR) method to tightly couple these tasks via low-rank regularized sparse representation in a unified framework. Our method makes the three tasks promote mutually by a joint optimization in an Augmented Lagrange Multiplier routine. Extensive experiments on two challenging S2V datasets demonstrate that our method outperforms the state-of-the-art methods impressively.
Similar papers:
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors [pdf]
Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang

Abstract: In this paper, we present a new approach for text lo- calization in natural images, by discriminating text and non-text regions at three levels: pixel, component and text- line levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incor- porating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the out- put of SFT, we apply two classifiers, a text component clas- sifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are com- monly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statisti- cal characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- measure values are 0.72 and 0.73, respectively, surpassing previous methods in accuracy by a large margin.
Similar papers:
  • Recognizing Text with Perspective Distortion in Natural Scenes [pdf] - Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
  • PhotoOCR: Reading Text in Uncontrolled Conditions [pdf] - Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven
  • Scene Text Localization and Recognition with Oriented Stroke Detection [pdf] - Lukas Neumann, Jiri Matas
  • From Where and How to What We See [pdf] - S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath
Optimal Orthogonal Basis and Image Assimilation: Motion Modeling [pdf]
Etienne Huot, Giuseppe Papari, Isabelle Herlin

Abstract: This paper describes modeling and numerical computa- tion of orthogonal bases, which are used to describe im- ages and motion fields. Motion estimation from image data is then studied on subspaces spanned by these bases. A reduced model is obtained as the Galerkin projection on these subspaces of a physical model, based on Euler and optical flow equations. A data assimilation method is stud- ied, which assimilates coefficients of image data in the re- duced model in order to estimate motion coefficients. The approach is first quantified on synthetic data: it demon- strates the interest of model reduction as a compromise be- tween results quality and computational cost. Results ob- tained on real data are then displayed so as to illustrate the method.
Similar papers:
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Action Recognition with Improved Trajectories [pdf] - Heng Wang, Cordelia Schmid
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • Mining Motion Atoms and Phrases for Complex Action Recognition [pdf] - Limin Wang, Yu Qiao, Xiaoou Tang
Markov Network-Based Unified Classifier for Face Identification [pdf]
Wonjun Hwang, Kyungshik Roh, Junmo Kim

Abstract: We propose a novel unifying framework using a Markov network to learn the relationship between multiple classi- fiers in face recognition. We assume that we have several complementary classifiers and assign observation nodes to the features of a query image and hidden nodes to the fea- tures of gallery images. We connect each hidden node to its corresponding observation node and to the hidden nodes of other neighboring classifiers. For each observation-hidden node pair, we collect a set of gallery candidates that are most similar to the observation instance, and the relation- ship between the hidden nodes is captured in terms of the similarity matrix between the collected gallery images. Pos- terior probabilities in the hidden nodes are computed by the belief-propagation algorithm. The novelty of the pro- posed framework is the method that takes into account the classifier dependency using the results of each neighbor- ing classifier. We present extensive results on two different evaluation protocols, known and unknown image variation tests, using three different databases, which shows that the proposed framework always leads to good accuracy in face recognition.
Similar papers:
  • Coupling Alignments with Recognition for Still-to-Video Face Recognition [pdf] - Zhiwu Huang, Xiaowei Zhao, Shiguang Shan, Ruiping Wang, Xilin Chen
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person [pdf] - Meng Yang, Luc Van_Gool, Lei Zhang
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors [pdf]
Nakamasa Inoue, Koichi Shinoda

Abstract: Assigning a visual code to a low-level image descrip- tor, which we call code assignment, is the most computa- tionally expensive part of image classification algorithms based on the bag of visual word (BoW) framework. This paper proposes a fast computation method, Neighbor-to- Neighbor (NTN) search, for this code assignment. Based on the fact that image features from an adjacent region are usually similar to each other, this algorithm effectively re- duces the cost of calculating the distance between a code- word and a feature vector. This method can be applied not only to a hard codebook constructed by vector quantization (NTN-VQ), but also to a soft codebook, a Gaussian mix- ture model (NTN-GMM). We evaluated this method on the PASCAL VOC 2007 classification challenge task. NTN-VQ reduced the assignment cost by 77.4% in super-vector cod- ing, and NTN-GMM reduced it by 89.3% in Fisher-vector coding, without any significant degradation in classification performance.
Similar papers:
  • Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf] - Dror Aiger, Efi Kokiopoulou, Ehud Rivlin
  • What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? [pdf] - Masakazu Iwamura, Tomokazu Sato, Koichi Kise
  • Fast Neighborhood Graph Search Using Cartesian Concatenation [pdf] - Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo
  • Low-Rank Sparse Coding for Image Classification [pdf] - Tianzhu Zhang, Bernard Ghanem, Si Liu, Changsheng Xu, Narendra Ahuja
  • Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval [pdf] - Yannis Avrithis
Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers [pdf]
Phillip Isola, Ce Liu

Abstract: To quickly synthesize complex scenes, digital artists of- ten collage together visual elements from multiple sources: for example, mountains from New Zealand behind a Scottish castle with wisps of Saharan sand in front. In this paper, we propose to use a similar process in order to parse a scene. We model a scene as a collage of warped, layered objects sampled from labeled, reference images. Each object is re- lated to the rest by a set of support constraints. Scene pars- ing is achieved through analysis-by-synthesis. Starting with a dataset of labeled exemplar scenes, we retrieve a dictio- Ce Liu Microsoft Research celiu@microsoft.com 1$234%+5()#% *+",&$(-.%&/%0"#$#0% !"#$#%"&''()#% 15()#%#6+,$)% 15()#74&7($()'.28% 0.$48#0+0% Original image Edited image nary of candidate object segments that match a query im- 9($6&5%0"#$#%% Original image Edited image age. We then combine elements of this set into a scene col- lage that explains the query image. Beyond just assigning object labels to pixels, scene collaging produces a lot more information such as the number of each type of object in the scene, how they support one another, the ordinal depth of each object, and, to some degree, occluded content. We exploit this representation for several applications: image editing, random scene synthesis, and image-to-anaglyph.
Similar papers:
  • Support Surface Prediction in Indoor Scenes [pdf] - Ruiqi Guo, Derek Hoiem
  • A Deformable Mixture Parsing Model with Parselets [pdf] - Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, Shuicheng Yan
  • Exemplar Cut [pdf] - Jimei Yang, Yi-Hsuan Tsai, Ming-Hsuan Yang
  • Action Recognition and Localization by Hierarchical Space-Time Segments [pdf] - Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
  • Video Segmentation by Tracking Many Figure-Ground Segments [pdf] - Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? [pdf]
Masakazu Iwamura, Tomokazu Sato, Koichi Kise

Abstract: Approximate nearest neighbor search (ANNS) is a basic and important technique used in many tasks such as object recognition. It involves two processes: selecting nearest neighbor candidates and performing a brute-force search of these candidates. Only the former though has scope for improvement. In most existing methods, it approximates the space by quantization. It then calculates all the distances between the query and all the quantized values (e.g., clus- ters or bit sequences), and selects a fixed number of can- didates close to the query. The performance of the method is evaluated based on accuracy as a function of the num- ber of candidates. This evaluation seems rational but poses a serious problem; it ignores the computational cost of the process of selection. In this paper, we propose a new ANNS method that takes into account costs in the selection pro- cess. Whereas existing methods employ computationally expensive techniques such as comparative sort and heap, the proposed method does not. This realizes a significantly more efficient search. We have succeeded in reducing com- putation times by one-third compared with the state-of-the- art on an experiment using 100 million SIFT features.
Similar papers:
  • Joint Inverted Indexing [pdf] - Yan Xia, Kaiming He, Fang Wen, Jian Sun
  • Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search [pdf] - Dror Aiger, Efi Kokiopoulou, Ehud Rivlin
  • Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval [pdf] - Yannis Avrithis
  • Fast Neighborhood Graph Search Using Cartesian Concatenation [pdf] - Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo
  • Fast Subspace Search via Grassmannian Based Hashing [pdf] - Xu Wang, Stefan Atev, John Wright, Gilad Lerman
Real-World Normal Map Capture for Nearly Flat Reflective Surfaces [pdf]
Bastien Jacquet, Christian Hane, Kevin Koser, Marc Pollefeys

Abstract: Although specular objects have gained interest in recent years, virtually no approaches exist for markerless recon- struction of reflective scenes in the wild. In this work, we present a practical approach to capturing normal maps in real-world scenes using video only. We focus on nearly pla- nar surfaces such as windows, facades from glass or metal, or frames, screens and other indoor objects and show how normal maps of these can be obtained without the use of an artificial calibration object. Rather, we track the reflections of real-world straight lines, while moving with a hand-held or vehicle-mounted camera in front of the object. In con- trast to error-prone local edge tracking, we obtain the re- flections by a robust, global segmentation technique of an ortho-rectified 3D video cube that also naturally allows ef- ficient user interaction. Then, at each point of the reflective surface, the resulting 2D-curve to 3D-line correspondence provides a novel quadratic constraint on the local surface normal. This allows to globally solve for the shape by in- tegrability and smoothness constraints and easily supports the usage of multiple lines. We demonstrate the technique on several objects and facades.
Similar papers:
  • Matching Dry to Wet Materials [pdf] - Yaser Yacoob
  • Forward Motion Deblurring [pdf] - Shicheng Zheng, Li Xu, Jiaya Jia
  • Refractive Structure-from-Motion on Underwater Images [pdf] - Anne Jordt-Sedlazeck, Reinhard Koch
  • Exploiting Reflection Change for Automatic Reflection Removal [pdf] - Yu Li, Michael S. Brown
  • Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects [pdf] - Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
Adapting Classification Cascades to New Domains [pdf]
Vidit Jain, Sachin Sudhakar Farfade

Abstract: Classification cascades have been very effective for ob- ject detection. Such a cascade fails to perform well in data domains with variations in appearances that may not be captured in the training examples. This limited generaliza- tion severely restricts the domains for which they can be used effectively. A common approach to address this limi- tation is to train a new cascade of classifiers from scratch for each of the new domains. Building separate detectors for each of the different domains requires huge annotation and computational effort, making it not scalable to a large number of data domains. Here we present an algorithm for quickly adapting a pre-trained cascade of classifiers using a small number of labeled positive instances from a different yet similar data domain. In our experiments with images of human babies and human-like characters from movies, we demonstrate that the adapted cascade significantly outper- forms both of the original cascade and the one trained from scratch using the given training examples.
Similar papers:
  • Multi-stage Contextual Deep Learning for Pedestrian Detection [pdf] - Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
  • Efficient Pedestrian Detection by Directly Optimizing the Partial Area under the ROC Curve [pdf] - Sakrapee Paisitkriangkrai, Chunhua Shen, Anton Van Den Hengel
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
  • Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation [pdf] - Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
  • Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection [pdf] - Tianfu Wu, Song-Chun Zhu
Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees [pdf]
Aastha Jain, Shuanak Chatterjee, Rene Vidal

Abstract: We propose an exact, general and efficient coarse-to-fine energy minimization strategy for semantic video segmenta- tion. Our strategy is based on a hierarchical abstraction of the supervoxel graph that allows us to minimize an energy defined at the finest level of the hierarchy by minimizing a series of simpler energies defined over coarser graphs. The strategy is exact, i.e., it produces the same solution as mini- mizing over the finest graph. It is general, i.e., it can be used to minimize any energy function (e.g., unary, pairwise, and higher-order terms) with any existing energy minimization algorithm (e.g., graph cuts and belief propagation). It also gives significant speedups in inference for several datasets with varying degrees of spatio-temporal continuity. We also discuss the strengths and weaknesses of our strategy rela- tive to existing hierarchical approaches, and the kinds of image and video data that provide the best speedups.
Similar papers:
  • Uncertainty-Driven Efficiently-Sampled Sparse Graphical Models for Concurrent Tumor Segmentation and Atlas Registration [pdf] - Sarah Parisot, William Wells_III, Stephane Chemouny, Hugues Duffau, Nikos Paragios
  • Flattening Supervoxel Hierarchies by the Uniform Entropy Slice [pdf] - Chenliang Xu, Spencer Whitt, Jason J. Corso
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • Potts Model, Parametric Maxflow and K-Submodular Functions [pdf] - Igor Gridchyn, Vladimir Kolmogorov
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Efficient Higher-Order Clustering on the Grassmann Manifold [pdf]
Suraj Jain, Venu Madhav Govindu

Abstract: The higher-order clustering problem arises when data is drawn from multiple subspaces or when observations fit a higher-order parametric model. Most solutions to this problem either decompose higher-order similarity measures for use in spectral clustering or explicitly use low-rank ma- trix representations. In this paper we present our approach of Sparse Grassmann Clustering (SGC) that combines at- tributes of both categories. While we decompose the higher- order similarity tensor, we cluster data by directly finding a low dimensional representation without explicitly build- ing a similarity matrix. By exploiting recent advances in online estimation on the Grassmann manifold (GROUSE) we develop an efficient and accurate algorithm that works with individual columns of similarities or partial observa- tions thereof. Since it avoids the storage and decomposition of large similarity matrices, our method is efficient, scal- able and has low memory requirements even for large-scale data. We demonstrate the performance of our SGC method on a variety of segmentation problems including planar seg- mentation of Kinect depth maps and motion segmentation of the Hopkins 155 dataset for which we achieve performance comparable to the state-of-the-art.
Similar papers:
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Correntropy Induced L2 Graph for Robust Subspace Clustering [pdf] - Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
  • Distributed Low-Rank Subspace Segmentation [pdf] - Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation [pdf]
Suyog Dutt Jain, Kristen Grauman

Abstract: The mode of manual annotation used in an interactive segmentation algorithm affects both its accuracy and ease- of-use. For example, bounding boxes are fast to supply, yet may be too coarse to get good results on difficult images; freehand outlines are slower to supply and more specific, yet they may be overkill for simple images. Whereas ex- isting methods assume a fixed form of input no matter the image, we propose to predict the tradeoff between accuracy and effort. Our approach learns whether a graph cuts seg- mentation will succeed if initialized with a given annotation mode, based on the images visual separability and fore- ground uncertainty. Using these predictions, we optimize the mode of input requested on new images a user wants segmented. Whether given a single image that should be segmented as quickly as possible, or a batch of images that must be segmented within a specified time budget, we show how to select the easiest modality that will be sufficiently strong to yield high quality segmentations. Extensive results with real users and three datasets demonstrate the impact.
Similar papers:
  • Implied Feedback: Learning Nuances of User Behavior in Image Search [pdf] - Devi Parikh, Kristen Grauman
  • Multi-view Object Segmentation in Space and Time [pdf] - Abdelaziz Djelouah, Jean-Sebastien Franco, Edmond Boyer, Francois Le_Clerc, Patrick Perez
  • Symbiotic Segmentation and Part Localization for Fine-Grained Categorization [pdf] - Yuning Chai, Victor Lempitsky, Andrew Zisserman
  • GrabCut in One Cut [pdf] - Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
  • Semantic Segmentation without Annotating Segments [pdf] - Wei Xia, Csaba Domokos, Jian Dong, Loong-Fah Cheong, Shuicheng Yan
A Framework for Shape Analysis via Hilbert Space Embedding [pdf]
Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi

Abstract: We propose a framework for 2D shape analysis using positive definite kernels defined on Kendalls shape mani- fold. Different representations of 2D shapes are known to generate different nonlinear spaces. Due to the nonlinear- ity of these spaces, most existing shape classification algo- rithms resort to nearest neighbor methods and to learning distances on shape spaces. Here, we propose to map shapes on Kendalls shape manifold to a high dimensional Hilbert space where Euclidean geometry applies. To this end, we introduce a kernel on this manifold that permits such a map- ping, and prove its positive definiteness. This kernel lets us extend kernel-based algorithms developed for Euclidean spaces, such as SVM, MKL and kernel PCA, to the shape manifold. We demonstrate the benefits of our approach over the state-of-the-art methods on shape classification, cluster- ing and retrieval.
Similar papers:
  • Accurate Blur Models vs. Image Priors in Single Image Super-resolution [pdf] - Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz Nadler, Anat Levin
  • Learning a Dictionary of Shape Epitomes with Applications to Image Labeling [pdf] - Liang-Chieh Chen, George Papandreou, Alan L. Yuille
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
  • On One-Shot Similarity Kernels: Explicit Feature Maps and Properties [pdf] - Stefanos Zafeiriou, Irene Kotsia
Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging [pdf]
Hae-Gon Jeon, Joon-Young Lee, Yudeog Han, Seon Joo Kim, In So Kweon

Abstract: Finding a good binary sequence is critical in determin- ing the performance of the coded exposure imaging, but pre- vious methods mostly rely on a random search for finding the binary codes, which could easily fail to find good long sequences due to the exponentially growing search space. In this paper, we present a new computationally efficient algorithm for generating the binary sequence, which is es- pecially well suited for longer sequences. We show that the concept of the low autocorrelation binary sequence that has been well exploited in the information theory community can be applied for generating the fluttering patterns of the shutter, propose a new measure of a good binary sequence, and present a new algorithm by modifying the Legendre se- quence for the coded exposure imaging. Experiments using both synthetic and real data show that our new algorithm consistently generates better binary sequences for the coded exposure problem, yielding better deblurring and resolution enhancement results compared to the previous methods for generating the binary codes.
Similar papers:
  • A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration [pdf] - Maxime Meilland, Tom Drummond, Andrew I. Comport
  • A New Adaptive Segmental Matching Measure for Human Activity Recognition [pdf] - Shahriar Shariat, Vladimir Pavlovic
  • Towards Motion Aware Light Field Video for Dynamic Scenes [pdf] - Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
  • Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences [pdf] - Bing Su, Xiaoqing Ding
  • Fibonacci Exposure Bracketing for High Dynamic Range Imaging [pdf] - Mohit Gupta, Daisuke Iso, Shree K. Nayar
Towards Understanding Action Recognition [pdf]
Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, Michael J. Black

Abstract: Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many re- cent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to pro- vide insights based on a systematic performance evalua- tion using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important for example, should we work on improv- ing flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that high- level pose features greatly outperform low/mid level fea- tures; in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features; this suggests it is important to improve optical flow and human detection algorithms. Our analysis and J- HMDB dataset should facilitate a deeper understanding of action recognition algorithms.
Similar papers:
  • From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding [pdf] - Weiyu Zhang, Menglong Zhu, Konstantinos G. Derpanis
  • Action Recognition with Improved Trajectories [pdf] - Heng Wang, Cordelia Schmid
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
Category-Independent Object-Level Saliency Detection [pdf]
Yangqing Jia, Mei Han

Abstract: It is known that purely low-level saliency cues such as frequency does not lead to a good salient object detection result, requiring high-level knowledge to be adopted for successful discovery of task-independent salient objects. In this paper, we propose an efficient way to combine such high-level saliency priors and low-level appearance mod- els. We obtain the high-level saliency prior with the object- ness algorithm to find potential object candidates without the need of category information, and then enforce the con- sistency among the salient regions using a Gaussian MRF with the weights scaled by diverse density that emphasizes the influence of potential foreground pixels. Our model ob- tains saliency maps that assign high scores for the whole salient object, and achieves state-of-the-art performance on benchmark datasets covering various foreground statistics.
Similar papers:
  • Saliency Detection via Dense and Sparse Reconstruction [pdf] - Xiaohui Li, Huchuan Lu, Lihe Zhang, Xiang Ruan, Ming-Hsuan Yang
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf] - Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Latent Task Adaptation with Large-Scale Hierarchies [pdf]
Yangqing Jia, Trevor Darrell

Abstract: Recent years have witnessed the success of large-scale image classification systems that are able to identify ob- jects among thousands of possible labels. However, it is yet unclear how general classifiers such as ones trained on ImageNet can be optimally adapted to specific tasks, each of which only covers a semantically related subset of all the objects in the world. It is inefficient and suboptimal to retrain classifiers whenever a new task is given, and is inapplicable when tasks are not given explicitly, but implic- itly specified as a set of image queries. In this paper we propose a novel probabilistic model that jointly identifies the underlying task and performs prediction with a linear- time probabilistic inference algorithm, given a set of query images from a latent task. We present efficient ways to esti- mate parameters for the model, and an open-source toolbox to train classifiers distributedly at a large scale. Empirical results based on the ImageNet data showed significant per- formance increase over several baseline algorithms.
Similar papers:
  • Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation [pdf] - Basura Fernando, Tinne Tuytelaars
  • Group Norm for Learning Structured SVMs with Unstructured Latent Variables [pdf] - Daozheng Chen, Dhruv Batra, William T. Freeman
  • Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval [pdf] - Cai-Zhi Zhu, Herve Jegou, Shin'Ichi Satoh
  • Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees [pdf] - Oisin Mac Aodha, Gabriel J. Brostow
  • Learning to Share Latent Tasks for Action Recognition [pdf] - Qiang Zhou, Gang Wang, Kui Jia, Qi Zhao
A Global Linear Method for Camera Pose Registration [pdf]
Nianjuan Jiang, Zhaopeng Cui, Ping Tan

Abstract: We present a linear method for global camera pose reg- istration from pairwise relative poses encoded in essential matrices. Our method minimizes an approximate geomet- ric error to enforce the triangular relationship in camera triplets. This formulation does not suffer from the typi- cal unbalanced scale problem in linear methods relying on pairwise translation direction constraints, i.e. an alge- braic error; nor the system degeneracy from collinear mo- tion. In the case of three cameras, our method provides a good linear approximation of the trifocal tensor. It can be directly scaled up to register multiple cameras. The re- sults obtained are accurate for point triangulation and can serve as a good initialization for final bundle adjustment. We evaluate the algorithm performance with different types of data and demonstrate its effectiveness. Our system pro- duces good accuracy, robustness, and outperforms some well-known systems on efficiency.
Similar papers:
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • Street View Motion-from-Structure-from-Motion [pdf] - Bryan Klingner, David Martin, James Roseborough
  • Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf] - Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim
  • Refractive Structure-from-Motion on Underwater Images [pdf] - Anne Jordt-Sedlazeck, Reinhard Koch
  • Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [pdf] - Pierre Moulon, Pascal Monasse, Renaud Marlet
Saliency Detection via Absorbing Markov Chain [pdf]
Bowen Jiang, Lihe Zhang, Huchuan Lu, Chuan Yang, Ming-Hsuan Yang

Abstract: In this paper, we formulate saliency detection via ab- sorbing Markov chain on an image graph model. We joint- ly consider the appearance divergence and spatial distri- bution of salient objects and the background. The virtual boundary nodes are chosen as the absorbing nodes in a Markov chain and the absorbed time from each transient node to boundary absorbing nodes is computed. The ab- sorbed time of transient node measures its global similar- ity with all absorbing nodes, and thus salient objects can be consistently separated from the background when the absorbed time is used as a metric. Since the time from transient node to absorbing nodes relies on the weights on the path and their spatial distance, the background region on the center of image may be salient. We further exploit the equilibrium distribution in an ergodic Markov chain to reduce the absorbed time in the long-range smooth back- ground regions. Extensive experiments on four benchmark datasets demonstrate robustness and efficiency of the pro- posed method against the state-of-the-art methods.
Similar papers:
  • Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf] - Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng
  • Saliency Detection via Dense and Sparse Reconstruction [pdf] - Xiaohui Li, Huchuan Lu, Lihe Zhang, Xiang Ruan, Ming-Hsuan Yang
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf]
Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng

Abstract: The goal of saliency detection is to locate important pix- els or regions in an image which attract humans visual at- tention the most. This is a fundamental task whose output may serve as the basis for further computer vision tasks like segmentation, resizing, tracking and so forth. In this paper we propose a novel salient region detec- tion algorithm by integrating three important visual cues namely uniqueness, focusness and objectness (UFO). In particular, uniqueness captures the appearance-derived vi- sual contrast; focusness reflects the fact that salient regions are often photographed in focus; and objectness helps keep completeness of detected salient regions. While uniqueness has been used for saliency detection for long, it is new to integrate focusness and objectness for this purpose. In fac- t, focusness and objectness both provide important salien- cy information complementary of uniqueness. In our ex- periments using public benchmark datasets, we show that, even with a simple pixel level combination of the three com- ponents, the proposed approach yields significant improve- ment compared with previously reported methods.
Similar papers:
  • Saliency Detection via Absorbing Markov Chain [pdf] - Bowen Jiang, Lihe Zhang, Huchuan Lu, Chuan Yang, Ming-Hsuan Yang
  • Saliency Detection: A Boolean Map Approach [pdf] - Jianming Zhang, Stan Sclaroff
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
Complementary Projection Hashing [pdf]
Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li

Abstract:
Similar papers:
  • Fast Subspace Search via Grassmannian Based Hashing [pdf] - Xu Wang, Stefan Atev, John Wright, Gilad Lerman
  • Supervised Binary Hash Code Learning with Jensen Shannon Divergence [pdf] - Lixin Fan
  • Learning Hash Codes with Listwise Supervision [pdf] - Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
  • A General Two-Step Approach to Learning-Based Hashing [pdf] - Guosheng Lin, Chunhua Shen, David Suter, Anton van_den_Hengel
  • Large-Scale Video Hashing via Structure Learning [pdf] - Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
Human Attribute Recognition by Rich Appearance Dictionary [pdf]
Jungseock Joo, Shuo Wang, Song-Chun Zhu

Abstract: We present a part-based approach to the problem of hu- man attribute recognition from a single image of a human body. To recognize the attributes of human from the body parts, it is important to reliably detect the parts. This is a challenging task due to the geometric variation such as articulation and view-point changes as well as the appear- ance variation of the parts arisen from versatile clothing types. The prior works have primarily focused on handling geometric variation by relying on pre-trained part detectors or pose estimators, which require manual part annotation, but the appearance variation has been relatively neglected in these works. This paper explores the importance of the appearance variation, which is directly related to the main task, attribute recognition. To this end, we propose to learn a rich appearance part dictionary of human with signifi- cantly less supervision by decomposing image lattice into overlapping windows at multiscale and iteratively refining local appearance templates. We also present quantitative results in which our proposed method outperforms the ex- isting approaches.
Similar papers:
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction [pdf] - Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
  • Strong Appearance and Expressive Spatial Models for Human Pose Estimation [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
Refractive Structure-from-Motion on Underwater Images [pdf]
Anne Jordt-Sedlazeck, Reinhard Koch

Abstract: In underwater environments, cameras need to be con- fined in an underwater housing, viewing the scene through a piece of glass. In case of flat port underwater housings, light rays entering the camera housing are refracted twice, due to different medium densities of water, glass, and air. This causes the usually linear rays of light to bend and the commonly used pinhole camera model to be invalid. When using the pinhole camera model without explicitly model- ing refraction in Structure-from-Motion (SfM) methods, a systematic model error occurs. Therefore, in this paper, we propose a system for computing camera path and 3D points with explicit incorporation of refraction using new meth- ods for pose estimation. Additionally, a new error function is introduced for non-linear optimization, especially bundle adjustment. The proposed method allows to increase recon- struction accuracy and is evaluated in a set of experiments, where the proposed methods performance is compared to SfM with the perspective camera model.
Similar papers:
  • Real-World Normal Map Capture for Nearly Flat Reflective Surfaces [pdf] - Bastien Jacquet, Christian Hane, Kevin Koser, Marc Pollefeys
  • SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels [pdf] - Jianxiong Xiao, Andrew Owens, Antonio Torralba
  • Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf] - Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim
  • Street View Motion-from-Structure-from-Motion [pdf] - Bryan Klingner, David Martin, James Roseborough
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
Efficient 3D Scene Labeling Using Fields of Trees [pdf]
Olaf Kahler, Ian Reid

Abstract: We address the problem of 3D scene labeling in a struc- tured learning framework. Unlike previous work which uses structured Support Vector Machines, we employ the recently described Decision Tree Field and Regression Tree Field frameworks, which learn the unary and binary terms of a Conditional Random Field from training data. We show this has significant advantages in terms of inference speed, while maintaining similar accuracy. We also demonstrate empirically the importance for overall labeling accuracy of features that make use of prior knowledge about the coarse scene layout such as the location of the ground plane. We show how this coarse layout can be estimated by our frame- work automatically, and that this information can be used to bootstrap improved accuracy in the detailed labeling.
Similar papers:
  • Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees [pdf] - Oisin Mac Aodha, Gabriel J. Brostow
  • Alternating Regression Forests for Object Detection and Pose Estimation [pdf] - Samuel Schulter, Christian Leistner, Paul Wohlhart, Peter M. Roth, Horst Bischof
  • Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors [pdf] - Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
  • Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees [pdf] - Aastha Jain, Shuanak Chatterjee, Rene Vidal
  • Potts Model, Parametric Maxflow and K-Submodular Functions [pdf] - Igor Gridchyn, Vladimir Kolmogorov
From Where and How to What We See [pdf]
S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath

Abstract: Eye movement studies have confirmed that overt atten- tion is highly biased towards faces and text regions in im- ages. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data ob- tained in an image into different coherent groups and sub- sequently models the likelihood of the clusters containing faces and text using a fully connected Markov Random Field (MRF). Given the eye tracking data from a test image, it pre- dicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object de- tectors for faces and text. The hybrid eye position/object de- tector approach achieves better detection performance and reduced computation time compared to using only the ob- ject detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.
Similar papers:
  • Recognizing Text with Perspective Distortion in Natural Scenes [pdf] - Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
  • Scene Text Localization and Recognition with Oriented Stroke Detection [pdf] - Lukas Neumann, Jiri Matas
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
  • PhotoOCR: Reading Text in Uncontrolled Conditions [pdf] - Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven
  • Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors [pdf] - Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
Drosophila Embryo Stage Annotation Using Label Propagation [pdf]
Tomas Kazmar, Evgeny Z. Kvon, Alexander Stark, Christoph H. Lampert

Abstract: In this work we propose a system for automatic classi- fication of Drosophila embryos into developmental stages. While the system is designed to solve an actual problem in biological research, we believe that the principle underly- ing it is interesting not only for biologists, but also for re- searchers in computer vision. The main idea is to combine two orthogonal sources of information: one is a classifier trained on strongly invari- ant features, which makes it applicable to images of very different conditions, but also leads to rather noisy predic- tions. The other is a label propagation step based on a more powerful similarity measure that however is only consistent within specific subsets of the data at a time. In our biological setup, the information sources are the shape and the staining patterns of embryo images. We show experimentally that while neither of the methods can be used by itself to achieve satisfactory results, their combina- tion achieves prediction quality comparable to human per- formance.
Similar papers:
  • Multi-stage Contextual Deep Learning for Pedestrian Detection [pdf] - Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
  • Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection [pdf] - Tianfu Wu, Song-Chun Zhu
  • Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks [pdf] - Mojtaba Seyedhosseini, Mehdi Sajjadi, Tolga Tasdizen
  • Adapting Classification Cascades to New Domains [pdf] - Vidit Jain, Sachin Sudhakar Farfade
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
Internet Based Morphable Model [pdf]
Ira Kemelmacher-Shlizerman

Abstract: In this paper we present a new concept of building a mor- phable model directly from photos on the Internet. Mor- phable models have shown very impressive results more than a decade ago, and could potentially have a huge im- pact on all aspects of face modeling and recognition. One of the challenges, however, is to capture and register 3D laser scans of large number of people and facial expres- sions. Nowadays, there are enormous amounts of face pho- tos on the Internet, large portion of which has semantic la- bels. We propose a framework to build a morphable model directly from photos, the framework includes dense regis- tration of Internet photos, as well as, new single view shape reconstruction and modification algorithms.
Similar papers:
  • Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model [pdf] - Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, Dimitris N. Metaxas
  • Accurate and Robust 3D Facial Capture Using a Single RGBD Camera [pdf] - Yen-Lin Chen, Hsiang-Tao Wu, Fuhao Shi, Xin Tong, Jinxiang Chai
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
  • High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination [pdf] - Yudeog Han, Joon-Young Lee, In So Kweon
  • Viewing Real-World Faces in 3D [pdf] - Tal Hassner
Modifying the Memorability of Face Photographs [pdf]
Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva

Abstract: Contemporary life bombards us with many new images of faces every day, which poses non-trivial constraints on human memory. The vast majority of face photographs are intended to be remembered, either because of personal rel- evance, commercial interests or because the pictures were deliberately designed to be memorable. Can we make a por- trait more memorable or more forgettable automatically? Here, we provide a method to modify the memorability of individual face photographs, while keeping the identity and other facial traits (e.g. age, attractiveness, and emotional magnitude) of the individual fixed. We show that face pho- tographs manipulated to be more memorable (or more for- gettable) are indeed more often remembered (or forgotten) in a crowd-sourcing experiment with an accuracy of 74%. Quantifying and modifying the memorability of a face lends itself to many useful applications in computer vision and graphics, such as mnemonic aids for learning, photo editing applications for social networks and tools for de- signing memorable advertisements.
Similar papers:
  • Pose-Free Facial Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model [pdf] - Xiang Yu, Junzhou Huang, Shaoting Zhang, Wang Yan, Dimitris N. Metaxas
  • Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation [pdf] - Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution [pdf]
Martin Kiechle, Simon Hawe, Martin Kleinsteuber

Abstract: High-resolution depth maps can be inferred from low- resolution depth measurements and an additional high- resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assump- tion that the co-supports of corresponding bimodal image structures are aligned when computed by a suitable pair of analysis operators. No analytic form of such operators ex- ist and we propose a method for learning them from a set of registered training signals. This learning process is done offline and returns a bimodal analysis operator that is uni- versally applicable to natural scenes. We use this to exploit the bimodal co-sparse analysis model as a prior for solving inverse problems, which leads to an efficient algorithm for depth map super-resolution.
Similar papers:
  • Live Metric 3D Reconstruction on Mobile Phones [pdf] - Petri Tanskanen, Kalin Kolev, Lorenz Meier, Federico Camposeco, Olivier Saurer, Marc Pollefeys
  • Depth from Combining Defocus and Correspondence Using Light-Field Cameras [pdf] - Michael W. Tao, Sunil Hadap, Jitendra Malik, Ravi Ramamoorthi
  • First-Photon Imaging: Scene Depth and Reflectance Acquisition from One Detected Photon per Pixel [pdf] - Ahmed Kirmani, Dongeek Shin, Dheera Venkatraman, Franco N. C. Wong, Vivek K Goyal
  • Semi-dense Visual Odometry for a Monocular Camera [pdf] - Jakob Engel, Jurgen Sturm, Daniel Cremers
  • Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf] - David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
3D Scene Understanding by Voxel-CRF [pdf]
Byung-Soo Kim, Pushmeet Kohli, Silvio Savarese

Abstract: Scene understanding is an important yet very challeng- ing problem in computer vision. In the past few years, re- searchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with dif- ferent depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth mea- surements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D recon- struction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (vox- els) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of par- tial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1 and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appeali
Similar papers:
  • Holistic Scene Understanding for 3D Object Detection with RGBD Cameras [pdf] - Dahua Lin, Sanja Fidler, Raquel Urtasun
  • Support Surface Prediction in Indoor Scenes [pdf] - Ruiqi Guo, Derek Hoiem
  • Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors [pdf] - Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
  • STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data [pdf] - Carl Yuheng Ren, Victor Prisacariu, David Murray, Ian Reid
  • Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences [pdf] - Frank Steinbrucker, Christian Kerl, Daniel Cremers
Curvature-Aware Regularization on Riemannian Submanifolds [pdf]
Kwang In Kim, James Tompkin, Christian Theobalt

Abstract: One fundamental assumption in object recognition as well as in other computer vision and pattern recognition problems is that the data generation process lies on a man- ifold and that it respects the intrinsic geometry of the man- ifold. This assumption is held in several successful al- gorithms for diffusion and regularization, in particular, in graph-Laplacian-based algorithms. We claim that the per- formance of existing algorithms can be improved if we ad- ditionally account for how the manifold is embedded within the ambient space, i.e., if we consider the extrinsic geom- etry of the manifold. We present a procedure for charac- terizing the extrinsic (as well as intrinsic) curvature of a manifold M which is described by a sampled point cloud in a high-dimensional Euclidean space. Once estimated, we use this characterization in general diffusion and regular- ization on M , and form a new regularizer on a point cloud. The resulting re-weighted graph Laplacian demonstrates su- perior performance over classical graph Laplacian in semi- supervised learning and spectral clustering.
Similar papers:
  • Total Variation Regularization for Functions with Values in a Manifold [pdf] - Jan Lellmann, Evgeny Strekalovskiy, Sabrina Koetter, Daniel Cremers
  • Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects [pdf] - Michael Weinmann, Aljosa Osep, Roland Ruiters, Reinhard Klein
  • Partial Enumeration and Curvature Regularization [pdf] - Carl Olsson, Johannes Ulen, Yuri Boykov, Vladimir Kolmogorov
  • Shortest Paths with Curvature and Torsion [pdf] - Petter Strandmark, Johannes Ulen, Fredrik Kahl, Leo Grady
  • On the Mean Curvature Flow on Graphs with Applications in Image and Manifold Processing [pdf] - Abdallah El_Chakik, Abderrahim Elmoataz, Ahcene Sadi
Dynamic Scene Deblurring [pdf]
Tae Hyun Kim, Byeongjoo Ahn, Kyoung Mu Lee

Abstract: Most conventional single image deblurring methods as- sume that the underlying scene is static and the blur is caused by only camera shake. In this paper, in contrast to this restrictive assumption, we address the deblurring problem of general dynamic scenes which contain multi- ple moving objects as well as camera shake. In case of dynamic scenes, moving objects and background have dif- ferent blur motions, so the segmentation of the motion blur is required for deblurring each distinct blur motion accu- rately. Thus, we propose a novel energy model designed with the weighted sum of multiple blur data models, which estimates different motion blurs and their associated pixel- wise weights, and resulting sharp image. In this framework, the local weights are determined adaptively and get high values when the corresponding data models have high data fidelity. And, the weight information is used for the seg- mentation of the motion blur. Non-local regularization of weights are also incorporated to produce more reliable seg- mentation results. A convex optimization-based method is used for the solution of the proposed energy model. Exper- imental results demonstrate that our method outperforms conventional approaches in deblurring both dynamic scenes and static scenes.
Similar papers:
  • A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration [pdf] - Maxime Meilland, Tom Drummond, Andrew I. Comport
  • Nonparametric Blind Super-resolution [pdf] - Tomer Michaeli, Michal Irani
  • Accurate Blur Models vs. Image Priors in Single Image Super-resolution [pdf] - Netalee Efrat, Daniel Glasner, Alexander Apartsin, Boaz Nadler, Anat Levin
  • Forward Motion Deblurring [pdf] - Shicheng Zheng, Li Xu, Jiaya Jia
  • Deblurring by Example Using Dense Correspondence [pdf] - Yoav Hacohen, Eli Shechtman, Dani Lischinski
Fingerspelling Recognition with Semi-Markov Conditional Random Fields [pdf]
Taehwan Kim, Greg Shakhnarovich, Karen Livescu

Abstract: Recognition of gesture sequences is in general a very dif- ficult problem, but in certain domains the difficulty may be mitigated by exploiting the domains grammar. One such grammatically constrained gesture sequence domain is sign language. In this paper we investigate the case of finger- spelling recognition, which can be very challenging due to the quick, small motions of the fingers. Most prior work on this task has assumed a closed vocabulary of fingerspelled words; here we study the more natural open-vocabulary case, where the only domain knowledge is the possible fin- gerspelled letters and statistics of their sequences. We de- velop a semi-Markov conditional model approach, where feature functions are defined over segments of video and their corresponding letter labels. We use classifiers of let- ters and linguistic handshape features, along with expected motion profiles, to define segmental feature functions. This approach improves letter error rate (Levenshtein distance between hypothesized and correct letter sequences) from 16.3% using a hidden Markov model baseline to 11.6% us- ing the proposed semi-Markov model.
Similar papers:
  • PhotoOCR: Reading Text in Uncontrolled Conditions [pdf] - Alessandro Bissacco, Mark Cummins, Yuval Netzer, Hartmut Neven
  • Translating Video Content to Natural Language Descriptions [pdf] - Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, Bernt Schiele
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data [pdf] - Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
  • Efficient Hand Pose Estimation from a Single Depth Image [pdf] - Chi Xu, Li Cheng
  • Multi-scale Topological Features for Hand Posture Representation and Analysis [pdf] - Kaoning Hu, Lijun Yin
Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf]
Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim

Abstract: We present a new multi-view 3D Euclidean reconstruc- tion method for arbitrary uncalibrated radially-symmetric cameras, which needs no calibration or any camera model parameters other than radial symmetry. It is built on the radial 1D camera model [25], a unified mathematical ab- straction to different types of radially-symmetric cameras. We formulate the problem of multi-view reconstruction for radial 1D cameras as a matrix rank minimization prob- lem. Efficient implementation based on alternating direc- tion continuation is proposed to handle scalability issue for real-world applications. Our method applies to a wide range of omnidirectional cameras including both dioptric and catadioptric (central and non-central) cameras. Ad- ditionally, our method deals with complete and incomplete measurements under a unified framework elegantly. Exper- iments on both synthetic and real images from various types of cameras validate the superior performance of our new method, in terms of numerical accuracy and robustness.
Similar papers:
  • Refractive Structure-from-Motion on Underwater Images [pdf] - Anne Jordt-Sedlazeck, Reinhard Koch
  • Unsupervised Intrinsic Calibration from a Single Frame Using a "Plumb-Line" Approach [pdf] - R. Melo, M. Antunes, J.P. Barreto, G. Falcao, N. Goncalves
  • An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points [pdf] - Lilian Calvet, Pierre Gurdjos
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf] - Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf]
Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee

Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional op- tical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly re- duces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution that finds the optical flow, as well as the weights is proposed. Comparative experimental results on the Middlebury opti- cal flow benchmark show that the proposed method using the complementary data models outperforms the state-of- the art methods.
Similar papers:
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation [pdf] - Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
  • A General Dense Image Matching Framework Combining Direct and Feature-Based Costs [pdf] - Jim Braux-Zin, Romain Dupont, Adrien Bartoli
  • DeepFlow: Large Displacement Optical Flow with Deep Matching [pdf] - Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
First-Photon Imaging: Scene Depth and Reflectance Acquisition from One Detected Photon per Pixel [pdf]
Ahmed Kirmani, Dongeek Shin, Dheera Venkatraman, Franco N. C. Wong, Vivek K Goyal

Abstract: Capturing depth and reflectance images using active il- lumination despite the detection of little light backscattered from the scene has wide-ranging applications in computer vision. Conventionally, even with single-photon detectors, a large number of detected photons is needed at each pixel location to mitigate Poisson noise. Here, using only the first detected photon at each pixel location, we capture both the 3D structure and reflectivity of the scene, demonstrating greater photon efficiency than previous work. Our com- putational imager combines physically accurate photon- counting statistics with exploitation of spatial correlations present in real-world scenes. We experimentally achieve millimeter-accurate, sub-pulse width depth resolution and 4-bit reflectivity contrast, simultaneously, using only the first photon detection per pixel, even in the presence of high background noise. Our technique enables rapid, low-power, and noise-tolerant active optical imaging.
Similar papers:
  • Semi-dense Visual Odometry for a Monocular Camera [pdf] - Jakob Engel, Jurgen Sturm, Daniel Cremers
  • Towards Motion Aware Light Field Video for Dynamic Scenes [pdf] - Salil Tambe, Ashok Veeraraghavan, Amit Agrawal
  • Structured Light in Sunlight [pdf] - Mohit Gupta, Qi Yin, Shree K. Nayar
  • A Joint Intensity and Depth Co-sparse Analysis Model for Depth Map Super-resolution [pdf] - Martin Kiechle, Simon Hawe, Martin Kleinsteuber
  • Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf] - David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
Street View Motion-from-Structure-from-Motion [pdf]
Bryan Klingner, David Martin, James Roseborough

Abstract: We describe a structure-from-motion framework that handles generalized cameras, such as moving rolling- shutter cameras, and works at an unprecedented scale billions of images covering millions of linear kilometers of roadsby exploiting a good relative pose prior along vehicle paths. We exhibit a planet-scale, appearance- augmented point cloud constructed with our framework and demonstrate its practical use in correcting the pose of a street-level image collection.
Similar papers:
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Refractive Structure-from-Motion on Underwater Images [pdf] - Anne Jordt-Sedlazeck, Reinhard Koch
  • SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels [pdf] - Jianxiong Xiao, Andrew Owens, Antonio Torralba
  • A Unified Rolling Shutter and Motion Blur Model for 3D Visual Registration [pdf] - Maxime Meilland, Tom Drummond, Andrew I. Comport
  • Rolling Shutter Stereo [pdf] - Olivier Saurer, Kevin Koser, Jean-Yves Bouguet, Marc Pollefeys
Direct Optimization of Frame-to-Frame Rotation [pdf]
Laurent Kneip, Simon Lynen

Abstract: This work makes use of a novel, recently proposed epipo- lar constraint for computing the relative pose between two calibrated images. By enforcing the coplanarity of epipo- lar plane normal vectors, it constrains the three degrees of freedom of the relative rotation between two camera views directlyindependently of the translation. The present paper shows how the approach can be ex- tended to n points, and translated into an efficient eigen- value minimization over the three rotational degrees of free- dom. Each iteration in the non-linear optimization has con- stant execution time, independently of the number of fea- tures. Two global optimization approaches are proposed. The first one consists of an efficient Levenberg-Marquardt scheme with randomized initial value, which already leads to stable and accurate results. The second scheme consists of a globally optimal branch-and-bound algorithm based on a bound on the eigenvalue variation derived from sym- metric eigenvalue-perturbation theory. Analysis of the cost function reveals insights into the nature of a specific rela- tive pose problem, and outlines the complexity under differ- ent conditions. The algorithm shows state-of-the-art perfor- mance w.r.t. essential-matrix based solutions, and a frame- to-frame application to a video sequence immediately leads to an alternative, real-time visual odometry solution. Note: All algorithms in this paper are made available in the OpenGV library. Please visit http://laurentkneip.
Similar papers:
  • Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf] - Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
  • Go-ICP: Solving 3D Registration Efficiently and Globally Optimally [pdf] - Jiaolong Yang, Hongdong Li, Yunde Jia
  • Efficient and Robust Large-Scale Rotation Averaging [pdf] - Avishek Chatterjee, Venu Madhav Govindu
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion [pdf] - Pierre Moulon, Pascal Monasse, Renaud Marlet
Shufflets: Shared Mid-level Parts for Fast Object Detection [pdf]
Iasonas Kokkinos

Abstract: We present a method to identify and exploit structures that are shared across different object categories, by us- ing sparse coding to learn a shared basis for the part and root templates of Deformable Part Models (DPMs). Our first contribution consists in using Shift-Invariant Sparse Coding (SISC) to learn mid-level elements that can translate during coding. This results in systematically better approximations than those attained using standard sparse coding. To emphasize that the learned mid-level structures are shiftable we call them shufflets. Our second contribution consists in using the resulting score to construct probabilistic upper bounds to the exact template scores, instead of taking them at face value as is common in current works. We integrate shufflets in Dual- Tree Branch-and-Bound and cascade-DPMs and demon- strate that we can achieve a substantial acceleration, with practically no loss in performance.
Similar papers:
  • Low-Rank Sparse Coding for Image Classification [pdf] - Tianzhu Zhang, Bernard Ghanem, Si Liu, Changsheng Xu, Narendra Ahuja
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Go-ICP: Solving 3D Registration Efficiently and Globally Optimally [pdf] - Jiaolong Yang, Hongdong Li, Yunde Jia
  • Abnormal Event Detection at 150 FPS in MATLAB [pdf] - Cewu Lu, Jianping Shi, Jiaya Jia
  • Higher Order Matching for Consistent Multiple Target Tracking [pdf] - Chetan Arora, Amir Globerson
A New Image Quality Metric for Image Auto-denoising [pdf]
Xiangfei Kong, Kuan Li, Qingxiong Yang, Liu Wenyin, Ming-Hsuan Yang

Abstract: This paper proposes a new non-reference image qual- ity metric that can be adopted by the state-of-the-art im- age/video denoising algorithms for auto-denoising. The proposed metric is extremely simple and can be imple- mented in four lines of Matlab code1. The basic assumption employed by the proposed metric is that the noise should be independent of the original image. A direct measure- ment of this dependence is, however, impractical due to the relatively low accuracy of existing denoising method. The proposed metric thus aims at maximizing the structure sim- ilarity between the input noisy image and the estimated im- age noise around homogeneous regions and the structure similarity between the input noisy image and the denoised image around highly-structured regions, and is computed as the linear correlation coefficient of the two correspond- ing structure similarity maps. Numerous experimental re- sults demonstrate that the proposed metric not only out- performs the current state-of-the-art non-reference quality metric quantitatively and qualitatively, but also better main- tains temporal coherence when used for video denoising.
Similar papers:
  • A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models [pdf] - Peihua Li, Qilong Wang, Lei Zhang
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
  • Single-Patch Low-Rank Prior for Non-pointwise Impulse Noise Removal [pdf] - Ruixuan Wang, Emanuele Trucco
  • Joint Noise Level Estimation from Personal Photo Collections [pdf] - Yichang Shih, Vivek Kwatra, Troy Chinen, Hui Fang, Sergey Ioffe
Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf]
Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof

Abstract: In this paper, we raise important issues concerning the evaluation complexity of existing Mahalanobis metric learning methods. The complexity scales linearly with the size of the dataset. This is especially cumbersome on large scale or for real-time applications with limited time bud- get. To alleviate this problem we propose to represent the dataset by a fixed number of discriminative prototypes. In particular, we introduce a new method that jointly chooses the positioning of prototypes and also optimizes the Ma- halanobis distance metric with respect to these. We show that choosing the positioning of the prototypes and learning the metric in parallel leads to a drastically reduced eval- uation effort while maintaining the discriminative essence of the original dataset. Moreover, for most problems our method performing k-nearest prototype (k-NP) classifica- tion on the condensed dataset leads to even better general- ization compared to k-NN classification using all data. Re- sults on a variety of challenging benchmarks demonstrate the power of our method. These include standard machine learning datasets as well as the challenging Public Fig- ures Face Database. On the competitive machine learning benchmarks we are comparable to the state-of-the-art while being more efficient. On the face benchmark we clearly out- perform the state-of-the-art in Mahalanobis metric learning with drastically reduced evaluation effort.
Similar papers:
  • A Max-Margin Perspective on Sparse Representation-Based Classification [pdf] - Zhaowen Wang, Jianchao Yang, Nasser Nasrabadi, Thomas Huang
  • Quadruplet-Wise Image Similarity Learning [pdf] - Marc T. Law, Nicolas Thome, Matthieu Cord
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • Ensemble Projection for Semi-supervised Image Classification [pdf] - Dengxin Dai, Luc Van_Gool
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
Attribute Adaptation for Personalized Image Search [pdf]
Adriana Kovashka, Kristen Grauman

Abstract: Current methods learn monolithic attribute predictors, with the assumption that a single model is sufficient to re- flect human understanding of a visual attribute. However, in reality, humans vary in how they perceive the association between a named property and image content. For example, two people may have slightly different internal models for what makes a shoe look formal, or they may disagree on which of two scenes looks more cluttered. Rather than discount these differences as noise, we propose to learn user-specific attribute models. We adapt a generic model trained with annotations from multiple users, tailoring it to satisfy user-specific labels. Furthermore, we propose novel techniques to infer user-specific labels based on tran- sitivity and contradictions in the users search history. We demonstrate that adapted attributes improve accuracy over both existing monolithic models as well as models that learn from scratch with user-specific data alone. In addition, we show how adapted attributes are useful to personalize im- age search, whether with binary or relative attributes.
Similar papers:
  • A Deep Sum-Product Architecture for Robust Facial Attributes Analysis [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Attribute Pivots for Guiding Relevance Feedback in Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
Attribute Pivots for Guiding Relevance Feedback in Image Search [pdf]
Adriana Kovashka, Kristen Grauman

Abstract: In interactive image search, a user iteratively refines his results by giving feedback on exemplar images. Active se- lection methods aim to elicit useful feedback, but traditional approaches suffer from expensive selection criteria and cannot predict informativeness reliably due to the impreci- sion of relevance feedback. To address these drawbacks, we propose to actively select pivot exemplars for which feed- back in the form of a visual comparison will most reduce the systems uncertainty. For example, the system might ask, Is your target image more or less crowded than this im- age? Our approach relies on a series of binary search trees in relative attribute space, together with a selection function that predicts the information gain were the user to compare his envisioned target to the next node deeper in a given attributes tree. It makes interactive search more effi- cient than existing strategiesboth in terms of the systems selection time as well as the users feedback effort.
Similar papers:
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • Attribute Adaptation for Personalized Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • Implied Feedback: Learning Nuances of User Behavior in Image Search [pdf] - Devi Parikh, Kristen Grauman
Pose Estimation with Unknown Focal Length Using Points, Directions and Lines [pdf]
Yubin Kuang, Kalle Astrom

Abstract: In this paper, we study the geometry problems of esti- mating camera pose with unknown focal length using com- bination of geometric primitives. We consider points, lines and also rich features such as quivers, i.e. points with one or more directions. We formulate the problems as polyno- mial systems where the constraints for different primitives are handled in a unified way. We develop efficient poly- nomial solvers for each of the derived cases with different combinations of primitives. The availability of these solvers enables robust pose estimation with unknown focal length for wider classes of features. Such rich features allow for fewer feature correspondences and generate larger inlier sets with higher probability. We demonstrate in synthetic experiments that our solvers are fast and numerically sta- ble. For real images, we show that our solvers can be used in RANSAC loops to provide good initial solutions.
Similar papers:
  • Lifting 3D Manhattan Lines from a Single Image [pdf] - Srikumar Ramalingam, Matthew Brand
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf] - Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
  • A Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration [pdf] - Adrien Bartoli, Daniel Pizarro, Toby Collins
Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf]
Zuzana Kukelova, Martin Bujnak, Tomas Pajdla

Abstract: The problem of determining the absolute position and orien- tation of a camera from a set of 2D-to-3D point correspon- dences is one of the most important problems in computer vision with a broad range of applications. In this paper we present a new solution to the absolute pose problem for camera with unknown radial distortion and unknown focal length from five 2D-to-3D point correspondences. Our new solver is numerically more stable, more accurate, and sig- nificantly faster than the existing state-of-the-art minimal four point absolute pose solvers for this problem. Moreover, our solver results in less solutions and can handle larger radial distortions. The new solver is straightforward and uses only simple concepts from linear algebra. Therefore it is simpler than the state-of-the-art Gro bner basis solvers. We compare our new solver with the existing state-of-the- art solvers and show its usefulness on synthetic and real datasets. 1
Similar papers:
  • Direct Optimization of Frame-to-Frame Rotation [pdf] - Laurent Kneip, Simon Lynen
  • Revisiting the PnP Problem: A Fast, General and Optimal Solution [pdf] - Yinqiang Zheng, Yubin Kuang, Shigeki Sugimoto, Kalle Astrom, Masatoshi Okutomi
  • A Robust Analytical Solution to Isometric Shape-from-Template with Focal Length Calibration [pdf] - Adrien Bartoli, Daniel Pizarro, Toby Collins
  • Multi-view 3D Reconstruction from Uncalibrated Radially-Symmetric Cameras [pdf] - Jae-Hak Kim, Yuchao Dai, Hongdong Li, Xin Du, Jonghyuk Kim
  • Pose Estimation with Unknown Focal Length Using Points, Directions and Lines [pdf] - Yubin Kuang, Kalle Astrom
Discriminative Label Propagation for Multi-object Tracking with Sporadic Appearance Features [pdf]
K.C. Amit Kumar, Christophe De_Vleeschouwer

Abstract: Given a set of plausible detections, detected at each time instant independently, we investigate how to associate them across time. This is done by propagating labels on a set of graphs that capture how the spatio-temporal and the ap- pearance cues promote the assignment of identical or dis- tinct labels to a pair of nodes. The graph construction is driven by the locally linear embedding (LLE) of either the spatio-temporal or the appearance features associated to the detections. Interestingly, the neighborhood of a node in each appearance graph is defined to include all nodes for which the appearance feature is available (except the ones that coexist at the same time). This allows to connect the nodes that share the same appearance even if they are tem- porally distant, which gives our framework the uncommon ability to exploit the appearance features that are available only sporadically along the sequence of detections. Once the graphs have been defined, the multi-object tracking is formulated as the problem of finding a label as- signment that is consistent with the constraints captured by each of the graphs. This results into a difference of con- vex program that can be efficiently solved. Experiments are performed on a basketball and several well-known pedes- trian datasets in order to validate the effectiveness of the proposed solution.
Similar papers:
  • Conservation Tracking [pdf] - Martin Schiegg, Philipp Hanslovsky, Bernhard X. Kausler, Lars Hufnagel, Fred A. Hamprecht
  • Combining the Right Features for Complex Event Recognition [pdf] - Kevin Tang, Bangpeng Yao, Li Fei-Fei, Daphne Koller
  • The Way They Move: Tracking Multiple Targets with Similar Appearance [pdf] - Caglayan Dicle, Octavia I. Camps, Mario Sznaier
  • GrabCut in One Cut [pdf] - Meng Tang, Lena Gorelick, Olga Veksler, Yuri Boykov
  • Learning Graphs to Match [pdf] - Minsu Cho, Karteek Alahari, Jean Ponce
Camera Alignment Using Trajectory Intersections in Unsynchronized Videos [pdf]
Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath

Abstract: This paper addresses the novel and challenging prob- lem of aligning camera views that are unsynchronized by low and/or variable frame rates using object trajec- tories. Unlike existing trajectory-based alignment meth- ods, our method does not require frame-to-frame synchro- nization. Instead, we propose using the intersections of corresponding object trajectories to match views. To find these intersections, we introduce a novel trajectory match- ing algorithm based on matching Spatio-Temporal Con- text Graphs (STCGs). These graphs represent the distances between trajectories in time and space within a view, and are matched to an STCG from another view to find the cor- responding trajectories. To the best of our knowledge, this is one of the first attempts to align views that are unsyn- chronized with variable frame rates. The results on simu- lated and real-world datasets show trajectory intersections are a viable feature for camera alignment, and that the tra- jectory matching method performs well in real-world sce- narios.
Similar papers:
  • Inferring "Dark Matter" and "Dark Energy" from Videos [pdf] - Dan Xie, Sinisa Todorovic, Song-Chun Zhu
  • Joint Subspace Stabilization for Stereoscopic Video [pdf] - Feng Liu, Yuzhen Niu, Hailin Jin
  • Online Motion Segmentation Using Dynamic Label Propagation [pdf] - Ali Elqursh, Ahmed Elgammal
  • Robust Trajectory Clustering for Motion Segmentation [pdf] - Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
  • Video Co-segmentation for Meaningful Action Extraction [pdf] - Jiaming Guo, Zhuwen Li, Loong-Fah Cheong, Steven Zhiying Zhou
From Subcategories to Visual Composites: A Multi-level Framework for Object Detection [pdf]
Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori

Abstract: The appearance of an object changes profoundly with pose, camera view and interactions of the object with other objects in the scene. This makes it challenging to learn de- tectors based on an object-level label (e.g., car). We pos- tulate that having a richer set of labelings (at different levels of granularity) for an object, including finer-grained sub- categories, consistent in appearance and view, and higher- order composites contextual groupings of objects consis- tent in their spatial layout and appearance, can significantly alleviate these problems. However, obtaining such a rich set of annotations, including annotation of an exponentially growing set of object groupings, is simply not feasible. We propose a weakly-supervised framework for object detection where we discover subcategories and the com- posites automatically with only traditional object-level cat- egory labels as input. To this end, we first propose an exemplar-SVM-based clustering approach, with latent SVM refinement, that discovers a variable length set of discrim- inative subcategories for each object class. We then de- velop a structured model for object detection that captures interactions among object subcategories and automatically discovers semantically meaningful and discriminatively rel- evant visual composites. We show that this model produces state-of-the-art performance on UIUC phrase object detec- tion benchmark.
Similar papers:
  • Holistic Scene Understanding for 3D Object Detection with RGBD Cameras [pdf] - Dahua Lin, Sanja Fidler, Raquel Urtasun
  • NEIL: Extracting Visual Knowledge from Web Data [pdf] - Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta
  • Mining Motion Atoms and Phrases for Complex Action Recognition [pdf] - Limin Wang, Yu Qiao, Xiaoou Tang
  • Hierarchical Part Matching for Fine-Grained Visual Categorization [pdf] - Lingxi Xie, Qi Tian, Richang Hong, Shuicheng Yan, Bo Zhang
  • Segmentation Driven Object Detection with Fisher Vectors [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
Quadruplet-Wise Image Similarity Learning [pdf]
Marc T. Law, Nicolas Thome, Matthieu Cord

Abstract: This paper introduces a novel similarity learning frame- work. Working with inequality constraints involving quadruplets of images, our approach aims at efficiently modeling similarity from rich or complex semantic label relationships. From these quadruplet-wise constraints, we propose a similarity learning framework relying on a con- vex optimization scheme. We then study how our metric learning scheme can exploit specific class relationships, such as class ranking (relative attributes), and class tax- onomy. We show that classification using the learned met- rics gets improved performance over state-of-the-art meth- ods on several datasets. We also evaluate our approach in a new application to learn similarities between webpage screenshots in a fully unsupervised way.
Similar papers:
  • Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf] - Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
  • Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes [pdf] - Sukrit Shankar, Joan Lasenby, Roberto Cipolla
Detecting Curved Symmetric Parts Using a Deformable Disc Model [pdf]
Tom Sie Ho Lee, Sanja Fidler, Sven Dickinson

Abstract: Symmetry is a powerful shape regularity thats been ex- ploited by perceptual grouping researchers in both human and computer vision to recover part structure from an im- age without a priori knowledge of scene content. Draw- ing on the concept of a medial axis, defined as the locus of centers of maximal inscribed discs that sweep out a sym- metric part, we model part recovery as the search for a sequence of deformable maximal inscribed disc hypothe- ses generated from a multiscale superpixel segmentation, a framework proposed by [13]. However, we learn affinities between adjacent superpixels in a space thats invariant to bending and tapering along the symmetry axis, enabling us to capture a wider class of symmetric parts. Moreover, we introduce a global cost that perceptually integrates the hy- pothesis space by combining a pairwise and a higher-level smoothing term, which we minimize globally using dynamic programming. The new framework is demonstrated on two datasets, and is shown to significantly outperform the base- line [13].
Similar papers:
  • A Method of Perceptual-Based Shape Decomposition [pdf] - Chang Ma, Zhongqian Dong, Tingting Jiang, Yizhou Wang, Wen Gao
  • Temporally Consistent Superpixels [pdf] - Matthias Reso, Jorn Jachalsky, Bodo Rosenhahn, Jorn Ostermann
  • Pose-Configurable Generic Tracking of Elongated Objects [pdf] - Daniel Wesierski, Patrick Horain
  • Building Part-Based Object Detectors via 3D Geometry [pdf] - Abhinav Shrivastava, Abhinav Gupta
  • SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor [pdf] - Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin
Deterministic Fitting of Multiple Structures Using Iterative MaxFS with Inlier Scale Estimation [pdf]
Kwang Hee Lee, Sang Wook Lee

Abstract: 2013 IEEE International Conference on Computer Vision ! =<>9*1&. >9 902*&?*&919531*= 1%@0#*&2#*>*&9(:168*= #! " $ # %
Similar papers:
  • A Global Linear Method for Camera Pose Registration [pdf] - Nianjuan Jiang, Zhaopeng Cui, Ping Tan
  • Latent Task Adaptation with Large-Scale Hierarchies [pdf] - Yangqing Jia, Trevor Darrell
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Towards Understanding Action Recognition [pdf] - Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, Michael J. Black
  • Joint Learning of Discriminative Prototypes and Large Margin Nearest Neighbor Classifiers [pdf] - Martin Kostinger, Paul Wohlhart, Peter M. Roth, Horst Bischof
Minimal Basis Facility Location for Subspace Segmentation [pdf]
Choon-Meng Lee, Loong-Fah Cheong

Abstract: In contrast to the current motion segmentation paradigm that assumes independence between the motion subspaces, we approach the motion segmentation problem by seeking the parsimonious basis set that can represent the data. Our formulation explicitly looks for the overlap between sub- spaces in order to achieve a minimal basis representation. This parsimonious basis set is important for the perfor- mance of our model selection scheme because the sharing of basis results in savings of model complexity cost. We propose the use of affinity propagation based method to de- termine the number of motion. The key lies in the incorpo- ration of a global cost model into the factor graph, serving the role of model complexity. The introduction of this global cost model requires additional message update in the factor graph. We derive an efficient update for the new messages associated with this global cost model. An important step in the use of affinity propagation is the subspace hypotheses generation. We use the row-sparse convex proxy solution as an initialization strategy. We further encourage the selec- tion of subspace hypotheses with shared basis by integrat- ing a discount scheme that lowers the factor graph facility cost based on shared basis. We verified the model selection and classification performance of our proposed method on both the original Hopkins 155 dataset and the more bal- anced Hopkins 380 dataset.
Similar papers:
  • GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity [pdf] - Jia Xu, Vamsi K. Ithapu, Lopamudra Mukherjee, James M. Rehg, Vikas Singh
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Distributed Low-Rank Subspace Segmentation [pdf] - Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time [pdf]
Yong Jae Lee, Alexei A. Efros, Martial Hebert

Abstract: We present a weakly-supervised visual data mining ap- proach that discovers connections between recurring mid- level visual elements in historic (temporal) and geographic (spatial) image collections, and attempts to capture the un- derlying visual style. In contrast to existing discovery meth- ods that mine for patterns that remain visually consistent throughout the dataset, our goal is to discover visual ele- ments whose appearance changes due to change in time or location; i.e., exhibit consistent stylistic variations across the label space (date or geo-location). To discover these elements, we first identify groups of patches that are style- sensitive. We then incrementally build correspondences to find the same element across the entire dataset. Finally, we train style-aware regressors that model each elements range of stylistic differences. We apply our approach to date and geo-location prediction and show substantial improve- ment over several baselines that do not model visual style. We also demonstrate the methods effectiveness on the re- lated task of fine-grained classification.
Similar papers:
  • Detecting Dynamic Objects with Multi-view Background Subtraction [pdf] - Raul Diaz, Sam Hallman, Charless C. Fowlkes
  • Human Attribute Recognition by Rich Appearance Dictionary [pdf] - Jungseock Joo, Shuo Wang, Song-Chun Zhu
  • Learning Discriminative Part Detectors for Image Classification and Cosegmentation [pdf] - Jian Sun, Jean Ponce
  • Pyramid Coding for Functional Scene Element Recognition in Video Scenes [pdf] - Eran Swears, Anthony Hoogs, Kim Boyer
  • Data-Driven 3D Primitives for Single Image Understanding [pdf] - David F. Fouhey, Abhinav Gupta, Martial Hebert
A Non-parametric Bayesian Network Prior of Human Pose [pdf]
Andreas M. Lehrmann, Peter V. Gehler, Sebastian Nowozin

Abstract: Having a sensible prior of human pose is a vital ingredi- ent for many computer vision applications, including track- ing and pose estimation. While the application of global non-parametric approaches and parametric models has led to some success, finding the right balance in terms of flex- ibility and tractability, as well as estimating model param- eters from data has turned out to be challenging. In this work, we introduce a sparse Bayesian network model of human pose that is non-parametric with respect to the es- timation of both its graph structure and its local distribu- tions. We describe an efficient sampling scheme for our model and show its tractability for the computation of ex- act log-likelihoods. We empirically validate our approach on the Human 3.6M dataset and demonstrate superior per- formance to global models and parametric networks. We further illustrate our models ability to represent and com- pose poses not present in the training set (compositionality) and describe a speed-accuracy trade-off that allows real- time scoring of poses.
Similar papers:
  • Allocentric Pose Estimation [pdf] - M. Jose Antonio, Luc De_Raedt, Tinne Tuytelaars
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data [pdf] - Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
  • Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests [pdf] - Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
Total Variation Regularization for Functions with Values in a Manifold [pdf]
Jan Lellmann, Evgeny Strekalovskiy, Sabrina Koetter, Daniel Cremers

Abstract: While total variation is among the most popular regu- larizers for variational problems, its extension to functions with values in a manifold is an open problem. In this pa- per, we propose the first algorithm to solve such problems which applies to arbitrary Riemannian manifolds. The key idea is to reformulate the variational problem as a multil- abel optimization problem with an infinite number of labels. This leads to a hard optimization problem which can be ap- proximately solved using convex relaxation techniques. The framework can be easily adapted to different manifolds in- cluding spheres and three-dimensional rotations, and al- lows to obtain accurate solutions even with a relatively coarse discretization. With numerous examples we demon- strate that the proposed framework can be applied to varia- tional models that incorporate chromaticity values, normal fields, or camera trajectories.
Similar papers:
  • Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints [pdf] - Masoud S. Nosrati, Shawn Andrews, Ghassan Hamarneh
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution [pdf] - Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian Lovell
  • Curvature-Aware Regularization on Riemannian Submanifolds [pdf] - Kwang In Kim, James Tompkin, Christian Theobalt
  • Manifold Based Face Synthesis from Sparse Samples [pdf] - Hongteng Xu, Hongyuan Zha
Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation [pdf]
Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu

Abstract: Estimating a dense correspondence field between suc- cessive video frames, under large displacement, is impor- tant in many visual learning and recognition tasks. We pro- pose a novel sparse-to-dense matching method for motion field estimation and occlusion detection. As an alterna- tive to the current coarse-to-fine approaches from the op- tical flow literature, we start from the higher level of sparse matching with rich appearance and geometric constraints collected over extended neighborhoods, using an occlusion aware, locally affine model. Then, we move towards the simpler, but denser classic flow field model, with an inter- polation procedure that offers a natural transition between the sparse and the dense correspondence fields. We experi- mentally demonstrate that our appearance features and our complex geometric constraints permit the correct motion es- timation even in difficult cases of large displacements and significant appearance changes. We also propose a novel classification method for occlusion detection that works in conjunction with the sparse-to-dense matching model. We validate our approach on the newly released Sintel dataset and obtain state-of-the-art results.
Similar papers:
  • Modeling Occlusion by Discriminative AND-OR Structures [pdf] - Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu
  • A General Dense Image Matching Framework Combining Direct and Feature-Based Costs [pdf] - Jim Braux-Zin, Romain Dupont, Adrien Bartoli
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • DeepFlow: Large Displacement Optical Flow with Deep Matching [pdf] - Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling [pdf]
Evgeny Levinkov, Mario Fritz

Abstract: Semantic road labeling is a key component of systems that aim at assisted or even autonomous driving. Consid- ering that such systems continuously operate in the real- world, unforeseen conditions not represented in any con- ceivable training procedure are likely to occur on a regular basis. In order to equip systems with the ability to cope with such situations, we would like to enable adaptation to such new situations and conditions at runtime. Existing adaptive methods for image labeling either re- quire labeled data from the new condition or even operate globally on a complete test set. None of this is a desirable mode of operation for a system as described above where new images arrive sequentially and conditions may vary. We study the effect of changing test conditions on scene labeling methods based on a new diverse street scene dataset. We propose a novel approach that can operate in such conditions and is based on a sequential Bayesian model update in order to robustly integrate the arriving im- ages into the adapting procedure.
Similar papers:
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • A Practical Transfer Learning Algorithm for Face Verification [pdf] - Xudong Cao, David Wipf, Fang Wen, Genquan Duan, Jian Sun
  • Randomized Ensemble Tracking [pdf] - Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke, Camille Monnier
  • Efficient 3D Scene Labeling Using Fields of Trees [pdf] - Olaf Kahler, Ian Reid
  • Robust Object Tracking with Online Multi-lifespan Dictionary Learning [pdf] - Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan
A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models [pdf]
Peihua Li, Qilong Wang, Lei Zhang

Abstract: The similarity or distance measure between Gaussian mixture models (GMMs) plays a crucial role in content- based image matching. Though the Earth Movers Dis- tance (EMD) has shown its advantages in matching his- togram features, its potentials in matching GMMs remain unclear and are not fully explored. To address this problem, we propose a novel EMD methodology for GMM matching. We rst present a sparse representation based EMD called SR-EMD by exploiting the sparse property of the underly- ing problem. SR-EMD is more efcient and robust than the conventional EMD. Second, we present two novel ground distances between component Gaussians based on the in- formation geometry. The perspective from the Riemannian geometry distinguishes the proposed ground distances from the classical entropy- or divergence-based ones. Further- more, motivated by the success of distance metric learning of vector data, we make the rst attempt to learn the EMD distance metrics between GMMs by using a simple yet ef- fective supervised pair-wise based method. It can adapt the distance metrics between GMMs to specic classica- tion tasks. The proposed method is evaluated on both simu- lated data and benchmark real databases and achieves very promising performance.
Similar papers:
  • Recursive Estimation of the Stein Center of SPD Matrices and Its Applications [pdf] - Hesamoddin Salehian, Guang Cheng, Baba C. Vemuri, Jeffrey Ho
  • A New Image Quality Metric for Image Auto-denoising [pdf] - Xiangfei Kong, Kuan Li, Qingxiong Yang, Liu Wenyin, Ming-Hsuan Yang
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
  • Log-Euclidean Kernels for Sparse Representation and Dictionary Learning [pdf] - Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang
Codemaps - Segment, Classify and Search Objects Locally [pdf]
Zhenyang Li, Efstratios Gavves, Koen E.A. van_de_Sande, Cees G.M. Snoek, Arnold W.M. Smeulders

Abstract: In this paper we aim for segmentation and classification of objects. We propose codemaps that are a joint formu- lation of the classification score and the local neighbor- hood it belongs to in the image. We obtain the codemap by reordering the encoding, pooling and classification steps over lattice elements. Other than existing linear decompo- sitions who emphasize only the efficiency benefits for lo- calized search, we make three novel contributions. As a preliminary, we provide a theoretical generalization of the sufficient mathematical conditions under which image en- codings and classification becomes locally decomposable. As first novelty we introduce l2 normalization for arbitrar- ily shaped image regions, which is fast enough for semantic segmentation using our Fisher codemaps. Second, using the same lattice across images, we propose kernel pooling which embeds nonlinearities into codemaps for object clas- sification by explicit or approximate feature mappings. Re- sults demonstrate that l2 normalized Fisher codemaps im- prove the state-of-the-art in semantic segmentation for PAS- CAL VOC. For object classification the addition of nonlin- earities brings us on par with the state-of-the-art, but is 3x faster. Because of the codemaps inherent efficiency, we can reach significant speed-ups for localized search as well. We exploit the efficiency gain for our third novelty: object seg- ment retrieval using a single query image only.
Similar papers:
  • To Aggregate or Not to aggregate: Selective Match Kernels for Image Search [pdf] - Giorgos Tolias, Yannis Avrithis, Herve Jegou
  • Action and Event Recognition with Fisher Vectors on a Compact Feature Set [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • Segmentation Driven Object Detection with Fisher Vectors [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Stable Hyper-pooling and Query Expansion for Event Detection [pdf] - Matthijs Douze, Jerome Revaud, Cordelia Schmid, Herve Jegou
  • Fine-Grained Categorization by Alignments [pdf] - E. Gavves, B. Fernando, C.G.M. Snoek, A.W.M. Smeulders, T. Tuytelaars
Contextual Hypergraph Modeling for Salient Object Detection [pdf]
Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel

Abstract: Salient object detection aims to locate objects that cap- ture human attention within images. Previous approaches often pose this as a problem of image contrast analysis. In this work, we model an image as a hypergraph that uti- lizes a set of hyperedges to capture the contextual proper- ties of image pixels or regions. As a result, the problem of salient object detection becomes one of finding salient ver- tices and hyperedges in the hypergraph. The main advan- tage of hypergraph modeling is that it takes into account each pixels (or regions) affinity with its neighborhood as well as its separation from image background. Further- more, we propose an alternative approach based on center- versus-surround contextual contrast analysis, which per- forms salient object detection by optimizing a cost-sensitive support vector machine (SVM) objective function. Experi- mental results on four challenging datasets demonstrate the effectiveness of the proposed approaches against the state- of-the-art approaches to salient object detection.
Similar papers:
  • Salient Region Detection by UFO: Uniqueness, Focusness and Objectness [pdf] - Peng Jiang, Haibin Ling, Jingyi Yu, Jingliang Peng
  • Saliency Detection in Large Point Sets [pdf] - Elizabeth Shtrom, George Leifman, Ayellet Tal
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
Dynamic Pooling for Complex Event Recognition [pdf]
Weixin Li, Qian Yu, Ajay Divakaran, Nuno Vasconcelos

Abstract: The problem of adaptively selecting pooling regions for the classification of complex video events is considered. Complex events are defined as events composed of several characteristic behaviors, whose temporal configuration can change from sequence to sequence. A dynamic pooling operator is defined so as to enable a unified solution to the problems of event specific video segmentation, tempo- ral structure modeling, and event detection. Video is de- composed into segments, and the segments most informative for detecting a given event are identified, so as to dynami- cally determine the pooling operator most suited for each sequence. This dynamic pooling is implemented by treating the locations of characteristic segments as hidden informa- tion, which is inferred, on a sequence-by-sequence basis, via a large-margin classification rule with latent variables. Although the feasible set of segment selections is combina- torial, it is shown that a globally optimal solution to the inference problem can be obtained efficiently, through the solution of a series of linear programs. Besides the coarse- level location of segments, a finer model of video struc- ture is implemented by jointly pooling features of segment- tuples. Experimental evaluation demonstrates that the re- sulting event detector has state-of-the-art performance on challenging video datasets.
Similar papers:
  • Action and Event Recognition with Fisher Vectors on a Compact Feature Set [pdf] - Dan Oneata, Jakob Verbeek, Cordelia Schmid
  • How Related Exemplars Help Complex Event Detection in Web Videos? [pdf] - Yi Yang, Zhigang Ma, Zhongwen Xu, Shuicheng Yan, Alexander G. Hauptmann
  • Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach [pdf] - Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim
  • Event Detection in Complex Scenes Using Interval Temporal Constraints [pdf] - Yifan Zhang, Qiang Ji, Hanqing Lu
  • Modeling 4D Human-Object Interactions for Event and Object Recognition [pdf] - Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
Exploiting Reflection Change for Automatic Reflection Removal [pdf]
Yu Li, Michael S. Brown

Abstract: This paper introduces an automatic method for remov- ing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assump- tions regarding the background or reflected scenes geom- etry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenar- ios. Our approach is straight forward and produces good results compared with existing methods.
Similar papers:
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
  • Pedestrian Parsing via Deep Decompositional Network [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Topology-Constrained Layered Tracking with Latent Flow [pdf] - Jason Chang, John W. Fisher_III
  • Detecting Dynamic Objects with Multi-view Background Subtraction [pdf] - Raul Diaz, Sam Hallman, Charless C. Fowlkes
  • Real-World Normal Map Capture for Nearly Flat Reflective Surfaces [pdf] - Bastien Jacquet, Christian Hane, Kevin Koser, Marc Pollefeys
Learning to Predict Gaze in Egocentric Video [pdf]
Yin Li, Alireza Fathi, James M. Rehg

Abstract: We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearers behaviors. Specifically, we compute the camera wearers head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fix- ations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocen- tric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.
Similar papers:
  • Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics [pdf] - Nicolas Riche, Matthieu Duvinage, Matei Mancas, Bernard Gosselin, Thierry Dutoit
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Predicting Primary Gaze Behavior Using Social Saliency Fields [pdf] - Hyun Soo Park, Eakta Jain, Yaser Sheikh
  • Semantically-Based Human Scanpath Estimation with HMMs [pdf] - Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin
  • Calibration-Free Gaze Estimation Using Human Gaze Patterns [pdf] - Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
Log-Euclidean Kernels for Sparse Representation and Dictionary Learning [pdf]
Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang

Abstract: The symmetric positive denite (SPD) matrices have been widely used in image and vision problems. Recently there are growing interests in studying sparse representa- tion (SR) of SPD matrices, motivated by the great success of SR for vector data. Though the space of SPD matrices is well-known to form a Lie group that is a Riemannian man- ifold, existing work fails to take full advantage of its geo- metric structure. This paper attempts to tackle this problem by proposing a kernel based method for SR and dictionary learning (DL) of SPD matrices. We disclose that the space of SPD matrices, with the operations of logarithmic multi- plication and scalar logarithmic multiplication dened in the Log-Euclidean framework, is a complete inner prod- uct space. We can thus develop a broad family of kernels that satises Mercers condition. These kernels character- ize the geodesic distance and can be computed efciently. We also consider the geometric structure in the DL process by updating atom matrices in the Riemannian space instead of in the Euclidean space. The proposed method is evalu- ated with various vision problems and shows notable per- formance gains over state-of-the-arts.
Similar papers:
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
  • A Framework for Shape Analysis via Hilbert Space Embedding [pdf] - Sadeep Jayasumana, Mathieu Salzmann, Hongdong Li, Mehrtash Harandi
  • A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models [pdf] - Peihua Li, Qilong Wang, Lei Zhang
  • Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution [pdf] - Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian Lovell
Model Recommendation with Virtual Probes for Egocentric Hand Detection [pdf]
Cheng Li, Kris M. Kitani

Abstract: Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learn- ing about hand-object manipulation. To enable such tech- nology, we believe that the hands must detected on the pixel- level to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the prob- lem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set a small amount of la- beled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automati- cally extracted from the test distribution. The key idea is that many features, such as the color distribution or rela- tive performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in first- person vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users.
Similar papers:
  • POP: Person Re-identification Post-rank Optimisation [pdf] - Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data [pdf] - Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
  • Efficient Hand Pose Estimation from a Single Depth Image [pdf] - Chi Xu, Li Cheng
  • Multi-scale Topological Features for Hand Posture Representation and Analysis [pdf] - Kaoning Hu, Lijun Yin
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
Modeling Occlusion by Discriminative AND-OR Structures [pdf]
Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu

Abstract: Occlusion presents a challenge for detecting objects in real world applications. To address this issue, this paper models object occlusion with an AND-OR structure which (i) represents occlusion at semantic part level, and (ii) cap- tures the regularities of different occlusion configurations (i.e., the different combinations of object part visibilities). This paper focuses on car detection on street. Since anno- tating part occlusion on real images is time-consuming and error-prone, we propose to learn the the AND-OR structure automatically using synthetic images of CAD models placed at different relative positions. The model parameters are learned from real images under the latent structural SVM (LSSVM) framework. In inference, an efficient dynamic pro- gramming (DP) algorithm is utilized. In experiments, we test our method on both car detection and car view estima- tion. Experimental results show that (i) Our CAD simula- tion strategy is capable of generating occlusion patterns for real scenarios, (ii) The proposed AND-OR structure model is effective for modeling occlusions, which outperforms the deformable part-based model (DPM) [6, 10] in car detec- tion on both our self-collected street parking dataset and the Pascal VOC 2007 car dataset [4], (iii) The learned model is on-par with the state-of-the-art methods on car view esti- mation tested on two public datasets.
Similar papers:
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Pedestrian Parsing via Deep Decompositional Network [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation [pdf] - Marius Leordeanu, Andrei Zanfir, Cristian Sminchisescu
  • Handling Occlusions with Franken-Classifiers [pdf] - Markus Mathias, Rodrigo Benenson, Radu Timofte, Luc Van_Gool
  • Learning People Detectors for Tracking in Crowded Scenes [pdf] - Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele
Motion-Aware KNN Laplacian for Video Matting [pdf]
Dingzeyu Li, Qifeng Chen, Chi-Keung Tang

Abstract: This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motion- aware K nearest neighbors. In hindsight, the fundamen- tal problem to solve in video matting is to produce spatio- temporally coherent clusters of moving foreground pixels. When used as described, the motion-aware KNN Lapla- cian is effective in addressing this fundamental problem, as demonstrated by sparse user markups typically on only one frame in a variety of challenging examples featur- ing ambiguous foreground and background colors, chang- ing topologies with disocclusion, significant illumination changes, fast motion, and motion blur. When working with existing Laplacian-based systems, our Laplacian is ex- pected to benefit them immediately with improved clustering of moving foreground pixels.
Similar papers:
  • Compensating for Motion during Direct-Global Separation [pdf] - Supreeth Achar, Stephen T. Nuske, Srinivasa G. Narasimhan
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • Fast Object Segmentation in Unconstrained Video [pdf] - Anestis Papazoglou, Vittorio Ferrari
  • Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations [pdf] - Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Perspective Motion Segmentation via Collaborative Clustering [pdf]
Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou

Abstract: This paper addresses real-world challenges in the mo- tion segmentation problem, including perspective effects, missing data, and unknown number of motions. It first for- mulates the 3-D motion segmentation from two perspective views as a subspace clustering problem, utilizing the epipo- lar constraint of an image pair. It then combines the point correspondence information across multiple image frames via a collaborative clustering step, in which tight integra- tion is achieved via a mixed norm optimization scheme. For model selection, we propose an over-segment and merge ap- proach, where the merging step is based on the property of the l1-norm of the mutual sparse representation of two over- segmented groups. The resulting algorithm can deal with incomplete trajectories and perspective effects substantial- ly better than state-of-the-art two-frame and multi-frame methods. Experiments on a 62-clip dataset show the signif- icant superiority of the proposed idea in both segmentation accuracy and model selection.
Similar papers:
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Minimal Basis Facility Location for Subspace Segmentation [pdf] - Choon-Meng Lee, Loong-Fah Cheong
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
  • Online Motion Segmentation Using Dynamic Label Propagation [pdf] - Ali Elqursh, Ahmed Elgammal
  • Robust Trajectory Clustering for Motion Segmentation [pdf] - Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation [pdf]
Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang

Abstract: We propose an unsupervised detector adaptation algo- rithm to adapt any offline trained face detector to a specific collection of images, and hence achieve better accuracy. The core of our detector adaptation algorithm is a prob- abilistic elastic part (PEP) model, which is offline trained with a set of face examples. It produces a statistically- aligned part based face representation, namely the PEP representation. To adapt a general face detector to a col- lection of images, we compute the PEP representations of the candidate detections from the general face detector, and then train a discriminative classifier with the top positives and negatives. Then we re-rank all the candidate detections with this classifier. This way, a face detector tailored to the statistics of the specific image collection is adapted from the original detector. We present extensive results on three datasets with two state-of-the-art face detectors. The signif- icant improvement of detection accuracy over these state- of-the-art face detectors strongly demonstrates the efficacy of the proposed face detector adaptation algorithm.
Similar papers:
  • Deep Learning Identity-Preserving Face Space [pdf] - Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
  • Modifying the Memorability of Face Photographs [pdf] - Aditya Khosla, Wilma A. Bainbridge, Antonio Torralba, Aude Oliva
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
Saliency Detection via Dense and Sparse Reconstruction [pdf]
Xiaohui Li, Huchuan Lu, Lihe Zhang, Xiang Ruan, Ming-Hsuan Yang

Abstract: In this paper, we propose a visual saliency detection al- gorithm from the perspective of reconstruction errors. The image boundaries are first extracted via superpixels as like- ly cues for background templates, from which dense and sparse appearance models are constructed. For each im- age region, we first compute dense and sparse reconstruc- tion errors. Second, the reconstruction errors are propa- gated based on the contexts obtained from K-means cluster- ing. Third, pixel-level saliency is computed by an integra- tion of multi-scale reconstruction errors and refined by an object-biased Gaussian model. We apply the Bayes formula to integrate saliency measures based on dense and sparse reconstruction errors. Experimental results show that the proposed algorithm performs favorably against seventeen state-of-the-art methods in terms of precision and recall. In addition, the proposed algorithm is demonstrated to be more effective in highlighting salient objects uniformly and robust to background noise.
Similar papers:
  • Saliency Detection via Absorbing Markov Chain [pdf] - Bowen Jiang, Lihe Zhang, Huchuan Lu, Chuan Yang, Ming-Hsuan Yang
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Category-Independent Object-Level Saliency Detection [pdf] - Yangqing Jia, Mei Han
  • Efficient Salient Region Detection with Soft Image Abstraction [pdf] - Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook
  • Contextual Hypergraph Modeling for Salient Object Detection [pdf] - Xi Li, Yao Li, Chunhua Shen, Anthony Dick, Anton Van_Den_Hengel
Video Segmentation by Tracking Many Figure-Ground Segments [pdf]
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg

Abstract: We propose an unsupervised video segmentation ap- proach by simultaneously tracking multiple holistic figure- ground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground seg- mentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By us- ing the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of seg- ment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statisti- cal inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment pro- posals and recombines for better ones by utilizing high- order statistic estimates from the appearance model and en- forcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework out- performs state-of-the-art approaches in the dataset, show- ing its efficiency and robustness to challenges in different video sequences.
Similar papers:
  • BOLD Features to Detect Texture-less Objects [pdf] - Federico Tombari, Alessandro Franchi, Luigi Di_Stefano
  • A New Adaptive Segmental Matching Measure for Human Activity Recognition [pdf] - Shahriar Shariat, Vladimir Pavlovic
  • Fast Object Segmentation in Unconstrained Video [pdf] - Anestis Papazoglou, Vittorio Ferrari
  • Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding [pdf] - Daniel M. Steinberg, Oscar Pizarro, Stefan B. Williams
  • Action Recognition and Localization by Hierarchical Space-Time Segments [pdf] - Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff
Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf]
Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han

Abstract: We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropri- ate interactions between the two modules to solve individ- ual problems. This joint estimation problem is divided into two subproblems, foreground/background segmentation and pose tracking, which alternate iteratively for optimization; segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides fore- ground response map for segmentation. The final solution is obtained when the iterative procedure converges. We eval- uate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its out- standing performance compared to the state-of-the-art tech- niques for segmentation and pose estimation.
Similar papers:
  • Pose Estimation and Segmentation of People in 3D Movies [pdf] - Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Real-Time Body Tracking with One Depth Camera and Inertial Sensors [pdf] - Thomas Helten, Meinard Muller, Hans-Peter Seidel, Christian Theobalt
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Parsing IKEA Objects: Fine Pose Estimation [pdf]
Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba

Abstract: We address the problem of localizing and estimating the fine-pose of objects in the image with exact 3D models. Our main focus is to unify contributions from the 1970s with re- cent advances in object detection: use local keypoint de- tectors to find candidate poses and score global alignment of each candidate pose to the image. Moreover, we also provide a new dataset containing fine-aligned objects with their exactly matched 3D models, and a set of models for widely used objects. We also evaluate our algorithm both on object detection and fine pose estimation, and show that our method outperforms state-of-the art algorithms.
Similar papers:
  • Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data [pdf] - Srinath Sridhar, Antti Oulasvirta, Christian Theobalt
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Allocentric Pose Estimation [pdf] - M. Jose Antonio, Luc De_Raedt, Tinne Tuytelaars
3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval [pdf]
Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, Winston Hsu

Abstract: We propose a 3D sub-query expansion approach for boosting sketch-based multi-view image retrieval. The core idea of our method is to automatically convert two (guided) 2D sketches into an approximated 3D sketch model, and then generate multi-view sketches as expanded sub-queries to improve the retrieval performance. To learn the weights among synthesized views (sub-queries), we present a new multi-query feature to model the similarity between sub- queries and dataset images, and formulate it into a convex optimization problem. Our approach shows superior per- formance compared with the state-of-the-art approach on a public multi-view image dataset. Moreover, we also conduct sensitivity tests to analyze the parameters of our approach based on the gathered user sketches.
Similar papers:
  • Image Retrieval Using Textual Cues [pdf] - Anand Mishra, Karteek Alahari, C.V. Jawahar
  • Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval [pdf] - Cai-Zhi Zhu, Herve Jegou, Shin'Ichi Satoh
  • Stable Hyper-pooling and Query Expansion for Event Detection [pdf] - Matthijs Douze, Jerome Revaud, Cordelia Schmid, Herve Jegou
  • Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation [pdf] - Basura Fernando, Tinne Tuytelaars
  • SYM-FISH: A Symmetry-Aware Flip Invariant Sketch Histogram Shape Descriptor [pdf] - Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, Liang Lin
A General Two-Step Approach to Learning-Based Hashing [pdf]
Guosheng Lin, Chunhua Shen, David Suter, Anton van_den_Hengel

Abstract: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typ- ically deeply coupled to this specific form. This tight cou- pling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Here we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem- specific hashing methods. Our framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. Both problems have been extensively studied in the literature. Our extensive ex- periments demonstrate that the proposed framework is ef- fective, flexible and outperforms the state-of-the-art.
Similar papers:
  • Structured Learning of Sum-of-Submodular Higher Order Energy Functions [pdf] - Alexander Fix, Thorsten Joachims, Sam Park, Ramin Zabih
  • Learning Hash Codes with Listwise Supervision [pdf] - Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
  • Complementary Projection Hashing [pdf] - Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li
  • Large-Scale Video Hashing via Structure Learning [pdf] - Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
  • Supervised Binary Hash Code Learning with Jensen Shannon Divergence [pdf] - Lixin Fan
Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes [pdf]
Dahua Lin, Jianxiong Xiao

Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a seman- tic topic. At the heart of this model is a novel stochas- tic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian pro- cesses, thus allowing the distributions of topics to vary con- tinuously across the image plane. A key aspect that distin- guishes this model from previous ones consists in its capa- bility of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination.
Similar papers:
  • Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding [pdf] - Daniel M. Steinberg, Oscar Pizarro, Stefan B. Williams
  • Box in the Box: Joint 3D Layout and Object Reasoning from Single Images [pdf] - Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun
  • Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors [pdf] - Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
  • Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation [pdf] - Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
  • Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification [pdf] - Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Holistic Scene Understanding for 3D Object Detection with RGBD Cameras [pdf]
Dahua Lin, Sanja Fidler, Raquel Urtasun

Abstract: In this paper, we tackle the problem of indoor scene un- derstanding using RGBD data. Towards this goal, we pro- pose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] frame- work to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate informa- tion from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilis- tic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial im- provement over the state-of-the-art.
Similar papers:
  • Coherent Object Detection with 3D Geometric Context from a Single Image [pdf] - Jiyan Pan, Takeo Kanade
  • From Subcategories to Visual Composites: A Multi-level Framework for Object Detection [pdf] - Tian Lan, Michalis Raptis, Leonid Sigal, Greg Mori
  • Segmentation Driven Object Detection with Fisher Vectors [pdf] - Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid
  • Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors [pdf] - Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
Robust Non-parametric Data Fitting for Correspondence Modeling [pdf]
Wen-Yan Lin, Ming-Ming Cheng, Shuai Zheng, Jiangbo Lu, Nigel Crook

Abstract: We propose a generic method for obtaining non- parametric image warps from noisy point correspondences. Our formulation integrates a huber function into a motion coherence framework. This makes our fitting function es- pecially robust to piecewise correspondence noise (where an image section is consistently mismatched). By utilizing over parameterized curves, we can generate realistic non- parametric image warps from very noisy correspondence. We also demonstrate how our algorithm can be used to help stitch images taken from a panning camera by warping the images onto a virtual push-broom camera imaging plane.
Similar papers:
  • DeepFlow: Large Displacement Optical Flow with Deep Matching [pdf] - Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
  • Optical Flow via Locally Adaptive Fusion of Complementary Data Costs [pdf] - Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee
  • A General Dense Image Matching Framework Combining Direct and Feature-Based Costs [pdf] - Jim Braux-Zin, Romain Dupont, Adrien Bartoli
  • Piecewise Rigid Scene Flow [pdf] - Christoph Vogel, Konrad Schindler, Stefan Roth
  • Optimization Problems for Fast AAM Fitting in-the-Wild [pdf] - Georgios Tzimiropoulos, Maja Pantic
A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-Dimensional Visual Data [pdf]
Lingqiao Liu, Lei Wang

Abstract: To achieve a good trade-off between recognition accu- racy and computational efficiency, it is often needed to re- duce high-dimensional visual data to medium-dimensional ones. For this task, even applying a simple full-matrix- based linear projection causes significant computation and memory use. When the number of visual data is large, how to efficiently learn such a projection could even become a problem. The recent feature merging approach offers an ef- ficient way to reduce the dimensionality, which only requires a single scan of features to perform reduction. However, existing merging algorithms do not scale well with high- dimensional data, especially in the unsupervised case. To address this problem, we formulate unsupervised fea- ture merging as a PCA problem imposed with a special structure constraint. By exploiting its connection with k- means, we transform this constrained PCA problem into a feature clustering problem. Moreover, we employ the hash- ing technique to improve its scalability. These produce a scalable feature merging algorithm for our dimensional- ity reduction task. In addition, we develop an extension of this method by leveraging the neighborhood structure in the data to further improve dimensionality reduction perfor- mance. In further, we explore the incorporation of bipolar merging a variant of merging function which allows the subtraction operation into our algorithms. Through three applications in visual recognition, we demonstrate that our
Similar papers:
  • Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences [pdf] - Bing Su, Xiaoqing Ding
  • Large-Scale Video Hashing via Structure Learning [pdf] - Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
  • Complementary Projection Hashing [pdf] - Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Fast High Dimensional Vector Multiplication Face Recognition [pdf] - Oren Barkan, Jonathan Weill, Lior Wolf, Hagai Aronowitz
Automatic Kronecker Product Model Based Detection of Repeated Patterns in 2D Urban Images [pdf]
Juan Liu, Emmanouil Psarakis, Ioannis Stamos

Abstract: Repeated patterns (such as windows, tiles, balconies and doors) are prominent and significant features in urban scenes. Therefore, detection of these repeated patterns be- comes very important for city scene analysis. This paper attacks the problem of repeated patterns detection in a pre- cise, efficient and automatic way, by combining traditional feature extraction followed by a Kronecker product low- rank modeling approach. Our method is tailored for 2D im- ages of building fac ades. We have developed algorithms for automatic selection of a representative texture within fac ade images using vanishing points and Harris corners. After rectifying the input images, we describe novel algorithms that extract repeated patterns by using Kronecker product based modeling that is based on a solid theoretical founda- tion. Our approach is unique and has not ever been used for fac ade analysis. We have tested our algorithms in a large set of images.
Similar papers:
  • Manipulation Pattern Discovery: A Nonparametric Bayesian Approach [pdf] - Bingbing Ni, Pierre Moulin
  • SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition [pdf] - Qiegen Liu, Jianbo Liu, Pei Dong, Dong Liang
  • Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision [pdf] - Tae-Hyun Oh, Hyeongwoo Kim, Yu-Wing Tai, Jean-Charles Bazin, In So Kweon
  • Log-Euclidean Kernels for Sparse Representation and Dictionary Learning [pdf] - Peihua Li, Qilong Wang, Wangmeng Zuo, Lei Zhang
  • Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length [pdf] - Nicolas Martin, Vincent Couture, Sebastien Roy
Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf]
Jiongxin Liu, Peter N. Belhumeur

Abstract: In this paper, we propose a novel approach for bird part localization, targeting fine-grained categories with wide variations in appearance due to different poses (including aspect and orientation) and subcategories. As it is chal- lenging to represent such variations across a large set of di- verse samples with tractable parametric models, we turn to individual exemplars. Specifically, we extend the exemplar- based models in [4] by enforcing pose and subcategory consistency at the parts. During training, we build pose- specific detectors scoring part poses across subcategories, and subcategory-specific detectors scoring part appearance across poses. At the testing stage, likely exemplars are matched to the image, suggesting part locations whose pose and subcategory consistency are well-supported by the im- age cues. From these hypotheses, part configuration can be predicted with very high accuracy. Experimental results demonstrate significant performance gains from our method on an extensive dataset: CUB-200-2011 [30], for both lo- calization and classification tasks.
Similar papers:
  • Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction [pdf] - Ning Zhang, Ryan Farrell, Forrest Iandola, Trevor Darrell
  • Parsing IKEA Objects: Fine Pose Estimation [pdf] - Joseph J. Lim, Hamed Pirsiavash, Antonio Torralba
  • Monocular Image 3D Human Pose Estimation under Self-Occlusion [pdf] - Ibrahim Radwan, Abhinav Dhall, Roland Goecke
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Allocentric Pose Estimation [pdf] - M. Jose Antonio, Luc De_Raedt, Tinne Tuytelaars
Joint Subspace Stabilization for Stereoscopic Video [pdf]
Feng Liu, Yuzhen Niu, Hailin Jin

Abstract: Shaky stereoscopic video is not only unpleasant to watch but may also cause 3D fatigue. Stabilizing the left and right view of a stereoscopic video separately using a monocu- lar stabilization method tends to both introduce undesir- able vertical disparities and damage horizontal disparities, which may destroy the stereoscopic viewing experience. In this paper, we present a joint subspace stabilization method for stereoscopic video. We prove that the low-rank subspace constraint for monocular video [10] also holds for stereo- scopic video. Particularly, the feature trajectories from the left and right video share the same subspace. Based on this proof, we develop a stereo subspace stabilization method that jointly computes a common subspace from the left and right video and uses it to stabilize the two videos simultane- ously. Our method meets the stereoscopic constraints with- out 3D reconstruction or explicit left-right correspondence. We test our method on a variety of stereoscopic videos with different scene content and camera motion. The experi- ments show that our method achieves high-quality stabiliza- tion for stereoscopic video in a robust and efficient way.
Similar papers:
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
  • Robust Trajectory Clustering for Motion Segmentation [pdf] - Feng Shi, Zhong Zhou, Jiangjian Xiao, Wei Wu
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Online Motion Segmentation Using Dynamic Label Propagation [pdf] - Ali Elqursh, Ahmed Elgammal
  • Camera Alignment Using Trajectory Intersections in Unsynchronized Videos [pdf] - Thomas Kuo, Santhoshkumar Sunderrajan, B.S. Manjunath
POP: Person Re-identification Post-rank Optimisation [pdf]
Chunxiao Liu, Chen Change Loy, Shaogang Gong, Guijin Wang

Abstract: Owing to visual ambiguities and disparities, person re- identification methods inevitably produce suboptimal rank- list, which still requires exhaustive human eyeballing to identify the correct target from hundreds of different likely- candidates. Existing re-identification studies focus on im- proving the ranking performance, but rarely look into the critical problem of optimising the time-consuming and error-prone post-rank visual search at the user end. In this study, we present a novel one-shot Post-rank OPtimisation (POP) method, which allows a user to quickly refine their search by either one-shot or a couple of sparse negative selections during a re-identification process. We conduct systematic behavioural studies to understand users search- ing behaviour and show that the proposed method allows correct re-identification to converge 2.6 times faster than the conventional exhaustive search. Importantly, through extensive evaluations we demonstrate that the method is ca- pable of achieving significant improvement over the state- of-the-art distance metric learning based ranking models, even with just one shot feedback optimisation, by as much as over 30% performance improvement for rank 1 re- identification on the VIPeR and i-LIDS datasets.
Similar papers:
  • Model Recommendation with Virtual Probes for Egocentric Hand Detection [pdf] - Cheng Li, Kris M. Kitani
  • Person Re-identification by Salience Matching [pdf] - Rui Zhao, Wanli Ouyang, Xiaogang Wang
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
  • Attribute Pivots for Guiding Relevance Feedback in Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • Implied Feedback: Learning Nuances of User Behavior in Image Search [pdf] - Devi Parikh, Kristen Grauman
Semantically-Based Human Scanpath Estimation with HMMs [pdf]
Huiying Liu, Dong Xu, Qingming Huang, Wen Li, Min Xu, Stephen Lin

Abstract: We present a method for estimating human scanpaths, which are sequences of gaze shifts that follow visual atten- tion over an image. In this work, scanpaths are modeled based on three principal factors that influence human atten- tion, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for se- mantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image re- gions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can rep- resent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent hu- man gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.
Similar papers:
  • Saliency Detection via Absorbing Markov Chain [pdf] - Bowen Jiang, Lihe Zhang, Huchuan Lu, Chuan Yang, Ming-Hsuan Yang
  • Analysis of Scores, Datasets, and Models in Visual Saliency Prediction [pdf] - Ali Borji, Hamed R. Tavakoli, Dicky N. Sihite, Laurent Itti
  • Predicting Primary Gaze Behavior Using Social Saliency Fields [pdf] - Hyun Soo Park, Eakta Jain, Yaser Sheikh
  • Learning to Predict Gaze in Egocentric Video [pdf] - Yin Li, Alireza Fathi, James M. Rehg
  • Calibration-Free Gaze Estimation Using Human Gaze Patterns [pdf] - Fares Alnajar, Theo Gevers, Roberto Valenti, Sennay Ghebreab
SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition [pdf]
Qiegen Liu, Jianbo Liu, Pei Dong, Dong Liang

Abstract: This paper presents a novel structure gradient and tex- ture decorrelating regularization (SGTD) for image decom- position. The motivation of the idea is under the assumption that the structure gradient and texture components should be properly decorrelated for a successful decomposition. The proposed model consists of the data fidelity term, total variation regularization and the SGTD regularization. An augmented Lagrangian method is proposed to address this optimization issue, by first transforming the unconstrained problem to an equivalent constrained problem and then ap- plying an alternating direction method to iteratively solve the subproblems. Experimental results demonstrate that the proposed method presents better or comparable perfor- mance as state-of-the-art methods do.
Similar papers:
  • A Method of Perceptual-Based Shape Decomposition [pdf] - Chang Ma, Zhongqian Dong, Tingting Jiang, Yizhou Wang, Wen Gao
  • A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models [pdf] - Peihua Li, Qilong Wang, Lei Zhang
  • A Simple Model for Intrinsic Image Decomposition with Depth Cues [pdf] - Qifeng Chen, Vladlen Koltun
  • Example-Based Facade Texture Synthesis [pdf] - Dengxin Dai, Hayko Riemenschneider, Gerhard Schmitt, Luc Van_Gool
  • Shape Index Descriptors Applied to Texture-Based Galaxy Analysis [pdf] - Kim Steenstrup Pedersen, Kristoffer Stensbo-Smidt, Andrew Zirm, Christian Igel
Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition [pdf]
Hans Lobel, Rene Vidal, Alvaro Soto

Abstract: Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recog- nition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recog- nition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to exten- sions of binary classification schemes, a strategy that ig- nores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max- margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular bench- mark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of- the-art recognition performance using far less visual words than previous approaches.
Similar papers:
  • Coupled Dictionary and Feature Space Learning with Applications to Cross-Domain Image Synthesis and Recognition [pdf] - De-An Huang, Yu-Chiang Frank Wang
  • Robust Dictionary Learning by Error Source Decomposition [pdf] - Zhuoyuan Chen, Ying Wu
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
Two-Point Gait: Decoupling Gait from Body Shape [pdf]
Stephen Lombardi, Ko Nishino, Yasushi Makihara, Yasushi Yagi

Abstract: Human gait modeling (e.g., for person identification) largely relies on image-based representations that muddle gait with body shape. Silhouettes, for instance, inherently entangle body shape and gait. For gait analysis and recog- nition, decoupling these two factors is desirable. Most im- portant, once decoupled, they can be combined for the task at hand, but not if left entangled in the first place. In this paper, we introduce Two-Point Gait, a gait representation that encodes the limb motions regardless of the body shape. Two-Point Gait is directly computed on the image sequence based on the two point statistics of optical flow fields. We demonstrate its use for exploring the space of human gait and gait recognition under large clothing variation. The results show that we can achieve state-of-the-art person recognition accuracy on a challenging dataset.
Similar papers:
  • Towards Understanding Action Recognition [pdf] - Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, Michael J. Black
  • Joint Segmentation and Pose Tracking of Human in Natural Videos [pdf] - Taegyu Lim, Seunghoon Hong, Bohyung Han, Joon Hee Han
  • Real-Time Body Tracking with One Depth Camera and Inertial Sensors [pdf] - Thomas Helten, Meinard Muller, Hans-Peter Seidel, Christian Theobalt
  • Strong Appearance and Expressive Spatial Models for Human Pose Estimation [pdf] - Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, Bernt Schiele
  • Estimating Human Pose with Flowing Puppets [pdf] - Silvia Zuffi, Javier Romero, Cordelia Schmid, Michael J. Black
Active Visual Recognition with Expertise Estimation in Crowdsourcing [pdf]
Chengjiang Long, Gang Hua, Ashish Kapoor

Abstract: We present a noise resilient probabilistic model for ac- tive learning of a Gaussian process classifier from crowds, i.e., a set of noisy labelers. It explicitly models both the overall label noises and the expertise level of each individ- ual labeler in two levels of flip models. Expectation propa- gation is adopted for efficient approximate Bayesian infer- ence of our probabilistic model for classification, based on which, a generalized EM algorithm is derived to estimate both the global label noise and the expertise of each indi- vidual labeler. The probabilistic nature of our model im- mediately allows the adoption of the prediction entropy and estimated expertise for active selection of data sample to be labeled, and active selection of high quality labelers to la- bel the data, respectively. We apply the proposed model for three visual recognition tasks, i.e, object category recogni- tion, gender recognition, and multi-modal activity recogni- tion, on three datasets with real crowd-sourced labels from Amazon Mechanical Turk. The experiments clearly demon- strated the efficacy of the proposed model.
Similar papers:
  • Active MAP Inference in CRFs for Efficient Semantic Segmentation [pdf] - Gemma Roig, Xavier Boix, Roderick De_Nijs, Sebastian Ramos, Koljia Kuhnlenz, Luc Van_Gool
  • Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification [pdf] - Bo Wang, Zhuowen Tu, John K. Tsotsos
  • Active Learning of an Action Detector from Untrimmed Videos [pdf] - Sunil Bandla, Kristen Grauman
  • A Convex Optimization Framework for Active Learning [pdf] - Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, S. Shankar Sasrty
  • Collaborative Active Learning of a Kernel Machine Ensemble for Recognition [pdf] - Gang Hua, Chengjiang Long, Ming Yang, Yan Gao
Transfer Feature Learning with Joint Distribution Adaptation [pdf]
Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, Philip S. Yu

Abstract: Transfer learning is established as an effective technolo- gy in computer vision for leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultane- ously reduced the difference in both the marginal distribu- tion and conditional distribution between domains. In this paper, we put forward a novel transfer learning approach, referred to as Joint Distribution Adaptation (JDA). Specifi- cally, JDA aims to jointly adapt both the marginal distribu- tion and conditional distribution in a principled dimension- ality reduction procedure, and construct new feature repre- sentation that is effective and robust for substantial distribu- tion difference. Extensive experiments verify that JDA can significantly outperform several state-of-the-art methods on four types of cross-domain image classification problems.
Similar papers:
  • Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information [pdf] - Andy J. Ma, Pong C. Yuen, Jiawei Li
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
  • Unsupervised Visual Domain Adaptation Using Subspace Alignment [pdf] - Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars
  • Unsupervised Domain Adaptation by Domain Invariant Projection [pdf] - Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann
From Semi-supervised to Transfer Counting of Crowds [pdf]
Chen Change Loy, Shaogang Gong, Tao Xiang

Abstract: Regression-based techniques have shown promising re- sults for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most in- formative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) La- belled data from other scenes are employed to further al- leviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regres- sion framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd pat- terns via manifold analysis. Extensive experiments validate the effectiveness of our approach.
Similar papers:
  • A Practical Transfer Learning Algorithm for Face Verification [pdf] - Xudong Cao, David Wipf, Fang Wen, Genquan Duan, Jian Sun
  • Transfer Feature Learning with Joint Distribution Adaptation [pdf] - Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, Philip S. Yu
  • Alternating Regression Forests for Object Detection and Pose Estimation [pdf] - Samuel Schulter, Christian Leistner, Paul Wohlhart, Peter M. Roth, Horst Bischof
  • Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests [pdf] - Danhang Tang, Tsz-Ho Yu, Tae-Kyun Kim
  • Manifold Based Face Synthesis from Sparse Samples [pdf] - Hongteng Xu, Hongyuan Zha
Abnormal Event Detection at 150 FPS in MATLAB [pdf]
Cewu Lu, Jianping Shi, Jiaya Jia

Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning frame- work. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140150 frames per second on average when computing on an ordinary desktop PC using MATLAB.
Similar papers:
  • Event Recognition in Photo Collections with a Stopwatch HMM [pdf] - Lukas Bossard, Matthieu Guillaumin, Luc Van_Gool
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • How Related Exemplars Help Complex Event Detection in Web Videos? [pdf] - Yi Yang, Zhigang Ma, Zhongwen Xu, Shuicheng Yan, Alexander G. Hauptmann
  • Event Detection in Complex Scenes Using Interval Temporal Constraints [pdf] - Yifan Zhang, Qiang Ji, Hanqing Lu
  • Modeling 4D Human-Object Interactions for Event and Object Recognition [pdf] - Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu
Correlation Adaptive Subspace Segmentation by Trace Lasso [pdf]
Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan

Abstract: This paper studies the subspace segmentation problem. Given a set of data points drawn from a union of subspaces, the goal is to partition them into their underlying subspaces they were drawn from. The spectral clustering method is used as the framework. It requires to find an affinity ma- trix which is close to block diagonal, with nonzero entries corresponding to the data point pairs from the same sub- space. In this work, we argue that both sparsity and the grouping effect are important for subspace segmentation. A sparse affinity matrix tends to be block diagonal, with less connections between data points from different sub- spaces. The grouping effect ensures that the highly cor- rected data which are usually from the same subspace can be grouped together. Sparse Subspace Clustering (SSC), by using l1-minimization, encourages sparsity for data se- lection, but it lacks of the grouping effect. On the contrary, Low-Rank Representation (LRR), by rank minimization, and Least Squares Regression (LSR), by l2-regularization, ex- hibit strong grouping effect, but they are short in subset s- election. Thus the obtained affinity matrix is usually very sparse by SSC, yet very dense by LRR and LSR. In this work, we propose the Correlation Adaptive Sub- space Segmentation (CASS) method by using trace Lasso. CASS is a data correlation dependent method which simul- taneously performs automatic data selection and groups correlated data together. It can be regarded as a method which adap
Similar papers:
  • Perspective Motion Segmentation via Collaborative Clustering [pdf] - Zhuwen Li, Jiaming Guo, Loong-Fah Cheong, Steven Zhiying Zhou
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Distributed Low-Rank Subspace Segmentation [pdf] - Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
  • Correntropy Induced L2 Graph for Robust Subspace Clustering [pdf] - Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
Correntropy Induced L2 Graph for Robust Subspace Clustering [pdf]
Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin

Abstract: In this paper, we study the robust subspace clustering problem, which aims to cluster the given possibly noisy da- ta points into their underlying subspaces. A large pool of previous subspace clustering methods focus on the graph construction by different regularization of the representa- tion coefficient. We instead focus on the robustness of the model to non-Gaussian noises. We propose a new robust clustering method by using the correntropy induced metric, which is robust for handling the non-Gaussian and impul- sive noises. Also we further extend the method for handling the data with outlier rows/features. The multiplicative form of half-quadratic optimization is used to optimize the non- convex correntropy objective function of the proposed mod- els. Extensive experiments on face datasets well demon- strate that the proposed methods are more robust to corrup- tions and occlusions.
Similar papers:
  • GOSUS: Grassmannian Online Subspace Updates with Structured-Sparsity [pdf] - Jia Xu, Vamsi K. Ithapu, Lopamudra Mukherjee, James M. Rehg, Vikas Singh
  • Distributed Low-Rank Subspace Segmentation [pdf] - Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan
  • Latent Space Sparse Subspace Clustering [pdf] - Vishal M. Patel, Hien Van Nguyen, Rene Vidal
  • Correlation Adaptive Subspace Segmentation by Trace Lasso [pdf] - Canyi Lu, Jiashi Feng, Zhouchen Lin, Shuicheng Yan
  • Robust Subspace Clustering via Half-Quadratic Minimization [pdf] - Yingya Zhang, Zhenan Sun, Ran He, Tieniu Tan
Face Recognition Using Face Patch Networks [pdf]
Chaochao Lu, Deli Zhao, Xiaoou Tang

Abstract: When face images are taken in the wild, the large varia- tions in facial pose, illumination, and expression make face recognition challenging. The most fundamental problem for face recognition is to measure the similarity between faces. The traditional measurements such as various mathematical norms, Hausdorff distance, and approximate geodesic distance cannot accurately capture the structural information between faces in such complex circumstances. To address this issue, we develop a novel face patch network, based on which we define a new similarity measure called the random path (RP) measure. The RP measure is derived from the collective similarity of paths by performing random walks in the network. It can globally characterize the contextual and curved structures of the face space. To apply the RP measure, we construct two kinds of networks: the in-face network and the out-face network. The in-face network is drawn from any two face images and captures the local structural information. The out-face network is constructed from all the training face patches, thereby modeling the global structures of face space. The two face networks are structurally complementary and can be combined together to improve the recognition performance. Experiments on the Multi-PIE and LFW benchmarks show that the RP measure outperforms most of the state-of-art algorithms for face recognition.
Similar papers:
  • Fast Face Detector Training Using Tailored Views [pdf] - Kristina Scherbaum, James Petterson, Rogerio S. Feris, Volker Blanz, Hans-Peter Seidel
  • Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition [pdf] - Yizhe Zhang, Ming Shao, Edward K. Wong, Yun Fu
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • Face Recognition via Archetype Hull Ranking [pdf] - Yuanjun Xiong, Wei Liu, Deli Zhao, Xiaoou Tang
  • Deep Learning Identity-Preserving Face Space [pdf] - Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning [pdf]
Jiwen Lu, Gang Wang, Pierre Moulin

Abstract: This paper presents a new approach for image set classi- fication, where each training and testing example contains a set of image instances of an object captured from varying viewpoints or under varying illuminations. While a number of image set classification methods have been proposed in recent years, most of them model each image set as a single linear subspace or mixture of linear subspaces, which may lose some discriminative information for classification. To address this, we propose exploring multiple order statistics as features of image sets, and develop a localized multi- kernel metric learning (LMKML) algorithm to effectively combine different order statistics information for classifica- tion. Our method achieves the state-of-the-art performance on four widely used databases including the Honda/UCSD, CMU Mobo, and Youtube face datasets, and the ETH-80 object dataset.
Similar papers:
  • Hidden Factor Analysis for Age Invariant Face Recognition [pdf] - Dihong Gong, Zhifeng Li, Dahua Lin, Jianzhuang Liu, Xiaoou Tang
  • On One-Shot Similarity Kernels: Explicit Feature Maps and Properties [pdf] - Stefanos Zafeiriou, Irene Kotsia
  • Similarity Metric Learning for Face Recognition [pdf] - Qiong Cao, Yiming Ying, Peng Li
  • Robust Feature Set Matching for Partial Face Recognition [pdf] - Renliang Weng, Jiwen Lu, Junlin Hu, Gao Yang, Yap-Peng Tan
  • From Point to Set: Extend the Learning of Distance Metrics [pdf] - Pengfei Zhu, Lei Zhang, Wangmeng Zuo, David Zhang
A Deep Sum-Product Architecture for Robust Facial Attributes Analysis [pdf]
Ping Luo, Xiaogang Wang, Xiaoou Tang

Abstract: Recent works have shown that facial attributes are useful in a number of applications such as face recognition and retrieval. However, estimating attributes in images with large variations remains a big challenge. This challenge is addressed in this paper. Unlike existing methods that assume the independence of attributes during their esti- mation, our approach captures the interdependencies of local regions for each attribute, as well as the high-order correlations between different attributes, which makes it more robust to occlusions and misdetection of face regions. First, we have modeled region interdependencies with a discriminative decision tree, where each node consists of a detector and a classifier trained on a local region. The detector allows us to locate the region, while the classifier determines the presence or absence of an attribute. Sec- ond, correlations of attributes and attribute predictors are modeled by organizing all of the decision trees into a large sum-product network (SPN), which is learned by the EM algorithm and yields the most probable explanation (MPE) of the facial attributes in terms of the regions localization and classification. Experimental results on a large data set with 22, 400 images show the effectiveness of the proposed approach.
Similar papers:
  • Hybrid Deep Learning for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Attribute Adaptation for Personalized Image Search [pdf] - Adriana Kovashka, Kristen Grauman
  • Attribute Dominance: What Pops Out? [pdf] - Naman Turakhia, Devi Parikh
  • Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing [pdf] - Amir Sadovnik, Andrew Gallagher, Devi Parikh, Tsuhan Chen
  • A Unified Probabilistic Approach Modeling Relationships between Attributes and Objects [pdf] - Xiaoyang Wang, Qiang Ji
Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps [pdf]
Jiajia Luo, Wei Wang, Hairong Qi

Abstract: Human action recognition based on the depth informa- tion provided by commodity depth sensors is an impor- tant yet challenging task. The noisy depth maps, differ- ent lengths of action sequences, and free styles in per- forming actions, may cause large intra-class variations. In this paper, a new framework based on sparse coding and temporal pyramid matching (TPM) is proposed for depth- based human action recognition. Especially, a discrimina- tive class-specific dictionary learning algorithm is proposed for sparse coding. By adding the group sparsity and geom- etry constraints, features can be well reconstructed by the sub-dictionary belonging to the same class, and the geom- etry relationships among features are also kept in the cal- culated coefficients. The proposed approach is evaluated on two benchmark datasets captured by depth cameras. Exper- imental results show that the proposed algorithm repeatedly achieves superior performance to the state of the art algo- rithms. Moreover, the proposed dictionary learning method also outperforms classic dictionary learning approaches.
Similar papers:
  • Semi-supervised Robust Dictionary Learning via Efficient l-Norms Minimization [pdf] - Hua Wang, Feiping Nie, Weidong Cai, Heng Huang
  • Learning View-Invariant Sparse Representations for Cross-View Action Recognition [pdf] - Jingjing Zheng, Zhuolin Jiang
  • Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications [pdf] - Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
  • Multi-attributed Dictionary Learning for Sparse Coding [pdf] - Chen-Kuo Chiang, Te-Feng Su, Chih Yen, Shang-Hong Lai
  • Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration [pdf] - Chenglong Bao, Jian-Feng Cai, Hui Ji
Pedestrian Parsing via Deep Decompositional Network [pdf]
Ping Luo, Xiaogang Wang, Xiaoou Tang

Abstract: We propose a new Deep Decompositional Network (DDN) for parsing pedestrian images into semantic regions, such as hair, head, body, arms, and legs, where the pedestri- ans can be heavily occluded. Unlike existing methods based on template matching or Bayesian inference, our approach directly maps low-level visual features to the label maps of body parts with DDN, which is able to accurately estimate complex pose variations with good robustness to occlusions and background clutters. DDN jointly estimates occluded regions and segments body parts by stacking three types of hidden layers: occlusion estimation layers, completion layers, and decomposition layers. The occlusion estimation layers estimate a binary mask, indicating which part of a pedestrian is invisible. The completion layers synthesize low-level features of the invisible part from the original features and the occlusion mask. The decomposition layers directly transform the synthesized visual features to label maps. We devise a new strategy to pre-train these hidden layers, and then fine-tune the entire network using the stochastic gradient descent. Experimental results show that our approach achieves better segmentation accuracy than the state-of-the-art methods on pedestrian images with or without occlusions. Another important contribution of this paper is that it provides a large scale benchmark human parsing dataset1 that includes 3,673 annotated samples collected from 171 surveillance videos. It is 20 times large
Similar papers:
  • Modeling Occlusion by Discriminative AND-OR Structures [pdf] - Bo Li, Wenze Hu, Tianfu Wu, Song-Chun Zhu
  • A Deep Sum-Product Architecture for Robust Facial Attributes Analysis [pdf] - Ping Luo, Xiaogang Wang, Xiaoou Tang
  • Hybrid Deep Learning for Face Verification [pdf] - Yi Sun, Xiaogang Wang, Xiaoou Tang
  • Multi-stage Contextual Deep Learning for Pedestrian Detection [pdf] - Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
  • Joint Deep Learning for Pedestrian Detection [pdf] - Wanli Ouyang, Xiaogang Wang
A Method of Perceptual-Based Shape Decomposition [pdf]
Chang Ma, Zhongqian Dong, Tingting Jiang, Yizhou Wang, Wen Gao

Abstract: In this paper, we propose a novel perception-based shape decomposition method which aims to decompose a shape into semantically meaningful parts. In addition to three popular perception rules (the Minima rule, the Short-cut rule and the Convexity rule) in shape decomposition, we propose a new rule named part-similarity rule to encourage consistent partition of similar parts. The problem is for- mulated as a quadratically constrained quadratic program (QCQP) problem and is solved by a trust-region method. Experiment results on MPEG-7 dataset show that we can get a more consistent shape decomposition with human per- ception compared with other state-of-the-art methods both qualitatively and quantitatively. Finally, we show the ad- vantage of semantic parts over non-meaningful parts in ob- ject detection on the ETHZ dataset.
Similar papers:
  • Building Part-Based Object Detectors via 3D Geometry [pdf] - Abhinav Shrivastava, Abhinav Gupta
  • A Fully Hierarchical Approach for Finding Correspondences in Non-rigid Shapes [pdf] - Ivan Sipiran, Benjamin Bustos
  • Bird Part Localization Using Exemplar-Based Models with Enforced Pose and Subcategory Consistency [pdf] - Jiongxin Liu, Peter N. Belhumeur
  • Learning a Dictionary of Shape Epitomes with Applications to Image Labeling [pdf] - Liang-Chieh Chen, George Papandreou, Alan L. Yuille
  • Cascaded Shape Space Pruning for Robust Facial Landmark Detection [pdf] - Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
Action Recognition and Localization by Hierarchical Space-Time Segments [pdf]
Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, Stan Sclaroff

Abstract: We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this represen- tation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hi- erarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time seg- ments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time pro- duce good action localization results.
Similar papers:
  • Action Recognition with Actons [pdf] - Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • Video Segmentation by Tracking Many Figure-Ground Segments [pdf] - Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, James M. Rehg
Constant Time Weighted Median Filtering for Stereo Matching and Beyond [pdf]
Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu

Abstract: Despite the continuous advances in local stereo match- ing for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for dis- parity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggre- gation. We also develop the first constant time algorithm for the previously time-consuming weighted median filter. This makes the simple combination box aggregation + weight- ed median an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering un- leashes its potential in other applications that were ham- pered by high complexities. We show its superiority in var- ious applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.
Similar papers:
  • Multi-channel Correlation Filters [pdf] - Hamed Kiani Galoogahi, Terence Sim, Simon Lucey
  • Training Deformable Part Models with Decorrelated Features [pdf] - Ross Girshick, Jitendra Malik
  • Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation [pdf] - David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias Ruether, Horst Bischof
  • A Rotational Stereo Model Based on XSlit Imaging [pdf] - Jinwei Ye, Yu Ji, Jingyi Yu
  • PM-Huber: PatchMatch with Huber Regularization for Stereo Matching [pdf] - Philipp Heise, Sebastian Klose, Brian Jensen, Alois Knoll
Domain Transfer Support Vector Ranking for Person Re-identification without Target Camera Label Information [pdf]
Andy J. Ma, Pong C. Yuen, Jiawei Li

Abstract: This paper addresses a new person re-identification problem without the label information of persons under non-overlapping target cameras. Given the matched (posi- tive) and unmatched (negative) image pairs from source do- main cameras, as well as unmatched (negative) image pairs which can be easily generated from target domain cameras, we propose a Domain Transfer Ranked Support Vector Ma- chines (DTRSVM) method for re-identification under target domain cameras. To overcome the problems introduced due to the absence of matched (positive) image pairs in target domain, we relax the discriminative constraint to a nec- essary condition only relying on the positive mean in tar- get domain. By estimating the target positive mean using source and target domain data, a new discriminative model with high confidence in target positive mean and low con- fidence in target negative image pairs is developed. Since the necessary condition may not truly preserve the discrim- inability, multi-task support vector ranking is proposed to incorporate the training data from source domain with label information. Experimental results show that the proposed DTRSVM outperforms existing methods without using label information in target cameras. And the top 30 rank accura- cy can be improved by the proposed method upto 9.40% on publicly available person re-identification datasets.
Similar papers:
  • Cross-View Action Recognition over Heterogeneous Feature Spaces [pdf] - Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia
  • Frustratingly Easy NBNN Domain Adaptation [pdf] - Tatiana Tommasi, Barbara Caputo
  • Unsupervised Domain Adaptation by Domain Invariant Projection [pdf] - Mahsa Baktashmotlagh, Mehrtash T. Harandi, Brian C. Lovell, Mathieu Salzmann
  • Unsupervised Visual Domain Adaptation Using Subspace Alignment [pdf] - Basura Fernando, Amaury Habrard, Marc Sebban, Tinne Tuytelaars
  • Domain Adaptive Classification [pdf] - Fatemeh Mirrashed, Mohammad Rastegari
Latent Multitask Learning for View-Invariant Action Recognition [pdf]
Behrooz Mahasseni, Sinisa Todorovic

Abstract: This paper presents an approach to view-invariant ac- tion recognition, where human poses and motions exhibit large variations across different camera viewpoints. When each viewpoint of a given set of action classes is specified as a learning task then multitask learning appears suitable for achieving view invariance in recognition. We extend the standard multitask learning to allow identifying: (1) latent groupings of action views (i.e., tasks), and (2) discrimi- native action parts, along with joint learning of all tasks. This is because it seems reasonable to expect that certain distinct views are more correlated than some others, and thus identifying correlated views could improve recognition. Also, part-based modeling is expected to improve robust- ness against self-occlusion when actors are imaged from different views. Results on the benchmark datasets show that we outperform standard multitask learning by 21.9%, and the state-of-the-art alternatives by 4.56%.
Similar papers:
  • Learning View-Invariant Sparse Representations for Cross-View Action Recognition [pdf] - Jingjing Zheng, Zhuolin Jiang
  • Action Recognition with Actons [pdf] - Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
  • The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection [pdf] - Mihai Zanfir, Marius Leordeanu, Cristian Sminchisescu
  • Learning Maximum Margin Temporal Warping for Action Recognition [pdf] - Jiang Wang, Ying Wu
  • Concurrent Action Detection with Structural Prediction [pdf] - Ping Wei, Nanning Zheng, Yibiao Zhao, Song-Chun Zhu
Progressive Multigrid Eigensolvers for Multiscale Spectral Segmentation [pdf]
Michael Maire, Stella X. Yu

Abstract: We reexamine the role of multiscale cues in image seg- mentation using an architecture that constructs a globally coherent scale-space output representation. This charac- teristic is in contrast to many existing works on bottom-up segmentation, which prematurely compress information into a single scale. The architecture is a standard extension of Normalized Cuts from an image plane to an image pyramid, with cross-scale constraints enforcing consistency in the so- lution while allowing emergence of coarse-to-fine detail. We observe that multiscale processing, in addition to im- proving segmentation quality, offers a route by which to speed computation. We make a significant algorithmic ad- vance in the form of a custom multigrid eigensolver for con- strained Angular Embedding problems possessing coarse- to-fine structure. Multiscale Normalized Cuts is a special case. Our solver builds atop recent results on randomized matrix approximation, using a novel interpolation opera- tion to mold its computational strategy according to cross- scale constraints in the problem definition. Applying our solver to multiscale segmentation problems demonstrates speedup by more than an order of magnitude. This speedup is at the algorithmic level and carries over to any imple- mentation target.
Similar papers:
  • Volumetric Semantic Segmentation Using Pyramid Context Features [pdf] - Jonathan T. Barron, Mark D. Biggin, Pablo Arbelaez, David W. Knowles, Soile V.E. Keranen, Jitendra Malik
  • Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length [pdf] - Zuzana Kukelova, Martin Bujnak, Tomas Pajdla
  • Efficient Higher-Order Clustering on the Grassmann Manifold [pdf] - Suraj Jain, Venu Madhav Govindu
  • Coarse-to-Fine Semantic Video Segmentation Using Supervoxel Trees [pdf] - Aastha Jain, Shuanak Chatterjee, Rene Vidal
  • Distributed Low-Rank Subspace Segmentation [pdf] - Ameet Talwalkar, Lester Mackey, Yadong Mu, Shih-Fu Chang, Michael I. Jordan