Simplicity Alip

Download Simplicity Alip

Post on 22-Nov-2015




0 download

Embed Size (px)


this presentation is all about simplicity


  • Jia Li, Ph.D.

    The Pennsylvania State University

    Image Retrieval and Annotation via a Stochastic Modeling Approach

  • OutlineIntroductionImage retrieval: SIMPLIcityAutomatic annotation: ALIPA stochastic modeling approachConclusions and future work

  • Image RetrievalThe retrieval of relevant images from an image database on the basis of automatically-derived image featuresApplications: biomedicine, defense, commercial, cultural, education, entertainment, Web, Approaches: Color layoutRegion basedUser feedback

  • Building, sky, lake, landscape, Europe, treeCan a computer do this?

  • OutlineIntroductionImage retrieval: SIMPLIcityAutomatic annotation: ALIPA stochastic modeling approachConclusions and future work

  • The SIMPLIcity SystemSemantics-sensitive Integrated Matching for Picture LIbraries Major featuresSensitive to semantics: combine semantic classification with image retrievalRegion based retrieval:wavelet-based feature extraction and k-means clusteringReduced sensitivity to inaccurate segmentation and simple user interface: Integrated Region Matching (IRM)

  • Wavelets

  • Fast Image SegmentationPartition an image into 44 blocksExtract wavelet-based features from each blockUse k-means algorithm to cluster feature vectors into regionsCompute the shape feature by normalized inertia

  • IRM: Integrated Region MatchingIRM defines an image-to-image distance as a weighted sum of region-to-region distances

    Weighting matrix is determined based on significance constrains and a MSHP greedy algorithm

  • A 3-D Example for IRM

  • IRM: Major AdvantagesReduces the influence of inaccurate segmentationHelps to clarify the semantics of a particular region given its neighborsProvides the user with a simple interface

  • Experiments and ResultsSpeed800 MHz Pentium PC with LINUX OSDatabases: 200,000 general-purpose image DB (60,000 photographs + 140,000 hand-drawn arts)70,000 pathology image segmentsImage indexing time: one second per imageImage retrieval time: Without the scalable IRM, 1.5 seconds/query CPU timeWith the scalable IRM, 0.15 second/query CPU timeExternal query: one extra second CPU time


  • Current SIMPLIcity SystemQuery Results

  • External Query

  • Robustness to Image Alterations10% brighten on average8% darkenBlurring with a 15x15 Gaussian filter70% sharpen20% more saturation10% less saturationShape distortions Cropping, shifting, rotation

  • Status of SIMPLIcityResearchers from more than 40 institutions/government agencies requested and obtained SIMPLIcityWe applied SIMPLicity to:Automatic image classificationSearching of pathological imagesSearching of art and cultural images

  • OutlineIntroductionImage retrieval: SIMPLIcityAutomatic annotation: ALIPA stochastic modeling approachConclusions and future work

  • Image DatabaseThe image database contains categorized images.Each category is annotated with a few words.Landscape, glacierAfrica, wildlifeEach category of images is referred to as a concept.

  • A Category of ImagesAnnotation: man, male, people, cloth, face

  • ALIP: Automatic Linguistic Indexing for PicturesLearn relations between annotation words and images using the training database.Profile each category by a statistical image model: 2-D Multiresolution Hidden Markov Model (2-D MHMM).Assess the similarity between an image and a category by its likelihood under the profiling model.

  • Training Process

  • Automatic Annotation Process

  • Model: 2-D MHMM

    Represent images by local features extracted at multiple resolutions.Model the feature vectors and their inter- and intra-scale dependence.2-D MHMM finds modes of the feature vectors and characterizes their spatial dependence.

  • 2D HMM

    Each node exists in a hidden state.The states are governed by a Markov mesh (a causal Markov random field).Given the state, the feature vector is conditionally independent of other feature vectors and follows a normal distribution. The states are introduced to efficiently model the spatial dependence among feature vectors.The states are not observable, which makes estimation difficult.

    Regard an image as a grid. A feature vector is computed for each node.

  • 2D HMMThe underlying states are governed by a Markov mesh.


  • 2D MHMMAn image is a pyramid grid.A Markovian dependence is assumed across resolutions.Given the state of a parent node, the states of its child nodes follow a Markov mesh with transition probabilities depending on the parent state.

  • 2D MHMMFirst-order Markov dependence across resolutions.

  • 2D MHMM The child nodes at resolution r of node (k,l) at resolution r-1:

    Conditional independence given the parent state:

  • Annotation ProcessRank the categories by the likelihoods of an image to be annotated under their profiling 2-D MHMMs.Select annotation words from those used to describe the top ranked categories.Statistical significance is computed for each candidate word. Words that are unlikely to have appeared by chance are selected.Favor the selection of rare words.

  • Initial Experiment

    600 concepts, each trained with 40 images15 minutes Pentium CPU time per concept, train only oncehighly parallelizable algorithm

  • Preliminary ResultsComputer Prediction: people, Europe, man-made, waterBuilding, sky, lake, landscape, Europe, treePeople, Europe, femaleFood, indoor, cuisine, dessertSnow, animal, wildlife, sky, cloth, ice, people

  • More Results

  • Results: using our own photographsP: Photographer annotationUnderlined words: words predicted by computer(Parenthesis): words not in the learned dictionary of the computer

  • 10 classes:

    Africa,beach,buildings,buses,dinosaurs,elephants,flowers,horses,mountains,food.Systematic Evaluation

  • 600-class ClassificationTask: classify a given image to one of the 600 semantic classesGold standard: the photographer/publisher classificationThis procedure provides lower-bounds of the accuracy measures because:There can be overlaps of semantics among classes (e.g., Europe vs. France vs. Paris, or, tigers I vs. tigers II)Training images in the same class may not be visually similar (e.g., the class of sport events include different sports and different shooting angles)Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choiceI.e., ALIP is about 90 times more intelligent than a system with random-drawing system

  • More InformationJ. Li, J. Z. Wang, ``Automatic linguistic indexing of pictures by a statistical modeling approach,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075-1088,2003.

  • ConclusionsSIMPLIcity systemAutomatic Linguistic Indexing of PicturesHighly challengingMuch more to be exploredStatistical modeling has shown some success.

  • Future WorkExplore new methods for better accuracyrefine statistical modeling of imageslearning from 3D medical imagesrefine matching schemesApply these methods to special image databases very large databasesIntegration with large-scale information systems