Table A.1 presents additional examples to Table 6 in the main paper, which illustrates how the parameter prediction network understands questions. The retrieved sentences are determined by common words before fine-tuning while they focus on the task to be solved after fine-tuning. More examples of retrieved sentences can be found at the link below.
Before fine-tuning | After fine-tuning |
What is the orange item below the person's feet? | What is written on the surfboard? |
What is near the baby elephant's feet? | What is written on the surfboard? |
What color are the shoes on the person's feet? | What is written on the skateboard? |
What are the long sticks under the person's feet? | What is written on the bottom of the snowboard? |
What is on the person's feet? | What is written on the airplane's tail? |
What is underneath the girl's feet? | What is written on the hydrant? |
What are on the person's feet? | What is written on the signpost? |
What is below the zebra's feet? | What is written on the pipes? |
What are on the child's feet? | What is written on the leftmost pant leg? |
What color are the bear's feet? | What is written on the ramp? |
What color are the horse's feet? | What is written on the kayak? |
What shape is on the bottom of the bear's feet? | What is written on the black bag ? |
Where are the brown faced dog's feet? | What is written on the wall next to the skier? |
What is attached to the man's feet? | What is written on the chalkboard? |
What is laying by the man's feet? | What is written on the ramp rail? |
What is on everyone's feet? | What is written at the top of the racket body? |
What is on the bottom of the man's feet? | What is written on the wall of the pitch? |
What is on the woman's feet? | What is written on the donuts? |
Figure A.1 presents additional examples to Figure 4(a) in the main paper, which illustrates the network's ability to perform various recognition tasks depending on questions. The network often fails in the questions involving the difficult tasks to learn only with image level annotations (e.g. object detection). More results can be found at the link below.
|
|
|
|
Figure A.2 presents additional examples to Figure 4(b) in the main paper, which illustrates that the network performs various recognition tasks determined by the question fairly on various images. More results can be found at the link below.
Q: What game is he playing? | |||
DPPnet: baseball | DPPnet: tennis | DPPnet: wii | DPPnet: frisbee |
Q: Where is the horse ? | |||
DPPnet: sidewalk | DPPnet: street | DPPnet: in field | DPPnet: behind fence |