With the ever-increasing urbanization process, the traffic jam has become a common problem in the metropolises around the world, making the traffic speed prediction a crucial and fundamental task. This task is difficult due to the dynamic and intrinsic complexity of the traffic environment in urban cities, yet the emergence of crowd map query data sheds new light on it. In general, a burst of crowd map queries for the same destination in a short duration (called "hotspot'') could lead to traffic congestion. For example, queries of the Capital Gym burst on weekend evenings lead to traffic jams around the gym. However, unleashing the power of crowd map queries is challenging due to the innate spatiotemporal characteristics of the crowd queries. To bridge the gap, this paper firstly discovers hotspots underlying crowd map queries. These discovered hotspots address the spatiotemporal variations. Then Dest-ResNet (Deep spatiotemporal Residual Network) is proposed for hotspot traffic speed prediction. Dest-ResNet is a sequence learning framework that jointly deals with two sequences in different modalities, i.e., the traffic speed sequence and the query sequence. The main idea of Dest-ResNet is to learn to explain and amend the errors caused when the unimodal information is applied individually. In this way, Dest-ResNet addresses the temporal causal correlation between queries and the traffic speed. As a result, Dest-ResNet shows a 30% relative boost over the state-of-the-art methods on real-world datasets from Baidu Map.