Despite the importance of video as a medium for communication, video editing is a skill with a high barrier to entry. To democratize access to content creation, we introduce the Video Editing and Reasoning Agent (VERA), an AI agent that can edit videos based on human instructions. VERA uses an encoded video representation, viewer context, and command to complete various editing tasks. To facilitate interaction with the agent, we contribute two interfaces: voice with one-at-a-time feedback and text with aggregated history. We evaluate the VERA system with human participants and find statistically significant evidence of the priming effect, in which the initial interface participants are exposed to shapes their subsequent interactions with and perception of the agent. We show that participants who began with the voice-based interface tended to issue commands that were broader in scope and more abstract than those who started with the text-based one. Voice users are also more likely to perceive the agent as a collaborator as opposed to an assistant. Our study's results contribute to the development of human-agent creative collaboration systems that help unlock creative potential.