Conventional statistical programming languages (R, Stata, Julia, etc.) have long been popular tools in empirical economics research, but they are limited in two dimensions. First, querying a dataset in a language like Stata is time-consuming and error-prone—especially for users without a solid grounding in computer science. Second, the queries themselves must be rigid call-and-return statements: when a user asks Stata to give her the results of a regression of “x” on “y,” she needed to already know the relationship she wanted to examine and the best manner of analysis (here, linear regression).
To address these limitations, my partner and I are building a new computer tool—a graphical user interface application named, at least at the moment, “Athena.” On the front end, it uses machine learning to process natural language queries: there is no “syntax” involved in coding with Athena. On the back end, Athena permits more flexible, open-ended querying than any other language we know of.
An example to illustrate: you are looking to regress “x” on “y” using an instrument. If you use Stata you must choose an instrument, Google the right syntax (find it here: https://www.stata.com/manuals13/rivregress.pdf), and then write a perfect line of code. If you use Athena you can simply ask Athena—in normal English—to open up your dataset and find the ten strongest instruments to use in a regression of “x” on “y.” At that point you can put you economic “detective hat” on to determine which instruments are worth looking into.