HackUMass VI Project: CodeAbility

Description:

CodeAbility gives those who don't have full use of their hands the opportunity to code.

Inspiration:

Repeated stress injury, early onset arthritis, or amputation cause many programmers to lose the ability to perform their craft. One of our member's sister, in fact, is one such person who is hindered in her programming ability. Often, solutions implement speech to text software to help their users. However, none of them manages to do so in a manner that is both efficient and user-friendly. For example, common speech software offers user-friendliness in allowing their users to speak English, but fail to provide efficiency in the synthesis of programming syntax; meanwhile, other software implements large amounts of jargon and difficult keywords, challenging their users with a high learning curve. All the made-up keywords cause listening to someone use this software to sound like a foreign language. In addition, all speech to text software is clumsy when navigating through lines and pages of code. With CodeAbility, we wanted to provide a program that combines the natural speech input and efficient navigation and syntax entry into one revolutionary system.

What it does:

Our project listens to natural language converts natural-sounding human speech directly into Python, provides spoken tools for editing and revising previous input, and implements a foot-based pedal system for expedited and simple navigation. The user can dictate as though they were explaining an algorithm to a friend, and the program converts this into properly formatted code with the correct syntax. Then, the user can give voice commands to perform actions on and edit the file, based on the commands of the popular editor Vim. Finally, fully customizable foot pedals allow the user to navigate between lines and tabs to use as easily as they would with a mouse.

How we built it:

The Google Speech API powers our speech recognition pipeline. We take the resulting transcription of our structured english language and generate an abstract syntax tree. The syntax tree represents the grammar of our constructed language, and allows us to convert it to runnable python. It also allows resolution of ambiguous phrases and deep, nested constructions. Finally, we passed the Python to a keyboard emulator, which used inputs commands into Atom with a vim emulation layer. We use Vim commands to navigate the file and properly enter the code. For hardware, we used OnShape to design the foot pedals, creating different iterations with improved ergonomics to allow the user to actuate the foot pedals with minimal effort. We also added a status indicator on the press to dictate pedal to indicate whether the program was listening for commands or not. As for the electronics, we created an extension board for the Arduino to communicate with the foot pedals, with all the necessary components for debouncing and input management. The whole hardware setup is put together on a single platform so that the pedals are all in one place. The Arduino sends serial commands to a python script which process the inputs and deploys keyboard shortcuts according to a user configurable JSON file and the state of the secondary function pedal. The python processing also controls the status of the dictation indicator light.

Challenges we ran into:

The three of us on the team doing software happened to be running three different operating systems, so we ran into problems with key mappings and line endings in keyboard emulation. We also had to figure out how to properly dictate certain phrases to avoid ambiguity, such as nested lists (we decided on a different delimiter per list), passing functions into functions and telling the difference between a user command and a variable with the same name. Another related challenge we ran into was the Google API trying to implement grammar when we actually wanted the opposite. ‘To’ was a frequently added word when Google was trying to intelligently add prepositions to make the sentence sound better; which caused us a lot of problems. Specifically because after the speech is interpreted by the API, the original meaning is lost. So we wouldn’t know if the word was added by Google, or the user was trying to say ‘to’, ‘too’, or ‘two’. In the end, we found the specific phrases where ‘to’ was a common added word and filtered it out when necessary, and also made ‘too’ a reserved word to limit confusion. We also found difficulty in adapting to Google’s speech recognition API because we weren’t able to specify our own vocabulary list to the system. This led us to have to be adaptive and comprehensive in our recognition algorithms when google didn’t know or frequently got wrong a word we were trying to use. I (Ian) had to come back to python after 6 years, and so jumping back into it was certainly a challenge. When it came to the task of coding the computer side of the Arduino pedal interface, I was rather daunted at first—and didn’t really know where to begin. It took several iterations, but by the end, I found myself starting to get used to the syntax (I can’t say how many times I tried to use semicolons and curly brackets), and used to thinking in the language. I’m happy that I managed to come out with a working script, and that I got to get used to working in python again. I (Chris) had to design the foot pedals so that the buttons would be depressed just when the pedal was parallel to the base. Designing the pedals without any constraints was already challenging enough, due to the lack of measurement tools such as calipers. However, once I had finished designing the version 0 pedals, they were way too large to be usable and were more clumsy than useful. After several revisions though, I was able to scale the pedals to a reasonable size, incorporate the correct button geometry, and allow the user the depress the pedals with minimal effort.

Accomplishments that we're proud of:

I (Robert) had no prior experience in domain specific languages, and so I’m also quite proud of the language that I wrote. I went in worried that I’d spend the weekend in ambiguity hell, but the reality was much more pleasant. The parts of my language that I’m most proud of are a few clever tricks which allow the language to remain both terse and precise. Ask me about the “quantity” keyword and the varied list delimiters!. I (Lilia) am pretty proud of getting the speech to parse correctly. I had to especially consider how to tell the listener to start and stop listening without interrupting the flow of the listening and streaming the audio. I ended up going with a customizable timeout in conjunction with a keyword (exit) that would exit the listener. I (Sana) am proud because after a little over a year of coding experience I managed to make the keyboard emulator work despite all the challenges regarding the cross operating system issues as well as issues with specific characters because the api wasn’t mapped correctly. I (Ian), still with limited python experience, was very proud to be able to implement the configurable and dynamic switching python script that runs on the computer to send key presses on a serial event sent by Arduino. It was a difficult logic puzzle to implement the secondary function key to change between a momentary and toggle ptt switch, all doing so in a completely end-user configurable manner. In the end, I was very proud that I got it all to work smoothly. I (Chris) was very proud of designing the foot pedals from scratch and managing to correctly work around the awkward button geometries. As mentioned before, there were many intricacies dealing with the pedal-button interaction during pedal depression, but I was able to overcome those. I also was very proud of designing the small circuit which bridged the gap between the buttons and the Arduino, as it allowed the Arduino to read the button inputs without any weird software trickery. In general: On the software side, our code has minimal latency and excellent word recognition capability. On the hardware side, everything was custom designed from scratch with CAD, and then 3D-printed and assembled. The pedals have virtually no latency, are run in the background without user intervention, and are completely reconfigurable to any hotkey.

What we learned:

I (Robert) have been doing software for a long time, but have never touched compiler and language design (even with a 10 foot pole!)--it just seemed so boring. In reality, the technical core of this project is a transpiler from a structured English language to Python, which gave me pretty interesting insight into the parts of a compiler (tokenizer, lexer, abstract syntax trees, etc.), in a way that was quite pleasant. I (Chris) have been using CAD for a while, but during the design of the pedals, I learned several new features of OnShape that streamlined the design process. In general: On the software side, we implemented for the first time for all of us an abstract syntax tree, as well as use of the Google Cloud Speech Recognition API. We learned a lot about phrasing english so that our program could properly map it to python while still being meaningful English. We spent a lot of time debating the best way to input things and we also created alternate inputs for many commands because in natural speech we have many different ways of saying things that feel natural to different people. On the pedal interface side, we had never communicated over serial from an Arduino to a python script, or dealt with sending keystrokes from python to the computer.

What's next:

Ideally, we’d finish a feature-complete mapping of structured English to Python, so that you can generate any valid Python directly from speech. We would also like to improve real time interpreting, so that if the Google API makes a stupid mistake, the user will be able to correct it rather than the program complaining that it cannot compile the garbage input and making the user start from scratch. After that, expansion to terse languages other than Python, such as Go or Javascript would be natural. Because of our implementation of LARK, our codebase is already set up to be easily able to add another language for parsing.

Built with:

Pedals: Hardware: we used an Arduino to interface with 3d printed pedals, powered by momentary switches, connected with various circuit components, and linked with a USB cable to a computer send information over serial. Software: Python, Arduino C++ Voice: Software: Google Cloud Speech Recognition API, Python, LARK.

Prizes we're going for:

HAVIT RGB Mechanical Keyboard

Google Home Mini

$100 Amazon Gift Cards

Grand Prize

Raspberry Pi Arcade Gaming Kit

Team Members

Lilia Heinold, Sanskriti Sharma, Christopher Zhu, Ian Richardson, Robert Cunningham

View on Github