Illinois ASE REU

Sunday, June 26, 2016

From Accountancy to CS: Code Hunt Research and R Programming

6.26.16

Codehunt Research and R programming

From Accountancy to CS

This is the end of my third year at Illinois and coming from a guided accounting background, I initially had little experience in computer coding aside from experience in visual basic in excel. My background consists of economics, accounting, and statistical analysis. However, this was not a barrier when it came down to learning about my first computer programming language R.

After being introduced to current progress of Code Hunt and the direction in which our research would be headed I began to self teach the basics of R using a program called R Studio. I began watching videos on youtube, Khan Academy, and Lynda to learn about the basics of coding and the functionality of R. My first main task in Code Hunt was to understand the structure of the game and analyze the raw user data to produce statistical analysis based on last years 48 hour player competition.

At first glance R seemed very complex and impossible to figure out without any prior computer programming or coding knowledge. I began testing it out by running simple lines of code to import data sets from .csv files, sorting and creating subsets, learning about building matrices. Next I felt comfortable moving on to plotting data by creating bar plots, and box plots. This took a fair amount of time to grasp the concepts of placing each variable within the code and troubleshooting error messages but nonetheless it was worth figuring out. Now I have created a range of graphs and multitude of R scripts to be used for further analyzing our data.

One of my latest challenges was exploring gitHub and bitBucket for cloning source codes and producing commits, which seem to be common in the realm of computer science. With the help of my team I was able to figure how to use the command terminal to git pull and git push files to finally upload into our team repository. Overall, I feel like every time I open R and begin digging through the data or clocking into the lab and meeting with our team I find more things to learn about which is the exciting part of research.

Joshua Reed

Monday, June 20, 2016

Learning from Code Hunt

In the early stages of research, I devoted my time to learn Python programming. To my surprise, the language was rather intuitive and user-friendly. Learning how to program in that language was not difficult at all. The only interesting thing about the language is declaring data types for variables is ambiguous. After learning how to code in Python, my first task was to take a look into the metadata extraction program that a student previously wrote. Originally, this program would iterate throughout all user data in the data release and write certain information about each level into a text file. The only user data that each level consisted of was the amount of attempts the person had on each level. This program does not differentiate between if the submission was in the Java or C# language. After making some modifications and multiple revisions of the data extraction program, the program was able to give us more information about each player's submission. It can also create two data sets; one in Java and one in C#.

For the past few weeks, the research that I have been conducting with my team has been a valuable learning experience. Engineers usually have to implement critical thinking in order to solve problems or decipher data that is presented to them. So far Dr. Xie has helped me and the research team utilize our thinking and problem solving skills to understand what we can learn from Code Hunt users, based off of their submissions. Currently, we are in the phase of gathering materials to write a research paper on our Code Hunt users. There is already a research paper that has been published. What we plan to do is try to draw up conclusions that would be useful to more than one group of people. Those stakeholders include professors in computer science, game designers for Code Hunt, companies that seek to teach their employees how to program, and even the users themselves. There is much more to accomplish in these next 10 days before our first deadline.

Tuesday, August 25, 2015

Python vs. C#

When I was first given the task of creating the meta data extraction program, I was given a choice between Python or C# to code the program. I had to carefully weigh my options because I needed to consider which language would allow me to complete the task with the least complications. Before I made my decision I looked at the following factors: cross platform development, availability of language features, syntax and familiarity. I required cross platform development because I knew from the beginning that I needed to create a program that would run on Windows and Unix based operating systems. Language features were important because there were certain directory navigation actions and functions that would be imperative to constructing my program. These functions needed to be in standard importable modules/libraries for the language that I chose. Syntax was important for very obvious reasons, I needed something that was easy for me to read.
Finally, familiarity was key because it would be a better for me to have experience with the language or a similar language. In my final decision, I chose Python because it had all these features and I had already coded smaller programs in Python. I have done things with C and Java but I felt that the leap to C# would slow down my progress with the task. Coupled with the fact that the General Use Machine Learning for Learning Library by Khanacademy was written in Python, it was an easy decision to make.

Sunday, August 23, 2015

Google is Better than Noodles

Upon completion of my data extraction program, it is vital that I reflect on what I have experienced in the process of its creation. Spending several hours typing and debugging code yield the fruition of my first self-designed program as an engineer and programmer. It pleases me to learn that I have the capability to design and realize a project but there is something more important that I have learnt. Google is the greatest invention conceived by the human race since fire. There were several times during my journey to complete my program that I would encounter enigmatic bugs and Google would come to my rescue. An example of this is when I was attempting to provide support for Windows on the data extraction program. I had researched the directory layout for Windows systems and discovered that looked something like this: C:\Users\fisiaka2. The Windows directory system uses back-slashes unlike Unix directory systems, which use front-slashes. To me this seemed like a simple implementation, ask the user what system they were using and choose the starting string based on that. Never in my wildest dreams did I expect a basic data type like a string to make me question my competence as a programmer. I received a plethora of errors no matter how I organized my strings for the Windows platform. Bewildered and frustrated, I googled the errors that IDLE (a Python Interactive Development Environment) was maliciously spewing at me. “Why do solutions to these errors always end up so simple?” I asked myself. One user on Stackoverflow explained that backslashes are not considered to be regular characters in Python. And make them part of a string, the string must be modified to be a raw string! All I had to do was put the letter r before the quotation marks in the string and just like that Windows support had been provided. So, this is a huge shoutout to Google for being so useful.

Saturday, August 1, 2015

The Internet As An Educational Tool: Accessibility Of Information On An Ever Growing Worldwide Web

If sifting through numerous sites on the web has taught me anything, it's that dissemination of information is still very chaotic. You get a sense of that on social media and news outlets. But I'm reflecting on education. Take the example of computing and developer support. I previously assumed that information sources and forums such as Git Hub, MSDN, Stack Overflow and many more would bring a semblance of direction in exploring uncharted topics in computing and programming. I might have been a bit too hopeful. I often find myself unable to find information that should be relatively easy to obtain, and sometimes even information that I know exists on the web. The most recurring issue, however, is of information that falls short of my needs. While some of this can be chalked down to the imperfections of search engines (which are nonetheless immeasurably helpful), there is a more obvious reason: data on the web is the input of people. Information gets left out, mixed up, generalized to a fault, or specified to a limited cause. The problem that is then presented to you, the consumer of such information, is to unravel it. This in itself is a good thing, as there is no better way to gain mastery of it. But when pieces are left out, you can hardly piece together a coherent picture, much less a full one. Weigh in the proneness to error in unregulated (or badly regulated) sources and you have a picture which may not even be correct. If you want to avoid this, you are then forced to limit your search to sites of established credibility or collaborative user input. The luxury of time needed to consume and vet information from all sources has long been noted- a single Google search returns millions of results, a small fraction of which is relevant to the purpose of the search.

So what happens when you do find relevant information to your cause? One possibility is that you have exactly what you need. Another is that you get something that you can adapt to your need, based on your previous knowledge on the subject matter, based on common sense, or based on the directions of a contributor. A third possibility is that you find information that you cannot act on- information that is beyond your comprehension, or not applicable to your instance. For much of my use of the internet, I haven't had to entertain this third possibility; but as I have delved into topics of increasing complexity, it's become an all too common theme in both my research project and my education. Why?

I have two reasons, the first being that some information on the internet comes without structure. In the different fields, there exists support in navigating processes. Such support is robust only at basic levels. Anything beyond that comes in bits and pieces. Let me introduce an example I have encountered in the research project I am currently undertaking: dupFinder is not well documented, perhaps because mostly experienced users of code analysis software have need for it, so it takes certain knowledge for granted, and the instructions for it's use that are available on the web are all duplicates of the original- scant, vague, and containing no reference to information that could be helpful in working up to the level of demystifying it. MSDN on the other hand, is well documented, because it is intended to be a reference point for developers of varying experience. But once you begin to leave the realms of programming that do not directly involve languages and their syntax, the extent of it's support is overreached, and the chaos of information takes over. The fall off in documented support as I progress from basic programming to more complex programming is quite sudden. The other reason is that there aren't as many people operating at the higher levels of complexity, and even fewer of that section contributing to these sources.

As the internet grows, it's been heralded as the educational tool of the future, but if these conditions remain, and with more people turning to the internet to facilitate this process, won't it be as problematic as ever at the advanced levels? Or would the information supply at advanced levels boost? We can't leave this one to chance- we've got to start thinking critically about how we organize the data we add to the web.

Friday, July 17, 2015

Is technology hindering communication?

Am I the only person who's noticed that the rapid evolution of technology has been reducing interpersonal communication? Have you ever been walking down the street alone, noticed a person walking by the opposite way, and pulled your phone out to avoid making eye contact with that person? Have you ever been in a room full of strangers and, instead of introducing yourself and starting conversation, went to Instagram, Twitter, or Facebook for entertainment to pass the time? I am aware that the evolution of technology is to make life easier, and cell phones make it easier to stay connected and communicate with your friends, but will we ever meet new people if we continue this way? The generations after mine are more tech savvy than ever but I'm guessing their interpersonal skills are lacking. Kids these days substitute video games for board games, substitute NBA 2k for actually going to the basketball courts, etc.

The most important question here is will this cause the younger generations to lack interpersonal skills? Is this actually a problem that society faces today, or is it just society evolving as it has always done? Interpersonal, teamwork, leadership, and communication are four of the top five skills that employers look for, all of which are hard to acquire if kids never seek different opportunities because everything is already at their fingertips on the internet. Maybe I'm just sour that things are easier for the younger generations than they were for me. Or maybe it's an apparent problem with society today that be studied to improve technological uses and advancements for the better. Only time will tell.

Monday, June 22, 2015

My Journey Into New Territory: C#, Roslyn and APIs

I've been trying to teach myself C#, mostly by playing the Code Hunt puzzles to learn its differences from C and C++, and writing basic programs. I did this in the name of speedy learning- something I previously considered my forte. As it turns out, this isn't a good way to get a solid foundation in any language, so writing more detailed code in C# for my research tasks has been rough. I've had to go back and relearn everything I've known about functions, objects, classes, overloading, and file i/o from C and C++, because besides the C in it's name, there isn't much in common with the other languages I know. The sunny side, however, is that I have now been exposed to completely new methods and system calls that implicitly perform a whole bunch of operations. I am now confident that when dealing with advanced tasks, C# has my back. I've also had to learn about file and directory manipulations, and from navigating and searching directories to creating and writing files I've gotten a lot of insight into the workings of installer packages. My conclusion is thus: C# is pretty neat. I think I'll make it my default language.
In the past weeks I've had to learn about APIs, a new topic for me. An Application Program Interface, consists of building blocks for software. APIs determine connections and compatibility between software components, and you use them to create graphical user interfaces. An API gives you a set of routines and you make calls to the API to do what you want. They are used in just about every website you love, from Google Maps and YouTube to Twitter and Amazon. The API I will be using is the Code Hunt REST API, which essentially provides core data from the website, for example data from users such as their experience, and number of attempts for a given level. Every programmer is going to come into solid contact with APIs, so this is a very important experience. I'm going to get my hands dirty with the Code Hunt API as soon as I can figure out what calls I can make to it, and just how it can work with a compiler.
When I first began coding I came across the .NET Compiler Platform known as Roslyn, a compiler platform for C# and Visual Basic. Because it's compilers come with APIs for code analysis, it will be a very useful tool in the weeks ahead. It is only available on Visual Studio 2015, so if you intend to use it, you have to download and install VS 2015 Community or Enterprise(Community is free but Enterprise costs $$$). I plan to use it to parse the user code obtained from the Code Hunt API. Looking at sample code that shows how the two work together is quite daunting. I can hardly get a sense of which calls are to the REST API and which code is for Roslyn. I will have to test the sample code to find out which is which. That will be my exploration for the immediate future. I'll let you know what I find.