# 302 specific

There are a lot of resources that will benefit you in a class like 302. Here is a nice, long list.

## R and R Studio

We’re using an incredibly powerful statistical software package in this class. Unfortunately, it does come with a bit of a learning curve. You won’t be required to learn the nitty-gritty ins and outs of the R scripting language but you will need to know the basics of what R can do. Luckily, we have RStudio to really help out with that. Below are some introductory articles and videos.

This simply needs to be installed. We’ll be using RStudio rather than the simple R console.

Pick the version for your operating system.

• Quick-R R Tutorial

A quick (obviously) introduction to some of the most basic aspects of R.

• The built-in help in RStudio

Inside RStudio, simply type help.start() in the console and voila, the manuals and reference materials appear in the Help module. If you prefer a more organized manual, here’s a getting started book.

• R for Cats

In case you like cats.

• Cheatsheets

Print them out and keep them at your desk. The RStudio IDE and RMarkdown cheatsheets are particularly useful.

• swirl

swirl is a fantastic collection of courses (ranging between 10 and 20 minutes each) designed to help you learn R programming while immersed in R!

• RMarkdown for Beginners

Guess what this website is built in. Yep! RMarkdown. If you want to create your assignments in RMarkdown and submit them that way you are more than welcome to do so. You can even use a plugin called papaja to create perfectly formatted APA papers for you.

• Pandoc

If you need to convert files from one markup format into another, pandoc is your swiss-army knife. (It converts from everything to everything and you’ll never need to touch it; everything happens through RStudio. —Dr S)

• LaTeX (variations)

You will be wanting to save your R results as PDFs at some point. To do this you will need to install LaTeX, a mathetmatical typesetting system. It’s required to convert code into symbols. Here’s a good but brief introduction to LaTeX with RStudio. You can install MiKTeX in Windows and MacTeX on a Mac.

So you can write something like:

$$y_{ij} = b_{ij} + \beta_{0} + \beta_{1}$$

and get the following displayed in your document:

$y_{ij} = b_{ij} + \beta_{0} + \beta_{1}$

## Big Data

• What Is Big Data? A Super Simple Explanation For Everyone

The term “Big Data” may have been around for some time now, but there is still quite a lot of confusion about what it actually means. In truth, the concept is continually evolving and being reconsidered, as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science and the Internet of Things. But what exactly is Big Data and how is it changing our world?

• What is Big Data?

Big data encompasses a wide range of analytics and data-gathering strategies. Essentially, it’s the ability to capture, store and analyze data on a mass scale to inform business decisions. It follows basic logic: The more you know about a problem or issue, the more reliable the solution.

## Data Mining

• What is Data Mining?

Data Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related - also known as “big data”) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. The process of data mining consists of three stages: (1) the initial exploration, (2) model building or pattern identification with validation/verification, and (3) deployment (i.e., the application of the model to new data in order to generate predictions).

• Revealing Online Learning Behaviors and Activity Patterns and Making Predictions with Data Mining Techniques in Online Teaching

(Abstract) This study was conducted with data mining (DM) techniques to analyze various patterns of online learning behaviors, and to make predictions on learning outcomes . Statistical models and machine learning DM techniques were conducted to analyze 17,934 server logs to investigate 98 undergraduate students’ learning behaviors in an online business course in Taiwan . The study scientifically identified students’ behavioral patterns and preferences in the online learning processes, differentiated active and passive learners, and found important parameters for performance prediction. The results also demonstrated how data mining techniques might be utilized to help improve online teaching and learning with suggestions for online instructors, instructional designers and courseware developers.

• How to Catch a Liar on the Internet

Technology makes it easier than ever to play fast and loose with the truth—but easier than ever to get caught.

• R and Data Mining

This website presents documents, examples, tutorials and resources on R and data mining.

## Text Mining

• Text Mining with R

This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.

• Text Mining(Big Data, Unstructured Data)

The purpose of Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents or to compute summaries for the documents based on the words contained in them. Hence, you can analyze words, clusters of words used in documents, etc., or you could analyze documents and determine similarities between them or how they are related to other variables of interest in the data mining project. In the most general terms, text mining will “turn text into numbers” (meaningful indices), which can then be incorporated in other analyses such as predictive data mining projects, the application of unsupervised learning methods (clustering), etc. These methods are described and discussed in great detail in the comprehensive overview work by Manning and Schütze (2002), and for an in-depth treatment of these and related topics as well as the history of this approach to text mining, we highly recommend that source.

• Why Text Mining May Be the Next Big Thing

“Big Data” is a hot topic in the business world these days. But there’s a subset of this broad field that has yet to take a turn in the spotlight. It’s called “text mining,” and you’re probably going to be hearing a lot more about it over the coming months and years. Basically, text mining is the process of combing through countless pages of plain-language digitized text to find useful information that’s been hiding in plain sight.

This post is an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists). I’m putting it on the blog since some of the links might be useful for a broader audience.

• Text mining: what do publishers have against this hi-tech research tool?

Researchers push for end to publishers’ default ban on computer scanning of tens of thousands of papers to find links between genes and diseases

## Cluster Analysis

• Clustering: An Introduction

Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters

• Cluster Analysis Introduction (StatSoft)

The term cluster analysis (first used by Tryon, 1939) encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies.

• Hierarchical Clustering Algorithms

An introduction to hierarchical clustering algorithms.

## Privacy, Ethics, and Social Issues

• Ethics, Big Data, and Analytics: A Model for Application

The use of big data and analytics to predict student success presents unique ethical questions for higher education administrators relating to the nature of knowledge; in education, “to know” entails an obligation to act on behalf of the student. The Potter Box framework can help administrators address these questions and provide a framework for action.

• The Promise of Big Data in Public Safety and Justice

Making data easier to digest for more law enforcement users.

• How the NSA Spied on Americans Before the Internet

In May 1984 — an apt year for columns about “Big Brother” — The Post’s Michael Schrage warned of a future in which the government could snoop on unsuspecting citizens by subpoenaing their floppy discs. Personal computers were new, expensive and not particularly common; the first dot-com domain wasn’t even registered until the following year.

• NSA gathered thousands of Americans’ e-mails before court ordered it to revise its tactics

For several years, the National Security Agency unlawfully gathered tens of thousands of e-mails and other electronic communications between Americans as part of a now-revised collection method, according to a 2011 secret court opinion.

There is no end to the resources that would be helpful in a class, generally. Regardless, here’s a finite collection.

# Writing

Not everything here is going to apply to every assignment but they will definitely come in handy at some point.

• Purdue’s Online Writing Lab (OWL)

Bookmark this. Get to know it. Shower it with affection. This will likely be your best friend in university.

• Microsoft Word APA template

This is a great way to make sure you’re actually following the required style guide, especially if you’ve never done it before. Of course, I would suggest you try using an automated process that lets you focus on the quality of your content and less on how it looks.

• Reverse dictionary

Incredibly handy little tool, this. For when you just can’t think of the right word or concept. From the website: “This tool lets you describe a concept and get back a list of words and phrases related to that concept. Your description can be anything at all: a single word, a few words, or even a whole sentence. Type in your description and hit Enter (or select a word that shows up in the autocomplete preview) to see the related words.”

• The De-Jargonizer

Shows you just how accessible your writing is in terms of scientific jargon.

• Writing a Literature Review

The literature review is intended to be a comprehensive look at the literature on a particular topic. Writing one can be devilishly difficult. Here is a guide.

• Writing a paper with Citavi

In case you go down the Citavi rabbithole.

• Accidental Plagiarism and How to Avoid It

Using a citation management system like the ones below can go a very long way to preventing this. A very important read.

# Tutoring

UA South provides free tutoring for writing and math, and various other related subjects, at multiple locations and fully online. Students can access free tutoring in- person at our Cochise and Yuma County locations, at the UA Think Tank in Tucson, as well as fully online from the UA Think Tank.

To find tutoring hours and availability near you, please select your location below to find the tutoring available at your learning center.

# Databases

A quick resource. Good for getting an overview. Certainly is not a replacement for some of the more specialized databases (see below).

Great way to search lots of social sciences research. Filters are your friend. Lots of peer reviewed journals and will likely serve as your source for much of the foundational theory and core research. Tip: pay attention to suggested readings after finding an article that’s very well suited to your search.

• ERIC

Educational Resources Information Center. Sponsored by the Department of Education. Focus is on pedagogy and education in general, not technology (though it may be included).

• LearnTechLib

Formerly EdITLib. Contains a considerable amount of research from technology-focused conference proceedings, journals, and eBooks. Great place to find niche research.

• CrossRef

Found the perfect article and want to see what’s cited it and progressed the research? This is your best friend.

• Scopus

Much like CrossRef but I’ve had better results with it.

# Software

• UA Library’s Citation Tools Overview

The library put together a brief page with descriptions and a comparison of the various citation mangement choices. It’s worth a look.

• Citavi

I love Citavi and really wish I could get into it more than I do. Maybe it’ll work for you. It’s very in-depth.

• EndNote Online

I believe EndNote is the library’s citation mangement software of choice.

• RefWorks

A popular citation plugin.

• Zotero

Another very popular citation and reference management system. Seems to be popular with the humanities.

• Mendeley

My citation software of choice and I’ve tried a lot of them. Downside (if this bothers you): it was purchased by Eselvier.