Twitter Data

Twitter & The school of data science

**Researchers who have been using or are planning on using SDS data resources (Twitter data & SOPHI), please see an important announcement here**

Social Media as a research domain emerged into the mainstream as an area of intense interest in the mid-late 2000s.

As a result, UNC Charlotte’s School of Data Science is one of the first institutions of its kind to develop a formal relationship with Twitter for the purpose of acquiring and maintaining volumes of social media generated data sufficient for research. We remain, in many ways, a model for others.

Now, with more than twenty approved use-cases on file, the School is thrilled to enter our sixth year of partnership with Twitter.

Available Types of Twitter Data

  • The 1% Stream (the Spritzer stream) – A free, real-time feed accessible to virtually anyone. It is a portion of the Twitter real-time stream, but is not a true statistical representation of the whole body of traffic. Twitter does “massage” it to determine its content. For example, they do not include tweets that contain certain information specific to the financial health of companies which readers could use to assess potential stock market actions. We store the content locally for use by researchers.

This data is ideal for researchers interested in time series analysis in which sampling is not an issue (e.g., aggregate analysis). The downside of this analysis is that currently, the data is only available in raw JSON form, separated into numerous ten minute files. Therefore, to access this data, the researcher needs to have programming skills to aggregate, filter and handle raw JSON files.

  • The Twitter Academic Research Track-ARPT-Academic researchers can now employ the Twitter Academic Research Product Track API. This great tool provides researchers access to free data sets. We encourage all UNC Charlotte faculty members to apply for the academic research product track. Addition information can be found at:
  • The Twitter Deacasehose Feed- SDS has access to the Twitter Decasehose Feed. This is a live-stream feed consisting of a statistically relevant 10% sample of the volume of data across Twitter.
  • Existing Datasets – We currently hold (for University affiliated researchers) datasets that we have previously extracted. These datasets are available for classwork. Please contact Rick Hudson for information on how to obtain approval for access to legacy datasets.
  • The COVID-19 Data Feed – This is a real-time feed from Twitter. The content of the stream is based on parameters selected by Twitter, which they have determined to indicate content specific to COVID-19. Access to this feed is restricted to UNC Charlotte personnel. External (to UNC Charlotte) collaborators seeking to use data from this source must request access directly from Twitter.

SDS is currently working to develop a platform to host this feed. Until it is completed, any UNC Charlotte faculty interested in requesting a sample file should contact Rick Hudson.

Requesting Twitter Resources

  • The 1% Stream and Existing Datasets: Contact Rick Hudson for more information.
  • The Twitter Decahose Feed: Interested researchers can contact Rich Hudson for access.