leveraging apis in sas to create interactive visualizations · 2015-09-23 · leveraging apis in...

13
1 Leveraging APIs in SAS ® to Create Interactive Visualizations Brian Bahmanyar, Cal Poly, San Luis Obispo, California Rebecca Ottesen, Cal Poly, San Luis Obispo, California ABSTRACT The Internet is one of the richest data resources for statisticians and data analysts. Therefore, it is important to understand the ways in which one can programmatically gather and present web-based data. APIs make it possible for programmers to stream the offerings of big web services. We investigated various APIs and websites to outline the most practical methods for statisticians to tap into these data sources. After gathering and cleaning the data in SAS we implemented a 3D interactive visualization using R and Google Earth™. INTRODUCTION The Internet is an exceptionally large and, for the most part, free data resource for data scientists, statisticians, and data analysts. Therefore, it is desirable for data professionals to be able to efficiently gather, present, and perform analyses with data presented on websites. However, websites are designed for human use and interaction. For example, to gather Twitter posts on a particular topic you can point your web browser to https://twitter.com and then begin the cumbersome task of copying and pasting relevant tweets into a spreadsheet for review and analysis; clearly this is not a scalable way to gather data. Thus there is need for a method to programmatically grab a subset of a website’s data for personal use – enter the application programming interface (API). APIs allow clients to make structured requests to servers that return structured responses. The most common web protocol between clients and servers is the Hyper-Text Transfer Protocol (HTTP). There are several request methods that ask servers to perform different tasks, and because we are concerned with collecting data, we will focus on the GET method. Base SAS has a HTTP procedure, which allows us to issue HTTP GET requests and receive structured responses that can be stored in a text file. After obtaining this data, and reading it into a proper data set, it can be explored in many creative ways. In this paper we will discuss methods to efficiently collect and interactively visualize and data based on data science jobs gathered form Glassdoor’s and Indeed’s APIs. With such data we are often presented with many variables recorded on a large number of observations. These data become difficult to digest with a static graph, which leads to a need for a way to examine the data interactively. This provides the user a method to quickly understand the big picture of what the data are saying and then look more closely at different variables for particular subsets/observations. The R package RKML, provides a way to create 3D interactive visualizations atop Google Earth for data with geospatial qualities. This paper focuses on the data driven workflow of using SAS to make HTTP requests to APIs based on the response received other APIs. Then once the data have been collected and formatted in SAS, they are exported to R to create markup documents that can be used to interact with the data in Google Earth™. GLASSDOOR’S AND INDEED’S APIS Both Glassdoor and Indeed are employment-related websites which offer free APIs available to those who register for free accounts. Both of these sites handle authentication by providing keys, which must be explicitly passed into every HTTP call. Despite their similarities, Glassdoor and Indeed offer different kinds of employment-related data. Indeed has one API, which takes location parameters and a job title query and returns information on up to 25 specific job openings matching your specifications. Glassdoor supports several different interfaces, two of which are discussed in this paper. The first takes parameters for country and job title of interest, and returns aggregate counts for the number of matching job openings by state (Job-Stats API). The second takes parameters for a location and company of interest, and returns information on the company such as overall rating (and a rating count), pros and cons of working there, links to company website and logo (Company Search API). The information coming from these three APIs complement each one another by allowing us to first see locations where the most jobs are concentrated, then to search relevant job openings, and finally to review the ratings about the companies advertising these opportunities. Computer software is much faster and far less prone to error than a human, so we want automate our workflow and have SAS ® do most of the work. We have to explicitly make the first API call using the HTTP procedure, but the key is to efficiency is use the data we receive to drive subsequent API calls. To achieve this we first make a call to Glassdoor’s Job Stats API using “data science” for our job title of interest. This returns the number of data science related job openings per state. Then we take the top fifty percent of states with the most job openings and pass them dynamically to Indeed’s API to retrieve data on specific job postings in each of those states. We use this data to

Upload: others

Post on 22-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

1

Leveraging APIs in SAS® to Create Interactive Visualizations Brian Bahmanyar, Cal Poly, San Luis Obispo, California Rebecca Ottesen, Cal Poly, San Luis Obispo, California

ABSTRACT The Internet is one of the richest data resources for statisticians and data analysts. Therefore, it is important to understand the ways in which one can programmatically gather and present web-based data. APIs make it possible for programmers to stream the offerings of big web services. We investigated various APIs and websites to outline the most practical methods for statisticians to tap into these data sources. After gathering and cleaning the data in SAS we implemented a 3D interactive visualization using R and Google Earth™.

INTRODUCTION The Internet is an exceptionally large and, for the most part, free data resource for data scientists, statisticians, and data analysts. Therefore, it is desirable for data professionals to be able to efficiently gather, present, and perform analyses with data presented on websites. However, websites are designed for human use and interaction. For example, to gather Twitter posts on a particular topic you can point your web browser to https://twitter.com and then begin the cumbersome task of copying and pasting relevant tweets into a spreadsheet for review and analysis; clearly this is not a scalable way to gather data.

Thus there is need for a method to programmatically grab a subset of a website’s data for personal use – enter the application programming interface (API). APIs allow clients to make structured requests to servers that return structured responses. The most common web protocol between clients and servers is the Hyper-Text Transfer Protocol (HTTP). There are several request methods that ask servers to perform different tasks, and because we are concerned with collecting data, we will focus on the GET method. Base SAS has a HTTP procedure, which allows us to issue HTTP GET requests and receive structured responses that can be stored in a text file. After obtaining this data, and reading it into a proper data set, it can be explored in many creative ways.

In this paper we will discuss methods to efficiently collect and interactively visualize and data based on data science jobs gathered form Glassdoor’s and Indeed’s APIs. With such data we are often presented with many variables recorded on a large number of observations. These data become difficult to digest with a static graph, which leads to a need for a way to examine the data interactively. This provides the user a method to quickly understand the big picture of what the data are saying and then look more closely at different variables for particular subsets/observations. The R package RKML, provides a way to create 3D interactive visualizations atop Google Earth for data with geospatial qualities. This paper focuses on the data driven workflow of using SAS to make HTTP requests to APIs based on the response received other APIs. Then once the data have been collected and formatted in SAS, they are exported to R to create markup documents that can be used to interact with the data in Google Earth™.

GLASSDOOR’S AND INDEED’S APIS Both Glassdoor and Indeed are employment-related websites which offer free APIs available to those who register for free accounts. Both of these sites handle authentication by providing keys, which must be explicitly passed into every HTTP call. Despite their similarities, Glassdoor and Indeed offer different kinds of employment-related data. Indeed has one API, which takes location parameters and a job title query and returns information on up to 25 specific job openings matching your specifications. Glassdoor supports several different interfaces, two of which are discussed in this paper. The first takes parameters for country and job title of interest, and returns aggregate counts for the number of matching job openings by state (Job-Stats API). The second takes parameters for a location and company of interest, and returns information on the company such as overall rating (and a rating count), pros and cons of working there, links to company website and logo (Company Search API). The information coming from these three APIs complement each one another by allowing us to first see locations where the most jobs are concentrated, then to search relevant job openings, and finally to review the ratings about the companies advertising these opportunities.

Computer software is much faster and far less prone to error than a human, so we want automate our workflow and have SAS® do most of the work. We have to explicitly make the first API call using the HTTP procedure, but the key is to efficiency is use the data we receive to drive subsequent API calls. To achieve this we first make a call to Glassdoor’s Job Stats API using “data science” for our job title of interest. This returns the number of data science related job openings per state. Then we take the top fifty percent of states with the most job openings and pass them dynamically to Indeed’s API to retrieve data on specific job postings in each of those states. We use this data to

Page 2: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

2

extract the company names from each of the job postings, in each of the top states, and pass those companies into Glassdoor’s Company Search API and retrieve information on the specific company. The workflow for this process is shown in Figure 1.

Figure 0 - Automated API Workflow

1. Give country of interest; receive frequencies of data science jobs by state

2. Give states of interest; receive data science related job postings by state

3. Give company from job posting; receive overall rating, pros/cons, link to website, etc.

DYNAMIC API REQUESTS WITH PROC SQL PROC SQL was used to create SAS macro variables from data gathered with previous API responses. In Figure 0 going from Step 1 to Step 2 requires us to iterate over the resulting states of interest and to make a separate Indeed Job Search API call for each state. To accomplish this we use PROC SQL to extract data that represent the top 50% of states with regard to number of data science related jobs, according to the Glassdoor Job Stats API and then pass their names into a delimited macro variable.

Figure 1 - PROC SQL to Create Macro String

Knowing the number of states in this string a SAS %DO loop is used to iterate over the dynamically created state names using %SCAN. This can be passed to automatically make separate API calls for each state. This same technique is used to make a Glassdoor Company API call for each job, requiring two %DO loops. The outer loop iterates over each of the states and the inner loop iterates over each job within the state. It would be error prone and extremely tedious to make all these calls one by one so using software to dynamically automate this process will simplifies our workflow and maximizes our efficiency.

STRUCTURE OF OUR HTTP REQUESTS The structures of URLs for all the API calls used are very similar because they are all GET requests. The GET method simply retrieves data and makes no modifications to resources on the server.

In Figure 2 the portions of the URL colored in red don’t change among API calls; they represent the rules for the language that the server understands. However, the portions colored in blue will change from API to API; they represent what we want to say to the server. The server will take this message and return a structured response. The structure of this response will be covered briefly in the next section. For Glassdoor’s APIs the SITE_NAME is glassdoor and the API_ENDPOINT is api/api.htm. Certain parameters, such as authentication information and response format, are required for every request while other parameters are optional. These parameters, whether

http://api.SITE_NAME.com/API_ENDPOINT?PARAM1=VAL1&PARAM2=VAL2&PARAM3=VAL3…

Figure 2 - GET Method General Structure

Page 3: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

3

required or optional, can be supplied in any order. Figure 3 displays some of the parameters that can be specified in Glassdoor’s Job-Stats interface. Other optional parameters can be found in Glassdoor’s API Documentation, linked in the References Section at the end of paper.

Figure 3 - Glassdoor Job Stats API Parameters

Glassdoor’s Company Search API has the same SITE_NAME and API_ENDPOINT, with slightly different parameters as listed in Figure 4.

Figure 4 - Glassdoor Company Search API Parameters

For Indeed’s API the SITE_NAME is indeed, the API_ENDPOINT is ads/apisearch, and options for parameters are shown in Figure 5.

Page 4: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

4

Figure 5 - Indeed Job Search API Parameters

Now that we have made sense of the structure of our requests to the server lets observe the structure of the responses we get back.

STRUCTURE OF SERVERS HTTP RESPONSES The Extensible Markup Language (XML) and JavaScript Object Notation (JSON) are two of the most popular data exchange formats. We requested that the server respond in JSON format because it is much less verbose and thus technically faster (although any speed differences for responses of this size are negligible), and it is much more readable.

Figure 6 - XML Response

Page 5: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

5

Figure 7 - JSON Response

Figures 6 and 7 show, respectively, XML and JSON responses to the same Indeed Job Search API call. The XML response resembles a tree data structure and has opening and closing tags for every line of data. The JSON response resembles a map, or dictionary, data structure. Typically one of the JSON keys maps to an array (or list) of maps each of which represent a collective result object.

READING JSON RESPONSES IN SAS The HTTP procedure was used to make HTTP GET requests and then to store the JSON responses as text files. These JSON text files can be read into SAS using a DATA step with the SCANOVER option. SCANOVER lets us use a @’STRING’ construct to read in the text following STRING on that line. This is extremely useful when parsing JSON because of its key-value structure.

Figure 8 - Turning JSON into a SAS dataset

Figure 8 shows the DATA step that was used to read in JSON responses from Indeed’s Job Search API (shown as an example in Figure 7). As evident from the INPUT statement, the SCANOVER option is used to find the keys of

Page 6: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

6

interest and read in their corresponding values. We can then use either the COMPRESS() or SUBSTR() functions to clean the extra comma and quotation characters at the end of each line. The COMPRESS() function was used because if we wanted to use SUBSTR() we would also have to use the LENGTH() function to determine the stop point of our sub-string.

In this DATA step we also prepare the variables that will be passed into the next API call. URLs, of course, cannot contain spaces so we use a combination of TRANWRD() and TRIM() functions to strip the trailing whitespace and replace imbedded spaces with commas. Now the city and company variables are formatted properly and ready to be passed as parameters into Glassdoor’s Company Search API.

After SAS makes our final API call our SAS datasets are ready to be merged and exported for use in R.

KEYHOLE MARKUP LANGUAGE AND GOOGLE EARTH™ The Keyhole Markup Language (KML) is an XML notation, which can be used to represent special data on three dimensional earth browsers. Duncan Temple Lang wrote the RKML package provides R users with functionality to generate KML documents.

Figure 9 – ‘Barebones’ Google Earth Visualization (SF/Silicon Valley)

Figure 9 shows a ‘barebones’ KML document opened in Google Earth™. The left image shows a high density of Data Science related jobs in the Silicon Valley/San Francisco area, which is so dense that it is difficult to see the individual jobs. Fortunately, Google provides the ability to expand on a cluster of jobs, as seen in the image on the right side of Figure 9. However, this visualization is not very interactive nor aesthetically pleasing. To enhance this image the KML markup can embed other technologies such as HTML and CSS. HTML, or HyperText Markup Language, is the standard markup language that is used to create web pages. It is the glue that holds and gives structure to the many images, objects, and text that make up a standard web page. CSS, or Cascading Style Sheets, is a styling language that is used to modify the appearance of a document written in a markup language such as HTML. Embedding icon style options, HTML, and CSS into the KML documents allows us to enhance the functionality and appearance of our visualization.

Page 7: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

7

Figure 10 – ‘Enhanced’ Google Earth Visualization

Figure 10 shows two pictures of the Google Earth Visualization after it has been augmented with icon styles, HTML, and CSS. The left image in Figure 10 shows cleaner icons of two colors, blue and orange. A blue colored icon tells us that the company had an ‘adjusted rating’ in the top 50 percent, while a blue icon represents the lower 50 percent. Companies with few reviews had unfairly high ratings so an ‘adjusted rating’ was calculated by weighting each company’s overall rating with respect to the number of ratings it received. The next improvement from using HTML is that when we click on a company from Google Earth a pop-up is displayed as shown on the right side of Figure 10. This pop-up is an HTML table, which consolidates text, an image, and a link to the original job posting into one window. This pop-up brings together the data collected from the Indeed and Glassdoor Company Search APIs to give us an overview of the job openings and companies given a certain job search keyword.

While HTML is used to structure the table, CSS determines the font style, spacing between rows, and provides the table with a hover-able property. As the user’s mouse moves over the fields in the table they become tinted, as shown in the Summary field in Figure 10. The static images shown in this paper don’t really do the interactive visualization justice. This visualization gives the user the full range of features Google Earth offers, such as zooming into street view, coupled with easy access to the data we lay over it. When using this tool online it is easy to navigate around the United States, select a job in a location of interest, investigate detail about that job/company, and even click the indeed link and open the original job posting within Google Earth.

CONCLUSION This paper provides a demonstration of using SAS to make data driven calls to multiple APIs, parsing the JSON responses into tables, and joining the data together into an analyzable data set. PROC HTTP is a simple way to call APIs in order to create a text based file that can be read into a SAS data set. The SCANOVER INFILE option is key for importing JSON files into SAS in a DATA step. Once the JSON (or XML) file is read in as a SAS data set it is simple to use PROC SQL to create dynamic macro variables that correspond to information gathered from the API call. These macro variables can be used as parameters in other HTTP calls to obtain more API data. After gathering and processing this information the possibilities are endless for statistical displays and visualizations. For example exporting the data into R to create an interactive visualization over Google Earth enabled us to explore the data much more efficiently than what a static plot would provide. Obtaining API based data with SAS can provide a dynamic and efficient mechanism where users can explore the offerings of big web services.

REFERENCES • Nolan, D., & Lang, D. (2014). In XML and Web Technologies for Data Sciences with R (pp. 581-599).

• Choy, M., & Kyong, S. (2013). “Efficient extraction of JSON information in SAS® using the SCANOVER function”, Proceedings of the SAS Global 2013 Conference, http://support.sas.com/resources/papers/proceedings13/296-2013.pdf

Page 8: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

8

• Glassdoor API Documentation. Retrieved September 01, 2015, http://www.glassdoor.com/api/index.htm

• Indeed Publisher Program. Retrieved September 12, 2015, http://www.indeed.com/publisher

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at:

Name:                        Brian  Bahmanyar  E-­‐mail:                    [email protected]  Enterprise:            Cal  Poly,  San  Luis  Obispo  LinkedIn:                https://www.linkedin.com/in/bbahmany  

APPENDIX SAS PROGRAM /************************************************************************************    AUTHORS:          Brian  Bahmanyar  &  Rebecca  Ottesen  INSTITUTION:  Cal  Poly,  San  Luis  Obispo    PARAMETERS:            ip  =  the  internet  protocol  (IP)  address  of  your  device          tp  =  Glassdoor  partner  identification     tk  =  Glassdoor  partner  key          jt  =  Glassdoor  job  title  search  query  (comma  separated)          publisher_id  =  Indeed  publisher  idenitification          query  =  Indeed  job  title  search  query  (plus  separated)      PRECONDITIONS:          -­‐  Register  for  free  Glassdoor  and  Indeed  accounts,  then  request  your  API  keys                  -­‐  Glassdoor:  http://www.glassdoor.com/api/index.htm                  -­‐  Indeed:        http://www.indeed.com/publisher            -­‐  Fill  in  all  DIRECTORY_PATHS  and  authentication  information  fields                  ****  There  are  FILENAMES  that  had  to  be  dynamically  created  and  changed  in  the                              %DO  loops,  be  sure  to  change  those  as  well  ****    To  run  program  simply  load  the  3  macros  and  call  the  %EXECUTE  macro,  which  makes  calls    to  helper  macros  and  serves  as  a  main  macro.    OUTPUTS:          -­‐  several  SAS  data  sets          -­‐  comma  delimited  text  files  with  job  results  for  each  state    ************************************************************************************/    FILENAME  states    "C:\DIRECTORY_PATH\states.txt";  FILENAME  company  "C:\DIRECTORY_PATH\company.txt";    OPTIONS  NOQUOTELENMAX;    %LET  ip  =  YOUR_IP;  %LET  tp  =  YOUR_TP;  %LET  tk  =  YOUR_TK;  %LET  publisher_id  =  YOUR_PUB_ID;    %LET  jt          =  data,scientist;  %LET  query    =  data+scientist;    %EXECUTE;    %MACRO  EXECUTE;         %getTopStates;  

Page 9: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

9

    PROC  SQL  NOPRINT;       SELECT  state         INTO  :  top_states_list  SEPARATED  BY  "-­‐"         FROM  top_states;     QUIT;       PROC  SQL  NOPRINT;       SELECT  file_names         INTO  :  file_names_list  SEPARATED  BY  "-­‐"         FROM  top_states;     QUIT;         %getJobs;  %MEND;    %MACRO  getTopStates;     PROC  HTTP     METHOD      =  "get"     URL      =    "  http://api.glassdoor.com/api/api.htm?v=1  %NRSTR(&t.p)=&tp  %NRSTR(&t.k)=&tk  %NRSTR(&userip)=&ip  %NRSTR(&useragent=)  %NRSTR(&format)=json  %NRSTR(&action)=jobs-­‐stats  %NRSTR(&jt)=&jt  %NRSTR(&country)=us  %NRSTR(&returnStates)=true  %NRSTR(&admLevelRequested)=1  "     OUT      =  states;     RUN;       DATA  jobs_per_state;       INFILE  states  LRECL=32000  TRUNCOVER  SCANOVER;         INPUT    @'"numJobs":  '      num_jobs_str    $200.                    @'"name":  '            state                  $200.;         state              =  COMPRESS(state,',"');       state                =  TRANWRD(TRIM(state),'  ',',');       num_jobs_str  =  COMPRESS(num_jobs_str,  ',');           num_jobs  =  INPUT(num_jobs_str,  12.);           DROP  num_jobs_str;       PROC  SORT;       BY  DESCENDING  num_jobs;       PROC  MEANS  P50  NOPRINT;       VAR  num_jobs;       OUTPUT  OUT  =  percentile  P50=P50;       RUN;       DATA  top_states;       IF  _N_  =  1  THEN  SET  percentile;       SET  jobs_per_state;         IF  num_jobs  >=  P50;         file_names  =  TRANWRD(state,',','_');         CALL  SYMPUT("state_count",  _N_);     RUN;  

Page 10: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

10

    PROC  datasets;       DELETE  Percentile;     QUIT;    %MEND;    %MACRO  getJobs;       %DO  i=1  %TO  &state_count;         %LET  state          =  %SCAN(%BQUOTE(&top_states_list),&i,'-­‐');       %LET  file_name  =  %SCAN(%BQUOTE(&file_names_list),&i,'-­‐');         FILENAME  loc  "C:\DIRECTORY_PATH\&file_name..txt";         PROC  HTTP       METHOD      =  "get"       URL      =    "  http://api.indeed.com/ads/apisearch?  %NRSTR(publisher)=&publisher_id  %NRSTR(&format)=json  %NRSTR(&limit)=25  %NRSTR(&q)=&query  %NRSTR(&l)=&state  %NRSTR(&sort)=relevance  %NRSTR(&radius)=200  %NRSTR(&latlong)=1  %NRSTR(&co)=us  %NRSTR(&userip)=&ip  %NRSTR(&useragent=)  %NRSTR(&v)=2  "       OUT      =  loc;       RUN;         DATA  &file_name._jobs;         INFILE  loc  lrecl=32000  truncover  scanover;           INPUT    @'"jobtitle"  :  '                                    title          $200.                @'"company"  :  '                              company      $200.                @'"city"  :  '                                    city          $200.                @'"state"  :  '                                  state          $200.                @'"date"  :  '                                    date          $200.                @'"snippet"  :  '                              summary      $255.                @'"url"  :  '                                      url          $200.                @'"latitude"  :  '                            latitude    $200.                @'"longitude"  :  '                          longitude  $200.                @'"formattedRelativeTime"  :  '  rel_time    $200.;           id  +  1;           title        =  COMPRESS(title        ,  ',"');         company      =  COMPRESS(company    ,  ',"');         city        =  COMPRESS(city          ,  ',"');         state        =  COMPRESS(state        ,  ',"');         date        =  COMPRESS(date          ,  ',"');         summary      =  COMPRESS(summary    ,  ',"');         url        =  COMPRESS(url            ,  ',"');         rel_time    =  COMPRESS(rel_time  ,  ',"');         latitude    =  COMPRESS(latitude  ,',');         longitude  =  COMPRESS(longitude,',');           city        =  TRANWRD(TRIM(city),  '  ',  ',');         company  =  TRANWRD(TRIM(company),  '  ',  ',');       RUN;         PROC  SQL  NOPRINT;  

Page 11: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

11

      SELECT  city           INTO  :city_list  separated  by  "-­‐"           FROM  &file_name._jobs;       QUIT;         PROC  SQL  NOPRINT;         SELECT  company           INTO  :company_list  separated  by  "-­‐"           FROM  &file_name._jobs;       QUIT;         %DO  j=1  %to  25;           %LET  city        =  %SCAN(%BQUOTE(&city_list),&j,'-­‐');         %LET  company  =  %SCAN(%BQUOTE(&company_list),&j,'-­‐');           PROC  HTTP         METHOD      =  "get"         URL      =    "  http://api.glassdoor.com/api/api.htm?  v=1  %NRSTR(&format)=json  %NRSTR(&t.p)=&tp  %NRSTR(&t.k)=&tk  %NRSTR(&userip)=&ip  %NRSTR(&action)=employers  %NRSTR(&q)=&company  %NRSTR(&city)=&city  %NRSTR(&state)=&state  %NRSTR(&useragent)=  "         OUT      =  company;         RUN;           DATA  temp;           INFILE  company  lrecl=32000  truncover  scanover;             INPUT   @'"name":  '      name     $200.                                                                                    @'"industry":  '                    industry   $200.                                                                                    @'"numberOfRatings":  '      rating_count   $200.                                                                                    @'"squareLogo":  '                logo     $200.                                                                                    @'"overallRating":  '          rating     $200.                                                                                    @'"pros":  '                            pros     $200.                                                                                    @'"cons":  '                            cons     $200.;           id  =  &j;           IF  _N_  =  1;         RUN;           %IF  &j  =  1  %THEN  %DO;         DATA  &file_name._ratings;           SET  temp;         RUN;         %END;           %ELSE  %DO;         DATA  &file_name._ratings;           SET  &file_name._ratings  temp;         RUN;         %END;         %END;       DATA  &file_name;       MERGE  &file_name._jobs  (IN=job)  &file_name._ratings  (IN=rate);       BY  id;       name                  =  COMPRESS(name,'",');       rating_count  =  COMPRESS(rating_count,',');       rating              =  COMPRESS(rating,',');  

Page 12: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

12

    industry          =  COMPRESS(industry,'",');       logo                  =  COMPRESS(logo,'",');       latitude          =  COMPRESS(latitude,',');       longitude        =  COMPRESS(longitude,',');       IF  rate  AND  job;     RUN;       PROC  EXPORT       DATA=&file_name       OUTFILE="C:\DIRECTORY_PATH\&file_name..txt"       DBMS=dlm;       DELIMITER=",";     RUN;       PROC  DATASETS;       DELETE  &file_name._ratings  &file_name._jobs;     QUIT;       %END;    %MEND;    

R PROGRAM ###  Author:                Brian  Bahmanyar  ###  Organization:    Cal  Poly,  San  Luis  Obispo  ###  ###  The  following  R  script  takes  the  CSV  exported  from  my  attached  SAS  Program  and  creates  KML  documents  ###          to  be  opened  and  explored  in  Google  Earth  ###  ###  PRECONDITION:  The  CSV  directory  should  only  contain  the  CSV  relevant  ###                              The  KML  directory  is  where  the  KML  files  will  be  output    library(XML)  library(RKML)  library(Rcompression)    miss  =  c(",","  ","")  setwd("DIRECTORY_PATH/CSVs")  len  =  length(dir())    styles  =  list(          Poor  =  list(                  IconStyle  =  list(                          scale="1",                          Icon=c(href="http://maps.google.com/mapfiles/kml/paddle/orange-­‐circle.png"))),          Good  =  list(                  IconStyle  =  list(                          scale="1",                          Icon=c(href="http://maps.google.com/mapfiles/kml/paddle/ltblu-­‐circle.png"))))    for  (i  in  1:len)  {            unChecked  =  read.csv(file=dir()[i],  as.is=T,  na.strings=miss)          dSet  =  na.omit(unChecked)          dSet$city  =  gsub(",",  "  ",  dSet$city)            rateCntSum  =  sum(dSet$rating_count)          rateWeight  =  dSet$rating_count/rateCntSum          dSet$rateAdj  =  dSet$rating*rateWeight            dSet$class  =  NULL          for  (j  in  1:nrow(dSet))            {                          if(  dSet$rateAdj[j]  <=  summary(dSet$rateAdj)[3]  )                  {                          dSet$class[j]  =  "Poor"  

Page 13: Leveraging APIs in SAS to Create Interactive Visualizations · 2015-09-23 · Leveraging APIs in SAS to Create Interactive Visualizations, continued 3 required or optional, can be

Leveraging APIs in SAS to Create Interactive Visualizations, continued

13

               }  else                    {                          dSet$class[j]  =  "Good"                  }          }          kmlName  =  dSet$state[1]          folderName  =  "Data  Science  Jobs"                    popUp  =  sprintf(paste(                  '<style>                            table,  th,  td  {border:  none;}                          table  {display:  table;  }                          table.hoverable  >  tbody  >  tr  {                                  -­‐webkit-­‐transition:  background-­‐color  .25s  ease;                                  -­‐moz-­‐transition:  background-­‐color  .25s  ease;                                  -­‐o-­‐transition:  background-­‐color  .25s  ease;                                  -­‐ms-­‐transition:  background-­‐color  .25s  ease;                                  transition:  background-­‐color  .25s  ease;                            }                          table.hoverable  >  tbody  >  tr:hover  {  background-­‐color:  #f2f2f2;  }                          thead  {border-­‐bottom:  1px  solid  #d0d0d0;  }                          td,  th  {                                  font-­‐family:  Verdana;                                  display:  table-­‐cell;                                  text-­‐align:  left;                                  vertical-­‐align:  middle;                          }                  </style>',                    '<table  class="hoverable">                                <tr><td><b>Title:</b></td><td>%s</td></tr>',                                "<tr><td><b>Industry:</b></td><td>%s</td></tr>",                                '<tr><td><b>Logo:</b></td><td><img  src="%s"  style="width:100px;height:78px;"></td></tr>',                                "<tr><td><b>Location:</b></td><td>%s,  %s</td></tr>",                                "<tr><td><b>Summary:</b></td><td>%s</td></tr>",                                "<tr><td><b>Rel.  Time:</b></td><td>%s</td></tr>",                                "<tr><td><b>Rating:</b></td><td>%s  (%s)</td></tr>",                                "<tr><td><b>Pros:</b></td><td>%s</td></tr>",                                "<tr><td><b>Cons:</b></td><td>%s</td></tr>",                                '<tr><td><b>Posting:</b></td><td><a  href=%s>Indeed  Page</a></td></tr>                  </table>'),                          dSet$title,                            dSet$industry,                          dSet$logo,                          dSet$city,  dSet$state,                            dSet$summary,                            dSet$rel_time,                            dSet$rating,  dSet$rating_count,                          dSet$pros,                            dSet$cons,                          dSet$url)            print(sprintf("LOGGING:  Iteration  %d",i))          eval(parse(text=sprintf("doc%d  =  kmlPoints(dSet,                                                                                                  docName=kmlName,                                                                                                folderName=folderName,                                                                                                description=popUp,                                                                                                docStyles=styles,                                                                                                style=dSet$class)",i)))              setwd("DIRECTORY_PATH/KMLs")          eval(parse(text=sprintf('saveXML(doc%d,  "%d.kml")',i,i)))          setwd("DIRECTORY_PATH/CSVs")  }    SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.