visualization: milestone 3 references · started with pen and paper mockups -> fed initial...

References Visualization:Milestone 3Arthur Wollocko, David Hu,Pat Thontirawong, Mohammad Fahad Sheikh

Introduction to References Vis

Text-based search

Identification of interest

Continued Interest..

Repeat

Demonstration!!!

Design Stages... Iterative design, with significant work/concepts early

● Keep Vis and Interaction at the forefront of development -- Differentiators!

● Process: ○ Started with pen and paper mockups -> Fed initial discussions

○ Took pen and paper concepts to domain experts -> Initial feedback

○ Integrated feedback and brainstorming concepts into designs -> Draw.io mockups

○ Prototyped! -> Needed to verify technical stack was sound (particularly considering

team’s lack of familiarity with the tech)

○ Implemented scoped prototype -> Limited dataset, full frontend/backend support

○ Expanded scope

=> PROFIT

Software EvolutionDesign to Implementation

Design Stages... Competitive Analysis

● Three players in the field:○ SciMat - Network analysis toolkit for broad

applications (not tailored to references)○ CiteSpace - Visualizations specifically

targeted at academic literature○ CitNetExplorer - Focuses on “Web of

Science” DB, with limited interaction● Deficiencies:

○ Desktop clients (typically Java)○ Lack of updated support (Citespace’s

documentation and download links are down)

○ Require technical domain experts ○ Require dataset configuration by user○ Difficult and non-intuitive UIs

(information overload)

Design Stages... Features we focused on to mitigate past mistakes

● Web-based application, to allow functionality anywhere

● Utilize ubiquitous practices for user input

● Focus on interactions with the data, allowing natural user exploration

● Utilization of multiple visualization dimensions (color, opacity, highlight,

gradient)

● Merging capability - Conduct multiple searches in one location

● Robust filtering/querying structure

● Versioning on API

Tech Stack

● Backend: Python + MySQL:○ Flask: Backend REST○ NetworkX: Graph building○ DiffLib: Text-matching○ Pandas: DataFrame and analysis○ Redis RQ: Long-running task queue

● Frontend: JavaScript + Firebase: ○ VueJS: User interface ○ D3: Visualizations○ Firebase: Persistence and Login

Tech Stack Interesting Design Choices -- What worked?● Our own database instead of PubMed API:

○ Faster backend (vs. PubMed API call ~0.5 second)○ No limits on query amount (PubMed restricts)

● Client-Server Architecture:○ Easy to get started - limited resources with small team

● MySQL:○ Easy to get started (familiarity among team)○ Created a Python script that runs a weekly update

● NetworkX Graph Library○ Easy to analyze network data (outputs to format needed for vis)○ Slow when graph is big. We refactored and implemented Redis Queue

● Firebase○ Convenient to set up authentication and session storage/persistence

Test Plan and Result:Backend Unit Testing- Target code coverage > 80-90%(Achieved 81%)- Added enhanced code coverage in production version (graph.py)

User Acceptance Testing - Recruited 6 pilot testers (professors, data scientists, researchers)- Generated UAT analysis document with desired observation points

Frontend Unit Testing- Target code coverage for Redux State machine > 80-90% (Achieved 82%)- Need to implement View coverage and Integration Testing for Production version

Test Plan and Result:User Acceptance Test Summary

● Template from San Francisco State University’s UAT○ Widely used in tech fields to conduct UAT testing○ Outlines testing procedure, assumptions, risks, desired observations○ Link to template

● Our analysis focused on: ○ Utility of our application○ Usefulness to those in the domain

● Results Summary: Overall, we found that users:○ Identified key papers that they want in less than 10 minutes○ Understood the UI, and the historical layering of search○ Appreciated the richness of search, and interconnections between different queries○ Requested more complex search capabilities (ANDs, ORs in search)○ Wanted QoL improvements, and author-based search/visualizations

“Whoa, so this paper is cited by ALL of these?”

“Helped me identify the key players in this space within the first few seconds”

https://its.sfsu.edu/sites/default/files/SFSU%20User%20Acceptance%20Test%20Plan%20Template%20v1.6.docx

Software EvolutionSummary Overview Milestone 1 -> Milestone 3

Default D3 Visualization

“Nodes move when scaling !!!”

Customizing D3 for static/dynamic

presentation“Nodes at rest !!!”

Pen and paper mockups

Simple ball-and-stick

prototype

Add color, opacity, interaction

Full-fledged prototype

Single search Merged search Cached/Persistence merged search

Add search engine

functionality

User Interface / Front-end:

Software EvolutionBackend: PubMed Data and APIs

M1

Milestone 1

Sample data from PubMed loaded to SQLite and AWS RDS (~3,000 articles)

Basic API routes added:- article/id- article/id/incoming-citation- article/id/outgoing-citation- journal/id

Milestone 2

Graph API routes added:- graph/id- graph/title- article/period- article/citation-range

More info in API response:- PageRank- Citation count

M2

Milestone 3

API added:- Support keyword search- Support exact title search

Features added:- Full PubMed (~1.5 million articles) - Scripted weekly database update- Enable task queuing on Redis RQ

M3

Development Process ReflectionRequirements and Estimates

● Finished all requirements and some stretch goals

● Overestimated:○ Graph visualization (D3 and NetworkX made it easier)

● Underestimated:○ UX Design and Brainstorming

○ Search query (due to redesign)

○ Project planning

○ Report

Development Process Reflection Lessons Learned

● Conduct research into similar systems early -- Ideas for design

● Had a UX person “on staff”

● Deeper dive into Graph DBs -- Optimization of search

● Increased scale of DB earlier -- had issues late with scaling (fixed)

● Difflib truncated search results when considering them, causing us to lose

mass amounts of data -- Fixed, but annoying

● Vue/D3 lack of clean integration -- they didn’t play nice initially

● PubMed limits queries to its API -- Had to adjust frequency of import scripts

● Timezones for team members -- Hard working from different timezones!

Questions?Thank You!

Arthur Wollocko, David Hu,Pat Thontirawong, Mohammad Fahad Sheikh

Incorporation of Feedback...Significant positive feedback from domain specialists..

Positive Feedback Continuation/Response

Good initial visualizations Expanding! Features planned include additional dimensions of visualization: color, opacity, size

Good choice of metadata/search options. Will help researchers explore deeper

Added additional options. Have added “Keywords”, date ranges, and “Minimum Citation” counts

Good intuitive UI design (e.g., Coloring of nodes (like hrefs)

Expanding on coloration concepts, applying individual colors for each search

Persistence and mobility Utilizing Firebase for persistence, expanding content persisted

Interaction with a traditionally NON interactive process Allowing additional dimensions of data exploration like panning/zooming, and utilization of space in graph

Great testing infrastructure Continuously identifying researchers to test application

(AW - 1)

Incorporation of Feedback...Addressing concerns from domain experts...

Feedback for Improvement Continuation/Response

User has to get the PMID from pubmed Scrapped this! No longer a companion app to PubMed. All search conducted within the RVT

Scope of effort / Mental Sanity Down scoped effort to ensure usable product, while learning tech stack requirements. Documented additional features for production version

Require more than just a single search to find topics/papers of interest

Allow for merging of searchers, and multiple searches. Expanding Added search history and focus/removal of searches

Lack of instruction for application usage Provided in-application tutorial. Expanding tooltips and explanations

Potential to overwhelm users Keeping UI simple, and limiting search/visualizations to protect against overload

Too much empty space Redefining our use of space for future efforts

(AW - 1)

Tools used by team

● Github○ This was our primary collaboration arena○ General Feel: We don’t know how people existed before source control and collaboration

tools● Trello

○ Project Management/Task distribution○ General Feel: On a small team, it might add more overhead than a simple excel spreadsheet

● Heroku○ Server Hosting○ General Feel: Heroku makes pushing your local development environment to the web easy.

Git remotes are a godsend● Slack and Google Drive

○ Collaboration and communication○ General Feel: Revolutionized communication for distributed teams

(AW 1)

References

● Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology,57(3), 359-377. doi:10.1002/asi.20317

● Cobo, M., López-Herrera, A., Herrera-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology,63(8), 1609-1630. doi:10.1002/asi.22688

● Eck, N. J., & Waltman, L. (2014). CitNetExplorer: A new software tool for analyzing and visualizing citation networks. Journal of Informetrics,8(4), 802-823. doi:10.1016/j.joi.2014.07.006

Documentation (Dev) Links Open Link Provided for your Viewing Pleasure

https://docs.google.com/document/d/1IwmwuWisdjM6b3rtisXh9r6KdQQP6NJ5zpDG3On8x8U/edit?usp=sharing



visualization: milestone 3 references · started with pen and paper mockups -> fed initial...

Documents