visualization: milestone 3 references · started with pen and paper mockups -> fed initial...
TRANSCRIPT
References Visualization:Milestone 3Arthur Wollocko, David Hu,Pat Thontirawong, Mohammad Fahad Sheikh
Introduction to References Vis
Text-based search
Identification of interest
Continued Interest..
Repeat
Demonstration!!!
Design Stages... Iterative design, with significant work/concepts early
● Keep Vis and Interaction at the forefront of development -- Differentiators!
● Process: ○ Started with pen and paper mockups -> Fed initial discussions
○ Took pen and paper concepts to domain experts -> Initial feedback
○ Integrated feedback and brainstorming concepts into designs -> Draw.io mockups
○ Prototyped! -> Needed to verify technical stack was sound (particularly considering
team’s lack of familiarity with the tech)
○ Implemented scoped prototype -> Limited dataset, full frontend/backend support
○ Expanded scope
=> PROFIT
Software EvolutionDesign to Implementation
Design Stages... Competitive Analysis
● Three players in the field:○ SciMat - Network analysis toolkit for broad
applications (not tailored to references)○ CiteSpace - Visualizations specifically
targeted at academic literature○ CitNetExplorer - Focuses on “Web of
Science” DB, with limited interaction● Deficiencies:
○ Desktop clients (typically Java)○ Lack of updated support (Citespace’s
documentation and download links are down)
○ Require technical domain experts ○ Require dataset configuration by user○ Difficult and non-intuitive UIs
(information overload)
Design Stages... Features we focused on to mitigate past mistakes
● Web-based application, to allow functionality anywhere
● Utilize ubiquitous practices for user input
● Focus on interactions with the data, allowing natural user exploration
● Utilization of multiple visualization dimensions (color, opacity, highlight,
gradient)
● Merging capability - Conduct multiple searches in one location
● Robust filtering/querying structure
● Versioning on API
Tech Stack
● Backend: Python + MySQL:○ Flask: Backend REST○ NetworkX: Graph building○ DiffLib: Text-matching○ Pandas: DataFrame and analysis○ Redis RQ: Long-running task queue
● Frontend: JavaScript + Firebase: ○ VueJS: User interface ○ D3: Visualizations○ Firebase: Persistence and Login
Tech Stack Interesting Design Choices -- What worked?● Our own database instead of PubMed API:
○ Faster backend (vs. PubMed API call ~0.5 second)○ No limits on query amount (PubMed restricts)
● Client-Server Architecture:○ Easy to get started - limited resources with small team
● MySQL:○ Easy to get started (familiarity among team)○ Created a Python script that runs a weekly update
● NetworkX Graph Library○ Easy to analyze network data (outputs to format needed for vis)○ Slow when graph is big. We refactored and implemented Redis Queue
● Firebase○ Convenient to set up authentication and session storage/persistence
Test Plan and Result:Backend Unit Testing- Target code coverage > 80-90%(Achieved 81%)- Added enhanced code coverage in production version (graph.py)
User Acceptance Testing - Recruited 6 pilot testers (professors, data scientists, researchers)- Generated UAT analysis document with desired observation points
Frontend Unit Testing- Target code coverage for Redux State machine > 80-90% (Achieved 82%)- Need to implement View coverage and Integration Testing for Production version
Test Plan and Result:User Acceptance Test Summary
● Template from San Francisco State University’s UAT○ Widely used in tech fields to conduct UAT testing○ Outlines testing procedure, assumptions, risks, desired observations○ Link to template
● Our analysis focused on: ○ Utility of our application○ Usefulness to those in the domain
● Results Summary: Overall, we found that users:○ Identified key papers that they want in less than 10 minutes○ Understood the UI, and the historical layering of search○ Appreciated the richness of search, and interconnections between different queries○ Requested more complex search capabilities (ANDs, ORs in search)○ Wanted QoL improvements, and author-based search/visualizations
“Whoa, so this paper is cited by ALL of these?”
“Helped me identify the key players in this space within the first few seconds”
Software EvolutionSummary Overview Milestone 1 -> Milestone 3
Default D3 Visualization
“Nodes move when scaling !!!”
Customizing D3 for static/dynamic
presentation“Nodes at rest !!!”
Pen and paper mockups
Simple ball-and-stick
prototype
Add color, opacity, interaction
Full-fledged prototype
Single search Merged search Cached/Persistence merged search
Add search engine
functionality
User Interface / Front-end:
Software EvolutionBackend: PubMed Data and APIs
M1
Milestone 1
Sample data from PubMed loaded to SQLite and AWS RDS (~3,000 articles)
Basic API routes added:- article/id- article/id/incoming-citation- article/id/outgoing-citation- journal/id
Milestone 2
Graph API routes added:- graph/id- graph/title- article/period- article/citation-range
More info in API response:- PageRank- Citation count
M2
Milestone 3
API added:- Support keyword search- Support exact title search
Features added:- Full PubMed (~1.5 million articles) - Scripted weekly database update- Enable task queuing on Redis RQ
M3
Development Process ReflectionRequirements and Estimates
● Finished all requirements and some stretch goals
● Overestimated:○ Graph visualization (D3 and NetworkX made it easier)
● Underestimated:○ UX Design and Brainstorming
○ Search query (due to redesign)
○ Project planning
○ Report
Development Process Reflection Lessons Learned
● Conduct research into similar systems early -- Ideas for design
● Had a UX person “on staff”
● Deeper dive into Graph DBs -- Optimization of search
● Increased scale of DB earlier -- had issues late with scaling (fixed)
● Difflib truncated search results when considering them, causing us to lose
mass amounts of data -- Fixed, but annoying
● Vue/D3 lack of clean integration -- they didn’t play nice initially
● PubMed limits queries to its API -- Had to adjust frequency of import scripts
● Timezones for team members -- Hard working from different timezones!
Questions?Thank You!
Arthur Wollocko, David Hu,Pat Thontirawong, Mohammad Fahad Sheikh
Incorporation of Feedback...Significant positive feedback from domain specialists..
Positive Feedback Continuation/Response
Good initial visualizations Expanding! Features planned include additional dimensions of visualization: color, opacity, size
Good choice of metadata/search options. Will help researchers explore deeper
Added additional options. Have added “Keywords”, date ranges, and “Minimum Citation” counts
Good intuitive UI design (e.g., Coloring of nodes (like hrefs)
Expanding on coloration concepts, applying individual colors for each search
Persistence and mobility Utilizing Firebase for persistence, expanding content persisted
Interaction with a traditionally NON interactive process Allowing additional dimensions of data exploration like panning/zooming, and utilization of space in graph
Great testing infrastructure Continuously identifying researchers to test application
(AW - 1)
Incorporation of Feedback...Addressing concerns from domain experts...
Feedback for Improvement Continuation/Response
User has to get the PMID from pubmed Scrapped this! No longer a companion app to PubMed. All search conducted within the RVT
Scope of effort / Mental Sanity Down scoped effort to ensure usable product, while learning tech stack requirements. Documented additional features for production version
Require more than just a single search to find topics/papers of interest
Allow for merging of searchers, and multiple searches. Expanding Added search history and focus/removal of searches
Lack of instruction for application usage Provided in-application tutorial. Expanding tooltips and explanations
Potential to overwhelm users Keeping UI simple, and limiting search/visualizations to protect against overload
Too much empty space Redefining our use of space for future efforts
(AW - 1)
Tools used by team
● Github○ This was our primary collaboration arena○ General Feel: We don’t know how people existed before source control and collaboration
tools● Trello
○ Project Management/Task distribution○ General Feel: On a small team, it might add more overhead than a simple excel spreadsheet
● Heroku○ Server Hosting○ General Feel: Heroku makes pushing your local development environment to the web easy.
Git remotes are a godsend● Slack and Google Drive
○ Collaboration and communication○ General Feel: Revolutionized communication for distributed teams
(AW 1)
References
● Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology,57(3), 359-377. doi:10.1002/asi.20317
● Cobo, M., López-Herrera, A., Herrera-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology,63(8), 1609-1630. doi:10.1002/asi.22688
● Eck, N. J., & Waltman, L. (2014). CitNetExplorer: A new software tool for analyzing and visualizing citation networks. Journal of Informetrics,8(4), 802-823. doi:10.1016/j.joi.2014.07.006
Documentation (Dev) Links Open Link Provided for your Viewing Pleasure
https://docs.google.com/document/d/1IwmwuWisdjM6b3rtisXh9r6KdQQP6NJ5zpDG3On8x8U/edit?usp=sharing