David Smith
Associate Professor
Research interests
- Efficient inference for machine learning models with complex latent structure
- Modeling natural language structures, such as morphology, syntax, and semantics
- Modeling the mutations in texts as they propagate through social networks and in language across space and time
- Interactive information retrieval and machine learning for expert users
Education
- PhD in Computer Science, Johns Hopkins University
- BA in Classics, Harvard University
Biography
David A. Smith is an associate professor in the Khoury College of Computer Sciences at Northeastern University, based in Boston. He is a founding member of the NULab for Texts, Maps, and Networks, Northeastern University’s center for the digital humanities and computational social sciences.
Prior to joining Northeastern, Smith was a professor at the University of Massachusetts and a contributor to Tufts University's Perseus Digital Library, one of the most widely used linguistic and cultural research systems in the humanities field. Funded by the NSF, NEH, DARPA, ONR, AFRL, the Mellon Foundation, and Google, Smith has published widely in natural language processing and computational linguistics, information retrieval, digital libraries, digital humanities, and political science.
Labs and groups
Recent publications
-
MONSTERMASH: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting
Citation: Danlu Chen, Jacob Murel, Taimoor Shahid, Xiang Zhang, Jonathan Parkes Allen, Taylor Berg-Kirkpatrick, David A. Smith. (2024). MONSTERMASH: Multidirectional, Overlapping, Nested, Spiral Text Extraction for Recognition Models of Arabic-Script Handwriting ICDAR (Workshops 2), 87-101. https://doi.org/10.1007/978-3-031-70642-4_6 -
Retrieving and Analyzing Translations of American Newspaper Comics with Visual Evidence
Citation: Jacob Murel, David A. Smith. (2024). Retrieving and Analyzing Translations of American Newspaper Comics with Visual Evidence ICDAR (Workshops 1), 125-137. https://doi.org/10.1007/978-3-031-70645-5_9 -
Self-training and Active Learning with Pseudo-relevance Feedback for Handwriting Detection in Historical Print
Citation: Jacob Murel, David A. Smith. (2024). Self-training and Active Learning with Pseudo-relevance Feedback for Handwriting Detection in Historical Print ICDAR (3), 305-324. https://doi.org/10.1007/978-3-031-70543-4_18 -
Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription
Citation: Jaydeep Borkar, David A. Smith. (2024). Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription CoRR, abs/2407.00250. https://doi.org/10.48550/arXiv.2407.00250 -
Detecting Manuscript Annotations in Historical Print: Negative Evidence and Evaluation Metrics
Citation: Jacob Murel, David A. Smith. (2024). Detecting Manuscript Annotations in Historical Print: Negative Evidence and Evaluation Metrics ICPRAM, 745-752. https://doi.org/10.5220/0012365600003654 -
Automatic Collation for Diversifying Corpora: Commonly Copied Texts as Distant Supervision for Handwritten Text Recognition
Citation: David A. Smith, Jacob Murel, Jonathan Parkes Allen, Matthew Thomas Miller. (2023). Automatic Collation for Diversifying Corpora: Commonly Copied Texts as Distant Supervision for Handwritten Text Recognition CHR, 206-221. https://ceur-ws.org/Vol-3558/paper1708.pdf -
Testing the Limits of Neural Sentence Alignment Models on Classical Greek and Latin Texts and Translations
Citation: Caroline Craig, Kartik Goyal, Gregory R. Crane, Farnoosh Shamsian, David A. Smith. (2023). Testing the Limits of Neural Sentence Alignment Models on Classical Greek and Latin Texts and Translations CHR, 530-553. https://ceur-ws.org/Vol-3558/paper6193.pdf -
Composition and Deformance: Measuring Imageability with a Text-to-Image Model
Citation: Si Wu, David A. Smith. (2023). Composition and Deformance: Measuring Imageability with a Text-to-Image Model CoRR, abs/2306.03168. https://doi.org/10.48550/arXiv.2306.03168 -
Adapting Transformer Language Models for Predictive Typing in Brain-Computer Interfaces
Citation: Shijia Liu, David A. Smith. (2023). Adapting Transformer Language Models for Predictive Typing in Brain-Computer Interfaces CoRR, abs/2305.03819. https://doi.org/10.48550/arXiv.2305.03819 -
An Experiment in Live Collaborative Programming on the Croquet Shared Experience Platform
Citation: Yoshiki Ohshima, Aran Lunzer, Jenn Evans, Vanessa Freudenberg, Brian Upton, David A. Smith. (2022). An Experiment in Live Collaborative Programming on the Croquet Shared Experience Platform Programming, 46-53. https://doi.org/10.1145/3532512.3535224 -
Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records
Citation: Helen O'Neill, Anne Welsh, David A. Smith, Glenn Roe, Melissa Terras. (2021). Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records Digit. Scholarsh. Humanit., 36, 1013-1029. https://doi.org/10.1093/llc/fqab010 -
Content-based Models of Quotation
Citation: Ansel MacLaughlin, David A. Smith. (2021). Content-based Models of Quotation EACL, 2296-2314. https://doi.org/10.18653/v1/2021.eacl-main.195 -
Contrastive Training for Models of Information Cascades
Citation: Shaobin Xu, David A. Smith. (2018). Contrastive Training for Models of Information Cascades AAAI, 483-490. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17086 -
Detecting and Evaluating Local Text Reuse in Social Networks
Citation: Shaobin Xu, David Smith, Abigail Mullen, and Ryan Cordell. Detecting and evaluating local text reuse in social networks. In ACL Joint Workshop on Social Dynamics and Personal Attributes in Social Media, 2014.