Show simple item record

dc.contributor.advisorLiu, Jyh-Charn
dc.creatorLin, Jason
dc.date.accessioned2021-01-04T16:45:08Z
dc.date.available2022-05-01T07:13:29Z
dc.date.created2020-05
dc.date.issued2020-04-16
dc.date.submittedMay 2020
dc.identifier.urihttps://hdl.handle.net/1969.1/191745
dc.description.abstractMathematical language plays an essential role in conceptualizing the technical contents of scientific publications. It applies words, symbols, and rules to constitute any sophisticated technical discussion. Existing technologies have achieved the recognition of mathematical objects (MOs) from digital documents, as well as the use of MOs and keywords to locate relevant resources. However, very few successful applications are on computer-based content analysis due to the obscured boundaries and semantics of technical contents. In this dissertation, we introduce the concept of reasoning block (RB) to mimic the divide-and-conquer of human writing and reading process. The RB model develops MO-based foundational solutions to address the challenges of reversing the original linear descriptions back to their logical non-linear structure. A system model requires both the annotations of constraint expressions and textual declarations to enhance the mapping of problem settings and physical semantics. These two components highlight the information the readers need to know for the proposed system model of a paper. Reliable indicators such as mathematical symbols, stop words, and punctuations are used as features to distinguish constraint expressions from any other MO. We have investigated both a greedy approach based on the local optimal and a probabilistic approach based on Bayes’ theorem in this study. As for mining the textual declarations of MOs, it requires to overcome the challenges of tagging, chunking, and pairing on the sentences mixed with words and MOs (MWM). We propose a second-order hidden Markov model and a frequent pattern mining toolkit for tagging and chunking the MWM sentence, respectively. The final pairing of MOs and their declarations depend on the three-layer information (spatial, semantic, and syntactic) of the intermediate tokens that connect them. Finally, the above analytical products are integrated and transform each publication into a hierarchical structure known as the MO reasoning (MOR) graph that consists of RBs in logical flows. Redundant MOs and their dependencies are removed based upon the minimum information required to cover all relations of MOs and words. The MOR graph is used as the technical essence to discover new forms of document fingerprint based on different writing styles in various domains.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectMathematical objecten
dc.subjectContent analysisen
dc.subjectReasoning graphen
dc.titleModeling of Reasoning Flows in Scientific Publicationsen
dc.typeThesisen
thesis.degree.departmentComputer Science and Engineeringen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorTexas A&M Universityen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.levelDoctoralen
dc.contributor.committeeMemberIoerger, Thomas R
dc.contributor.committeeMemberJiang, Anxiao
dc.contributor.committeeMemberPickens, Adam
dc.type.materialtexten
dc.date.updated2021-01-04T16:45:09Z
local.embargo.terms2022-05-01
local.etdauthor.orcid0000-0001-8013-9923


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record