Plagiarism Checker

Plagiarism search Plagiarism study - design of a plagiarism detection system

INTRODUCTION

1.1 Background of Study

'Plagiarism is the act of taking the writings of another person and passing them o' as one's own. The fraudulence is closely related to forgery and piracy-practices generally in violation of copyright laws.'

(Encyclopedia Britannica, cited in Asim, MEr Ali, HM, Dahwa, A, & Vaclav, S 2011).

The word plagiarism was coined out from the Latin word plagiarius which means kidnapper. (Wikipedia 2015).

1.1.1 Definitions

  • According to Meriam Webster's online dictionary, plagiarism (pronounced /??ple''??r??z??m/) ' is the act of using someone words or ideas without giving credit to that person'.( Meriam Webster, 2014).
  • The University of Oxford describes plagiarism as 'The copying or paraphrasing of other people's ideas without full acknowledgement'.
  • In their own words, Oxford Dictionaries describes plagiarism as 'The practice of taking someone else's work or ideas and passing them off as one's own'. (Oxford Dictionaries Language Matters, 2014).
  • 1.1.2 State of Plagiarism in Nigeria

    Up till recently, universities in Nigeria have really been plagued by the menace called plagiarism. A gross offence against the intellectual property of an individual, plagiarism is now advancing to the frontline of criminal activities perpetrated in Nigeria especially in her secondary schools and tertiary institutions. But for the timely intervention of vice chancellors in Nigerian universities, many Nigerian students will not be aware that the word plagiarism exists. Plagiarism in Nigeria can be seen in the following instances;

    Some students 'copy and paste' materials for their assignment from the internet without a thought of who the author of the work is.

  • Others read through and paraphrase a work without ascribing the work to the author.
  • Some even use the work of an author from another institution or another part of the world and pass it off as theirs, that is, the singular difference between the real author's work and the plagiarist's work is that the plagiarized work now contains the personal information of the plagiarist instead of that of the author.
  • Also, some students also copy the assignments of their colleagues and submit them as theirs. This is especially seen in programming classes.
  • Researchers fill in the data sourced by other researchers without due cognizance to the researchers who took their time to source the data.
    This plagium seem not to be limited to the four walls of the Nigerian institutions as the outside community is also affected by plagiarism.
  • Seminarians use the work of other people without making reference to the owner of the work.
  • Newspapers columnists run the works of other columnists under the plagiarist name.
  • Television presenters run the exact script produced by other presenters.
  • Blog owners just copy and paste contents from various sources to make up their own stories without proper acknowledgements. Chiagozie(2012).

    1.1.3 Consequences of Plagiarism
    The consequences of plagiarism are far reaching and quite devastating. Six consequences of plagiarism according to iThenticate(from Turnitin) are:

  • Personal damage: Plagiarism can destroy a student's reputation and can even lead to suspension or expulsion from school. It may also be reflected in the student's record, thereby permanently destroying the student's image for life.
  • Professional damage: Plagiarism may deface the career of a public person, politician and business professional. A plagiarist may be relieved of his position in an organization and may not be able to obtain another job.
  • Damage to the academic reputation: A plagiarist may suddenly come to the end of his academic career, if caught and relieved of his or her ability to publish academic papers.
  • Legal consequences: Copyright laws are against plagiarism, therefore a plagiarized writer has the right to sue anyone found plagiarizing his work. This may result in imprisonment or other recommended punishments.
  • Financial consequences: A plagiarist may have to pay a huge amount of money if found guilty.
  • Consequences of research plagiarism: Plagiarizing a research work should not be mentioned as it could cause severe damage to the research work. iThenticate (2014).

    1.1.4 Types of Plagiarism Detection

  • Manual plagiarism detection: This is done manually by a person.
  • Automatic plagiarism detection: This is performed using plagiarism detection software. Automatic detection of plagiarism is usually faster and more efficient. (Wikipedia 2014).
    1.1.5 Major aspects involved in the automatic detection of plagiarism
    These methods used for automatic detection of plagiarism can be grouped thus:
    1.1.5.1 External plagiarism detection methods: 'External plagiarism detection deals with the problem of 'nding plagiarized passages in a suspicious document based on a reference corpus. External plagiarism detection methods have been used by many of the plagiarism detection software available like Turnitin, WriteCheck, etc.
    1.1.5.2 Intrinsic plagiarism detection methods: 'Intrinsic plagiarism detection does not use external knowledge and tries to identify discrepancies in style within a suspicious document'. (Gabriel, Gaston and Sebastian 2011)
    1.1.6 Examples of Plagiarism Detection Software
    1. PlagAware 10. The Plagiarism checker
    2. Turnitin 11. ACNP(Anti copy and Paste)
    3. MOSS 12. Ephorus
    4. Plagium 13. PlagScan
    5. Doc Cop
    6. Viper
    7. TurnItOutSafely
    8. CheckForPlagiarism.net
    9. Ithenticate
    (Radim 2007, PlagiarismChecker 2015,)

    1.1.7 Measures used for assessing plagiarism detection software
    For information retrieval systems (Wikipedia 2015), performance and correctness are measured based on some quantities, and the measures that are mostly used are:

  • Precision: this is the fraction of the documents retrieved that are relevant to the user information needs.

    ' Recall: This is the fraction of documents that are relevant to the query that have been successfully retrieved.

    In the third international competition held in 2011 on plagiarism detection (Martins et al. 2011), the following measures were used to grade the submitted entries.

  • Precision: This was measured using the formula below

    (Martins et al. 2011)

  • Recall:

    (Martins et al. 2011)

  • Granularity: This measure was introduced to take care of overlapping or multiple detections for a single plagiarism case.

    (Martins et al. 2011)

  • Plagdet: The overall score which comprises of the three measure was obtained using:

     

    (Martins et al. 2011)
    Where F1 is equally weighted harmonic mean of precision and recall.
    S = plagiarism cases in the corpus
    R = plagiarism cases reported by a particular plagiarism detector
    s is represented as a set of references to the characters of dplg and dsrc forming the passages Splg and Ssrc
    s = (splg, rplg, ssrc, rsrc) and s'S
    r'R like s
    s n r = s'r if r detects s,
    ?? otherwise
    SR'S are cases detected by detection in R
    Rs'R are detections of S, that is,
    Rs ' R are detections of s

    1.2 Statement of Problem
    There exists lots of plagiarism detection software, implemented either as online applications or as desktop applications. Most popular plagiarism detection software combines the following corpus to check plagiarism:

  • A reference corpus of documents in their databases.
  • A reference corpus of documents uploaded by the user.
  • Documents and web pages on the world wide web.
    Turnitin, the just contracted plagiarism detection software for fighting plagiarism in Nigerian universities (The Punch 2013) combines the three in the detection of plagiarism.
    Till date no plagiarism detection software has been developed in Nigeria. We rely on those from abroad which are mostly tailored to the needs of the environment where they were developed. Also, one can say that majority of research works and other works in Nigerian universities are yet unpublished and therefore not likely to be on the web or in the databases of popular plagiarism detection software. Hence if an award winning research work is yet to published, a plagiarist may decide to use that work without being detected.
    There is therefore a need to create a reference corpus where source documents of research works done in Nigerian universities can be kept for testing plagiarism.
    1.3 Research Objectives
    This research should accomplish the following:
    I. Research on some important methodologies that have been used to detect external plagiarism.
    II. Create a model for a corpus where all submitted projects(published or not) in Nigerian universities can be collected to be used as source documents for testing if a suspected work contains plagiarized contents.
    III. Develop an online system that will be able to detect based on titles of projects whether a project already existing in the reference corpus has been plagiarized by a new project.
    1.4 Research Methodology
    The following methods will be used to accomplish the aims and objectives of this project
    The reference corpus will be created with the aid a database from a database management system.
    The technology behind web crawling will be used in getting the subset of documents that will be tested against the suspected document.
    1.5 Scope of Study
    As a case study, the model for the corpus will be based on past projects from the department of computer science, University of Lagos. This project majorly focuses on external plagiarism detection, which is just like a step in the whole process of detecting plagiarism automatically.

    CHAPTER TWO
    LITERATURE REVIEW
    2.1 External plagiarism detection
    External plagiarism detection is the detection of plagiarism in a suspicious document as against a reference corpus. This reference corpus may be collected in a database, from an institution, a set of documents the user (or customer) wants to test. It may also be a collection of documents sourced from the internet.
    2.1.1 Stages in external plagiarism detection
    Two major stages are usually involved in external plagiarism detection, which are:
    I. Search Space Reduction: this is to search for possible source documents in the reference corpus against which a document will be tested against. This is necessary to reduce the search space of documents in the reference corpus that will be used to detect plagiarism in a suspicious document.
    II. Exhaustive Search: this entails thoroughly searching each document in the resulting search space to see the ones that were actually plagiarized. This is required to sort out the part of a suspicious document that has been plagiarized
    2.2 Methodologies that have been used in external plagiarism detection
    Various methods have been used to detect plagiarism.
    2.2.1 A Language Dependent Methodology That Uses N-Grams
    2.2.1.1 N'Grams Description
    'An n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus'. ( Wikipedia 2015).
    'An n-gram is an n-character slice of a larger string'. site new download)
    Sometimes spaces may be attached to the beginning and end of the string to be sliced to pad it. N-grams of a string can be taken in different ways and for different sizes.
    For example given a string, 'she takes taxi to school daily'. In word n-gram,
    1-gram, we have: 'she', 'takes', 'taxi', 'to', 'school', 'daily'.
    2-grams(di-grams), we have: ('_____ she') 'she takes', 'takes taxi', 'taxi to', 'to school', 'school daily',( 'daily _____ ').
    3-grams(tri-grams), we have: ( ' she', ' she takes') 'she takes taxi', 'takes taxi to', 'taxi to school', 'to school daily' ('school daily _____ ' ,'daily ______ ________').
    Those in brackets are included when spaces are appended to the beginning and end of the string.
    Also, for DNA sequencing, the word FATHER would contain the following
    1-gram: 'F', 'A', 'T','H', 'E', 'R'
    2-grams: '_F', 'FA', 'AT', 'TH', 'HE', 'ER', 'R_'
    3-grams: '__F', '_FA', 'FAT', 'ATH', 'THE', 'HER', 'ER_', 'E__'
    4-grams: '___F', '__FA', '_FAT', 'FATH', 'ATHE', 'THER', 'HER_', 'ER__', 'R___'
    And so on.
    (Wikipedia 2015; William and John 1994)
    2.2.1.2 Implementation
    In Approaches for Intrinsic and External plagiarism Detection by Gabriel, Gaston and Sebastian, n-grams were used to detect external plagiarism.

  • Search space reduction: Gabriel et al. used 4-grams after removing stop words from the documents. A document qualifies as a relevant document in the resulting search space if it has at least two word 4-grams close enough to be in the same paragraph as that of the suspicious document. For the exhaustive search stage, trigrams were used in the detection of plagiarism in a document with stop words still tact.
  • Exhaustive Search: For the exhaustive search, Gabriel et al, used word tri-gram without removal of stop words.
    The following results were obtained:
    Overall Score Precision Recall Granularity
    0.3468605 0.2257937 0.9116530 1.0611984
    (Gabriel, Gaston, Sebastian and Jaun 2011)
    2.2.1.3 Defects of the plagiarism detecting methodology
  • Plagiarism detection for languages other than English.
  • Plagiarism detection when the synonyms of words used in the reference text are used in the given text.
    2.2.1.4 Advantages of using n-grams
    a) N-grams are relatively simple to use and understand. (Wikipedia 2015)
    b) since any string is decomposed into similar parts, any errors that are present
    c) It does not require any linguistic knowledge since the string is simply broken down. (Jian-Yung et al. 1934).
    d) They are scalable as N could easily be increased or decreased to meet requirements. (Wikipedia 2015).
    e) They are independent to spelling variations.(Rammal and Sanan 2011).
    2.2.1.5 Disadvantages of using n-grams
    a) ' Models derived directly from the n-gram frequency counts have severe problems when confronted with any n-grams that have not explicitly been seen before'.(Wikipedia 2015a)
    The plagiarism detection software put in by Gabriel et al. in 2011 PAN competition (Plagiarism detection section) won them third place.
    2.2.2.2 A Language Independent Methodology for Detecting Plagiarism
    2.2.2.1 Description
    To detect plagiarism across different languages, translation engines like Google API language translator could be used to translate all texts in a different language into the applicable language, say English before further processing can be completed.

     

    2.2.2.2 Implementation
    In their report Improved implementation for finding text similarities in large collection of data, Grman and Ravas described their winning methodology for implementing external plagiarism.
    They divided the detection into three stages:

  • Pre-processing
  • Detection of passage pairs
  • Post-processing
    Preprocessing: This stage involves the pre-processing of the input data which is plain text. With the aim of reducing the amount of data and for efficient comparison of words, Grman and Ravas took the following steps to preprocess the input data:
    1. The input text is first translated into English if necessary(for example, using Google Translations)
    2. Words are extracted in terms of char, offset and length.
    3. The words are normalized through stemming and synonym extraction (synonyms of each word are obtained using applications like WordNet).
    4. The original file of words is converted into a binary file of word invariant.
    Detection of passage pairs
    The objective here was to check if a suspicious document matches reference documents when there is:
  • Change of words
  • Omission or addition of words in the passage.
  • No passage length limit given, whether minimum or maximum.
    The degree of similarity of a pair of passages (from the suspicious and reference documents) is the number of elements (Nsr (Is , Ir)) that results from the intersection of words in the suspicious document and words in the reference document.
    Nsr (Is , Ir)= Is 'Ir
    Where Is is the passage from the suspicious document
    Ir is the passage from the reference document.// hw was sch space red. Done?
    If Nsr goes beyond the threshold value, then plagiarism is suspected in passage Is.
    If the detected areas are adjacent, then they are merged into a single area. Thereafter, each resulting area (Is and Ir) is separated into pairs of distinct areas (Isi and Irj where i = 1,2,3,' and j = 1, 2, 3, ') with the following characteristics:
    I. Isi and Irj either start or end in a word belonging to the set.
    II. The ratios qsi and qrj exceed qmin(selected threshold)

    and

    III. Nsr (Is , Ir) ' NTreshold
    where NTreshold is the maximum limit for the number of elements that are in the intersection of the passage from the source document and that of the reference document
    They took qmin = 0.9 and NThreshold = 15
    Post-processing
    This was done with the following goals achieved:
    I. Removal of overlapping passages in suspicious documents.
    II. Increase of global score by reducing granularity and increasing precision.
    To achieve this, the result from the previous stage was subjected to three monitored threshold quantities.

  • T1 was the threshold for t1, where t1 is the share of word number in all passages of the suspicious and reference documents.
  • T2 was the threshold for t2, where t2 is the share of word number in all passages of the suspicious and reference documents where the words were expressed by the number of characters.

    ' T3 was the threshold for t3, where t3 is the minimum length of passage expressed by the number of characters.

    While experimenting the range of values used for the above three thresholds were in the range: 50, 60, 70 for T1, 50 and 60 for T2 was and 150 and 200 for T3.
    PlagDet Precision Recall Granularity T1 T2 T3
    0.5569 0.396916 0.938023 1.002249
    70 60 200
    0.615389 0.473128 0.892744 1.006975 50 50 150
    Results for experiment for different values for T1, T2, T3 with the use of synonyms and without the removal of stopwords(Grman and Ravas 2011)
    The plagiarism detection software put in by Grman et al. took the first place in PAN 2011(Plagiarism detection section) using the same corpus as the one used by Gabriel et al.
    2.2.2.3 Advantages of a language independent plagiarism detector
    1. It is able to detect plagiarized works written in a different language that is different from that of the original writer.
    2. It is able to detect plagiarism even when the synonyms of words in the source document are used in the suspicious document.
    2.2.2.4 Critique of this methodology
    1. In the report studied for Grman et al. implementation for their plagiarism detection system, there was no mention of the search space reduction technique used, hence it is assumed all reference documents were used in testing the input text (suspicious text) for plagiarism which would have probably taken longer running time. The application might have performed better if the search space was reduced.
    2.2.3 A Cluster based methodology
    In 2010, a cluster based method for detecting plagiarism was proposed by Du Zou, Wei-Jiang Long and Zhang Ling. This method majorly uses Winnowing algorithm to detect plagiarism, but employs clustering and merging to increase the efficiency of the methodology even when text is obfuscated by inserting text amidst copied text.
    2.2.3.1 Description

  • Winnowing Algorithm
    Winnowing, the major algorithm employed by the trio is used for selecting fingerprints from hashes of k-grams. Elbegbayan (2005).
    The Hashes of k-gram fingerprinting uses the concept of k-grams in determining the fingerprint of a document. This algorithm obtains the fingerprint of a document in four steps.
    First, irrelevant features in the text such as punctuation marks that do not add to the meaning of the text are removed.
    Secondly, it splits the document into k-grams where k is a parameter chosen by the user.
    For example: The statement 'Rasheedat is a girl' can split thus into 5-grams as: rashe ashee sheed heeda eedat edati datis atisa tisag isagi sagir agirl .
    Thirdly, each k-gram is hashed (using a hashing function) into a value.
    Fourthly, a sequence of selected hashes is chosen to represent the fingerprint of the document.
    Different methods have been used for this last step to determine which hashes should be selected to form the fingerprint of that document, and one of them is the Winnowing algorithm.
    The Winnowing algorithm uses the concept of moving windows to select hashes that will form the fingerprint of the document.
    First, a window of size w is chosen, where w is the number of consecutive k-gram in a window and w is a parameter chosen by the user.
    Next, windows are created from the hashes of the document by starting from the beginning of the document and moving one hash at time till the last hash for the document has been enclosed in a window.
    Next, the smallest hash in each window is picked to represent the fingerprint of the document. When a hash in the same position has been selected before in previous window, if that hash remains the minimum in subsequent windows, it is not added again since it has already been chosen. Elbegbayan(2005).
  • Clustering in plagiarism detection
    The clustering method used by Zou and his team mates to detect plagiarism in 2011 was adapted from that used by that used by Basile et al in 2010 to detect plagiarism.
    Basile and his team mate us used clustering, by:
    I. First encoding the suspicious text and the source text using T9 like coding
    II. Matching similar texts in the suspicious document to those in the source document
    III. Using a joining algorithm to join close matches to obtain a larger percentage of the plagiarized text
    IV. Tuning the result obtained
    I. T9 Encoding
    The original text (for both the suspicious and source document) is coded in a T9-like coding that represents the text in a new alphabet consisting of (0,1,2,3,4,5,6,7,8,9). It transforms three or four different letters in the original alphabet to the same character say a,b,c,d can each be transformed to character 2, e,f,g to 3 and so on, new line and blank space are transformed to 0, other symbols are transformed to 1.In T9 compression, a long T9 sequence(10 to15 characters) has in most cases an 'almost unique' translation in a sentence which makes sense in the original language.
    II. Matching similar texts in the suspicious document to those in the source document
    Similar text in the suspicious document that matches that in the source document is stored in a list, if:
  • It is the longest match in the source document starting from any possible starting position in the suspicious,
  • It is longer than a fixed threshold
  • It is not part of an already existing match in the list.
    Also the position of that match is recorded for both documents by storing the last previous position of the same string of length 7 for any starting position in the source document,
    The position of the last occurrence of that string for any possible string of length 7 is stored in a vector of size 107.
    III. Using a joining algorithm to join close matches.
    Before the joining algorithm is applied, matches for both texts (source and suspicious) are represented in a bidirectional plot. The matches of the suspicious text should be represented on the x axis while that of the source text should be represented on the y axis. Hence each match of length l, starting at x in the suspicious document and at y in the source document, draws a line from (x, y)to(x + l, y + l).
    Matches are then joined if the following conditions hold simultaneously:
  • Matches are subsequent in the x coordinate.
  • The distance between the matches on the x axis(say dx) are greater than or equal to zero but shorter than or equal to lx(length of match on x axis)of the longest of the two sequences, scaled by a certain 'x, that is, dx < lx scaled by 'x , where dx>=0
  • The distance between the matches on the y axis(say dy) are greater than or equal to zero but shorter than or equal to ly(length of match on y axis)of the longest of the two sequences, scaled by a certain 'x, that is, dy < ly scaled by 'y , where dy>=0
    IV. Tuning the result obtained
    The results are tuned by applying different values for 'x and 'y in the joining algorithm to obtain different degree of granularity, precision and recall.
    2.2.3.2 Implementation
    Zou and his team mates used a clustering algorithm to detect plagiarism in 2010. They achieved this by breaking down their methodology into three steps:
  • Pre-selecting: This is done to reduce the search space of given source documents.
  • Locating: This compares the suspicious document with each candidate document to obtain the copied text.
  • Post-processing: This is done to remove some text fragments that were not actually plagiarized from the final result.
  • Pre-selecting
    This stage is a search space reduction stage. It is performed to reduce the number of source documents that will be tested against the suspicious document. Pre-selecting helps to reduce the total running time needed to detect actual plagiarism, saving about 90% of the running time used up if pre-selecting is not used.
    At first, they used C Basile's (Basile et al., 2008) method for calculating distance between each suspicious document and source document and selected the first 50 closest source documents for further processing.
    C. Basile et al. (2008) in their paper, used an algorithm to get the distance between a suspicious document and each source document and selected the first 10 for further processing.
    They first obtained the fingerprint of each document by transforming the text into a sequence of word lengths such that the sentence 'Deborah is a girl' is represented as 7214. All word lengths greater than 9 were cut to 9 so that all words can be represented in the new alphabet (0, . . ., 9 ).
    Secondly, the 8-gram of this coded version was obtained.
    Thirdly, the distance between the suspicious document and the source document was calculated from their fingerprints using this formula:

    C. Basile et al.(2008)
    where:

  • x and y are a pair of texts,
  • ' is an arbitrary n-gram(that is, 8-gram),
  • fx(') is the relative frequency with which ' appears in x
  • Dn(x) is the set of all n-gram which have nonzero frequency in x also called the n-gram dictionary of x
    Zou et al., applied this formula and picked the 50 nearest source documents to a suspicious document. They found out that using Winnowing to obtain the documents fingerprints before calculating the distance was quite inefficient and inaccurate.
    Hence, to improve the efficiency and accuracy of the pre-selection stage, they first represented two different documents by fingerprint vectors D1 and D2. Two documents were considered as the same (successive same fingerprint) if the number of different fingerprints in their fingerprint vectors is less than a particular threshold.
    The source documents with successive same fingerprints as the suspicious document are therefore selected as candidate documents.
  • Locating
    This is done to compare a suspicious document with each source document in the set of candidate documents obtained for that suspicious document from the previous stage.
    The steps for locating are:
    1. Preprocessing
    2. Sampling
    3. Clustering and merging
    Preprocessing involves the removal of all symbols that do not affect the meaning of the document. Also the position of each word is recorded before and after the removal of these symbols.
    Sampling: Using Winnowing algorithm with a window size of 6 and overlapping word-5-grams, the sample fingerprint vector of each document is obtained. This fingerprint vector also records both the start and end positions of the original text. Afterwards the inverted index of each document can also be generated.

    The following is used to compute the position of the original text.

    where:

  • SPi is the start position represented by the ith fingerprint
  • EPi is the end position represented by the ith fingerprint
  • Pcur is the beginning position of the current window
  • W is the window size
  • K is the length of the text
    Clustering and merging: The two fingerprint vectors obtained previously are compared by representing them in a bidirectional plane to obtain a list of matches between the source and suspicious documents. An example is given below.

    Zou et al. (2010).
    In the above figure, non-obfuscated plagiarism is represented by lines while obfuscated plagiarism is shown by squares.
    To locate the copied text in a document, the improved longest common substring is used to merge all common substrings whose separating distance is less than a given threshold. Thereafter, to take care of obfuscated text, clustering is applied.
    Given an obfuscated passage(of axis ai and width 2'y) with a class aclass and line ai(non-obfuscated plagiarized fragment of the text) is in aclass , if there is another line aj is in aclass, and the distance between ai and aj along the passage direction is less than a threshold 'x, then aj is merged to aclass.
    That is, given ai, and class aclass and ai' aclass, there exist j>I, given the conditions:
    |xi +yi ' (xj + yj)| ' 'y
    |(xi + yi + li) ' ( xj + yi +lj)| - 1.4 * (li + lj)' 'x
    Then aj ' aclass
    Hence the lines in the shadow squares are merged, thereby reducing the impact of obfuscation.
  • Post-Processing
    The merging and clustering done while trying to locate plagiarized text introduces three types of errors:
    1. Error resulting as a result of the angle of the merged line deviating from the normal 45' expected to show plagiarized part of a text. This is usually because the length of the source fragment and suspicious fragment are different. The recommended solution to this error is to discard those merged parts that led to this error.
    2. Error resulting from merging sparse points. This can be resolved by discarding those copy texts whose calculated similarity is less than a threshold (say 0.05)
    3. Error resulting from the fact that the copy texts that were found in the suspicious document were found to be repeated more than once in the source document. This was resolved by first calculating the product of the similarity of the similar text and their length. The similar text with the largest product is picked while others are discarded.
    2.2.3.3 Advantages of the Methodology
    1. The cluster based methodology for detecting plagiarism employed by Zou and his team mates is very useful in detecting non-obfuscated plagiarism and obfuscated plagiarism.
    2. It also saves time used in in detecting plagiarism since a search space reduction is done.
    2.2.3.4 Disadvantages of the Methodology
    1. Plagiarism may not be detected when synonyms of words in the source document are used in the suspicious document.
    2. It does not take care of plagiarism done in a language different from that used in the source document.

    CHAPTER THREE
    SYSTEM ANALYSIS AND DESIGN

    Perhaps the very first step that should be taken in plagiarism detection in a reference corpus is the search for a document in the corpus that has the same title as a suspicious document.
    Cases abound (even in Nigeria) in which the only change made to a stolen document by a plagiarist is the substitution of the genuine author's name for the plagiarist's name.
    According to Du Zou, Wei-jiang Long and Zhang Ling, Supposed it takes an average of 100ms to process a pair of documents, and then the total computation time will be more than 200 days.
    Also, a survey of most algorithms used in plagiarism detection shows a thorough scan of the documents in a reference corpus without first checking if a plagiarist is involved in 100% plagiarism of a genuine document in the reference corpus through a title check.
    3.1 System Analysis
    From the literature review done, two methods were used to detect plagiarism. Though excellent algorithms they were, none of them actually considered that the plagiarist might actually be submitting an existing work in the corpus as his without even changing the title.
    The proposed system addresses a module in plagiarism detection. It detects plagiarism by checking if there is an existing project documentation in the reference corpus that has the same title as the title of a suspicious project documentation, thereby quickly detecting plagiarism of a source document in the corpus.
    A plagiarism detection system that first check if there is an existing document in the reference corpus having a similar title as a suspicious document has the following advantages:

  • It has a faster running time since it quickly searches through the titles of documents in the reference corpus to seek out the original document that has been plagiarized.
    It is a basic stage to be taken in determining if plagiarism has actually taken place.
    1. System Architecture
    The system is able to perform the following:
  • Store the hard copy of the documentation of students' projects.
  • Store necessary information regarding the author and supervisor(s) of a project.
  • Allows the user to check if a chosen title of a project is the same as that of a documentation of a project in the reference corpus.
  • It also allows a user to view existing project documentation
  • Likewise, it allows a user to be able to view contact information about the author and his or her supervisor.
  • Already entered information about a project documentation should also be editable by an authorized user.

    ' Title detection system is a web application developed thus to allow its users have easy access to it, wherever they may be.
    Title detection system opens with a title page through which a user can login to access modules apportioned to him or her.
    There are three major types of users of the system, namely:

  • The Head of Department
  • A Supervisor
  • An Author
    After the login page, a welcome page opens displaying modules that are activated to that user, from which he can choose to open needed module. The modules are:

    The Welcome page with modules activated for an HOD that has not mapped his HOD ID to his supervisor ID

    The Welcome page with modules activated for a Temporary author

    The Welcome page with modules activated for an Author.

    The Welcome page with the module activated for a Temporary Supervisor.

    The Welcome page with modules activated for a Supervisor.
    Description of each module
    Create Supervisor: This module is only accessible by the HOD of the department who is automatically assigned a login username and password at the creation of a department in the system. He uses this module to create temporary login details for supervisors through which the supervisors can login into the system to register their details. This login details are only valid for three (3) days after which they are deleted from the system so as to free that storage space. Hence the supervisor must register within three days of creation of the temporary login details.
    Create Author: a registered supervisor uses this module to create temporary login details for an author through which an author can register his details. This login detail is only valid for three (3) days after which it is deleted from the system so as to free that storage. Hence the author must register within three days of creation of the temporary login details.
    Register Supervisor: a lecturer who is to become a supervisor for the first time uses this module to register as a supervisor after logging in with the temporary login details supplied to him by the Head of Department. At the successful registration of this supervisor, a supervisor identification number is automatically assigned to him to be used as username for subsequent login with the default password of '1234567'.
  • Register Author: A new author to the uses this module to register as an author after logging in with the temporary login details supplied to him by his major supervisor. At the successful registration of this author, an author identification number is automatically assigned to him to be used as username for subsequent login with the default password of '1234567'.
  • Check Title: This module is only accessible by either a registered supervisor or a Head of Department. This module allows the user to check if supplied title from students that are about to engage in a research work already exists in the reference corpus.
    If the same title is found, a table is displayed that allows the user to:
  • Download the paper
  • View information about the author and supervisor(s)
    If a project with the same title is not found, then a page is displayed that allows the supervisor to upload the paper by filling some essential details. Otherwise, paper upload may be used by the author to upload his paper himself.
  • Upload Paper: This module is used by the author to upload a paper with an approved title to the reference corpus. A paper that has the same title as a paper in the repository cannot be uploaded to the system. An undergraduate or MSc student can only give details about just one supervisor, which is his major supervisor. A PhD student (or a person of high rank) is allowed to provide details for a maximum of two more supervisors after giving information about his major supervisor. This uploads the paper temporary into the system to await approval from the author's major supervisor before it is permanently uploaded to the system.
  • Approve Uploads: After submission of a paper by an author, a supervisor uses this module to verify and approve the upload before the uploaded paper is permanently inserted into the system. Immediately paper upload is approved and permanently entered into the system, the temporary entered upload is deleted from the system.
    Note: Paper in this project may refer to a published paper or the documentation of the project work of an undergraduate, a dissertation or thesis from a master student or PhD student.
    2. Functional Requirements of the system
    IPO (Input Process Output) chart of the proposed system
    S/N Product Function Input Process Output
    1. Title Page/Login Page Login Details This checks the database for a matching login details using Login servlet. Displays an appropriate welcome page if matching login details are found in the database. Else the user is prompted to enter the correct login details.
    2. Welcome Page The Task to be performed is selected here The server checks for the page referenced by the selected (Task)link The appropriate page is displayed by the server
    3. Create Supervisor The temporary login username and password for a new supervisor to the system (Title Detection System) are entered here. A java servlet processes this information and send it to the database for storage. These details are stored in the temporary supervisor table in the database.
    4. Create Author The temporary login username and password of a new author to the system (Title Detection System) are entered here. A java servlet processes this information and send it to the database for storage. These details are stored in the temporary author table in the database.
    5. Register Author Information about a new author is entered here. A java servlet processes this information and send it to the database for storage. The Information about the author is saved and a web page is displayed showing the user the entered information.
    6. Register Supervisor Information about a new supervisor is entered here. A java servlet processes this information and send it to the database for storage. The Information about the supervisor is saved and a web page is displayed showing the user the entered information.
    7. Author/Supervisor Mapping The author ID and supervisor ID of the Head of Department is entered here. A java servlet processes this information and send it to the database for storage. An update is made to the author ID and supervisor ID fields of that HOD record to their actual values.
    8. Upload Paper A paper with an approved title is entered here with other information about the paper such as the title of the paper, author id and supervisors IDs. A Java servlet processes the entered information and paper and saves it to a temporary storage in the database. The uploaded paper is stored and a printable report of the information about the just uploaded paper is displayed.
    9. Check Title The title of the new paper is entered from CheckTitle.html A servlet collects the entered title and checks if that same title is in the database. This done by using an SQL SELECT STATEMENT to select the paper ID with the entered title as reference. If a paper having the same title is found in the database, a page is displayed through which information about the author or supervisors of the project can be accessed by clicking the appropriate link. Also, the paper may be viewed clicking an appropriate link.
    10. Approve Upload The user selects the uploaded paper he wants approved. A servlet collects entered information and uses an SQL insert statement to send this information to the database for permanent storage. The approved paper is stored in the permanent storage and a success message id displayed.
    11. View Paper View Paper is selected from the table showing information about the document with the same title. A servlet is called by the server that retrieves the paper from the database using an SQL SELECT statement with the title of the paper as reference. The paper is displayed to the user through the browser.
    12. Author's Information View The link to view in the Author's information column is selected. A servlet is called to collect the required information from the database using an SQL SELECT statement. The Information about the author and user is display.
    13. Temporary Supervisor trigger An insert update or delete is made to the supervisor table. This Oracle trigger removes all entries in the temporary supervisor table that are older than three days. Entries in the temporary supervisor table that are older than three days are cleared from the database.
    14. Temporary Author trigger An insert update or delete is made to the author table. This Oracle trigger removes all entries in the temporary author table that are older than three days. Entries in the temporary author table that are older than three days are cleared from the database.
    15. Temporary Upload trigger An insert is made to the projects table. This trigger removes an entry with same information from the temporary upload table. Entry in the temporary upload table that has just been inserted in the projects table is removed.
    16. Department HOD Login trigger A department is created. A new entry is mad to the Department_HOD_login table containing the HOD username and password through an SQL Insert statement. The username and password of the HOD is created.
    17. Paper Information Edit The link to edit in the Paper Information edit column is selected A servlet takes the title of the paper and retrieves the Information concerning that paper using an SQL SELECT statement. A page consisting of the information concerning that paper is displayed appropriately in an editable format. The page also contains a button called save.
    18. Paper Information edit upload The necessary information are edited and the save button is activated. A servlet is called that saves the edited information to the database using a SQL UPDATE statement. A printable report of the paper's information showing the new changes are displayed.

    3.1.3 User Descriptions
    3.1.3.1 UML (Unified Modeling Language) Use Case Table of the system
    S/N User Product Function Role
    1. Head of Department *Title Check,
    *Upload Paper
    *Approve Paper
    *Create Supervisor
    *Create Author
    *Register Supervisor
    *Author-Supervisor Mapping This user is allowed to check if a submitted work or a proposed work has the same title as that in the reference corpus.

    2. Supervisor *Title Check
    *Upload Paper
    *Approve Paper
    *Create Author
    *Author-Supervisor Mapping The support staff is allowed to enter student information concerning a verified paper and save it to the database. They are also allowed to edit the information of already uploaded papers in cases where errors were made or updates are required.
    3. Author *Upload Paper An author uses this upload that paper to the system as well as enter basic information about the supervisor(s) of the project into the system.
    4. Temporary Supervisor *Register supervisor A supervisor uses this to enter his personal information into the system.
    5. Temporary Author *Register Author An author uses this to enter his personal information into the system.
    Use Case Table of the System
    3.1.3.2 UML (Unified Modeling Language) Use Case Diagrams of the system.

     

    3.2 System Design
    3.2.1 DATABASE DESIGN

    The diagram above is the Entity-Relationship diagram of the system.
    NOTE: Temporary upload table is a temporary storage location for project documentations before they are transferred into the PROJECTS table. Approved and disapproved projects are deleted from Temporary upload table.
    Also Temporary author login table and temporary supervisor login table are tables used to store the login information that allows an author to gain access into the system to get registered. Immediately an author is registered, the author is assigned permanent login details which he or she uses to upload his project documentation.
    Database Description of the above ER model

  • AUTHOR

    ' SUPERVISOR

  • PROGRAMME
  • DEPARTMENT
  • FACULTY

    ' INSTITUTION

  • AUTHOR_MAJOR_SUPERVISOR
  • SUPPORTING_SUPERVISORS
  • AUTHOR_SUPERVISOR
  • PROJECTS
  • AUTHOR_LOGIN
  • SUPERVISOR_LOGIN
  • DEPARTMENT_HOD_LOGIN

    Database tables used for temporary storage
  • TEMPORARY _AUTHOR LOGIN

    ' TEMPORARY_SUPERVISOR LOGIN

  • T_AUTHOR_MAJOR SUPERVISOR
  • T_SUPPORTING_SUPERVISORS
  • TEMPORARY_AUTHOR_SUPERVISOR
  • TEMPORARY_UPLOAD

    TRIGGERS USED TO AUTOMATICALLY UPDATE TABLES
    S/N TRIGGER FUNCTION
    1. AUTHOR_LOGIN_TRIGGER Inserts login details for a newly registered author.
    2. TEMPORARY_AUTHOR_TRIGGER Deletes all temporary login details that have been in existence for more than three (3) days from the Temporary Author Login table.
    3. SUPERVISOR LOGIN TRIGGER Inserts login details for a newly registered supervisor.
    4. TEMPORARY_SUPERVISOR_TRIGGER Deletes all temporary login details that have been in existence for more than three (3) days from the Temporary Author Login table.
    5. TEMPORARY_UPLOAD_TRIGGER Deletes approved projects from Temporary_upload table.

    3.2.2 Java Program Design
    UML (Unified Modeling Language) Activity diagrams of the Java programs

    UML (Unified Modeling Language) Class diagrams of the Java programs

    GRAPHICAL USER INTERFACES (WEB PAGES)

  • LOGIN PAGE
    Title Detection System opens with a login page through which the user enters his or her login credentials. Login credentials entered help determine the type of user gaining access to the system, thus determining the modules to be activated for that user.
  • WELCOME PAGES
    The welcome page introduces the user to the system as well as displays the modules of the system that are available to the user. The different welcome pages for different users are displayed below.

    Head of Department Welcome Page
    The Welcome page for this user (Head of Department) gives him access to do the following:
    Create authors
    Create supervisor
    Register supervisors
    Upload papers
    Check Titles
    Approves uploaded projects in which he acted as a major supervisor.
    Map his author ID and supervisor ID to his username as a Head of department

    The Head of Department creates temporary login credentials for a supervisor or author through the Create Supervisor or Create Author pages respectively by clicking on the appropriate links.
    Temporary Supervisor Page
    This page is displayed to a user with temporary login credentials of a supervisor to allow him register himself as a supervisor. Only the Register Supervisor module is activated for this user.

     

    Supervisor Welcome page

    The Supervisor welcome page grants access to a supervisor to do the following:
    Create authors
    Upload papers
    Check titles
    Approves uploaded projects in which he acted as a major supervisor.
    Temporary Author Welcome Page
    This gives access to an author to register his or her information using his temporary author credentials.


    Author Welcome page
    This page is the portal through which an author can upload papers.

  • CREATE SUPERVISOR PAGE
    This page can only be accessed by a valid Head of Department in the system by clicking on the Create Supervisor link on the Welcome page. It allows the user to create temporary supervisor login details.

    An error page is displayed if the credentials are not accepted by the system.
    Supervisor Registration Page
    This page is used by a supervisor to register his information into the system. After successful registration, the registered supervisor is assigned a permanent username and password. Through this login details, the user can then proceed to perform other functions.
  • Create Author web page
    This is available to the Head of Department and supervisors.
    Note: Created temporary login credentials are removed from the Temporary_Author table in the database after a minimum of three days of its entry. The removal of temporary login credentials that are at least three days old is triggered by an insert, update or delete action on the Temporary_Author table.
    An error page is displayed if the credentials are not accepted by the system.
  • Author Registration page
    This page is used by an author to register his information into the system. After successful registration, the registered author is assigned a permanent username and password. Through this login details, the user can then proceed to perform other functions.
  • Upload Paper page
    This is the portal through which an academic paper is uploaded into the system. Undergraduates and Masters student are only to supply the Major supervisor ID while Doctor of Philosophy are to supply Major supervisor ID and the supervisor IDs of other supporting supervisors where applicable.
  • Supervisor Author Mapping page
    This maps the supervisor ID and author ID of the Head of Department to his HOD ID
  • CHECK TITLE page
    This page gives access to the Head of Department and Supervisors to check if the title of a project already exists in the reference corpus.
  • About page
    This gives a general description of what the system is about.
  • Contact page
    This page gives some information about the author and supervisor of the project as well as the institution where the project was conducted.

    CHAPTER FOUR
    SYSTEM IMPLEMENTATION
    4.1 Implementation Platform
    Title Detection System was implemented on the following platform.
    4.1.1 Hardware Requirement
    The hardware requirement for the system on which Title Detection System was built used the following configuration:
  • Computer Processor: Pentium
  • Hard Disk: 454.9Gigabytes
  • RAM (Random Access Memory): 4.0 Gigabytes
  • Clock Speed: 2.13 GHz
    4.1.2 Database Server Configuration
  • Computer Processor: Pentium
  • Hard Disk: 454.9Gigabytes
  • RAM (Random Access Memory): 4.0 Gigabytes
  • Clock Speed: 2.13 GHz
    4.1.3 Software Requirement
    The software used to build Title detection System include:
  • Operating System: Windows 7
  • Web Server: Apache Tomcat 8.0.3.0
  • Front End and User Interfaces: HTML(Hypertext Markup Language), Javascript, Cascading Style Sheet.
  • Database Specification: Oracle Database Management System
  • IDE(Independent Development Environment): NetBeans
  • Reporting Tool: Jasper Reports
  • Report Designer: iReport
  • Browser: Internet Explorer, Mozilla Firefox, Google Chrome
  • Programming Languages: Java, SQL(Structure Query Language), PLSQL(Procedural Language Structure Query anguage), HTML(Hypertext Markup Language), Javascript, Cascading Style Sheet.

    4.2 User Guide

  • LOGIN PAGE
    Title Detection System opens with a login page through which the user enters his or her login credentials. Login credentials entered help determine the type of user that is trying to gain access to the system. This will help to determine the modules to be activated for that user.
    To log into the system, the user must supply his or her username (Head of Department ID, Temporary Supervisor Username, Supervisor ID, Temporary Author Username or Author ID as appropriate) and password.
    Then click on the Login button to submit his or her login details for a check.
  • WELCOME PAGES
    The welcome page introduces the user to the system as well as displays the modules of the system that are available to a verified user. The contents in the welcome page served to a user are determined by type of user accessing the system. The different welcome pages for different users are displayed below.
  • Head of Department Welcome Page
    The Welcome page for this user (Head of Department) provides modules that allows this type of user to Create authors, Create supervisor, Register supervisors, Upload papers, Check Titles, Approves uploaded projects in which he acted as a major supervisor and Map his author ID and supervisor ID to his username as a Head of department.

    Head of Department Welcome Page
    The Head of Department creates temporary login credentials for a supervisor using the Create Supervisor by clicking on the Create Supervisor link.
  • Temporary Supervisor Page
    This allows a supervisor with temporary log in credentials to register his personal details before he can be assigned valid log in credentials as a supervisor.

    Temporary Supervisor Welcome page
  • Supervisor Welcome Page
    This page contains modules that assist a supervisor to perform his supervisory role.

    Supervisor Welcome Page
    The Supervisor welcome page grants access to a supervisor to, Create authors, Upload papers, Check titles and Approves uploaded projects in which he acted as a major supervisor.
    Temporary Author Welcome Page
    This gives access to an author to register his or her information using his temporary author credentials. Consequently, the supervisor is assigned his log in credentials.


    Temporary Author Welcome Page

  • Author Welcome page
    This page is the portal through which a registered author in the system can upload papers.

    Author Welcome Page
    Creating temporary login credentials for a supervisor
    To create temporary login credentials for a supervisor, the head of the department of the supervisor must first log in using his HOD login credentials, after which he can gain access to the Create supervisor page using the Create Supervisor link on his welcome page.
    This page can only be accessed by a valid Head of Department in the system. The user (HOD) must then supply the needed information to create temporary login details for s supervisor.
    After this, he or she submits the information using the submit button.

    Sample Input for Create Supervisor Page
    A success message will be displayed thereafter to inform the user that the temporary login details for each entered user has been activated.

    Sample Output for Successful Creation of Supervisor Page
    Note: Created temporary login credentials are removed from the deactivated after a minimum of three days of its entry. The removal of temporary login credentials that are at least three days old is triggered by an insert, update or delete action on the Temporary_Supervisor table.
    An error page is displayed if the credentials are not accepted by the system. This may be due to a clash in usernames in the system.
  • Registering the personal details of a supervisor
    A supervisor, whose personal information has not been registered in the system as a supervisor, must first log into the system using the temporary log in credentials supplied by his or her Head of Department.
    He must then click on the Register Supervisor link on his or her welcome page so as to be able to enter his or her information into the system.
    He or she then enters required information after which he or she submits them using the submit button.

    Sample Input for Supervisor Registration Page
    Upon successful registration of details, a document containing the entered information and other information is displayed to the user.


    Sample Result from the Successful Registration of Supervisor
    Otherwise, an error page is displayed indicating the possible cause of the rejection by the system.
  • CREATING TEMPORARY LOGIN CREDENTIALS FOR AN AUTHOR
    To create temporary login credentials for an author, the supervisor of the author must first log into the system as a supervisor, after which he can gain access to the Create Author page using the Create Author link on his or her welcome page.
    This page can only be accessed by a valid supervisor in the system. The user (supervisor) must then supply the needed information to create temporary login details for the author(s).
    After this, he or she submits the information using the submit button
    Sample Input Create Authors Page
    A success message will be displayed thereafter to inform the user that the temporary login details for each entered user has been activated.
    Successful Creation of Temporary Author Page Output
    An error page is displayed if the credentials are not accepted by the system. This may be due to a clash in usernames in the system.
    Note: Created temporary login credentials are removed from the deactivated after a minimum of three days of its entry. The removal of temporary login credentials that are at least three days old is triggered by an insert, update or delete action on the Temporary_Supervisor table.
  • Registering the personal details of an author
    An author, whose personal information has not been registered in the system as an author, must first log into the system using the temporary log in credentials supplied by his or her supervisor.
    He or she must then click on the Register Supervisor link on his or her welcome page so as to be able to enter his or her information into the system.
    He or she then enters required information after which he or she submits them using the submit button.

    Sample Input for Register Author Information Page
    Upon successful registration of details, a document containing the entered information and other information is displayed to the user.


    Sample Result for Successful Registration of Author Page

    Otherwise, an error page is displayed indicating the possible cause of the rejection by the system.

  • Mapping the Supervisor ID and Author ID to an Head of Department Username
    The Head of Department is required to map his author ID or supervisor ID to his username to allow any module that requires one of IDs information to be activated for him.
    The Head of Department must first log in using his or her HOD username.
    Afterwards, he or she should click on the Supervisor/Author Mapping link to open the
    Then the user should supply one or both of the information requested before clicking on the Map button to submit the information.

    Sample Input for Map Author Supervisor Page
    A success message will be displayed thereafter to inform the user that the detail(s) has been activated.

    Sample Output Page for Map Author Supervisor
    An error page is displayed if the credentials are not accepted by the system. This may result if the supplied information is not correct or valid.
  • Uploading a paper into the system.
    A registered author that wants to upload his or her academic paper must first log into the system using his login credentials.
    There after he or she clicks on the Upload Paper link on his or her welcome page. The Upload Paper page is accessibly by registered authors, supervisors and Head of departments in the system.
    He or she supplies the required information after which he or she submits them using the submit button.
    Note: Only the major supervisor ID will be acknowledged for authors submitting uploading their project reports for undergraduate and masters programmes.
    Doctor of Philosophy authors uploading their thesis must supply the Major supervisor ID with the supervisor IDs of other supporting supervisors where applicable.

    Sample Input for Upload Paper Page
    Upon successful upload of an academic paper, message is displayed informing the user that the paper now awaits approval.

    Sample Output for Successful Upload of Paper Page
    Otherwise, an error page is displayed possibly indicating the cause of the rejection by the system.
  • Approving Uploaded Project.
    This module is only available to supervisors. Also, a Head of Department that has mapped his or her supervisor ID to his or her HOD username may also use this module effectively.
    The valid and registered user must first log into the system using his or her login details.
    Afterwards, the user clicks on the Approve Upload link on his or her Welcome page.
    If there are uploaded papers awaiting his approval as the major supervisor of that project, a page that contains a table showing the uploaded information of uploaded project(s) is displayed. Otherwise, a page is displayed containing an empty table.

    Sample Output for Approve Upload Page
    To view and check if the uploaded paper satisfies his requirement, the supervisor should click on the View button under the View Paper column.
    To disapprove the upload paper, the supervisor must click on the Disapprove button under the Disapprove paper column
    To approve paper(s), the supervisor must first select the check box of each paper he or she wants to approve before clicking the Approve Selected Papers button.
    A success message will be displayed thereafter to inform successful approval of papers. Otherwise, an error message is displayed. An error will result it there is a paper with the same title in the reference corpus.
    Checking for a paper whose title is same as that of a given paper in the reference corpus.
    A registered supervisor or Head of Department is allowed to perform this function.
    The user must first log into the system using his or her login credentials.
    Afterwards, he or she clicks on the Check Title link on his or her Welcome page to open the Check Title page.
    He or she then supplies the title he wants to check for in the reference corpus.

    Sample Input for Check Title Page
    If same title is found in the reference corpus, then Same Title web page is displayed, else a Unique Title web page is displayed.


    Sample Output for Same Title Page
    To view the information about the author and supervisors of a program, the user should click on the View Details link in the Author & Supervisor Details column.

    Sample Result of document opened for View Author & Supervisor Details
    To open the paper with the same title as the one that was checked for, the user must click on the Open Paper link in the View Paper column.

    Sample Result for Open Paper Page
    When no paper is found in the reference corpus that has the same title as the supplied title, a unique title page is displayed.
    To upload the paper immediately, the user can click on the Upload link seen on the Unique Paper Title Page.

    Result for Unique Paper Title Page
    4.3 Discussion of Results
    The Title Detection System was able to achieve its research objective, that is:
    I. A research was conducted on some important methodologies that have been used to detect external plagiarism.
    II. A model was created for a corpus where all submitted projects(published or not) in Nigerian universities can be collected to be used as source documents for testing if a suspected work contains plagiarized contents.
    III. An online system was developed that is able to detect based on titles of projects whether a project already existing in the reference corpus has been plagiarized by a new project.
    4.4 Significant contribution to knowledge or real life
    i. The Check Title module in Title Detection system allows a supervisor to quickly detect the existence of a valid research work in the reference corpus with the same title as the title of a new project that is yet to be entered into the system.
    ii. Early detection of the existence of an academic paper with the same title as that of a submitted title can prevent a waste of time used in carrying out a project that will later be rejected by the system as a plagiarized work.
    iii. Early detection can also prevent reinventing the wheel if a proposed project has already been carried out by another person with optimal results.
    iv. Late detection of the presence of an existing project in the reference corpus bearing the same title as that of a submitted project can significantly detect plagiarism and prevent fraud as the student might truly have delayed the check until the last minutes so as to submit a plagiarized paper.
    v. The reference corpus that will proceed from this project will significantly add to the reference corpus of advanced plagiarism detection systems like Turnitin and avail them the opportunity of a more efficient detection of plagiarism as both published and unpublished academic papers in all Nigerian universities will now be stored in the reference corpus.
    vi. Also, there will be a drastic decrease in the level of plagiarism in the nation using Title Detection System since students will now be aware that there is no hiding place. No one will be able to take the academic work of a student in another university and submit it as his own without being caught.
    vii. There will be increase in the level of originality and uniqueness amongst Nigerian students as they will be more or less forced to carry out their research works themselves.
    viii. This project has been able to review proven methodologies for detecting plagiarism even in translated texts and highly obfuscated texts. It has therefore added to the knowledge of such students who might have afore time been ignorant of such methodologies.

    CHAPTER FIVE
    CONCLUSION AND RECOMMENDATION
    5.1 Summary of findings
    This research has discovered various implemented methodologies by other authors of detecting plagiarism. These methodologies include:

  • A language dependent N-gram methodologies by Gabriel et al (2011)that detects plagiarism by:
    I. Reducing the search space of source documents to be thoroughly checked. This was accomplished by the use four-grams after the removal of stop words.
    II. Searching resulting documents thoroughly using tri-grams.
  • A language independent methodology by Grman and Ravas (2011) that detects plagiarism by:
    I. Preprocessing the source and suspicious documents to translate any document in a language different from English to English using Google Translations, extracts and normalizes words from the documents, before the file of words is converted to a binary file of word invariants.
    II. Detecting a plagiarized document even when words are changed, omitted or added using some formulas.
    III. Post-processing the resulting file to remove overlapping passages the file of the suspicious document and increasing the precision and reducing the granularity of the result.

    ' A cluster based methodology that detects plagiarism by:
    I. Preprocessing the source and suspicious documents to remove to remove irrelevant symbol that do not affect the meaning of the document and subsequently, noting the position of each word in the document.
    II. Obtaining the sample fingerprints of each document by applying Winnowing algorithm that uses a window size of 6 and overlapping word 5-grams, noting the first and last position of each document creating an inverted index of the documents before clustering and merging the suspicious document to each source document.
    III. Post-processing the resultant output to ensure errors that result from the clustering and merging done above are corrected in the final output.
    5.2 Conclusion
    External plagiarism can be efficiently detected using the studied methodologies and Title Detection System is a stepping stone to implementing these methodologies in the Nigerian community as it provides a basis and provides a foundation that can be developed upon to create an advanced plagiarism detection system tailored to the needs of Nigerian tertiary institutions. The awareness of the existence of Title Detection System on its own will greatly decrease the level of plagiarism and foster creativity and uniqueness mentality in Nigerian students. This will result in more creative inventions by students that will affect not only them, but Nigeria and the World as a whole. It will likewise decrease the rate in the theft of intellectual properties and help authors gain the true dividends of their works.
    5.3 Limitations
    The modeling of the database would have really more fine-tuned by reducing the amount of storage required to store the different authors and supervisors by just creating a table named Person that collects the information of an individual and then maps a supervisor or author to that the record of an individual in the person table. This was lately discovered and would have been implemented but for time that would be needed to integrate this feature into the whole system.
    5.4 Further Research And Recommendations

  • Studied literatures have shown proven ways of detecting methodologies of detecting plagiarism in a reference corpus even when the suspicious paper has been highly obfuscated or translated to a language different from that used by the original author. These methodologies can be implemented and even improved upon to develop a better and more efficient plagiarism detection system than Turnitin which is presently being used by some academicians in Nigerian universities.
  • To reduce the amount of storage wasted in creating supervisors and author information for a particular person in the different universities where the person is engaged in, the database can be designed in such way that a person's details are collected when first admitted into the system and thereafter mapped to necessary author roles or supervisor role in the different universities where he or she is engaged in for those roles.

    REFERENCES
    1. ACM Digital Library.
    2. Aldrian, O (2012), Text Searching Algorithms. Available from: http://www.comp.nus.edu.sg/~rahul/allfiles/aldrian-text-searching.pdf . [July 3, 2015].
    3. Asim, MEA, Hussam, MDA, Vaclav 2011. 'Overview and comparison of plagiarism detection tools', Proceedings of the Dateso 2011: Annual International Workshop on Databases, Texts, Specifications and Objects, pp161-172. Available from: http://ceur-ws.org/Vol-706/poster22.pdf [March 12, 2015].
    4. Basile, C, Dario, B, Emanuele, C, Giampaolo, C and Mirko, DE (2009). 'A plagiarism detection procedure in three steps: selection, matches and squares'. 3rd PAN Workshop on uncovering Pagiarism, Authorship and Social Software Misuse, pp. 19-23, 2009. Available from http://ceur-ws.org/Vol-502/paper3.pdf . [May 8, 2015].
    5. Chiagozie, FN 2012, 'Opinion: Plagiarism and the Nigerian writer', YNaija.com 2015. Available from: http://ynaija.com/opinion-plagiarism-and-the-nigerian-writer/ . [May 11, 2015].
    6. Du, Z, Wei_jiang, L, Zhang, L (2010), 'A cluster-based plagiarism detection method'. Proceedings of the 2nd international I. Available from: http://ceur-ws.org/Vol-1176/CLEF2010wn-PAN-ZouEt2010.pdf.
    7. Encyclopedia Britannica 2013, Plagiarism. Available from: http://www.britannica.com/EBchecked/topic/462640/plagiarism. [ February 20, 2015].
    8. Gabriel, O, Gaston, L, Sebastian, AR, Juan, DV, 2011. Approaches for intrinsic and external plagiarism detection. Available from: http://www.uni-weimar.de/medien/webis/research/events/pan-11/pan11-papers-final/pan11-plagiarism-detection/oberreuter11-notebook.pdf . [March 11, 2015].
    9. Grman, J, Ravas, R, 2011. Improved implementation for finding text similarities in large collection of data. Available from: http://ceur-ws.org/Vol-1177/CLEF2011wn-PAN-GrmanEt2011.pdf
    10. Grman, J, Ravas, R (2011). Improved implementation for finding text similarities in large collection of data. Available from: citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.9169 . [ March 11, 2015].
    11. Hiremath, SA, Otari, MS, (2014), 'Plagiarism Detection-Different Methods and Their Analysis: Review', International Journal of Innovative Research in Advanced Engineering (IJIRAE),Volume 1 Issue 7. Available from: http://www.ijirae.com/volumes/vol1/issue7/AUCS10085.06.pdf
    12. Jian-Yun, N, Jiangfeng, G, Jian, Z, Ming, Z, 1934. The use of word and n-grams for Chinese information retrieval. Available from: http://research.microsoft.com/pubs/68843/words_ngrams_chinese_learning.pdf. [March 11, 2015].
    13. Jure, L and Anand, R n.d. Clustering algorithms. Available from: http://web.stanford.edu/class/cs345a/slides/12-clustering.pdf [May 8, 2015]
    14. Luhn, HP, (1998), H. P. Luhn and Automatic Indexing. Available from: https://www.i school .utexas.edu/~ssoy/or gani zi ng/l 391d2c.htm.
    15. Martins, P, Andreas, E, Alberto, B, Benno, S & Paolo, R 2011. 'Overview on 3rd international competition on plagiarism detection'. Available from: http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011t.pdf. [March 11, 2015).
    16. Merriam Webster Learner's Dictionary, 2014, Plagiarism. Available from: <http://www.learnersdictionary.com/definition/plagiarism>.
    17. N Ebbegbayan, 2005, Winnowing, a document fingerprinting algorithm, Department of Computer Science, Linkoping University. Available from: https://www.ida.liu.se/~TDDC03/oldprojects/2005/final-projects/prj10.pdf . [May 9, 2015].
    18. Oxford Dictionaries Language Matters, 2014, Plagiarism. Available from: http://www.oxforddictionaries.com/us/definition/american_english/plagiarism
    [ February 20, 2015].
    14. Plagiarism Checker n.d., Plagiarism Checker Reviews. Available from: //www.plagiarismchecker.net/plagiarism-checker-reviews.php .[June 24, 2015].
    15. Project Rennaisance (n.d). The 6 most important sales and marketing books you should have(picture). Availabe from: http://cs.stanford.edu/people/eroberts/cs201/projects/honor-code/tech.htm [July 2, 2015] .
    16. Radim, R, (2007),Semantics-based plagiarism detection. PhD Thesis, Masaryk University. Available from: http://is.muni.cz/th/39672/fi_r/teze.pdf
    15. Rammal, M, Sanan, M, 2011. Improving Arabic information retrieval system using n-gram method. Available from: http://www.wseas.us/e-library/transactions/computers/2011/52-429.pdf [March 11, 2015].
    16. Rijsbergen, CJ, 1979. Information Retrieval 2nd edn. Available from: http://www.dcs.gla.ac.uk/Keith/Preface.html. [March 11, 2015].
    17. Segun, O 2013, 'Vice chancellors tackle plagiarism with technology', The Punch 9 April. Available from: http://www.punchng.com/education/vice-chancellors-tackle-plagiarism-with-technology/. [February 20, 2015].
    18. Stackoverflow (2013), Checking which checkboxes are selected using java (a jsp). Available from: http://stackoverflow.com/questions/15775412/checking-which-check-boxes-are-selected-using-java-a-jsp. [July 3, 2015].
    19. Stackoverflow (2010), Java I/O-Text file-How to check for content? Available from: http://stackoverflow.com/questions/2328735/java-file-i-o-text-file-how-to-check-for-content . [July 3, 2015].
    20. Turnitin, Plagiarism and the Web: Myths and Realities. Available from: http://turnitin.com/static/resources/documentation/turnitin/company/Turnitin_Whitepaper_Plagiarism_Web.pdf
    21. University of Oxford n.d. , Plagiarism. Available from:< http://www.ox.ac.uk/students/academic/guidance/skills/plagiarism>.
    22. William, BC & John MT, 1994. N-gram text-based categorization. Available from: http://odur.let.rug.nl/~vannoord/TextCat/textcat.pdf. [March 11, 2015].
    23. Wikipedia, (2015a). N-gram. Available from: http://en.wikipedia.org/wiki/N-gram. [March 11, 2015].
    24. Wikipedia (2014b), Plagiarism. Available from: http://en.wikipedia.org/wiki/Plagiarism [December 11, 2014].
    25. Wikipedia, (2015c). Information Retrieval. Available from: http://en.wikipedia.org/wiki/Information_retrieval. [March 11, 2015].
    26.

    APPENDICES
    1. Ahmad, GL and Aijaz A, (2011). Plagiarism detection in Java. Available from: http://www.diva-portal.org/smash/get/diva2:428025/FULLTEXT01.pdf
    2. Akin, O,(2014), Intellectual property rights protection and the challenge of avioding the trap of plagiarism. Available from: http://www.unaab.edu.ng/attachments/intellectualrightprofomotayo.pdf
    3. Arun, KJ (2012) ,' Similarity Overlap Metric and Greedy String Tiling' PAN 2012: Plagiarism Detection Notebook for PAN at CLEF 2012. Available from: http://ceur-ws.org/Vol-1178/CLEF2012wn-PAN-Jayapal2012.pdf
    4. Anthony O (2013), Efficient clustering based plagiarism detection system using IPPDC. Available from: http://digitalcommons.csbsju.edu/cgi/viewcontent.cgi?article=1015&context=honors_theses .[July 3, 2015].
    5. Benno S, Sven MVE (2006),Near Similarity search and plagiarism analysis. Available from: http://www.uni-weimar.de/medien/webis/publications/papers/stein_2006a.pdf
    6. Bensal, ER and Miraflores, ES (2013), Shall we turn to Turnitin? Available from: http://callej.org/journal/14-2/Bensal_Miraflores_Tan_2013.pdf .[July 3, 2015].
    7. Butters, D n.d. Comparison of matching algorithms used in three plagiarism detection systems. Available from: https://www.cs.auckland.ac.nz/courses/compsci725s2c/archive/termpapers/725butters.pdf .[July 3, 2015].
    8. Copyright Act (1990). Available from: http://www.nigeria-law.org/CopyrightAct.htm
    9. Dean dad (2013), Confession of a community college dean, Dean dad: Blog. Avaialble from: http://suburbdad.blogspot.com/2006/10/thoughts-on-turnitincom.html .[July 3, 2015].
    10. Digital Library
    11. Emmanuel, O (2012). 'Plagiarism the story of Sanusi and Zakaria' The Nigerian Voice(2012). Available from: http://www.thenigerianvoice.com/news/96225/50/plagiarism-the-story-of-sanusi-and-zakaria.html . [July 3, 2015].
    12. Graduate Program Harvard Law School, n.d., Plagiarism tutorial. Available from: http://learning.law.harvard.edu/graduateprogram/Open_Plagiarism_Tutorial.pdf
    13. Hiremath, SA and Otari, MS (2014), 'Plagiarrism detection- different methods and their analysis: review' International Journal of Innovative Research in Advanced Engineering (IJIRAE), volume 1, issue 7. Available from: http://www.ijirae.com/volumes/vol1/issue7/AUCS10085.06.pdf .[July 3, 2015].
    14. International School of Management, n.d. Academic Integrity and Culture Sensitivity. Available from: http://www.ism.edu.ng/brochure/academic-integrity.pdf
    15. Jens, T n.d., Analysis of Turnitin.com. Available from: https://www.cs.auckland.ac.nz/courses/compsci725s2c/archive/termpapers/jrotzky.pdf .
    16. Nnamdi, F (2012), ' Plagiiarism: Nigeria's central bank governor sued', PM News Nigeria . Available from: http://www.pmnewsnigeria.com/2012/04/23/plagiarism-nigerias-central-bank-governor-sued/
    17. Rasia, N, Sheena,K (2013), 'Extrinsic Plagiarism Detection in Text Combining Vector Space Model and Fuzzy Semantic Similarity Scheme', International Journal of Advanced Computing, Engineering and Application (IJACEA), Vol. 2, No. 6. Available from: http://www.iracst.org/ijacea/papers/vol2no62013/1vol2no6.pdf
    18. Reena, K, Preeti, MC, Vaibhav, J and Kuldeep, R, (2013) 'Semantically Detecting Plagiarism for Research Papers', International Journal of Engineering Research and Applications (IJERA), Vol. 3, Issue 3, May-Jun 2013, pp.077-080 . Available from: http://www.ijera.com/papers/Vol3_issue3/P33077080.pdf
    19. Yelsew80 (2013), 'African first as Nigerian universities deploytop plagiarism detection software', Nairaland forum. Available from: http://www.nairaland.com/1234086/african-first-nigerian-universities-deploy. [July 3, 2015].
    20. Richa (2014), JavaScript with HREF : Using JavaScript inside the A Link Tag. Udemy blog: Blog. Available from: https://blog.udemy.com/javascript-href/ . [July 2, 2015].
    21. Sattyam, KM and Manish, P (2010), ' Efficient matching algorithm for offline text', Global journal of computer science and technology, vol. 10, issue 11, pp 23 -28. Available from: http://globaljournals.org/GJCST_Volume10/4-An-Efficient-Word-Matching-Algorithm-For-off-Line-Text.pdf .[July 3, 2015].
    22. Saul, S, Daniel, SW and Alex, A, Winnowing: Local algorithms for document fingerprinting. Available from: http://igm.univ-mlv.fr/~mac/ENS/DOC/sigmod03-1.pdf [May 5, 2015].
    23. Tshepo, B (2010), Turning to turnitin to fight plagiarism among university students. Available from: http://ifets.info/journals/13_2/1.pdf . [July 3, 2015].
    24. Wikipedia, (2015d), Checksum. Available from: http://en.wikipedia.org/wiki/Checksum [May 5, 2015].
    25. Wikipedia, (2015e), Fingerprint computing. Available from: http://en.wikipedia.org/wiki/Fingerprint_(computing) [May 5, 2015].
    26. Wikipedia, (2015e), Inverted Index. Available from: http://en.wikipedia.org/wiki/Inverted_index [May 8, 2015].
    27. Wikipedia, (2015d), Obfuscation. Available from: http://en.wikipedia.org/wiki/Obfuscation [May 8, 2015]
    28. Wikipedia, (2015d), Perceptron Available from: http://en.wikipedia.org/wiki/Perceptron [May 8, 2015]
    29. Wikipedia, (2015e), Real valued function. Available from: http://en.wikipedia.org/wiki/Real-valued_function[May 5, 2015).
    30. Wikipedia, (2015e), Winnow algorithm. Available from: http://en.wikipedia.org/wiki/Winnow_(algorithm) [May 5, 2015).
    31. Wikipedia, (2015e), Real valued function. Available from: http://en.wikipedia.org/wiki/Real-valued_function[May 5, 2015).

    Keywords for index: Desing plagiarism software, plagiarism detection software, creation of plagiarism scanner, plagiarism detection problem.

    This essay was donated by a student on 3.7.2015 in exchange for a free plagiarism scan.