Yap Peak Fei, Prof. Ts. Dr. Ting Choo Yee, Hairul Azhar Abdul Rashid
Description of Invention
Industry Cadetship programme assigns penultimate students to companies that best suit their profiles, bridging the gap between academic learning and practical skills. However, manual analysis of educational data for assignments is time-consuming. Thus, this study aimed to (i) propose an algorithm for student-company assignments through student profiles and company profiles, (ii) propose a method for the assign- ment of the supervisor to a company, and (iii) use similarity measure techniques to recommend companies with similar characteristics. Data was collected from a uni- versity’s student, company, and lecturer datasets. To assign students to companies, Haversine, OpenStreetMap, and NetworkX are used to calculate the shortest distance between students and companies, evaluated based on mean, variance, standard devi- ation, and utilization rate. For lecturer assignments, a comparative analysis utilized models with and without embeddings. Non-embedding analysis employed Count Vec- torizer and Label Encoder alongside similarity measures like Cosine Similarity, Jac- card Similarity, Euclidean Distance, Manhattan Distance, and Hamming Distance to compute similarity scores between lecturer expertise and company descriptions. Em- beddings from Voyage AI, BERT, RoBERTa, and GloVe were used with Cosine Sim- ilarity to assess alignment between domain descriptions and company or lecturer in- formation. Comparisons between embeddings from 100-word and 50-word descrip- tions revealed that longer descriptions improved performance, with accuracy increas- ing from 0.6154 to 0.7692, precision from 0.5744 to 0.7751, and the F1-score from 0.5807 to 0.7484. Lecturers were assigned to companies based on the highest similar- ity scores, evaluated through accuracy, precision, recall, and F1-score. Results showed that embedding techniques significantly enhanced the matching process, with accuracy rising from 0.63 to 0.86, recall from 0.62 to 0.85, and the F1-score from 0.67 to 0.88. For lecturer-company distance analysis, the OSMnx method using the bounding box approach demonstrated the best performance, with a mean absolute error (MAE) of 0.62 km and a root mean square error (RMSE) of 0.89 km, highlighting its effective- ness in accurately estimating distances.