Group Selection Page HELP Controls Group ITDM DET Vacuum WP78 FXE SPB HED SQS SXP Sample Environment PSPO XO TS Migrated2Zulip Archived2GitlabPages Archived2PDF MigratedByGroup
WP76 EEE DET_archived DET_deleted
General electronics MicroTCA EEE Electronics Lab EEE Rack Room
  MicroTCA Logbook  Not logged in ELOG logo
Message ID: 95     Entry time: 08 Apr 2014, 15:00
Author: Frank Babies 
Type: Hardware Changes 
Category: exflutcadev 
Subject: analysing sda hdd exflutcadev 

Messages from today.

Apr  8 11:01:13 frank-pczW0712 kernel: [86807.520045] EXT4-fs (sdc1): error count: 13

Apr  8 11:01:13 frank-pczW0712 kernel: [86807.520052] EXT4-fs (sdc1): initial error at 1396349073: ext4_reserve_inode_write:4483

Apr  8 11:01:13 frank-pczW0712 kernel: [86807.520057] EXT4-fs (sdc1): last error at 1396353000: ext4_remount:4576

 

Smartmontools shows a lot of read and ECC failures, but after one hour self test it shows  no error.

root@frank-pczW0712:~# smartctl -a /dev/sdc

.......

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   074   063   044    Pre-fail  Always       -       156219000

  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       62

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       34453485116

  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9002

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       62

184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   098   098   000    Old_age   Always       -       2

188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   067   064   045    Old_age   Always       -       33 (Min/Max 25/34)

194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always       -       33 (0 20 0 0)

195 Hardware_ECC_Recovered  0x001a   021   016   000    Old_age   Always       -       156219000

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

 

SMART Error Log Version: 1

ATA Error Count: 2

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

 

Error 2 occurred at disk power-on lifetime: 6383 hours (265 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 d7 b0 a4 00  Error: UNC at LBA = 0x00a4b0d7 = 10793175

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 d0 f8 80 b0 a4 e0 00  20d+08:56:43.218  READ DMA EXT

  25 d0 f8 88 af a4 e0 00  20d+08:56:43.210  READ DMA EXT

  25 d0 10 78 b3 a4 e0 00  20d+08:56:43.195  READ DMA EXT

  25 d0 f8 80 b2 a4 e0 00  20d+08:56:43.136  READ DMA EXT

  25 d0 f8 88 b1 a4 e0 00  20d+08:56:43.010  READ DMA EXT

 

Error 1 occurred at disk power-on lifetime: 6383 hours (265 days + 23 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 d2 b0 a4 00  Error: UNC at LBA = 0x00a4b0d2 = 10793170

 

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  25 d0 f8 80 e8 95 e0 00  20d+08:56:31.300  READ DMA EXT

  25 d0 f8 88 e7 95 e0 00  20d+08:56:31.267  READ DMA EXT

  25 d0 10 78 df c0 e0 00  20d+08:56:27.840  READ DMA EXT

  25 d0 f8 80 de c0 e0 00  20d+08:56:27.831  READ DMA EXT

  25 d0 f8 80 e8 b0 e0 00  20d+08:56:23.092  READ DMA EXT

 

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  extended offline    completed without error       00%      9002         -

# 2  Short offline       Completed without error       00%      9001         -

# 3  Short offline       Interrupted (host reset)      00%      8834         -

 

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

 

 

ELOG V3.1.4-7c3fd00